List of Contributors H. Ackermann, Department of General Neurology, Hertie Institute for Clinical Brain Research, University of Tu¨bingen, Osianderstr. 24, Tu¨bingen, Germany R. Adolphs, Division of the Humanities and Social Sciences, California Institute of Technology, Humanities and Social Sciences 228-77, 331B Baxter Hall, Pasadena, CA 91125, USA K. Alter, School of Neurology, Neurobiology and Psychiatry, Newcastle upon Tyne, UK S. Anders, Institute of Medical Psychology and Behavioral Neurobiology, University of Tu¨bingen, Gartenstr. 29, 72074 Tu¨bingen, Germany R. Assadollahi, Department of Psychology, University of Konstanz, P.O. Box D25, D-78457 Konstanz, Germany T. Ba¨nziger, Swiss Center for Affective Sciences, University of Geneva, 7 rue des Battoirs, 1205 Geneva, Switzerland S. Baron-Cohen, Department of Psychiatry, Autism Research Centre, University of Cambridge, Douglas House, 18B Trumpington Road, Cambridge CB2 2AH, UK M.M. Bradley, Department of Psychology, Psychology Building Room 114, University of Florida, P.O. Box 112250, Gainesville, FL 32611, USA R. Cardinale, Department of Psychology, University of Bologna, Viale Berti Pichat, 5-40127 Bologna, Italy M.A. Cato Jackson, Nemours Children’s Clinic, Neurology Division, 807 Children’s Way, Jacksonville, FL 32207, USA B. Chakrabarti, Department of Psychiatry, Autism Research Centre, University of Cambridge, Douglas House, 18B Trumpington Road, Cambridge CB2 2AH, UK S.D. Chiller-Glaus, Department of Psychology, University of Zurich, Zurich, Switzerland M. Codispoti, Department of Psychology, University of Bologna, Viale Berti Pichat 5, I-40127 Bologna, Italy B. Crosson, Brain Rehabilitation Research Center, Malcolm Randall VA Medical Center, 1601 SW Archer Rd, Gainesville, FL 32608-1197, USA D.W. Cunningham, Department Bu¨lthoff, Max Planck Institute for Biological Cybernetics, Spemannstrasse 38, 72076 Tu¨bingen, Germany M. Davis, Behavioral Sciences and Psychology, School of Medicine, Center for Behavioral Neuroscience, Emory University, 1639 Pierce Dr., Suite 4000, Atlanta, GA 30322, USA A. De Cesarei, Department of Psychology, University of Bologna, Viale Berti Pichat, 5-40127 Bologna, Italy T. Demirakca, Central Institute of Mental Health, J5, Division of Neuroimaging, 68159 Mannheim, Germany S. Dietrich, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, 04103 Leipzig, Germany K. Do¨hnel, Department of Psychiatry, Psychotherapy and Psychosomatics, University of Regensburg, Universita¨tstrasse 84, D-93053 Regensburg, Germany G. Ende, Central Institute of Mental Health, J5, Division of Neuroimaging, 68159 Mannheim, Germany
v
vi
T. Ethofer, Section of Experimental MR of the CNS, Department of Neuroradiology, University of Tu¨bingen, Olfried-Mu¨ller Str 51, 72076 Tu¨bingen, Germany V. Ferrari, Department of Psychology, University of Bologna, Viale Berti Pichat, 5-40127 Bologna, Italy I. Fischler, Department of Psychology, Psychology Building Room 114, University of Florida, P.O. Box 112250, Gainesville, FL 32611, USA T. Flaisch, Department of Psychology, Institute of Psychology, University of Konstanz, Universita¨tstrasse 10, 78457 Konstanz, Germany V. Gazzola, BCN Neuro-Imaging-Centre, University Medical Center Groningen, University of Groningen, A. Deusinglaan 2, 9713 AW Groningen, The Netherlands D. Grandjean, Swiss Center for Affective Sciences, University of Geneva, 7 rue des Battoirs, 1205 Geneva, Switzerland G. Hajak, Department of Psychiatry, Psychotherapy and Psychosomatics, University of Regensburg, Universita¨tstrasse 84, D-93053 Regensburg, Germany A. Hennenlotter, Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, D-04103 Leipzig, Germany C. Herbert, Department of Psychology, University of Konstanz, P.O. Box D25, D-78457 Konstanz, Germany M. Jungho¨fer, Institute for Biosignalanalysis and Biomagnetism, University of Mu¨nster, Mu¨nster 48149, Germany A. Keil, Department of Psychology, ZPR Building 029, C 516, University of Konstanz, Box D23, D-78457 Konstanz, Germany C. Keysers, BCN Neuro-Imaging-Centre, University Medical Center Groningen, University of Groningen, A. Deusinglaan 2, 9713 AW Groningen, The Netherlands J. Kissler, Department of Psychology, University of Konstanz, P.O. Box D25, D-78457 Konstanz, Germany S.A. Kotz, Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, 04103 Leipzig, Germany B. Kreifelts, Department of Psychiatry, University of Tu¨bingen, Osianderstr. 24, 72076 Tu¨bingen, Germany K. Kucharska-Pietura, Whichurch Hospital, Cardiff and Value NHS Trust, Cardiff CF4 7XB, UK P.J. Lang, NIMH Center for the Study of Emotion and Attention, Department of Clinical and Health Psychology, University of Florida, 2800 SW Archer Road, Building 772, Gainesville, FL 32610, USA S. Leiberg, Institute of Medical Psychology and Behavioral Neurobiology, University of Tu¨bingen, MEG Center, Otfried-Mu¨ller-Str 47, 72076 Tu¨bingen, Germany J. Meinhardt, Ludwig-Maximilian University, Munich, Germany M. Meyer, Department of Neuropsychology, Institute for Psychology, University of Zurich, Zurich, Switzerland J.L. Mu¨ller, Department of Psychiatry, Psychotherapy and Psychosomatics, University of Regensburg, Universita¨tstrasse 84, D-93053 Regensburg, Germany S. Paulmann, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, 04103 Leipzig, Germany M.D. Pell, School of Communication Sciences and Disorders, McGill University, Neuropragmatics and Emotion Lab, 1266 Avenue des Pins Ouest, Montr´ eal, QCH3G 1A8, Canada P. Peyk, Department of Psychology, University of Basel, Basel, Switzerland H. Pihan, Department of Neurology, Schulthess Klinik, Lengghalde 2, 8008 Zurich, Switzerland G. Pourtois, Neurology and Imaging of Cognition, Clinic of Neurology and Department of Neuroscience, University Medical Centre, University of Geneva, Geneva, Switzerland and Swiss Center for Affective Sciences, University of Geneva, Switzerland
vii
D. Sabatinelli, NIMH Center for the Study of Emotion and Attention, University of Florida, Building 772 SURGE, 2800 SW Archer Road, Gainesville, FL 32608, USA K.R. Scherer, Swiss Center for Affective Sciences, University of Geneva, 7 rue des Battoirs, 1205 Geneva, Switzerland U. Schroeder, Klinik Holthausen, Am Hagen 20, D-45527 Hattingen, Germany H.T. Schupp, Department of Psychology, Institute of Psychology, University of Konstanz, Universita¨tstrasse 10, 78457 Konstanz, Germany A. Schwaninger, Department Bu¨lthoff, Max Planck Institute for Biological Cybernetics, Spemannstrasse 38, first floor, 72076 Tu¨bingen, Germany J. Schwerdtner, Department of Psychiatry, Psychotherapy and Psychosomatics, University of Regensburg, Universita¨tstrasse 84, D-93053 Regensburg, Germany M. Sommer, Department of Psychiatry, Psychotherapy and Psychosomatics, University of Regensburg, Universita¨tstrasse 84, D-93053 Regensburg, Germany M. Spezio, Division of the Humanities and Social Sciences, California Institute of Technology, Humanities and Social Sciences 228-77, Pasadena, CA 91125, USA J. Stockburger, Department of Psychology, Institute of Psychology, University of Konstanz, Universita¨tstrasse 10, 78457 Konstanz, Germany D.P. Szameitat, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, 04103 Leipzig, Germany H. Tost, Central Institute of Mental Health, J5, Division of Neuroimaging, 68159 Mannheim, Germany P. Vuilleumier, Neurology and Imaging of Cognition, Clinic of Neurology and Department of Neuroscience, University Medical Centre, University of Geneva, Geneva, Switzerland and Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland C. Wallraven, Department Bu¨lthoff, Max Planck Institute for Biological Cybernetics, Spemannstrasse 38, 72076 Tu¨bingen, Germany S. Wiens, Department of Psychology, Stockholm University, Frescati Hagva¨g, 10691 Stockholm, Sweden D. Wildgruber, Department of Psychiatry, University of Tu¨bingen, Osianderstr. 24, 72076 Tu¨bingen, Germany
Preface Common sense and everyday experience imply that feelings, affects, and emotions dominate large parts of our everyday lives and particularly our social interactions. Emotions seem to drive much of our behavior, albeit apparently not always for the good, leading ancient philosophers to reason that we might be better off without emotions and that, perhaps, the purpose of reason could be the control of these obscure corners of our inner lives. In the 17th century, Rene´ Descartes proposed a dualism between body and mind which has guided much of the Western philosophy and science until recently. Perhaps as a Cartesian heritage, affects, emotions, and feelings were all viewed as not lending themselves easily to exact scientific study although Descartes himself viewed emotion, unlike reason, as having a physiological base (in his Traite´ des passions de l’aˆme, 1649, he even argued that the control of the physical expression of emotion would control the emotions themselves, a view which has repercussions even today, but it is rarely traced back to Descartes). Recent developments in the rapidly growing discipline of neuroscience, driven by the enormous methodological and technological advances in neurophysiology, neuroimaging, and computational neurosciences, are heralding a shift of paradigm. Emotions are no longer regarded as too elusive to be approached with scientific methods, and some have proposed that human survival and success depend critically on the functioning of the neural networks that are thought to underlie affect, emotions, and, perhaps, feelings. But what are the results of the scientific study of emotion beyond such increasingly commonplace statements? What are the data that allow us to draw such seemingly simple conclusions? Are these conclusions indeed simple? And, most importantly, what are the implications of emotion research on our understanding of human social interaction and communication? Researchers from all branches of behavioral and physiological sciences are trying to specify the mechanisms that normally guide emotional processing and the malfunctions that may give rise to emotional disorders. The present volume brings many of them together, spanning a wide spectrum of experimental approaches in animals and humans; instrumentations from behavioral testing to neurophysiology and neuroimaging; paradigms from passive, uninstructed stimulus perception to complex social interaction in different processing domains. The common thrive behind all contributions is to elucidate emotion as a social phenomenon that does not affect individuals in isolation, but is rapidly conveyed between individuals, some of these processes being implicit and automatic, others being more explicit and conscious. This volume is the result of a conference held in September 2004 at the Freudental Castle near Konstanz in southwestern Germany. This conference and this volume would not have been possible without the generous support from the Heidelberg Academy of Sciences and Humanities in Heidelberg and the Center for Junior Researchers at the University of Konstanz, which is gratefully acknowledged. May 2006 Silke Anders, Tu¨bingen Gabriele Ende, Mannheim Markus Jungho¨fer, Mu¨nster Johanna Kissler, Konstanz Dirk Wildgruber, Tu¨bingen ix
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 1
Emotion, motivation, and the brain: Reflex foundations in animal and human research Peter J. Lang1, and Michael Davis2 1
NIMH Center for the Study of Emotion and Attention, Department of Clinical and Health Psychology, University of Florida, FL 32610-0165, USA 2 Department of Psychiatry, Behavioral Sciences and Psychology, Emory University, Yerkes National Primate Center and the Center for Behavioral Neuroscience, Atlanta, GA 30322, USA
Abstract: This review will focus on a motivational circuit in the brain, centered on the amygdala, that underlies human emotion. This neural circuitry of appetitive/approach and defensive/avoidance was laid down early in our evolutionary history in primitive cortex, sub-cortex, and mid-brain, to mediate behaviors basic to the survival of individuals and the propagation of genes to coming generations. Thus, events associated with appetitive rewards, or that threaten danger or pain, engage attention and prompt information gathering more so than other input. Motive cues also occasion metabolic arousal, anticipatory responses, and mobilize the organism to prepare for action. Findings are presented from research with animals, elucidating these psychophysiological (e.g., cardiovascular, neuro-humoral) and behavioral (e.g., startle potentiation, ‘‘freezing’’) patterns in emotion, and defining their mediating brain circuits. Parallel results are described from experiments with humans, showing similar activation patterns in brain and body in response to emotion cues, co-varying with participants’ reports of affective valence and increasing emotional arousal. Keywords: fear; startle; amygdala; arousal; conditioning; valence
clear that motivated behavior, prompted by appetitive or aversive cues, can be similar across the mammalian phyla: When a rat first perceives a predatory cat, the rodent’s behavioral and physiological reactions are very much like those of a human who, for example, is abruptly aware of an intruder in her/his home. Both species initially ‘‘freeze,’’ and both show parallel changes in heart rate and respiration. Both release similar chemicals into their blood stream. In each case these events prompt attention to the potential threat and a readying of the body for defensive action. Furthermore, if an animal or a human escapes the danger, brain circuits change: The survivor will have learned to react to tell-tale-signs of the potential predator, and to be aroused and be more
Introduction This essay presents a motivational framework, integrating animal and human data, with the aim of explicating the biological foundation of human emotion. The raw data of emotion are threefold: affective language, behavior, and physiology (Lang, 1985, 1994). The emphasis here is on the latter two data sources — considering emotion’s actions and physiology to be founded on evolved neural circuits that are shared by humans and other animals. We cannot know if animals experience emotion in the same way that humans do. However, it is Corresponding author. Tel.: +1-352 392-2439; Fax: +1-352 392-6047; E-mail:
[email protected]fl.edu DOI: 10.1016/S0079-6123(06)56001-7
3
4
wary. These adaptive patterns of behavior have been carefully preserved in evolution because they are successful in promoting survival. Significantly, it is in anticipation of, or subsequent to, these survival situations that humans report the most intense emotional experiences. Such reports occur when motivational systems (reflexive, goal-relevant behaviors and their determining neural circuits) are activated, but action is delayed, or in the aftermath of such actions, or when signs or signals appear that recall previous encounters. Thus, much can be learned about the biological foundation of expressed emotion by studying how humans and less complex animals confront appetitive or aversive events. Working on a definition of emotion Emotion cannot be operationally defined by a single measure. Emotions involve multiple responses that are variously sequenced in time. Events that are pleasant/appetitive or aversive/threatening initially engage heightened attention. They prompt information gathering, and do so more readily than other less motivationally relevant cues. Motive cues also occasion general metabolic arousal, anticipatory responses that are oriented towards the engaging event, and neuromuscular mobilization of the body for action. These reflex reactions are differently paced and form different patterns, reflecting a changing organismic state as punishment or reward become more imminent, and specific goal-oriented actions are deployed. Reports of emotional experience occur in the context of these response events, or with their inhibition or delay, but correlations with any specific affective report are notoriously modest. In this chapter we will try to explicate what is special about emotional information processing in the brain. We will propose that neural networks underlying expressed emotion include direct connections to the brain’s primary motivational systems, appetitive and defensive. These neural circuits were laid down early in evolutionary history: In primitive cortex, sub-cortex, and mid-brain. They determine the deployment of attentional resources, systemic mobilization, approach, and defensive behaviors, and the formation of conditioned associations fundamental to the survival of individuals.
Affective valence and arousal The motivated behavior of a simple organism such as the flatworm can be almost entirely characterized by two survival movements: Direct approach to appetitive stimuli and withdrawal from aversive stimuli (Schneirla, 1959). These are the only available tools in achieving consummation or escape. This modest repertoire is woefully insufficient, however, for more complex species that must implement many critical sub-goals and cope with a richly perceived sensory environment. Humans have proved to be successful survivors, seeming to surmount niche limitations, adapting readily to a greater variety of environments. Much of this human success can be attributed to language — the ability to communicate, to manipulate symbols in problem solving, and to label and catalog our experience of the world. Although language is not the major focus of this discourse, it is a primary way through which we infer emotional experience in others. Thus, the relationship between emotion’s language and motivationally determined behavior and physiology merits a careful prefatory consideration. The language of emotion includes thousands of words with myriad shades of feeling, degrees of redundancy, and shared meaning. Faced with this plethora, philosophers, psychologists, and psycholinguists have all tried to condense the list into a few primary emotions, or alternatively, to define dimensions of meaning that might underlay this vast vocabulary. The view that affects — subjective reports of experienced emotion — might be organized under a limited number of overarching factors was proposed by Wundt (1896) in the 19th century. Contemporary studies of natural language categories (Shaver et al., 1987; Ortony et al., 1988) suggest that emotional knowledge is hierarchically organized, and that the superordinate division is between positivity (pleasant states: love, joy) and negativity (unpleasant states: anger, sadness, fear). Osgood and his associates (e.g., Osgood et al., 1957), using the semantic differential, showed that emotional descriptors were distributed primarily along a bipolar dimension of affective valence — ranging from attraction and pleasure to aversion and displeasure. A dimension of activation — from calm to aroused — also accounted for
5
substantial variance. Other investigators have drawn similar conclusions from factor analysis of verbal reports (e.g., Mehrabian and Russell, 1974; Tellegen, 1985), and even, of facial expressions (Schlosberg, 1952). No other factors have ever approached the generality and significance of these two simple variables. We should not be too surprised, perhaps, to learn that affective valence and arousal find a parallel in motivational theories based on conditioning research with animals and humans. Konorski (1967), for example, founded a motivational typology of unconditioned reflexes, keyed to the reflex’s survival role. Exteroceptive reflexes were either preservative (e.g., ingestion, copulation, nurture of progeny) or protective (e.g., withdrawal from or rejection of noxious agents). He further suggested that affective states were consistent with this bi-phasic typology: Preservative emotions include such affects as sexual passion, joy, and nurturance; fear and anger are protective affects. Dickinson and Dearing (1979) developed Konorski’s dichotomy into a theory of two opponent motivational systems, aversive and attractive, each activated by a different, but equally wide range of unconditioned stimuli, determining perceptual-motor patterns and the course of learning. In this general view, affective valence is determined by the dominant motive system: The appetitive system (preservative/attractive) prompts positive affect; the defense system (protective/aversive) is the source of negative affect. Affective arousal reflects the ‘‘intensity’’ of motivational mobilization, appetitive or defensive, determined mainly by degree of survival need and the imminence or probability of nociception or appetitive consummation. From this perspective, individual reported emotions would be based on differing, situation-determined action dispositions, as ‘‘fear’’ indicates a disposition to avoid or escape and ‘‘anger’’ is a disposition to attack.
Attention, perception, and emotion For both humans and other animals, the first reaction to any cue is reflexive, directional orientation to the stimulus (Pavlov, 1927). If the input is motivationally irrelevant, this ‘‘orienting reflex’’ rapidly
habituates (Sokolov, 1963). Cues to appetite or aversion, however, lead to systemic adjustments that facilitate sustained perceptual processing. For example, an animal (reptile or mammal) orienting to the appearance of a distant predator shows a profound deceleration in heart rate — ‘‘fear bradycardia’’ — not found in response to other events (Campbell et al., 1997). ‘‘Freezing’’ — a statue-like inhibition of movement — accompanies the change in heart rate, along with increased overall sensory acuity. If the predator approaches (shows stalking behavior), somatic and autonomic activation increases progressively, culminating in defensive action. Humans show similar attention and action readiness when confronted with motivational cues (in life and in the laboratory) responding reflexively even if stimuli are not actual events, but media representations. In fact, stories, pictures, and films all prompt patterns of bodily change in observers that co-vary with the rated affective valence (pleasant or unpleasant) and arousal (intensity) of their emotional experience.
Emotional perception In recent years, the psychophysiology of emotional perception has been systematically studied, using a set of standard photographic picture stimuli, calibrated for affective response. There are currently nearly 1000 pictures in the International Affective Picture System (IAPS — Lang et al., 1999) rated for experienced pleasure and arousal by a large normative subject sample. A representative distribution of IAPS pictures is presented in Fig. 1 (Bradley, 2000) located in a Cartesian space formed by independent dimensions of rated pleasure and arousal. Similar distributions have also been obtained for collections of acoustic stimuli (International Affective Digitized Sounds (IADS): Bradley et al., 1998a) as well as verbal materials (Affective Norms for English Words (ANEW): Bradley et al., 1998b). Studies of IAPS picture stimuli have uncovered highly reliable patterns of physiological and behavioral responses that vary consistently with the factor structure uncovered in studies of emotional language (see Fig. 1: Greenwald et al., 1989, 1998; Bradley et al., 2003). Thus, when affective valence
6
Fig. 1. In the upper right of the figure, pictures from the International Affective Picture System (Lang et al., 1999) are plotted in a twodimensional (Cartesian) space. Picture location is defined by mean ratings of judged pleasure and emotional arousal as reported by a normative sample. The vectors in the upper and lower portions of the affective space describe hypothesized increasing activation levels in appetitive and defensive motivation that co-vary with reported arousal. The other three quadrants of the figure contain graphic representations of the co-variation between physiological reactions to a sample of these picture stimuli and self-ratings of the emotional experience of research participants. Pictures are rank ordered on the abscissa by either each participant’s ratings of affective valence or their ratings of affective arousal. The mean change in corrugator muscle activity (top right) and heart rate (bottom right) are presented across ranked ratings of affective valence (pleasure). Mean skin conductance responses and cortical event-related potentials (bottom right) are plotted as a function of ranked affective arousal rating. The correlations are in all significant cases (po0.01).
ratings are ranked by picture from the most to the least pleasant image, for each subject, facial muscle activity during picture viewing shows a strong monotonic relationship with levels of affective valence: Corrugator (frown) muscle action increases
linearly as pictures are rated as more unpleasant; conversely, zygomatic (smile) muscle activity increases with judged pleasantness. Heart rate is also responsive to differences in affective valence: Unpleasant pictures generally prompt marked
7
deceleration during viewing (recalling the ‘‘fear bradycardia’’ seen in animals), more pronounced than that seen when subjects view pleasant pictures. Other physiological responses vary with changes in rated emotional arousal, rather than affective valence. Skin conductance — a good general index of autonomic activation — increments monotonically with increases in rated arousal, regardless of picture valence. Electroencephalographic measurement (EEG) shows a distinct, voltage-positive cortical response evoked directly by the picture stimuli. This is also positively correlated with stimulus arousal (i.e., it is similarly enhanced for both pleasant and unpleasant arousing pictures: Cuthbert et al., 2000; Keil et al., 2002). These measures appear to index the intensity or activation level of the current motivational state, but are silent about motivational direction (i.e., appetitive or defensive). Behaviors elicited in the context of emotional picture perception (in reaction to secondary stimuli) also covary with motivational engagement. When first exposed to a new picture, reaction time responses to probes are significantly slower for emotionally arousing than for affectively calm pictures (Bradley et al., 1992). These data suggest that new activating images may require more attentional resources at encoding. Furthermore, when
participants control viewing time, they look longer at emotionally arousing pictures, both pleasant and unpleasant, than at neutral pictures (Lang et al., 1993). This latter relationship is not found, however, if pictures evoke very high levels of distress: When phobics view pictures specific to their fear, viewing time is dramatically reduced (see Hamm, et al., 1997). They also show heart rate acceleration (rather than deceleration), consistent with a precipitous increase in defense motivation and mobilization for active escape. As the phobia data imply, relationships between specific measures can vary widely for individuals and to some extent between particular groups. Gender differences, for example, are highly reliable: Pleasantness ratings covary more closely with facial muscle activity in females than in males; on the other hand, skin conductance changes are more closely correlated with arousal ratings in males than in females (Lang et al., 1993). The results of factor analyses of affect self-report, physiological, and behavioral measures are presented in Table 1. The data were obtained from large groups of young, healthy participants. The obtained two-factor solution is clearly very strong: Pleasantness ratings, heart rate, and facial muscles load on a first, affective valence factor; arousal and interest ratings, viewing time, skin conductance, and
Table 1. Factor analyses of measures of emotional picture processing Measure
Factor 1 (Valence)
Factor 2 (Arousal)
Sorted loadings of dependent measures on principal components (Lang et al., 1993) Valence ratings 0.86 Corrugator musclea –0.85 Heart rate 0.79 Zygomatic musclea 0.58 Arousal ratings 0.15 Interest ratings 0.45 Viewing time –0.27 Skin conductance –0.37
–0.00 0.19 –0.14 0.29 0.83 0.77 0.76 0.74
Sorted loadings of dependents measures on principal components (Schupp et al., 1994) Valence ratings 0.89 –0.83 Corrugator musclea Heart rate 0.73 Arousal ratings –0.11 Cortical slow wave –0.06 Skin conductance 0.19
0.07 –0.10 –0.02 0.89 –0.79 0.77
a
Bioelectric potentials from muscles that mediate facial expression.
8
cortical EEG load on a second, affective arousal factor. The cross-loadings for all measures are very low. The data are consistent with the view that reported affective experience is determined in significant part by the individual’s motivational state. That is, negative affective valence (unpleasant experience) is associated with activation of the defense system; positive valence (pleasant feelings) is associated with activation of the appetitive system. Reports of arousal are associated with both states, reflecting an increase in incentive strength and organismic mobilization. The motivational states elicited by these affective cues (and the somatic, cortical, and autonomic substrates of their perception) appear to be fundamentally similar to those occurring when other complex animals ‘‘stop, look, and listen,’’ sifting through the environmental buzz for cues of danger, social meaning, or incentives to appetite.
Neural substrates of emotion: Attention, action and the role of the amygdala Much recent research has shown that a brain area called the amygdala is a crucial structure in a neural network that mediates motivated attention and preparation for action. In man, this almond-shaped structure lies deep in the brain’s temporal lobe. It is composed of several different nuclei that serve different network functions. The basolateral amygdala (Bla), which includes the lateral, basal and basomedial nuclei, is of particular significance, as it receives information from the thalamus, hippocampus, and cerebral cortex (see McDonald, 1998), and then projects (Fig. 2) to other amygdala nuclei, as well as to targets elsewhere in the brain relevant to emotional memory and action. The Bla’s projections to the central nucleus of the amygdala (CeA)
Fig. 2. Schematic diagram of the outputs of the basolateral nucleus of the amygdala to various target structures, and the subsequent outputs and targets of the amygdala’s central nucleus (CeA) and the lateral basal nucleus of the stria terminalis (BNST: The ‘‘extended’’ amygdala). Known and possible functions of these connections are briefly described.
9
and the bed nucleus of the stria terminalis (BNST; or ‘‘extended amygdala’’) are relayed from these sites to specific hypothalamic and brainstem target areas that mediate most of the visceral and striate muscle events that index emotional processing. Their projections target lateral hypothalamus — a key center activating the sympathetic branch of the autonomic nervous system in emotion (LeDoux et al., 1988). In addition, direct projections from the BNST go to the dorsal motor nucleus of the vagus, the nucleus of the solitary tract, and the ventrolateral medulla. These brainstem nuclei are known to regulate heart rate and blood pressure (Schwaber et al., 1982), and may thus modulate cardiovascular responses in emotion. Projections to the parabrachial nucleus are likely to be involved in emotion’s respiratory changes (with additional, perhaps indirect effects on the cardiovascular system), as electrical stimulation and lesions of this nucleus alter breathing patterns. Finally, indirect projections from the amygdala’s central nucleus to the paraventricular nucleus (via the BNST and preoptic area) may mediate neuroendocrine responses that are particularly prominent when emotional stimuli are sustained.
vigilance and superior signal detection found in the attentional phase of emotional processing. As already noted, sensory orientation to threat in mammals and reptiles is accompanied by a profound decrease in heart rate (‘‘fear bradycardia’’: Campbell et al., 1997). Heart rate decrease is associated with attention in humans (Graham and Clifton, 1966) and furthermore, a greater deceleration is generally found in response to stimuli judged to be unpleasant (Bradley, 2000; Lang et al., 1993). Several lines of research suggest that this cardiac response can be mediated by the central nucleus of the amygdala. During Pavlovian aversive conditioning in rabbits, one sees a rapid development of conditioned bradycardia. Pascoe and Kapp (1985) found a high correlation (.71) between the firing frequency of individual neurons in the amygdala’s central nucleus and extent of heart rate deceleration to a conditioned stimulus. Furthermore, the central nucleus of the amygdala could have indirect, but widespread, effects on cortical activity — mediated by projections to cholinergic neurons that in turn project to the cortex (Kapp et al., 1992). This path may account for changes in the EEG waveform, perhaps associated with enhanced sensory processing, acquired during Pavlovian aversive conditioning at the same rate as conditioned bradycardia.
Attention, vigilance, and conditioned fear During emotional stimulation, projections from the central nucleus or BNST to the ventral tegmental area appear to mediate increases in dopamine metabolites in the prefrontal cortex (Goldstein et al., 1996). Cells in the locus coeruleus, which release norepinephrine into the brain, are also activated, perhaps mediated by projections to its dendritic field, or indirectly, via projections to the paragigantocellularis nucleus (Redmond, 1977; AstonJones et al., 1996). Furthermore, there are direct projections to the lateral dorsal tegmental nucleus and parabrachial nuclei. These latter nuclei have cholinergic neurons that project to the thalamus and could mediate increased synaptic transmission of its sensory relay neurons. The sensory thalamus is, of course, a primary processor of environmental input. Thus, this sequence of projections, by augmenting cholinergic activation and facilitating thalamic transmission, may contribute to the increased
Motor behavior Emotion’s attentional phase is characterized by immobility, ‘‘freezing,’’ mediated in the rat by amygdala (CeA) projections to the ventral periacqueductal gray. In the action phase, projections to the dorsal periacqueductal gray appear to mediate fight/flight responses (Fanselow, 1991). As norepinephrine and serotonin facilitate excitation of motor neurons (McCall and Aghajanian, 1979; White and Neuman, 1980), rapid defensive action could be mediated by lateral BNST activation of norepinephrine release in the locus coeruleus or via its projections to serotonin containing raphe neurons. Amygdala stimulation Electrical stimulation of the amygdala, or a bnormal electrical activation via temporal-lobe
10
seizures, produces emotion-like behavioral and autonomic changes (probably activating targets seen in Fig. 4) that humans generally describe as an experience of fear or apprehension (Chapman et al., 1954; Gloor et al., 1981). In animals, electrical or chemical stimulation of the amygdala can also produce prominent cardiovascular effects, and persistent stimulation may produce gastric ulceration, increases in blood levels of cortisol and epinephrine, and sustained changes in respiration. Stimulation of these same CeA sites can also produce responses associated with attention — both bradycardia (Kapp et al., 1990) and lowvoltage fast EEG activity in rabbits (Kapp et al., 1994) and in rats (Dringenberg and Vanderwolf, 1996). Furthermore, depending on the state of sleep, electrical stimulation of the amygdala in some species activates cholinergic cells involved in arousal-like effects. Overall, the orienting reflex has been described as the most common response elicited by electrical stimulation of the amygdala (Ursin and Kaada, 1960; Applegate et al., 1983). In many species, electrical or chemical stimulation of the amygdala prompts cessation of ongoing behavior, facilitating the sensory orienting critical to the attentional phase of emotion. It is associated with ‘‘freezing’’ in rats and by the cessation of operant bar pressing. Electrical stimulation also activates facial motoneurons, eliciting jaw movements, and may be the pathway mediating facial expressed emotion. The broad effects of amygadala stimulation on the motor system include modulating brainstem reflexes, such as the massenteric, baroreceptor nictitating membrane, eyeblink, and startle reflexes. In summary, it is assumed that the above broad network of amygdala connections shown by the stimulation data are already formed in an adult organism, given that these effects are produced in the absence of explicit, prior learning. This suggests that the behavioral pattern evoked by emotional stimuli is, in significant part, ‘‘hard wired’’ during evolution. Thus, it is only necessary that an initially neutral stimulus activate the amygdala — in association, for example, with an aversive event — for this formerly neutral cue to then produce the full constellation of emotional effects.
Amygdala lesions and drug infusion Assuming the emotion circuit to be a ‘‘hard wired’’ set of connections, destruction of the amygdala is expected to disrupt or eliminate emotion’s sensory processing and motor output. Various investigators have provided data in support of this hypothesis, showing, for example, that lesions of the amygdala block attentional responses to stimuli paired with food (cf. Gallagher et al., 1990), and in general, fail to benefit from procedures normally facilitating attention to conditioned stimuli (Holland and Gallagher, 1993a, b). Other research suggests that the central nucleus and the basolateral nucleus may make different contributions to the processing of emotional stimuli: In odor-aversion learning, rats develop aversions to novel odors associated with illness, but only if the odor is part of a compound stimulus that includes a distinctive taste. Electrolytic (Bermudez-Rattoni et al., 1986) or chemical lesions (Hatfield et al., 1992) of the basolateral nucleus — but not the CeA — block such odor-aversion learning, but do not impede taste-aversion learning. Local infusion of N-methyl-D-aspartate (NMDA) antagonists into the Bla has a similar selective effect (Hatfield and Gallagher, 1995). Thus, it has been suggested that, whereas the central nucleus ‘‘regulates attentional processing of cues during associative conditioning’’ (Hatfield et al., 1996, p. 5265), the Bla is critically involved in ‘‘associative learning processes that give conditioned stimuli access to the motivation value of their associated unconditioned stimuli’’(Hatfield et al., 1996, p. 5264). The two nuclei appear to work in concert: The motivational significance of input is mediated by amygdala’s basolateral nucleus and the central nucleus maintains the relevant cue as a focus of attention.
Conditioned emotional states A large literature indicates that amygdala lesions block many measures used to assess conditioned and unconditioned fear (cf. Davis, 2000), including changes in heart rate, blood pressure, ulcers, and respiration; secretion of ACTH or corticosteroids
11
into the blood; and release of dopamine, norepinephrine, or serotonin in certain brain areas. They also block behavioral measures: Freezing, fear-potentiated startle, and vocalization, as well as operant conflict test performance, conditioned emotional responses, and shock avoidance. Furthermore, lesions of the amygdala cause a general taming effect in many species (Goddard, 1964) perhaps analogous to the increase in trust found in humans following surgical amygdala lesions (Adolphs et al., 1998). Effects of local drug infusion The intensity of fear is determined by the interplay of a variety of brain chemicals, many acting directly in the amygdala. Fear is reduced when GABA (a major inhibitory neurotransmitter), or GABA agonists, benzodiazepines (e.g., valium) are infused into the amygdala. Drugs that decrease excitatory transmission in the amygdala, such as glutamate antagonists, have similar actions. Local infusion studies have also evaluated chemical compounds that increase fear. These include GABA antagonists, or peptides such as corticotrophin releasing hormone (CRH), cholecystokinin (CCK), vasopressin, thyroid-releasing hormone (TRH), and opiate antagonists. Extensive tables giving references, substances used, sites of administration (CeA, Bla, etc.), and the effects can be found in Davis (2000). Approach behavior Research concerned with the amygdala’s role in conditioned appetitive approach (Everitt et al., 2000) has emphasized a projection from the central nucleus to dopamine neurons in the ventral tegmental area that then projects to the nucleus accumbens. In studies of ‘‘autoshaping,’’ a light (CS+) is followed by food reward, delivered in various locations. Another stimulus (CS) is presented but never followed by food. Under these conditions, rats learn to approach the CS+ light before going to the food hopper to retrieve the food. Bilateral lesions of the central nucleus, but not lesions of the
Bla, markedly disrupt this behavior (Parkinson et al., 2000a). This effect is apparently mediated by CeA’s projection to dopamine containing neurons in the ventral tegmental area, releasing dopamine in the nucleus accumbens (given that depletion of dopamine in the nucleus accumbens core prevents acquisition of autoshaping; Parkinson et al., 1998). In contrast, approach behavior itself seems to be mediated by the anterior cingulate cortex (Bussey et al., 1997) via projections to the nucleus accumbens core (Parkinson et al., 2000b). In another series of experiments, rats first trained to associate an auditory conditioned stimulus (CS) with delivery of food were subsequently trained to press a lever to obtain food. Later, presentation of the original auditory CS increased lever pressing. Lesions of the central nucleus of the amygdala (not the Bla) reduced this facilitatory effect — mediated, the authors speculated (Everitt et al., 2000), by projections from the CeA to the mesolimbic dopamine system. Opposite effects have been reported, however, in a similar paradigm measuring actual consumption of food (Gallagher, 2000). In this case, projections from the posterior division of the basolateral amygdala to the hypothalamus are thought to be involved. Extra-amygdalar projections of the basolateral nucleus As illustrated in Fig. 2, connections between the amygdala’s central nucleus and the BNST are critically involved in many of emotion’s autonomic and motor responses. However, projections from the basolateral nucleus to other target areas in the brain are also significant for emotional behavior. The ventral striatum pathway: Secondary reinforcement The Bla projects directly to the nucleus accumbens in the ventral striatum (McDonald, 1991), in close apposition to dopamine terminals of A10 cell bodies in the ventral tegmental area (cf. Everitt and Robbins, 1992). The ventral striatum, Morgenson (1987) suggests, is where affective processes in the limbic forebrain gain access to the subcortical part
12
of the motor system that results in appetitive actions (cf. Morgenson, 1987). Projections from the Bla to the nucleus accumbens are critically involved in secondary reinforcement. In a relevant paradigm: A light is paired with food, after which the experimental animals are presented with two levers. One lever turns on the light, pressing the other does not. Normal rats press the ‘‘light’’ lever much more than the other lever. Hence, light is a secondary reinforcer, as it prompts new behavior via its prior association with food. Rats with Bla lesions fail to learn this discrimination, whereas rats with lesions of the CeA do (Cador et al., 1989; Burns et al., 1993). Connections between the Bla and the ventral striatum also are involved in conditioned place preference (Everitt et al., 1991). Nevertheless, other research suggests that the central nucleus of the amygdala also has an important modulatory role in secondary reinforcement paradigms. Drugs like amphetamine, that release dopamine, increase lever pressing to cues previously associated with food. The effect occurs with local infusion of amphetamine into the nucleus accumbens (Taylor and Robbins, 1984), but is blocked when 6-OHDA prompts local dopamine depletion (Taylor and Robbins, 1986). However, 6-OHDA does not block the conditioned reinforcement itself, which is consistent with the idea that the reinforcement signal comes from some other brain area, such as the Bla, that projects to the nucleus accumbens. These results suggest that two relatively independent processes operate during conditioned reinforcement. First, information from the amygdala concerning the CS–US (US — unconditioned stimulus) association is sent to the nucleus accumbens to control instrumental behavior as a conditioned reinforcer. Second, dopamine in the nucleus accumbens modulates this instrumental behavior. The central nucleus of the amygdala, via its projections to the mesolimbic dopamine system, seems to be critical for this invigorating or arousing effect of dopamine. Thus, lesions of the central nucleus block the increase in bar pressing normally produced by infusion of amphetamine into the nucleus accumbens (Robledo et al., 1996), probably by preventing dopamine in the nucleus accumbens shell (Everitt et al., 2000).
The dorsal striatum pathway The amygdala modulates memory in a variety of tasks such as inhibitory avoidance, motor or spatial learning ( McGaugh et al., 1992, 1993; Packard et al., 1994; Cahill and McGaugh, 1998; Packard and Teather, 1998). Thus, post-training intercaudate injections and intra-hippocampal infusion of amphetamine have task-specific effects on memory — intercaudate injections enhances memory in a visible-platform water maze task, but has no effect when the platform is hidden (Packard et al., 1994; Packard and Teather, 1998); the hippocampal infusion result is the opposite: Enhanced memory for the hidden platform task and no advantage for the visible platform. However, post-training intraamygdala injections of amphetamine enhance memory in both water maze tasks (Packard et al., 1994; Packard and Teather, 1998), suggesting that the amygdala may have broad influence, modulating both the hippocampal and caudate–putamen memory systems. Perhaps similarly, lesions of the central nucleus block freezing but do not escape to a tone previously paired with shock, whereas lesions of the basal nucleus of the basolateral complex have just the opposite effect (Amorapanth et al., 2000). However, lesions of the lateral nucleus block both freezing and escape. Lesions of the Bla (but not the CeA) also block avoidance of a bar associated with shock (Killcross et al., 1997), suggesting that basolateral outputs to the dorsal or the ventral striatum may be important in escape or avoidance learning.
Projections to the cortex Primate research shows that the basal nucleus of the amygdala projects to several areas in the inferior temporal cortex, continuing into prestriate and striate areas of the occipital lobe (Amaral and Price, 1984; Iwai and Yukie, 1987). Furthermore, the lateral nucleus of the amygdala gets input from an adjacent site in the visual system (TEO), which in turn receives hierarchical projections from the several nuclei along the ventral visual stream, extending to the retinal mapping area of the calcarine fissure. These projections could potentially ‘‘close
13
the loop’’ with the visual system (Amaral et al., 1992) representing an amygdala feedback circuit that may be significant for the sustained perceptual evaluation seen in the early stages of emotional processing. Following Pavlovian conditioning, presentation of a conditioned stimulus appears to elicit some neural representation of the US with which it was paired — as the sound of an electric can opener might elicit a representation of food and signal approach behavior in the family cat. On the basis of a procedure called ‘‘US devaluation,’’ several studies suggest that the basolateral amygdala — perhaps via connections with cortical areas such as the perirhinal cortex (cf. Gewirtz and Davis, 1998) — is critical for retaining these US representations (e.g., Hatfield et al., 1996). Second-order conditioning also depends on a US representation elicited by a CS. Again, lesions of the Bla, but not the CeA, block second-order conditioning (Everitt et al., 1989, 1991; Hatfield et al., 1996). This same effect occurs with local infusions of NMDA antagonists into the basolateral nucleus of the amygdala (Gewirtz and Davis, 1997). Converging evidence also now suggests that the connection between the basolateral nucleus and the prefrontal cortex is critically involved in the way in which a representation of a US (e.g., very good, somewhat good, somewhat bad, very bad) guides approach or avoidance behavior. Analogous to the animal data, patients with lesions of the orbital regions of the prefrontal cortex frequently ignore important information that could usefully guide their actions and decision-making (Damasio, 1994; Bechara et al., 1997; Anderson et al., 1999). Studies using single-unit recording techniques in rats indicate that cells in both the Bla and the orbitofrontal cortex fire differentially to an odor, depending on whether the odor predicts a positive (e.g., sucrose) or negative (e.g., quinine) US. These differential responses emerge before the development of consistent approach or avoidance behavior elicited by that odor (Schoenbaum et al.,1998). Many cells in the Bla reverse their firing pattern during reversal training (i.e., the cue that used to predict sucrose now predicts quinine and vice versa — Schoenbaum et al., 1999), although this has not always been observed (e.g., Sanghera et al., 1979). In contrast, many fewer cells in the orbitofrontal
cortex showed selectivity before the behavioral criterion was reached and many fewer reversed their selectivity during reversal training (Schoenbaum et al., 1999). These investigators suggest that cells in the Bla encode the associative significance of cues, whereas cells in the orbitofrontal cortex are active when that information, relayed from the Bla, is required to guide motivated choice behavior, presumably via to both the motor cortex and to the dorsal striatum. The significance of Bla and frontal cortex in US representation and in guiding motivated (and choice) behavior is also supported by lesion studies with rhesus monkeys (Baxter et al., 2000). That is, when both areas were lesioned, monkeys continued to approach a cue associated with food on which they had recently been satiated; whereas, control monkeys showed appropriate choice behavior and consistently switched to a new cue. Studies of the amygdala in humans In the past, the primary source of data on human amygdala functioning was based on the behavior of patients with temporal-lobe lesions — either from accident, disease, or surgical treatment. Overall, these data suggest that the consequences are emotional deficits in emotion perception (e.g., Adolphs et al., 1994; Adolphs and Tranel, 1999) and expression (e.g., Lee et al., 1998). However, these findings are considerably less consistent and specific than those available from studies of experimentally lesioned animals. More recently, however, the advent of neural imaging techniques, Positron emission tomography (PET), and functional magnetic resonance imaging (fMRI), has opened a new, non-invasive window for the study of regional changes in the human brain. Brain imaging Functional MRI and PET do not directly measure neural action, but rather, measure enhanced blood flow in the capillaries of the cerebral parenchyma. This effect is, however, a reliable vascular sequel to regional neural firing. Thus, the method can be used to assess functional anatomy mediating language,
14
reflexes (autonomic and somatic), and behavioral actions that are emotion’s output. As already noted, appetitive and threatening stimuli capture attention and appear to accentuate processing in primary sensory areas. Primate research indicates, furthermore, that the amygdala projects to occipital and ventral temporal processing areas of the visual system (Amaral et al., 1992). To evaluate emotional processing in the visual system, Lane et al. (1997) using PET and Lang et al. (1998) using fMRI presented evocative picture stimuli (IAPS) to normal subjects and recorded blood flow changes in the caudal cortex. Compared to affectively neutral pictures, participants showed dramatically larger areas of activation (in lateral occipital, posterior parietal, and inferior temporal cortex) for pictures with emotionally evocative content. Subsequent fMRI research (Bradley et al., 2003; Sabatinelli et al., 2004) has further shown that activation in these secondary-processing areas of the visual system progressively increases, covarying monotonically, with the judged emotional arousal of picture stimuli. The significance of heightened emotional/motivational intensity is also highlighted by PET research with phobic participants (Fredrikson et al., 1993, 1995), showing enhanced occipital activation when viewing pictures of specific fear objects. Consistent with the animal experiments, neuroimaging researchers have generally reported amygdala activation when human participants process emotional stimuli. Thus, Pavlovian fearconditioned cues prompt increased fMRI amygdala signal (cf. Davis and Whalen, 2001). Irwin et al. (1996) reported that fMRI signal intensity is also greater in the amygdala when subjects view graphic photographs of unpleasant material (e.g., mutilated human bodies), compared with neutral pictures. Comparable increases in amygdala activity to emotion-evoking visual input have also been found with PET (Lane et al., 1997; Reiman et al., 1997); and Cahill et al. (1996) reported a relationship between emotion-induced amygdala activation and recall of the visual content. In more recent research, Sabatinelli et al. (2005) found that activation of the amygdala covaried closely with the activation in the inferotemporal, visual processing area. Furthermore, both sites increased
in activation with rated level of increased emotional arousal of picture content. Finally, both sites showed heightened activation when phobics viewed pictures of feared stimuli, relative the response of non-fearful participants (see Fig. 3). There are several studies suggesting that pictures of emotional faces engage the amygdala, but the findings are difficult to interpret. That is, face stimuli generally do not arouse strong emotion (as defined by autonomic activation or verbal report), and facial expressions vary in communicative clarity (e.g., fear faces are less reliably distinguished than angry faces). One view holds that the amygdala functions as an initial stimulus discriminator that screens for motivational significance, but subsides when the stimulus is resolved (e.g., Whalen, 1998). Interestingly, in both non-human and human subjects, several amygdala-mediated responses (Applegate et al., 1983; Whalen and Kapp, 1991) reach their peak early during conditioning and subside thereafter (Schneiderman, 1972; Masur et al., 1974; Weisz and McInerney, 1990; also see Kapp et al., 1990). Moreover, when stimulus contingencies change (e.g., when a CS is suddenly not followed by shock at the beginning of extinction), there is a re-emergence of single-unit activity in the lateral amygdala nucleus in rats (Quirk et al., 1995). Under analogous conditions, humans show a resurgence of amygdalar blood flow (LaBar et al., 1998). Considering the animal research, one could speculate that blood flow occurring ‘‘in the region of the amygdala’’ during face discrimination might indicate Bla activation, and not coincident activation of the CeA (with implications for autonomic reflexes and an ‘‘experience’’ of emotion). Unfortunately, the spatial resolution of our current imaging tools are not yet sufficient to reliably discriminate individual nuclei.
Emotional arousal: Physiology and cognition From the perspective of the animal model presented here, input to the amygdala’s Bla begins the sequence of neural events that underlay emotion, namely orienting and attention, approach, and defensive behaviors such as avoidance. Bla outputs to the CeA and the extended amygdala appear to
15
Fig. 3. Effects of emotional intensity and fear relevance on amygdala and inferior temporal regions of interest (ROIs) as measured with functional magnetic resonance imaging (fMRI: From Sabatinelli et al., 2005). Areas of functional activity during picture processing (superimposed on the average brain) are present in the right panel. Average standardized blood oxygen level dependent (BOLD) signal change in the control group (left panel, top) and the snake-fearful group (left, bottom) are presented for the amygdala and inferotemporal cortex, as these signals varied over different picture content categories. Error bars represent Standard Errors of the Mean. The picture types are ordered on the abscissas (left to right) as they increase in average affective arousal ratings (International Affective Picture System standardization groups: Lang et al., 1999).
be critical in the increased processing of emotionally significant stimuli, be they pleasant or aversive. Outputs from the CeA and BNST in turn mediate many of the autonomic and somatic components of overt action. Direct output to the dorsal striatum, or indirect output via the orbital frontal cortex, appear to be involved in the actual avoidance response. Furthermore, outputs from
Bla to the ventral striatum as well as the orbitofrontal cortex are also likely contributors to the execution of approach and choice behavior. The above circuitry constitutes a motivational system that is finely tuned to the probability that events will require survival action, e.g., that a remote threat will become an imminent danger or that a sexual provocation will likely to lead to
16
pleasant consummation. In animals, increasing imminence prompts a more general mobilization of the organism, mediated by various neurotransmitters such as acetylcholine, dopamine, norepinephrine, as well as many peptides such as CRH. These substances act either within the amygdala or at various central target areas to facilitate neural transmission, and they are associated with increasing intensity of appetitive or defensive motivation (and roughly correlated with arousal reports in humans). How might the above model relate to the experience of an emotion? Or from a neuro-chemical perspective: How does looking at a picture, thinking pleasant or unpleasant thoughts, or remembering a painful experience lead to a release of acetylcholine, dopamine, norepinephrine, or corticotropin releasing hormone? Considering the animal research, we know that the CeA and the BNST have direct connections to the neurons in the brainstem that release acetylcholine, dopamine, norepinephrine and to neurons in the basal forebrain that release ACH. Electrical stimulation of the amygdala has been shown to increase cell firing in many of these neuronal groups. Activation of cells in the brainstem that release norepinephrine and epinephrine are especially important in modulating memory in many different brain areas via activation of norepinephrine receptors in the Bla (McGaugh, 2004). Activation of acetylcholine receptors is also critical for memory formation (Gold, 2003; Power et al., 2003). In addition, cells in the lateral division of the amygdala’s central nucleus send CRH containing terminals to the BNST (Sakanaka et al., 1986), where many of the actions of CRH may actually be mediated (cf. Walker et al., 2003). Thus, more arousing images and thoughts could activate more cells in amygdala that automatically lead to a release of these neurochemicals, helping to stamp in these images for later recall.
Cognitive networks We can roughly describe the neural structure and chemistry of emotional arousal in the context of imminent nociceptive stimulation or in anticipation
of consummatory pleasures. We can consider this physiology in the context of conditioning procedures, explaining links between behavior and simple lights and tones, invoking concepts such as contiguity, primary reinforcement, or generalization. However, we understand less how the neurophysiology and chemistry of arousal connects to human experience and to the wider information processing functions of the brain. Although it is possible that some emotional stimuli might be ‘‘hard wired’’ (Seligman, 1970; O¨hman and Mineka, 2001) for association to avoidance or approach, most affective cues are indirectly learned, remote associations, complexly connected to memories of appetite or aversion, and reflect the unique reactional biographies of individuals. Cognitive psychologists (e.g., Anderson and Bower, 1974; Kintsch, 1974) suggested that our knowledge about events is represented in networks of representations, linked by laws of association, and instantiated by stimuli that match elements in the network. Lang (1979, 1994) suggested that emotional memories are networks of this type. For example, the fear network of a snake-fearful individual would include information about the stimulus (a coiled snake, but also the snake’s skin — as in a belt, a rustling sound), the stimulus context (swimming at the lake, alone in the woods), physiology (changes in heart rate and the vasomotor system, respiration, and behavior [running away]), and related interpretive elaborations (snakes are slimy; snakes can kill). When enough input cues match units in the network, activity in one unit is transmitted to adjacent units, and depending on the strength of activation, the entire memory structure may be engaged. The probability of network processing is likely to be increased with the number of units initially stimulated. The cognitive network is presumed to overlay a neural network — perhaps, an organization of Hebbian cell assemblies (Hebb, 1949). Of course, only a fraction of its representational units would have higher level, language representation that — passing through awareness — are the formative stuff of affective reports. How do emotional networks differ from other knowledge structures in the brain? It is proposed that emotional networks are singular because they
17
include associative connections to the primary motivational systems in the brain that are the focus of this discourse. In brief, reciprocal projections from cortex to the amygdala circuits engage the somatic and autonomic reflexes that evolved to ensure the survival of individuals and species. Levels of activation It may be that stimuli evoke differences in physiological arousal because networks activate different numbers of cells in the amygdala depending on the associative history of those stimuli. In fact, most stimuli or situations that produce an emotional reaction do so by virtue of prior conditioning. Monkeys reared in the lab, where serpents are not normally encountered, are generally not afraid of snakes, when compared to monkeys raised in the wild (Mineka et al., 1984). A baby with its fingers in a light socket does not feel afraid when the light switch is turned on, whereas a child who was once shocked in a similar situation will be terrified. After this association is formed, putting a finger in a socket may be presumed to engage many cells in the amygdala, leading to a large release of neurochemicals and strong activation of the defense motivation system. Similar but perhaps less robust amygdala activations are assumed to occur whenever emotion networks are activated — by distressing thoughts, imagery, and when viewing emotionally evocative media (e.g., pictures, written or spoken text). For example, just telling people they might get a painful shock is enough to increase blood flow in the amgydala (Phelps et al., 2001). EEG measures of network activation The EEG is recorded from electrodes on the surface of the scalp that transmit the brain’s neural, bioelectric activity in real time. There is now considerable evidence that emotional networks, activated by picture stimuli, prompt distinct EEG, event-related potential (ERP) waveforms. The most reliable ERP is a slow positive-going voltage, typically beginning 400 ms after picture onset and continuing until offset (Cacioppo et al., 1994; Keil et al., 2002; Schupp et al., 2004). The amplitude of this waveform, or late
positive potential, is systematically related to participants’ ratings of their experience of emotional arousal (Cuthbert et al., 2000). Sabatinelli et al. (in press, Fig. 4) have shown, furthermore, that this EEG response to emotional pictures is highly correlated with an increased fMRI blood oxygen level dependent (BOLD) signal in visual cortex (extrastriate occipital, parietal, and inferior temporal), when this fMRI response is prompted by the same picture stimuli. Thus, it is suggested that both the BOLD response and the late positive ERP results from activity in a common neuronal pool in the posterior cortex that determines the semantic meaning of visual input. In this case, the positive voltage ERP reflects volume conduction to the scalp of bio-electric changes associated with actual neural-cell firing, while the delayed fMRI response is the replacement flow of oxygenated blood subsequent to metabolism in this same cell complex. As already noted, emotional BOLD activation in sensory cortex (particularly, inferotemporal cortex) has been shown, in turn, to correlate highly with activation in the amygdala (Sabatinelli et al., 2005). Taken together, these findings suggest the late positive ERP is a valid indicator of emotional network processing. From a cognitive perspective, emotional arousal occurs when a stimulus activates a matching representation in an emotional network. The immediacy and intensity of the affective reaction depends on the general associative strength of the network, and specifically, on the strength of connections to the amygdala circuit. In humans, this net is broadly cast and affects can be prompted by representations that are not readily discriminated linguistically (and are therefore, outside awareness). Thus, many different stimuli, varying across individuals, can prompt an amygdala-dependent release of neurochemicals, with a potentially widespread modulation of sensory and motor systems (and in humans, reports of increasing ‘‘arousal’’).
The startle reflex and emotional priming From an evolutionary perspective, human emotions can be considered dispositions to action — as the experience of anger is a throwback to attack,
18
Fig. 4. The event-related potential (ERP) scalp topography obtained from the electroencephalograph (EEG: upper left) and the fMRI BOLD contrast overlay (lower left) are compared for pleasant, neutral, and unpleasant picture presentations. The ERP map represents average microvolt change from 400 to 900 ms after picture onset, with red indicating postive and blue indicating negative voltage. The BOLD contrast overlay, coronal slice y ¼ 68, represents a random effects analysis of picture elicited activity, peaking roughly 8 s after picture onset, with red indicating more reliable increases in oxygenated blood flow and yellow indicating a threshold of po.000000001. Time-course waveform of the ERP and the fMRI BOLD response prompted by pleasant, neutral, and unpleasant picture processing are presented in the right panel. The ERP waveform represents the average of 27 centro-parietal sensors; the BOLD waveform represents the average of activity in inferotemporal, lateral occipital, and parietal regions of interest. The arrow at time zero on the abscissas indicates picture onset. The correlations between the late positive potential (400–900 s) and BOLD signal were 0.60 for the lateral occipital cortex, 0.51 for the parietal cortex, and 0.52 for the inferotemporal cortex (all p values less than 0.001: Adapted from Sabatinelli et al., in press).
or fear is an evolved disposition to flee. In humans, the ‘‘emotion’’ generally occurs in a context of inhibition or delay, and the primed action may or may not ultimately occur. In contrast, affect is not easily inferred in more primitive animals, as the overt response — attack or flight — is immediate and automatic. A few of these automatic survival responses have been preserved in humans — notably, the startle reflex. Any abrupt sensory event will prompt a startle response, a chained series of rapid extensor-flexor movements that cascade throughout the body (Landis and Hunt, 1939). This reaction is a defensive reflex, facilitating escape in many species (e.g., the hermit crab; Elwood et al., 1998), and it may still serve a protective function in mammals (i.e., in avoiding organ injury as in the eyeblink, or in retraction of the head and torso in the full body startle reflex to avoid attack from above; Li and Yoemans, 1999). Abruptness is the key to startle elicitation: When effective, the rise time of the eliciting stimulus is perceptually instantaneous. In
human subjects, sudden closure of the eyelids is one of the first, fastest (occurring within 25 to 40 ms after startle stimulus onset), and most stable elements in the reflex sequence. It is the preferred measure of startle in humans. In rats, whole body startle is measured in specially designed cages. Although, when startled mammals typically do not show actual escape, the amplitude of the startle reflex is augmented when they are under threat of pain or predation. The first laboratory demonstration of this was reported by Brown et al. (1951), who showed that the amplitude of the acoustically elicited startle reflex in rats was increased when elicited in the presence of a light previously paired with footshock. This effect is considered an animal model of fear or anxiety because drugs like diazepam, which reduce fear or anxiety in humans, block the increase in startle in the presence of the conditioned light stimulus but do not affect the basic startle reflex itself (see Davis et al., 1993, for a review). In contrast, during an appetitive state, the startle reflex appears to be
19
partially inhibited, i.e., startle amplitude is reduced when elicited in the presence of a light previously paired with food (Schmid et al., 1995) or rewarding electrical brain stimulation (Yeomans et al., 2000). These effects are very like what cognitive psychologists call ‘‘priming’’ in research on human associative behavior. Cognitive priming occurs when a prior stimulus raises the activation level of an associated stimulus–response event. For example, the prime ‘‘bread’’ prompts a faster reaction time response to the word ‘‘butter.’’ States of the organism may also prime particular behavior. Thus, clinically depressed individuals respond to nearly all cues with associations that are affectively negative. The potentiated startle observed in animal conditioning can be understood as an instance of motivational state priming. That is, the induced defensive state of the organism primes (augments) an independently instigated reflex that is connected to the defense system, i.e., the startle response. According to the motivational priming hypothesis (Lang et al., 1990, 1997), the defensive startle reflex will be of significantly greater amplitude (and faster) when a prior stimulus has induced a consonant, defensive motivational state. Alternatively, if the appetitive system has been activated, as when a pleasant stimulus is perceived, the defensive startle reflex will undergo a reciprocal attenuation. Thus, the startle reflex can serve as a non-volitional, objective measure of affective valence; and as will be seen, it is a method that solidly links animal neuroscience research and human psychophysiology.
Startle modulation in humans Like rats, human subjects show elevated startle amplitude in the presence of cues previously paired with shock (Hamm et al., 1993; Lipp et al., 1994; Grillon and Davis, 1997) or simply when they are told they might receive a shock (Grillon et al., 1991, 1993). However, this potentiation phenomenon is not restricted to anticipated pain; rather, the probe-startle reaction is modulated (up or down) in almost any emotion evoking, perceptual context (Lang et al., 1990).
When startle probes are administered while subjects view pictures that vary systematically in emotional valence, results have consistently conformed to the motivational priming hypothesis: The probe-startle reflex is inhibited when participants view pleasant stimuli and is potentiated when pictures were judged to be unpleasant (Vrana et al., 1988; Lang et al., 1990; Lang, 1995; see Bradley, 2000 for a review). These emotion-perceptual effects have also been reported in five-month old infants, viewing smiling, neutral, and angry faces (Balaban, 1995). Affective modulation of startle is observed for picture stimuli regardless of whether the startle probe is visual, acoustic, or tactile (e.g., Bradley et al., 1990; Hawk and Cook, 1997), suggesting that the effect is not modality specific. Furthermore, affective modulation is not confined to visual percepts: When the foreground stimuli consist of short, 6 s sound clips of various affective events (e.g., sounds of love making, babies crying, bombs bursting), and the startle probe is a visual light flash, the same affective modulation of the probe reflex is observed (Bradley et al., 1994). Researchers have also found startle potentiation in subjects smelling unpleasant odors (Miltner et al., 1994; Ehrlichman et al., 1995).
Attention and arousal Consistent with the motivational priming hypothesis, modulatory effects on the startle reflex appear to increase with greater activation of a motive system. However, the direction of effect is opposite for appetitive and defensive activation: The most arousing pleasant pictures (e.g., romantic and erotic couples) prompt the greatest startle inhibition (Cuthbert et al., 1996). Conversely, when unpleasant arousing pictures are probed, the startle reflex is strongly potentiated. Thus, when phobic individuals look at pictures of their fear object (e.g., snakes or spiders), they show dramatically augmented startle potentiation compared to normal subjects (Hamm et al., 1997; Sabatinelli et al., 2001). Graham (1992) previously proposed that startle reflex inhibition was an index of attentional engagement, and recent research supports the view
20
Fig. 5. Mean startle blink magnitude in response to probe noise stimuli while participants viewed specific unpleasant picture contents. Picture contents are ordered on the abscissa by rated affective arousal from left to right (low to high arousal). The correlation between content arousal rank and blink magnitude was 0.86, po0.01.
that attention and emotional arousal are interacting factors in probe-startle modulation. Thus, probes presented during low-arousal, unpleasant stimuli — pictures of sad events, pollution, people who are ill — actually prompt some reflex inhibition relative to less interesting, neutral stimuli. Startle magnitude increases, however, linearly as unpleasant pictures are reported to be more arousing — with threat to the viewer (a gunman pointing his weapon at the viewer or animal attack) prompting the greatest potentiation (Cuthbert et al., 1996; Bradley et al., 2000; see Fig. 5). There is also evidence that startle reflexes are reduced for all picture contents, when attention is initially engaged (i.e., the first few hundred milliseconds after the onset of a stimulus; see Bradley et al., 2003), and furthermore, that this attentional
‘‘pre-pulse’’ inhibition may be greatest for emotional arousing pictures (irrespective of valence). The early startle inhibition could indicate initial attention capture that is preferential for motivationally relevant input. In the subsequent seconds, as the picture input is more fully processed, arousing unpleasant pictures prompt the expected reflex potentiation, while startle reflexes during pleasant pictures remain inhibited. These data suggest that relative reflex inhibition is the modal reaction when attention is engaged, and that this reduced response increases with more ‘‘interesting’’ and more pleasantly exciting foregrounds. In this view, potentiated startle is the exceptional response, occurring only when attention is focused on very upsetting or threatening stimulus contents. The fact that inhibition of motor action is the early, first reaction to a stimulus and that defensive potentiation occurs later (and only to the most arousing unpleasant stimuli) is consistent with the defense motivational sequence processing from attention to action, and has been discussed previously in a wider timeframe. That is, the first reaction to threat is inevitably to stop, look, and listen (as prey does when first perceiving a predator in the distance), and only after information is gathered, a strategy is devised, and the attack is clearly imminent, are the acts of fight or flight deployed.
The amygdala and conditioned fear: Startle modulation in the rat Figure 6 shows pathways believed to mediate fearpotentiated startle in rats during conditioned visual stimulation. Information from the retina projects to both the dorsal lateral geniculate nucleus and lateral posterior nucleus of the thalamus. The dorsal lateral geniculate projects to the visual cortex whereas the lateral posterior nucleus projects directly to the Bla (and the perirhinal cortex, Shi and Davis, 1996), which then projects to the lateral and/or basolateral nucleus of the amygdala. Next, the basolateral nucleus projects to the central nucleus, which in turn has neural projections to the startle pathway.
21
Fig. 6. Schematic diagram of pathways believed to be involved in the fear-potentiated startle effect in rats using a visual conditioned stimulus. Visual information goes from the retina through the lateral posterior nucleus of the thalamus either directly to the basolateral amygdala or indirectly through the perirhinal cortex to the amygdala. Visual information also goes from the retina to the dorsal lateral geniculate nucleus to the visual cortex and then through the perirhinal cortex to the amygdala. Shock information goes through a series of parallel pathways to eventually reach the basolateral amygdala. The basolateral nucleus of the amygdala projects to the central nucleus which then projects directly to the startle pathway at the level of the nucleus reticularis pontis caudalis (PnC), as well as indirectly via a synapse in the deep layers of the superior colliculus and mesencephalic reticular formation (Deep SC/Me). A pathway in the brain stem and spinal cord mediates the startle reflex itself. Afferents from the cochlea synapse onto a small group of neurons embedding the auditory nerve called cochlear root neurons (CRN) which send heavy projections to the PnC. Axons from cells in the PnC form part of the reticulospinal tract that make both monosynaptic and polysynaptic connections in the spinal cord onto motoneurons.
Electrolytic or chemical lesions of visual thalamus (Shi and Davis, 1996), perirhinal cortex (Rosen et al., 1992; Campeau and Davis, 1995a) or Bla (Sananes and Davis, 1992; Campeau and Davis, 1995b) completely block the expression of fear-potentiated startle using a visual CS. None of these lesions affect startle amplitude itself. Lesions (Hitchcock and Davis, 1986; Campeau and Davis, 1995b) or local infusion of glutamate antagonists into the CeA (Kim et al., 1993; Walker and Davis, 1997) also block fear-potentiated startle. Both conditioned fear and sensitization of startle by footshocks appear to modulate startle at the level of the nucleus reticularis pontis caudalis (PnC) (Berg and Davis, 1985; Boulis and Davis, 1989; Krase et al., 1994). The CeA projects directly to the nucleus reticularis pontis caudalis (Rosen et al., 1991) and electrolytic lesions along this pathway block the expression of fear-potentiated startle (Hitchcock and Davis, 1991). However, an obligatory synapse appears to exist in this pathway because fiber-sparing chemical lesions in the deep layers of superior colliculus or periacqueductal gray also block fear-potentiated startle (Frankland and Yeomans, 1995; Fendt et al., 1996), as does local infusion of muscimol or glutamate antagonists into a part of the mesencephalic reticular formation (Meloni and Davis, 1999; Zhao and Davis, 2004).
Pleasure attenuated startle As mentioned earlier, startle amplitude is decreased when elicited in the presence of a cue previously paired with food (Schmid et al., 1995). Moreover, the nucleus accumbens is important for this ‘‘pleasure-attenuated startle effect’’ because pretraining local infusion of the neurotoxin 6-OHDA into the nucleus accumbens, which markedly reduces dopamine, blocks pleasure attenuated startle (Koch et al., 1996). It is possible that the connection between the Bla and the ventral striatum may be involved in pleasure-attenuated startle. Unilateral lesions of the Bla on one side of the brain, and the nucleus accumbens ablation on the other, would test this hypothesis.
22
Startle and amygdala in the human brain There is limited direct evidence that the amygdala’s role in determining startle reflex amplitude is similar in rodents and humans. Bower et al. (1997) have provided suggestive data: They studied epileptic patients who were treated by resection of the temporal lobe, and found that greater post-surgery amygdala volume was associated with larger baseline startle reflexes. Funayama et al. (2001) also looked at epilepsy patients with temporal pole resection and found deficits in fear-potentiated startle that were task and hemisphere specific. Patients with damage to the right temporal lobe did not potentiate to unpleasant pictures but did when told they might be shocked. Patients with left temporal-lobe damage were the opposite and one patient with bilateral damage did not potentiate in either case. This suggests that many of the lateralized effects that have variously been reported may turn out to be task specific. An fMRI study of amygdala reactions to faces of different races was also suggestive (Phelps et al., 2000): The researcher showed that a measure of implicit negative evaluation and startle potentiation both co-varied with increased left amygdala activation. There are, of course, imaging studies showing amygdala activation in human conditioning studies similar to those conducted with animals (e.g., Morris et al., 1996). However, startle is rarely included as an additional dependent variable.
Emotional processing: From attention to action When a wild rat sees a human at some distance away, the rat freezes in a highly alert, attentive posture. As the human (or other potential predator) gradually approaches, the rat suddenly darts away, if escape is possible. If escape is not possible and the human gets very close, the rat will attack (Blanchard et al., 1986). Defensive behaviors increase systematically with a reduction in distance from predators and other dangerous or potentially painful stimuli (Blanchard and Blanchard, 1987): Given an available escape route, proximity is associated with an increased probability of active flight. In the absence of an escape option, the best
defense may be attack. When the threat stimulus is distant, however, the rat freezes and orients toward the predator. The Blanchards noted further that increases in ‘‘the amplitude of the startle response to sudden stimuli accompany decreasing defensive distance’’ (p. S5). Using concepts introduced by Timberlake and colleagues (Timberlake and Lucas, 1989; Timberlake, 1993), Fanselow (1991, 1994) has similarly analyzed fear behavior, describing three stages of increasing prey–predator proximity: Pre-encounter defense, pre-emptive behavior that occurs in a foraging area where predators were previously encountered; post-encounter defense, responses prompted by the detection of a distant predator; circa-strike defense, behaviors that occur in the region of physical contact or its close imminence. Behavior shifts from pre-emptive threat vigilance at pre-encounter to post-encounter freezing and orienting to a specific predator cue, to the circastrike stage when the organism, beyond vigilance, engages in vigorous defensive action. Mild amygdala stimulation (electrical) first stops on-going behavior (freezing), prompting bradycardia and EEG activation. As stimulation increases, the animal becomes more active; at high levels of stimulation, it vigorously attempts escape from the source of stimulation. It appears that a predator at some distance prompts mild activation of the amygdala, functionally increasing attention. As the predator comes closer, amygdala activation increases, now mediating overt defensive behavior, including escape. As Fanselow (1991, 1994) proposed, the switch from an attentional mode to active defense may involve a switch in activation from ventral to dorsal periacqueductal gray. Bandler and others (cf. Bandler and Shipley, 1994) have shown that the ventral periacqueductal gray projects to both cardiovascular centers mediating bradycardia and motor system inhibition. In contrast, the dorsal periacqueductal gray projects to centers mediating tachycardia and active escape behavior. Assuming a low threshold for projected amygdala activation in ventral periacqueductal gray, we might expect this structure to mediate attentional responses, subsequent to the amygdala’s activation by a novel event. If the dorsal periacqueductal gray had a higher threshold for amygdala activation, however, then
23
its function would depend on greater activation in the amygdala (e.g., as with predator approach), prompting an abrupt switch from passive vigilance to full-scale action. Lang et al. (1997) proposed an adaptation of the predator stage model for explicating human brain and psychophysiological reactions to unpleasant and threatening stimuli. They suggest that humans, when first seated in the psychophysiological laboratory, are functionally similar to an animal at the preencounter stage, i.e., they are moderately alert in an unfamiliar environment. For humans and other animals, presentation of an aversive stimulus (postencounter) prompts focused attention, ‘‘freezing,’’ apprehension, and arousal. Human participants show this pattern when viewing threatening films or pictures, or attending to a previously neutral cue that augers a contingent aversive event, such as electric shock. During post-encounter, physiological indices of attention at first increase with imminence and severity of the threatened consequence — greater skin conductance, increased heart rate deceleration, and some inhibition of the probe-startle reflex. During this period, however, brain and body are also mobilizing for possible action. One could conjecture that amygdala transmissions to the central gray are increasing with greater arousal (imminence of threat), ultimately switching the impact site from the ventral (freezing) to the dorsal (action) gray. The first motor consequence of this switch may be the change in direction of probe-startle modulation, i.e., from an initial moderate reflex inhibition to reflex potentiation. Startle potentiation then progressively increases in magnitude with the aversiveness of stimuli (as probes occur more proximal to the Circa-strike stage in the animal model). With a further increment in threat, the heart rate response also reverses the direction of change from orienting to defense (Sokolov, 1963; Graham, 1979) — from a parasympathetically mediated fear bradycardia to action mobilization and sympathetically mediated cardiac acceleration. The biological model of emotion presented here suggests that, depending on level of stimulus aversion (threat, apprehension), patterns of physiological change (and in humans, reports of experienced emotional arousal) will systematically vary with the level of defense system activation. Furthermore, the
overt behaviors that may result with increasing imminence (closer to the Circa-strike region) could look fearful, angry — or given overwhelming stress and no available coping behavior — and could lead to autonomic and somatic collapse, hopelessness, and depression. The model’s assumed predator is, of course, engaged in parallel dance — first observing quietly the distant prey, stalking forward slowly, increasingly mobilized for what becomes, at the Circastrike stage, a final charge. Overall, it is a parallel progression from attention to action, with a coincident increment in appetitive arousal. While there is currently little data on the predator’s anticipatory pleasures — joy of the hunt, satisfaction in its consummation — the neurophysiology of the process is likely to be similar in psychophysiology to what is observed in defense.
Emotion and the brain: Concluding thoughts We have proposed here that basic emotions are founded on brain circuitry in which the amygdala is a central component. We conceive emotion, not as a single reaction, but as the reflection of a motivational process. Emotions reflect sequenced, somatic and autonomic reflexes, organized by brain circuits that developed over evolutionary history to insure the survival of individuals and the propagation of genes across generations. Thus, events that are positive/appetitive or aversive/threatening have an initial advantage in capturing attention. Furthermore, these events prompt further information gathering, evaluation, and action planning — more so, according to their degree of survival significance. Motive cues also occasion the release of neurochemicals, metabolic arousal, anticipatory responses that are oriented towards the engaging event, and a general mobilization of the organism preparatory to action. Sometimes these motivational sequences play out in humans in the same stimulus-driven way that they do in less complex organisms. More often, given an elaborate brain, equipped with language, interactive memory storage, and a vast behavioral repertoire, motivational reflexes may only be engaged in part, with overt action inhibited, delayed, or complexly adapted to
24
the context. Nevertheless, it is this reflex bedrock that generally prompts humans to say that they are emotionally aroused — joyful, angry, fearful, anxious, or sad and hopeless. It is emotion’s reflex automaticity, moreover, that may have much to do with our sense of being less in control when emotionally aroused, and instead, feel driven by unbidden impulses and helpless in emotion’s thrall.
Acknowledgments This work was supported in part by National Institute of Mental Health Grants MH 47840, MH 57250, MH 58922, MH 59906 and the Woodruff Foundation to MD, NSF Contract No. IBN987675, to Emory University, and P50 MH52384, MH 37757, and MH43975 to PJL.
References Adolphs, R. and Tranel, D. (1999) Intact recognition of emotional prosody following amygdala damage. Neuropsychologia, 37: 1285–1292. Adolphs, R., Tranel, D. and Damasio, A.R. (1998) The human amygdala in social judgment. Nature, 393: 470–474. Adolphs, R., Tranel, D., Damasio, H. and Damasio, A.R. (1994) Impaired recognition of emotion in facial expressions following bilateral damage to the human amygdala. Nature, 372: 669–672. Amaral, D.G. and Price, J.L. (1984) Amygdalo-cortical projections in the monkey (Macaca fascicularis). J. Comp. Neurol., 230: 465–496. Amaral, D.G., Price, J.L., Pitkanen, A. and Carmichael, S.T. (1992) Anatomical organization of the primate amygdaloid complex. In: Aggleton, J.P. (Ed.), The Amygdala: Neurobiological Aspects of Emotion, Memory and Mental Dysfunction. Wiley, New York, pp. 1–66. Amorapanth, P., LeDoux, J.E. and Nader, K. (2000) Different lateral amygdala outputs mediate reactions and actions elicited by a fear-arousing stimulus. Nat. Neurosci., 3: 74–79. Anderson, S.W., Bechara, A., Damasio, H., Tranel, D. and Damasio, A.R. (1999) Impairment of social and moral behavior related to early damage in human prefrontal cortex. Nat. Neurosci., 2: 1032–1037. Anderson, J.R. and Bower, G.H. (1974) A propositional theory of recognition memory. Memory Cogn., 2: 406–412. Applegate, C.D., Kapp, B.S., Underwood, M.D. and McNall, C.L. (1983) Autonomic and somatomotor effects of amygdala central n. stimulation in awake rabbits. Physiol. Behav., 31: 353–360.
Aston-Jones, G., Rajkowski, J., Kubiak, P., Valentino, R.J. and Shipley, M.T. (1996) Role of the locus coeruleus in emotional activation. Prog. Brain Res., 107: 379–402. Balaban, M. (1995) Affective influences on startle in fivemonth-old infants: reactions to facial expressions of emotion. Child Dev., 66: 23–36. Bandler, R. and Shipley, M.T. (1994) Columnar organization in the midbrain periaqueductal gray: modules for emotional expression? Trends Neurosci., 17: 379–389. Baxter, M.G., Parker, A., Lindner, C.C.C., Izquierdo, A.D. and Murray, E.A. (2000) Control of response selection by reinforcer value requires interaction of amygdala and orbital prefrontal cortex. J. Neurosci., 20: 4311–4319. Bechara, A., Damasio, H., Tranel, D. and Damasio, A.R. (1997) Deciding advantageously before knowing the advantageous strategy. Science, 275: 1293–1294. Berg, W.K. and Davis, M. (1985) Associative learning modifies startle reflexes at the lateral lemniscus. Behav. Neurosci., 99: 191–199. Bermudez-Rattoni, F., Grijalva, C.V., Kiefer, S.W. and Garcia, J. (1986) Flavor-illness aversion: the role of the amygdala in acquisition of taste-potentiated odor aversions. Physiol. Behav., 38: 503–508. Blanchard, R.J. and Blanchard, D.C. (1987) An ethoexperimental approach to the study of fear. Psychol. Rec., 37: 305–316. Blanchard, R.J., Flannelly, K.J. and Blanchard, D.C. (1986) Defensive behavior of laboratory and wild Rattus norvegicus. J. Comp. Psychol., 100: 101–107. Boulis, N. and Davis, M. (1989) Footshock-induced sensitization of electrically elicited startle reflexes. Behav. Neurosci., 103: 504–508. Bower, D., Eckert, M., Gilmore, R., Leonard, C.M., Bauer, R., Roper, S., Bradley, M.M., Lang, P.J. and Barta, P. (1997) Startle eyeblink magnitude in humans depends on extent of right amygdala removal. Soc. Neurosci. Abstr., 23: 570. Bradley, M.M. (2000) Emotion and motivation. In: Cacioppo, J.T., Tassinary, L.G. and Bernston, G. (Eds.), Handbook of Psychophysiology. Cambridge University Press, New York, pp. 602–642. Bradley, M.M., Codispoti, M., Cuthbert, B.N. and Lang, P.J. (2000) Emotion and picture perception: emotion and motivation I: Defensive and appetitive reactions in picture processing. Emotion, 1: 276–298. Bradley, M.M., Cuthbert, B.N. and Lang, P.J. (1998a) International affective digitized sounds (IADS). The Center for Research in Psychophysiology. University of Florida, Gainesville, FL. Bradley, M.M., Greenwald, M.K., Petry, M. and Lang, P.J. (1992) Remembering pictures: pleasure and arousal in memory. J. Exp. Psychol. Learn. Mem. Cogn., 18: 379–390. Bradley, M.M., Lang, P.J. and Cuthbert, B.N. (1990) Startle reflex modification: emotion or attention. Psychophysiology, 27: 513–523. Bradley, M.M., Lang, P.J. and Cuthbert, B.N. (1998b) Affective norms for English words (ANEW). In: Technical manual and affective ratings. The Center for the Study of Emotion and Attention. University of Florida, Gainesville, FL.
25 Bradley, M.M., Sabatinelli, D., Lang, P.J., Fitzsimmons, J.R., King, W. and Desai, P. (2003) Activation of the visual cortex in motivated attention. Behav. Neurosci., 117: 369–380. Bradley, M.M., Zack, J. and Lang, P.J. (1994) Cries, screams, and shouts of joy: affective responses to environmental sounds. Psychophysiology, 31(S29). Brown, J.S., Kalish, H.I. and Farber, I.E. (1951) Conditional fear as revealed by magnitude of startle response to an auditory stimulus. J. Exp. Psychol., 41: 317–328. Burns, L.H., Robbins, T.W. and Everitt, B.J. (1993) Differential effects of excitotoxic lesions of the basolateral amygdala, ventral subiculum and medial prefrontal cortex on responding with conditioned reinforcement and locomotor activity potentiated by intra-accumbens infusions of D-amphetamine. Behav. Brain Res., 55: 167–183. Bussey, T.J., Everitt, B.J. and Robbins, T.W. (1997) Dissociable effects of cingulate and medial frontal cortex lesions on stimulus–reward learning using a novel Pavlovian autoshaping procedure in rats: implications for the neurobiology of emotion. Behav. Neurosci., 111: 908–919. Cacioppo, J.T., Crites Jr., S.L., Gardner, W.L. and Berntson, G.G. (1994) Bioelectrical echoes from evaluative categorizations: I. A late positive brain potential that varies as a function of trait negativity and extremity. J. Pers. Soc. Psychol., 67: 115–125. Cador, M., Robbins, T.W. and Everitt, B.J. (1989) Involvement of the amygdala in stimulus–reward associations: interaction with the ventral striatum. Neuroscience, 30(1): 77–86. Cahill, L., Haier, R.J., Fallon, J., Alkire, M.T., Tang, C., Keator, D., Wu, J. and McGaugh, J.L. (1996) Amygdala activity at encoding correlated with long-term, free recall of emotional information. Proc. Natl. Acad. Sci. USA, 93: 8016–8021. Cahill, L. and McGaugh, J.L. (1998) Mechanisms of emotional arousal and lasting declarative memory. Trends Neurosci., 21: 294–299. Campbell, B.A., Wood, G. and McBride, T. (1997) Origins of orienting and defense responses: an evolutionary perspective. In: Lang, P.J., Simmons, R.F. and Balaban, M.T. (Eds.), Attention and Orienting: Sensory and Motivational Processes. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 41–67. Campeau, S. and Davis, M. (1995a) Involvement of subcortical and cortical afferents to the lateral nucleus of the amygdala in fear conditioning measured with fear-potentiated startle in rats trained concurrently with auditory and visual conditioned stimuli. J. Neurosci., 15: 2312–2327. Campeau, S. and Davis, M. (1995b) Involvement of the central nucleus and basolateral complex of the amygdala in fear conditioning measured with fear-potentiated startle in rats trained concurrently with auditory and visual conditioned stimuli. J. Neurosci., 15: 2301–2311. Chapman, W.P., Schroeder, H.R., Guyer, G., Brazier, M.A.B., Fager, C., Poppen, J.L., Solomon, H.C. and Yakolev, P.I. (1954) Physiological evidence concerning the importance of the amygdaloid nuclear region in the integration of circulating function and emotion in man. Science, 129: 949–950.
Cuthbert, B.N., Bradley, M.M. and Lang, P.J. (1996) Probing picture perception: activation and emotion. Psychophysiology, 33: 103–111. Cuthbert, B.N., Schupp, H.T., Bradley, M.M., Birbaumer, N. and Lang, P.J. (2000) Brain potentials in affective picture processing: covariation with autonomic arousal and affective report. Biol. Psychol., 52: 95–111. Damasio, A.R. (1994) Descartes’ Error. Grosset/Putnam, New York. Davis, M. (2000) The role of the amygdala in conditioned and unconditioned fear and anxiety. In: Aggleton, J.P. (Ed.) The Amygdala, Vol. 2. Oxford University Press, Oxford, UK, pp. 213–287. Davis, M., Falls, W.A., Campeau, S. and Kim, M. (1993) Fearpotentiated startle: a neural and pharmacological analysis. Behav. Brain Res., 58: 175–198. Davis, M. and Whalen, P. (2001) The amygdala: vigilance and emotion. Mol. Psychiatry, 6: 13–34. Dickinson, A. and Dearing, M.F. (1979) Appetitive-aversive interactions and inhibitory processes. In: Dickinson, A. and Boakes, R.A. (Eds.), Mechanisms of Learning and Motivation. Erlbaum, Hillsdale, NJ, pp. 203–231. Dringenberg, H.C. and Vanderwolf, C.H. (1996) Cholinergic activation of the electrocorticogram: an amygdaloid activating system. Exp. Brain Res., 108: 285–296. Ehrlichman, H., Brown, S., Zhu, J. and Warrenburg, S. (1995) Startle reflex modulation during exposure to pleasant and unpleasant odors. Psychophysiology, 32: 150–154. Elwood, R.W., Wood, K.E., Gallagher, M.B. and Dick, J.T.A. (1998) Probing motivational state during agonistic encounters in animals. Nature, 39317: 66–68. Everitt, B.J., Cador, M. and Robbins, T.W. (1989) Interactions between the amygdala and ventral striatum in stimulus–reward associations: studies using a second-order schedule of sexual reinforcement. Neuroscience, 30: 63–75. Everitt, B.J., Cardinal, R.N., Hall, J., Parkinson, J.A. and Robbins, T.W. (2000) Differential involvement of amygdala subsystems in appetitive conditioning and drug addiction. In: Aggleton, J.P. (Ed.) The Amygdala, Vol. 2. Oxford University Press, Oxford, UK, pp. 353–390. Everitt, B.J., Morris, K.A., O’Brien, A. and Robbins, T.W. (1991) The basolateral amygdala-ventral striatal system and conditioned place preference: further evidence of limbic-striatal interactions underlying reward-related processes. Neuroscience, 42(1): 1–18. Everitt, B.J. and Robbins, T.V. (1992) Amygdala-ventral striatal interactions and reward related processes. In: Aggleton, J.P. (Ed.), The Amygdala: Neurobiological Aspects of Emotion, Memory and Mental Dysfunction. Wiley-Liss, New York, pp. 401–429. Fanselow, M.S. (1991) The midbrain periaqueductal gray as a coordinator of action in response to fear and anxiety. In: Depaulis, A. and Bandler, R. (Eds.), The Midbrain Periaqueductal Gray Matter: Functional, Anatomical and Neurochemical Organization. Plenum Publishing Co., New York, pp. 151–173.
26 Fanselow, M.S. (1994) Neural organization of the defensive behavior system responsible for fear. Psychon. Bull. Rev., 1: 429–438. Fendt, M., Koch, M. and Schnitzler, H.U. (1996) Lesions of the central gray block conditioned fear as measured with the potentiated startle paradigm. Behav. Brain Res., 74: 127–134. Frankland, P.W. and Yeomans, J.S. (1995) Fear-potentiated startle and electrically evoked startle mediated by synapses in rostrolateral midbrain. Behav. Neurosci., 109: 669–680. Fredrikson, M., Wik, G., Annas, P., Ericson, K.A.J. and StoneElander, S. (1995) Functional neuroanatomy of visually elicited simple phobic fear: additional data and theoretical analysis. Psychophysiology, 32: 43–48. Fredrikson, M., Wik, G., Greitz, T., Stone-Elander, S., Ericson, K.A.J. and Sedvall, G. (1993) Regional cerebral blood flow during experimental phobic fear. Psychophysiology, 30: 126–130. Funayama, E.S., Grillon, C.G., Davis, M. and Phelps, E.A. (2001) A double dissociation in the affective modulation of startle in humans: effects of unilateral temporal lobectomy. J. Cogn. Neurosci., 13: 721–729. Gallagher, M. (2000) The amygdala and associative learning. In: Aggleton, J.P. (Ed.) The Amygdala, Vol. 2. Oxford University Press, Oxford, UK. Gallagher, M., Graham, P.W. and Holland, P.C. (1990) The amygdala central nucleus and appetitive Pavlovian conditioning: lesions impair one class of conditioned behavior. J. Neurosci., 10: 1906–1911. Gewirtz, J. and Davis, M. (1997) Second order fear conditioning prevented by blocking NMDA receptors in the amygdala. Nature, 388: 471–474. Gewirtz, J.C. and Davis, M. (1998) Application of Pavlovian higher-order conditioning to the analysis of the neural substrates of learning and memory. Neuropharmacology, 37: 453–460. Gloor, P., Olivier, A. and Quesney, L.F. (1981) The role of the amygdala in the expression of psychic phenomena in temporal lobe seizures. In: Ben-Ari, Y. (Ed.), The Amygdaloid Complex. Elsevier/North-Holland, New York, pp. 489–507. Goddard, G.V. (1964) Functions of the amygdala. Psychol. Bull., 62: 89–109. Gold, P.E. (2003) Acetylcholine modulation of neural systems involved in learning and memory. Neurobiol. Learn. Mem., 80: 194–210. Goldstein, L.E., Rasmusson, A.M., Bunney, B.S. and Roth, R.H. (1996) Role of the amygdala in the coordination of behavioral, neuroendocrine, and prefrontal cortical monoamine responses to psychological stress in the rat. J. Neurosci., 16(15): 4787–4798. Graham, F.K. (1979) Distinguishing among orienting, defense, and startle reflexes. In: Kimmel, H.D., van Olst, H. and Orelebeke, F. (Eds.), The Orienting Reflex in Humans. An International Conference Sponsored by the Scientific Affairs Division of the North Atlantic Treaty Organization. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 137–167. Graham, F.K. (1992) Attention: the heartbeat, the blink, and the brain. In: Campbell, B.A., Hayne, H. and Richardson, R.
(Eds.), Attention and Information Processing in Infants and Adults. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 3–29. Graham, F.K. and Clifton, R.K. (1966) Heart rate change as a component of the orienting response. Psychol. Bull., 65: 305–320. Greenwald, M.K., Bradley, M.M., Cuthbert, B.N. and Lang, P.J. (1998) Sensitization of the startle reflex in humans following aversive electric shock exposure. Behav. Neurosci., 112: 1069–1079. Greenwald, M.K., Cook, E.W.I. and Lang, P.J. (1989) Affective judgment and psychophysiological response: dimensional covariation in the evaluation of pictorial stimuli. J. Psychophysiol., 3: 51–64. Grillon, C., Ameli, R., Woods, S.W., Merikangas, K. and Davis, M. (1991) Fear-potentiated startle in humans: effects of anticipatory anxiety on the acoustic blink reflex. Psychophysiology, 28: 588–595. Grillon, C., Amelia, R., Merikangas, K., Woods, S.W. and Davis, M. (1993) Measuring the time course of anticipatory anxiety using the fear-potentiated startle reflex. Psychophysiology, 30: 340–346. Grillon, C. and Davis, M. (1997) Fear-potentiated startle conditioning in humans: effects of explicit and contextual cue conditioning following paired vs. unpaired training. Psychophysiology, 34: 451–458. Hamm, A.O., Cuthbert, B.N., Globisch, J. and Vaitl, D. (1997) Fear and startle reflex: blink modulation and autonomic response patterns in animal mutilation fearful subjects. Psychophysiology, 34: 97–107. Hamm, A.O., Greenwald, M.K., Bradley, M.M. and Lang, P.J. (1993) Emotional learning, hedonic changes, and the startle prove. J. Abnorm. Psychol., 102: 453–465. Hatfield, T. and Gallagher, M. (1995) Taste-potentiated odor conditioning: impairment produced by infusion of an N-methyl-D-aspartate antagonist into basolateral amygdala. Behav. Neurosci., 109(4): 663–668. Hatfield, T., Graham, P.W. and Gallagher, M. (1992) Taste-potentiated odor aversion: role of the amygdaloid basolateral complex and central nucleus. Behav. Neurosci., 106: 286–293. Hatfield, T., Han, J.S., Conley, M., Gallagher, M. and Holland, P. (1996) Neurotoxic lesions of basolateral, but not central, amygdala interfere with Pavlovian second-order conditioning and reinforcer devaluation effects. J. Neurosci., 16: 5256–5265. Hawk, L.W. and Cook, E.W. (1997) Affective modulation of tactile startle. Psychophysiology, 34: 23–31. Hebb, D.O. (1949) The Organization of Behavior. Wiley and Sons, New York. Hitchcock, J.M. and Davis, M. (1986) Lesions of the amygdala, but not of the cerebellum or red nucleus, block conditioned fear as measured with the potentiated startle paradigm. Behav. Neurosci., 100: 11–22. Hitchcock, J.M. and Davis, M. (1991) The efferent pathway of the amygdala involved in conditioned fear as measured with the fear-potentiated startle paradigm. Behav. Neurosci., 105: 826–842.
27 Holland, P.C. and Gallagher, M. (1993a) The effects of amygdala central nucleus lesions on blocking and unblocking. Behav. Neurosci., 107: 235–245. Holland, P.C. and Gallagher, M. (1993b) Amygdala central nucleus lesions disrupt increments, but not decrement, in conditioned stimulus processing. Behav. Neurosci., 107: 246–253. Irwin, W., Davidson, R.J., Lowe, M.J., Mock, B.J., Sorenson, J.A. and Turski, P.A. (1996) Human amygdala activation detected with echo-planar functional magnetic resonance imaging. NeuroReport, 7: 1765–1769. Iwai, E. and Yukie, M. (1987) Amygdalofugal and amygdalepetal connections with modality-specific visual cortical areas in Macaques (Macaca fuscata, M. mulatta and M. fascicularis). J. Comp. Neurol., 261: 362–387. Kapp, B.S., Supple, W.F. and Whalen, P.J. (1994) Effects of electrical stimulation of the amygdaloid central nucleus on neocortical arousal in the rabbit. Behav. Neurosci., 108: 81–93. Kapp, B.S., Whalen, P.J., Supple, W.F. and Pascoe, J.P. (1992) Amygdaloid contributions to conditioned arousal and sensory information processing. In: Aggleton, J.P. (Ed.), The Amygdala: Neurobiological Aspects of Emotion, Memory, and Mental Dysfunction. Wiley-Liss, New York, pp. 229–254. Kapp, B.S., Wilson, A., Pascoe, J.P., Supple, W.F. and Whalen, P.J. (1990) A neuroanatomical systems analysis of conditioned bradycardia in the rabbit. In: Gabriel, M. and Moore, J. (Eds.), Neurocomputation and Learning: Foundations of Adaptive Networks. Bradford Books, New York, pp. 55–90. Keil, A., Bradley, M.M., Hauk, O., Rochstroh, B., Elbert, T. and Lang, P.J. (2002) Large-scale neural correlates of affective picture-processing. Psychophysiology, 39: 641–649. Killcross, S., Robbins, T.W. and Everitt, B.J. (1997) Different types of fear-conditioned behavior mediated by separate nuclei within amygdala. Nature, 388: 377–380. Kim, M., Campeau, S., Falls, W.A. and Davis, M. (1993) Infusion of the non-NMDA receptor antagonist CNQX into the amygdala blocks the expression of fear-potentiated startle. Behav. Neural Biol., 59: 5–8. Kintsch, W. (1974) The Representation of Meaning in Memory. Erlbaum, Hillsdale, NJ. Koch, M., Schmid, A. and Schnitzler, H.U. (1996) Pleasureattenuation of startle is disrupted by lesions of the nucleus accumbens. NeuroReport, 7: 1442–1446. Konorski, J. (1967) Integrative Activity of the Brain: An Interdisciplinary Approach. The University of Chicago Press, Chicago. Krase, W., Koch, M. and Schnitzler, H.U. (1994) Substance P is involved in the sensitization of the acoustic startle response by footshock in rats. Behav. Brain Res., 63: 81–88. LaBar, K.S., Gatenby, J.C., Gore, J.C., LeDoux, J.E. and Phelps, E.A. (1998) Human amygdala activation during conditioned fear acquisition and extinction: a mixed-trial fMRI study. Neuron, 20: 937–945. Landis, C. and Hunt, W. (1939) The Startle Paradigm. Farrar and Rinehart, New York.
Lane, R.D., Reiman, E.M., Bradley, M.M., Lang, P.J., Ahern, G.L., Davidson, R.J. and Schwartz, G.E. (1997) Neuroanatomical correlates of pleasant and unpleasant emotion. Neuropsychologia, 35: 1437–1444. Lang, P.J. (1979) A bio-informational theory of emotional imagery. Psychophysiology, 16: 495–512. Lang, P.J. (1985) The cognitive psychophysiology of emotion: fear and anxiety. In: Tuma, A.H. and Maser, J.D. (Eds.) Anxiety and the Anxiety Disorders, Vol. 3. Erlbaum, Hillsdale, NJ, pp. 131–170 No. 2, 3–62. Lang, P.J. (1994) The motivational organization of emotion: affect-reflex connections. In: VanGoozen, S., Van De Poll, N.E. and Sargeant, J.A. (Eds.), Emotions: Essays on Emotion Theory. Erlbaum, Hillsdale, NJ, pp. 61–93. Lang, P.J. (1995) The emotion probe. Am. Psychol., 50: 372–385. Lang, P.J., Bradley, M.M. and Cuthbert, B.N. (1990) Emotion, attention, and the startle reflex. Psychol. Rev., 97: 377–395. Lang, P.J., Bradley, M.M. and Cuthbert, B.N. (1997) Motivated attention: affect, activation and action. In: Lang, P.J., Simons, R.F. and Balaban, M.F. (Eds.), Attention and Orienting: Sensory and Motivational Processes. Lawrence Erlbaum Associates, Inc., NJ, pp. 97–135. Lang, P.J., Bradley, M.M. and Cuthbert, B.N. (1999) International Affective Picture System (IAPS). The Center for Research in Psychophysiology. University of Florida, Gainesville, FL. Lang, P.J., Bradley, M.M., Fitzsimmons, J.R., Cuthbert, B.N., Scott, J.D., Moulder, B. and Nangia, V. (1998) Emotional arousal and activation of the visual cortex: an fMRI analysis. Psychophysiology, 35: 1–13. Lang, P.J., Greenwald, M.K., Bradley, M.M. and Hamm, A.O. (1993) Looking at pictures: affective, facial, visceral and behavioral reactions. Psychophysiology, 30: 261–273. LeDoux, J.E., Iwata, J., Cicchetti, P. and Reis, D.J. (1988) Different projections of the central amygdaloid nucleus mediate autonomic and behavioral correlates of conditioned fear. J. Neurosci., 8: 2517–2529. Lee, G.P., Bechara, A., Adolphs, R., Arena, J., Meador, K.J., Loring, D.W. and Smith, J.R. (1998) Clinical and physiological effects of stereotaxic bilateral amygdalotomy for intractable aggression. J. Neuropsychiatry Clin. Neurosci., 10: 413–420. Li, L. and Yeomans, J.S. (1999) Summation between acoustic and trigeminal stimuli evoking startle. Neuroscience, 90: 139–152. Lipp, O.V., Sheridan, J. and Siddle, D.A. (1994) Human blink startle during aversive and nonaversive Pavlovian conditioning. J. Exp. Psychol. Anim. Behav. Process., 20: 380–389. Masur, J.D., Dienst, F.T. and O’Neal, E.C. (1974) The acquisition of a Pavlovian conditioned response in septally damaged rabbits: role of a competing response. Physiol. Psychol., 2: 133–136. McCall, R.B. and Aghajanian, G.K. (1979) Serotonergic facilitation of facial motoneuron excitation. Brain Res., 169: 11–27.
28 McDonald, A.J. (1998) Cortical pathways to the mammalian amygdala. Prog. Neurobiol., 55: 257–332. McDonald, J. (1991) Topographic organization of amygdaloid projections to the caudatoputamen, nucleus accumbens, and related striateal-like areas of the rat brain. Neuroscience, 44(1): 15–33. McGaugh, J.L. (2004) The amygdala modulates the consolidation of memories of emotionally arousing experiences. Annu. Rev. Neurosci., 27: 1–28. McGaugh, J.L., Introini-Collison, I.B., Cahill, L., Castellano, C., Dalmaz, C., Parent, M.B. and Williams, C.L. (1993) Neuromodulatory systems and memory storage: role of the amygdala. Behav. Brain Res., 58: 81–90. McGaugh, J.L., Introini-Collison, I.B., Cahill, L., Kim, M. and Liang, K.C. (1992) Involvement of the amygdala in neuromodulatory influences on memory storage. In: Aggleton, J.P. (Ed.), The Amygdala: Neurobiological Aspects of Emotion, Memory, and Mental Dysfunction. Wiley-Liss, New York, pp. 431–451. Mehrabian, A. and Russell, J.A. (1974) An Approach to Environmental Psychology. MIT Press, Cambridge, MA. Meloni, E.G. and Davis, M. (1999) Muscimol in the deep layers of the superior colliculus/mesencephalic reticular formation blocks expression but not acquisition of fear-potentiated startle in rats. Behav. Neurosci., 113: 1152–1160. Miltner, W., Matjak, M., Braun, C. and Diekmann, H. (1994) Emotional qualities of odors and their influence on the startle reflex in humans. Psychophysiology, 31: 107–110. Mineka, S., Davidson, M., Cook, M. and Keir, R. (1984) Observational conditioning of snake fear in rhesus monkeys. J. Abnorm. Psychol., 93: 355–372. Morgenson, G.M. (1987) Limbic-motor intergration. In: Epstein, A. and Morrison, A.R. (Eds.), Progress in Psychobiology and Physiological Psychology. Academic Press, New York, pp. 117–170. Morris, J.S., Frith, C.D., Perrett, D.I. and Rowland, D. (1996) A differential neural response in the human amygdala to fearful and happy facial expression. Nature, 383: 812–815. O¨hman, A. and Mineka, S. (2001) Fears, phobias, and preparedness: toward an evolved module of fear and fear learning. Psychol. Rev., 108: 483–522. Ortony, A., Clore, G.L. and Collins, A. (1988) The Cognitive Structure of Emotions. Cambridge Press, Cambridge. Osgood, C., Suci, G. and Tannenbaum, P. (1957) The Measurement of Meaning. University of Illinois, Urbana, IL. Packard, M.G., Cahill, L. and McGaugh, J.L. (1994) Amygdala modulation of hippocampal-dependent and caudate nucleusdependent memory processes. Proc. Natl. Acad. Sci. USA, 91: 8477–8481. Packard, M.G. and Teather, L.A. (1998) Amygdala modulation of multiple memory systems: hippocampus and caudate–putamen. Neurobiol. Learn. Mem., 69: 163–203. Parkinson, J.A., Dally, J.W., Bamford, A., Fehrent, B., Robbins, T.W. and Everitt, B.J. (1998) Effects of 6-OHDA lesions of the rat nucleus accumbens on appetitive Pavlovian conditioning. J. Psychopharmacol., 12: A8.
Parkinson, J.A., Robbins, T.W. and Everitt, B.J. (2000a) Dissociable roles of the central and basolateral amygdala in appetitive emotional learning. Eur. J. Neurosci., 12: 405–413. Parkinson, J.A., Willoughby, P.J., Robbins, T.W. and Everitt, B.J. (2000b) Disconnection of the anterior cingulater cortex and nucleus accumbens core impairs Pavlovian approach behavior: further evidence for limbic cortico-ventral striatopallidal systems. Behav. Neurosci., 114: 42–63. Pascoe, J.P. and Kapp, B.S. (1985) Electrophysiological characteristics of amygdaloid central nucleus neurons during Pavlovian fear conditioning in the rabbit. Behav. Brain Res., 16: 117–133. Pavlov, I.P. (1927) Conditioned Reflexes. Oxford University Press, Oxford UK. Phelps, E.A., O’Connor, K.J., Cunningham, W.A., Funayama, E.S., Gatenby, J.C., Gore, J.C. and Banaji, M.R. (2000) Performance on indirect measures of race evaluation predicts amygdala activation. J. Cogn. Neurosci., 12: 729–738. Phelps, E.A., O’Connor, K.J., Gatenby, J.C., Gore, J.C., Grillon, C. and Davis, M. (2001) Activation of the left amygdala to a cognitive representation of fear. Nat. Neurosci., 4: 437–441. Power, A.E., Vazdarjanova, A. and McGaugh, J.L. (2003) Muscarinic cholinergic influences in memory consolidation. Neurobiol. Learn. Mem., 80: 178–193. Quirk, G.J., Repa, J.C. and LeDoux, J.E. (1995) Fear conditioning enhances short-latency auditory responses of lateral amygdala neurons: parallel recordings in the freely behaving rat. Neuron, 15: 1029–1039. Redmond Jr., D.E. (1977) Alteration in the function of the nucleus locus: a possible model for studies on anxiety. In: Hanin, I.E. and Usdin, E. (Eds.), Animal Models in Psychiatry and Neurology. Pergamon Press, Oxford, UK, pp. 292–304. Reiman, E.M., Lane, R.D., Ahern, G.L., Schwartz, G.E., Davidson, R.J., Friston, K.J., Yun, L. and Chen, K. (1997) Neuroanatomical correlates of externally and internally generated human emotion. Am. J. Psychiatry, 54: 918–925. Robledo, P., Robbins, T.W. and Everitt, B.J. (1996) Effects of excitotoxic lesions of the central amygdaloid nucleus on the potentiation of reward-related stimuli by intra-accumbens amphetamine. Behav. Neurosci., 110: 981–990. Rosen, J.B., Hitchcock, J.M., Miserendino, M.J.D., Falls, W.A., Campeau, S. and Davis, M. (1992) Lesions of the perirhinal cortex but not of the frontal, medial prefrontal, visual, or insular cortex block fear-potentiated startle using a visual conditioned stimulus. J. Neurosci., 12: 4624–4633. Rosen, J.B., Hitchcock, J.M., Sananes, C.B., Miserendino, M.J.D. and Davis, M. (1991) A direct projection from the central nucleus of the amygdala to the acoustic startle pathway: anterograde and retrograde tracing studies. Behav. Neurosci., 105: 817–825. Sabatinelli, D., Bradley, M.M., Cuthbert, B.N. and Lang, P.J. (2001) Affective startle modulation in anticipation and percpetion. Psychophysiology, 38: 719–722. Sabatinelli, D., Flaisch, T., Bradley, M.M., Fitzsimmons, J.R. and Lang, P.J. (2004) Affective picture perception: gender differences in visual cortex. Neuroreport, 15: 1109–1112.
29 Sabatinelli, D., Bradley, M.M., Fitzsimmons, J.R. and Lang, P.J. (2005) Parallel amygdala and inferotemporal activation reflect emotional intensity and fear relevance. NeuroImage, 24: 1265–1270. Sabatinelli, D., Lang, P.J., Keil, A. and Bradley, M.M. (in press). Emotional perception: Correlation of functional MRI and event-related potentials. Cerebral cortex. Sakanaka, M., Shibasaki, T. and Lederis, K. (1986) Distribution and efferent projections of corticotropin-releasing factor-like immunoreactivity in the rat amygdaloid complex. Brain Res., 382: 213–238. Sananes, C.B. and Davis, M. (1992) N-Methyl-D-Aspartate lesions of the lateral and basolateral nuclei of the amygdala block fear-potentiated startle and shock sensitization of startle. Behav. Neurosci., 106: 72–80. Sanghera, M.K., Rolls, E.T. and Roper-Hall, A. (1979) Visual responses of neurons in the dorsolateral amygdala of the alert monkey. Exp. Neurol., 63: 610–626. Schlosberg, J. (1952) The description of facial expression in terms of two dimensions. J. Exp. Psychol., 44: 229–237. Schmid, A., Koch, M. and Schnitzler, H.U. (1995) Conditioned pleasure attenuates the startle response in rats. Neurobiol. Learn. Mem., 64: 1–3. Schneiderman, N. (1972) Response system divergencies in aversive classical conditioning. In: Black, A.H. and Prokasy, W.F. (Eds.) Classical Conditioning, Vol. III: Current Research and Theory. Appleton-Century-Crofts, New York. Schneirla, T. (1959) An evolutionary and developmental theory of biphasic processes underlying approach and withdrawal. In: Jones, M. (Ed.), Nebraska Symposium on Motivation. University of Nebraska Press, Lincoln, pp. 1–42. Schoenbaum, G., Chiba, A.A. and Gallagher, M. (1998) Orbitofrontal cortex and basolateral amygdala encode expected outcomes during learning. Nat. Neurosci., 1: 155–159. Schoenbaum, G., Chiba, A.A. and Gallagher, M. (1999) Neural encoding in orbitofrontal cortex and basolateral amygdala during olfactory discrimination learning. J. Neurosci., 19: 1876–1884. Schupp, H.T., Cuthbert, B.N., Bradley, M.M., Hillman, C.H., Hamm, A.O. and Lang, P.J. (2004) Brain processes in emotional perception: motivated attention. Cogn. Emotion, 18(5): 593–611. Schwaber, J.S., Kapp, B.S., Higgins, G.A. and Rapp, P.R. (1982) Amygdaloid basal forebrain direct connections with the nucleus of the solitary tract and the dorsal motor nucleus. J. Neurosci., 2: 1424–1438. Seligman, M.E.P. (1970) On the generality of the laws of learning. Psychol. Rev., 77: 406–418. Shaver, P., Schwartz, J., Kirson, D. and O’Connor, C. (1987) Emotion knowledge: further exploration of a prototype approach. J. Pers. Soc. Psychol., 52: 1061–1086. Shi, C. and Davis, M. (1996) Anatomical tracing and lesion studies of visual pathways involved in fear conditioning measured with fear potentiated startle. Soc. Neurosci. Abstr., 22: 1115. Sokolov, E.N. (1963) Perception and the Conditioned Reflex. Pergamon Press, Oxford, UK.
Taylor, J.R. and Robbins, T.W. (1984) Enhanced behavioral control by conditioned reinforcers following microinjections of D-amphetamine into the nucleus accumbens. Psychopharmacology, 84: 405–412. Taylor, J.R. and Robbins, T.W. (1986) 6-hydroxydopamine lesions of the nucleus accumbens, but not the caudate nucleus, attenuate enhanced responding with reward-related stimuli produced by intra-accumbens D-amphetamine. Psychpharmacology, 90: 390–397. Tellegen, A. (1985) Structures of mood and personality and their relevance to assessing anxiety, with an emphasis on self-report. In: Tuma, A.H. and Maser, J.D. (Eds.), Anxiety and the Anxiety Disorders. Lawrence Erlbaum, Hillsdale, NJ, pp. 681–706. Timberlake, W. (1993) Behavior systems and reinforcement: an intergrative approach. J.Exp. Anal. Behav., 60: 105–128. Timberlake, W. and Lucas, G.A. (1989) Behavior systems and learning: from misbehavior to general principles. In: Klein, S.B. and Mowrer, R.R. (Eds.), Instrumental Conditioning Theory and the Impact of Biological Constraints on Learning. Erlbaum, Hillsdale, NJ. Ursin, H. and Kaada, B.R. (1960) Functional localization within the amygdaloid complex in the cat. Electroencephalogr. Clin. Neurophysiol., 12: 109–122. Vrana, S.R., Spence, E.L. and Lang, P.J. (1988) The startle probe response: a new measure of emotion? J. Abnorm. Psychol., 97: 487–491. Walker, D.L. and Davis, M. (1997) Double dissociation between the involvement of the bed nucleus of the stria terminalis and the central nucleus of the amygdala in light-enhanced versus fear-potentiated startle. J. Neurosci., 17: 9375–9383. Walker, D.L., Toufexis, D.J. and Davis, M. (2003) Role of the bed nucleus of the stria terminalis versus the amygdala in fear, stress, and anxiety. Eur. J. Pharmacol., 463: 199–216. Weisz, D.J. and McInerney, J. (1990) An associative process maintains reflex facilitation of the unconditioned nictitating membrane response during the early stages of training. Behav. Neurosci., 104: 21–27. Whalen, P.J. (1998) Fear, vigilance, and ambiguity: initial neuroimaging studies of the human amygdala. Curr. Directions Psychol. Sci., 7: 177–188. Whalen, P.J. and Kapp, B.S. (1991) Contributions of the amygdaloid central nucleus to the modulation of the nictitating membrane reflex in the rabbit. Behav. Neurosci., 105: 141–153. White, S.R. and Neuman, R.S. (1980) Facilitation of spinal motoneuron excitability by 5-hydroxytryptamine and noradrenaline. Brain Res., 185: 1–9. Wundt, W. (1896) Gundriss der Psychologie. Entgelman, Leipzig, Germany. Yeomans, J.S., Steidle, S. and Li, L. (2000) Conditioned brainstimulation reward attenuates the acoustic startle reflex in rats. Soc. Neurosci. Abstr., 30: 1252. Zhao, Z. and Davis, M. (2004) Fear-potentiated startle in rats is mediated by neurons in the deep layers of the superior colliculus/deep mesencephalic nucleus of the rostral midbrain through the glutamate non-NMDA receptors. J. Neurosci., 24: 10326–10334.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 2
Emotion and attention: event-related brain potential studies Harald T. Schupp1,, Tobias Flaisch1, Jessica Stockburger1 and Markus Jungho¨fer2 1 Department of Psychology, University of Konstanz, Konstanz, Germany Institute for Biomagnetism and Biosignalanalysis, Mu¨nster University Hospital, Mu¨nster, Germany
2
Abstract: Emotional pictures guide selective visual attention. A series of event-related brain potential (ERP) studies is reviewed demonstrating the consistent and robust modulation of specific ERP components by emotional images. Specifically, pictures depicting natural pleasant and unpleasant scenes are associated with an increased early posterior negativity, late positive potential, and sustained positive slow wave compared with neutral contents. These modulations are considered to index different stages of stimulus processing including perceptual encoding, stimulus representation in working memory, and elaborate stimulus evaluation. Furthermore, the review includes a discussion of studies exploring the interaction of motivated attention with passive and active forms of attentional control. Recent research is reviewed exploring the selective processing of emotional cues as a function of stimulus novelty, emotional prime pictures, learned stimulus significance, and in the context of explicit attention tasks. It is concluded that ERP measures are useful to assess the emotion–attention interface at the level of distinct processing stages. Results are discussed within the context of two-stage models of stimulus perception brought out by studies of attention, orienting, and learning. Keywords: emotion; attention; EEG; ERP; EPN; LPP Humans live in an environment presenting a seemingly endless stream of stimuli. Only a subset of the available information is consciously recognized, becomes the focus of sustained attention, and is subjected to controlled and elaborated processing. Mechanisms of selective attention assure the prioritized processing of some objects, events, or locations and multiple avenues to command attention are indicated by distinguishing among active and passive forms of attentional control1 (O¨hman et al.,
2000b). In passive attention, the power to capture attention derives from simple qualities of the stimulus such as intensity, suddenness of onset, or novelty. In active attention, priority processing reflects the intentional effort to look for selected stimuli based on instructions, self-generated intentions, or associative learning. In addition, certain kinds of stimuli trigger selective attention due to their biological meaning. Organisms respond to environmental cues according to their emotional/motivational significance (Lang et al., 1997; O¨hman et al., 2000b). The attention capture of emotionally relevant stimuli has been dubbed ‘motivated attention’ referring to a natural state of selective attention, ‘‘y similar to that occurring in an animal as it forages in a field, encounters others, pursues prey or sexual partners, and tries
Corresponding author. Tel.: +49-7531-882504; Fax: +497531-882971; E-mail:
[email protected] 1 Similar to the distinction between active and passive attention, other classification schemata distinguish between automatic vs. controlled, bottom-up vs. top-down, reflexive vs. instructed, implicit vs. explicit attention. DOI: 10.1016/S0079-6123(06)56002-9
31
32
or void predators and comparable dangers’’ (Lang et al., 1997, p. 97). An evolutionary perspective suggests that this form of attentional control is highly adaptive for survival giving primacy — in terms of attentional selection and response — to appetitively and aversively relevant events (Lang et al., 1997). One aim of the present review is to present theory and data regarding the emotional guidance of selective attention in distinct processing stages. The review will discuss the effects of motivated attention from the perspective of the biphasic approach to emotion. Furthermore, we predominantly consider empirical evidence derived by event-related brain potential studies (ERPs).2 ERP measures have the unique advantage of high temporal resolution, and it will be concluded that they provide a useful tool to study the emotional guidance of attention at the level of distinct processing stages. A second aim of the present review is to extend this perspective to recent approaches examining the emotion–attention interface. Specifically, recent research is considered examining effects of motivated attention in relation to active and passive attention and learning. In the final section, the ERP indices of selective attentional orienting to emotional cues are discussed from the perspective of two-stage models of stimulus perception advanced in research of attention, orienting, and learning. A biphasic view of emotion Although emotional expressions, bodily changes, and reported feelings vary idiosyncratically according to dispositional and situational factors, many theorists claim that the emotional or affect system retains a much simpler biphasic organization of two distinct motivational subsystems (Schneirla, 1959; Konorski, 1967; Dickinson and Dearing, 1979; Lang et al., 1990, 1997; Lang, 1995; Cacioppo et al., 1999; Dickinson and Balleine, 2002). The self-preservative appetitive 2 Discussion of the interaction of emotion and attention based on behavioral, autonomic, reflex, somatic, and neuroimaging measures are provided in recent reviews (Lang et al., 1997; Mogg and Bradley, 1999; O¨hman et al., 2000a; Pessoa et al., 2002a; Compton, 2003; Vuilleumier, 2005).
system determines foraging, ingestion, copulation, and nurture of progeny, and is accompanied by affectively pleasant states. The protective defensive system coordinates withdrawal from and defense against nociceptive agents, and is associated with the experience of unpleasant affects. The view that emotion is in part organized by underlying motivational factors is supported by research utilizing verbal reports, which consistently demonstrates the primacy of the valence dimension. Furthermore, both motivational subsystems can vary in terms of engagement or activation reflecting the arousal level which is reliably observed as second dimension in studies of natural language and verbal reports (Lang et al., 1990, 1997). Accordingly, emotions can be functionally considered as action dispositions preparing the organism for either avoidance or approach-related actions, interrupting ongoing behavior and mental processes (Frijda, 1986; Lang et al., 1997). Consistent with this view, a large number of studies demonstrated reliable modulations in autonomic, somatic, and reflex response measures while participants viewed pleasant, neutral, and unpleasant pictures (Bradley, 2000; Hamm et al., 2003). Presumably, emotional modulations are not limited to the behavioral output channels. They should also be evident during the preceding perceptual and evaluative processing of emotional cues. Specifically, efficient preparation and organization of appropriate behavioral responses require a rapid extraction of critical information from the environment. In this respect, emotional cues direct attentional resources (O¨hman et al., 2000a). Furthermore, motivated attention reflects our evolutionary heritage and is therefore most apparent for stimuli with high evolutionary significance, that is, prototypical stimuli related to threat and survival strongly engaging basic motivational systems (Bradley et al., 2001). Emotion and attention: event-related brain potentials ERP measures provide a unique window into the brain’s processing of emotional cues assisting in detailing information processing at the level of
33
distinct stages. Already, the operation of selective processing triggered by reflexive or explicit attention has been greatly facilitated by the use of ERP measures. Specifically, an array of experimental tasks including the attentional blink, refractory period, visual search, and spatial cuing paradigms revealed modulations of specific ERP components thought to reflect perceptual encoding, working memory, and motor preparation processes (Luck and Hillyard, 2000; Luck et al., 2000). Similar to the domains of explicit and reflexive attention, it is expected that ERPs provide valuable information regarding the emotional guidance of selective attention. Three issues are of particular interest: (1) Identifying distinct processing stages at which emotional cues are selectively processed. (2) Determining whether the differential processing of emotional cues already affects obligatory processing stages. (3) Comparing the ERP signature of motivated and explicit attentional control. ERPs provide a voltage measurement of neural activity that can be recorded noninvasively from multiple scalp regions (Birbaumer et al., 1990). More specific, ERPs are considered to reflect summed postsynaptic potentials generated in the process of neural transmission and passively conducted through the brain and skull to the skin surface where they contribute to the electroencephalogram (EEG). Since ERPs are usually hidden in the larger background EEG activity, it is necessary to use multiple stimulus presentations and the calculation of stimulus-locked signal averaging to extract the ERP signal from the background EEG activity. Biophysical considerations suggest that large-amplitude ERP components reflect widespread, synchronous sources in cortical regions (Lutzenberger et al., 1987). Brain activity locked to the processing of a stimulus becomes apparent as positive and negative deflections in the ERP waveform. The amplitude and latency of specific ERP components provide information regarding the strength and time course of underlying neural processes. Furthermore, given appropriate spatial sampling (Tucker, 1993), the topography of ERP components can be used to estimate the neural generator sites by advanced analytic tools such as Current-Source-Density (CSD; Perrin, Bertrand and Pernier, 1987) or L2-Minimum-Norm-Estimate
(L2-MNE; Hamalainen and Ilmoniemi, 1994; Hauk et al., 2002). In the experiments summarized here, pleasant, neutral, and unpleasant pictures from the International Affective Picture System (IAPS; Lang et al., 2005) were presented in a passive task context in which subjects were instructed to simply view the pictures. Some of the reviewed studies used a 6 s picture presentation time and extended intertrial intervals, enabling the simultaneous assessment of autonomic and somatic measures of affect processing (Cuthbert et al., 2000; Schupp et al., 2004a). Other studies utilized the timing parameters of the modified oddball paradigm developed by Cacioppo and colleagues (Cacioppo et al., 1993) with shorter presentation times (1.5 s) and faster presentation rates (Schupp et al., 2000a, 2003a, 2004b). In addition, more recent studies using the rapid serial visual presentation (RSVP) paradigm are described, in which the pictures were presented briefly (333 ms) and as a continuous stream (Jungho¨fer et al., 2001; Schupp et al., 2003b, 2006, submitted; Wieser et al., 2006;). Furthermore, while earlier studies used sparse sensor sampling, more recent research provided an improved spatial sampling by using 128 or 256 sensor recordings. Due to limitations in head coverage and the placements of sensors, reliable differentiation of emotional and neutral contents within the first 300 ms was brought out in comparatively recent research (Jungho¨fer et al., 2001; Schupp et al., 2003b). Finally, earlier research exploring emotion processing is acknowledged, complementing the data reported in this review (Johnston et al., 1986; Diedrich et al., 1997; Johnston and Oliver-Rodriguez, 1997; Palomba et al., 1997). Across studies, the differential processing of emotional compared to neutral pictures was consistently and reliably reflected by three ERP components, which are summarized in the following according to their temporal occurrence in the processing stream.
Early posterior negativity The first ERP component reflecting the differential processing of emotional compared to neutral stimuli is the early posterior negativity (EPN). The temporal
34
and spatial appearance of the early indicator of selective processing of emotional pictures is illustrated in Fig. 1 presenting data from research utilizing the RSVP paradigm (Jungho¨fer et al., 2001). A pronounced ERP difference for the processing of emotionally arousing and neutral pictures developed around 150 ms which was maximally pronounced around 250–300 ms. This differential ERP appeared as negative deflection over temporooccipital sensor sites and a corresponding polarity reversal over fronto-central regions. Despite differences in the overall topography, research presenting pictures discretely and with longer presentation times (1.5 s and 1.5 s intertrial interval) also demonstrated a more pronounced negative potential for emotional pictures in the same latency range over temporooccipital sensors (Schupp et al., 2003a, 2004c). Additional analyses served to test the prediction derived from the biphasic view of emotion that the differential processing of pleasant and unpleasant cues varies
as a function of emotional arousal. Consistent with this notion, the EPN co-varied with the arousal level of the emotional pictures. Specifically, highly arousing picture contents of erotic scenes and mutilations elicited a more pronounced posterior negativity compared to less arousing categories of the same valence (Jungho¨fer et al., 2001; Schupp et al., 2004b). Given that these studies used natural scenes varying widely in terms of physical characteristics, it is important to explore to what extent the EPN effect is related to systematic differences in physical characteristics of the stimulus materials. To examine the effects of color of the stimulus materials, Jungho¨fer and colleagues (2001) included a control condition presenting the same materials as grayscaled images. An almost identical affect modulated early posterior negativity was observed as for the corresponding color images showing that the early discrimination of emotional from neutral pictures did not depend on color. Additional
Fig. 1. Time course and topography of the early posterior negativity. Left side: Upper panel: Grand-averaged ERP waveforms at a right occipital sensor while viewing high and low arousing (left side), and high, moderate, and low arousing pictures (right side), respectively. Pictures were presented at 3 Hz. Lower panel: Grand-averaged ERP waveforms at the same right occipital sensor while viewing high and low arousing pictures at 5 Hz. The relative difference component peaking around 64 ms reflects the EPN of the preceding emotional picture. Right side: Top (upper panel) and back (lower panel) side views of the topographical distribution of the difference waves of (high–low) arousing pictures for both the 3 Hz (left side) as well as the 5 Hz (right side) presentation rates. Scalp topography is shown in terms of current source density distribution estimating difference activities in bilateral occipito-temporal and right hemispheric parietal cortex areas.
35
control analyses determined that increased EPN amplitudes to emotional pictures were not secondary to differences in physical characteristics such as brightness, measures of perceptual complexity, or spatial frequency. However, a recent study provided a more systematic investigation of the EPN component as a function of picture content (objects vs. people), emotional arousal (low vs. high), and picture type (single foreground item vs. scattered scenes and multiple items; Lo¨w et al., 2005). Each of these variables affected the EPN component suggesting that the EPN is not specifically reflecting emotional arousal but is presumably more generally related to perceptual variables determining selective attention. For instance, greater EPN amplitudes were obtained for people compared to objects, and pictures with a clear foreground object as opposed to more complex scenes. Interestingly, emotional effects were observed independent from picture type and content. However, in this study, high compared to low arousing pictures elicited augmented EPN amplitudes only for pictures depicting people. Accordingly, to determine the effects of emotional arousal, secondary explanations based on picture type and content need to be experimentally controlled by selecting appropriate picture materials (Schupp et al., 2004a, b). From a theoretical perspective, it is unknown to what extent variables such as salient perceptual features, high evolutionary significance, and expert knowledge contribute to the emotional EPN modulation. Future research is needed to examine whether the EPN effect is limited to selected classes of emotional stimuli. It might be informative to briefly outline the reasons that led to the decision to refer to this modulation as early posterior negativity associated with emotion processing (rather than reduced positivity). First, estimates of the neural generators by current source density (cf., Fig. 1; Jungho¨fer et al., 2001) and minimum norm analysis (cf., Fig. 3; Schupp et al., 2006) suggested that the difference in potential between emotional and neutral pictures has its sources over occipito-temporo-parietal sites. Second, with the notion of a stronger positive potential for neutral compared to emotional images, some readers might infer stronger neural activation for neutral materials. In fact, our accompanying
functional magnetic resonance imaging (fMRI) studies indicate increased activation of the extended visual cortex while viewing emotional pictures in rapid serial visual presentations (Jungho¨fer et al., 2005, 2006). Third, paying explicit attention to specific stimulus features (such as color, orientation, or shape) and higher-order categories (e.g., animal vs. non-animal) is also reflected by a temporo-occipital negativity with a similar latency (Thorpe et al., 1996; Codispoti et al., in press). In sum, a negative potential over temporo-occipital sensor sites, developing around 150 ms poststimulus and most pronounced around 200–300 ms, reliably reflects the differential processing of emotional compared to neutral visual stimuli. These findings have been considered from the perspective of ‘natural selective attention,’ proposing that perceptual encoding is in part directed by underlying motivational systems of avoidance and approach.3
Late positive potential As shown in Fig. 2, subsequent to the modulation during perceptual encoding, it is consistently observed that emotional (pleasant and unpleasant) pictures elicit increased late positive potentials (LPPs) over centro-parietal regions, most apparent around 400–600 ms poststimulus (also referred to as P3b; Palomba et al., 1997; Cuthbert et al., 2000; Schupp et al., 2000a, 2003a, 2004a, b; Keil et al., 2002; Amrhein et al., 2004). This LPP modulation appears sizable and can be observed in nearly each individual when calculating simple difference scores (emotional–neutral). In addition, the LPP is also specifically enhanced for pictures that are more emotionally intense (i.e., described by viewers as more arousing, and showing a heightened skin conductance response). Extending the valence by arousal interaction, a recent study examined the LPP amplitude associated with the processing of specific categories of human 3 The present analyses might be extended to discrete states of emotion, considered to reflect a subordinate level of emotional organization. For instance, focusing on the fear system, we recently obtained evidence that threatening faces are associated with increased EPN and LPP amplitudes compared to neutral and friendly faces (Schupp et al., 2004c).
36
Fig. 2. Time course and topography of the late positive potential. (A) Illustration of the experimental paradigm used. Pictures are shown in blocks of six stimuli for 1.5 s each, separated by variable Intertrial Intervals (ITI) (1.5 s). (B) Left side: Grand-averaged ERP waveforms of a right parietal sensor while viewing pleasant, neutral, and unpleasant pictures. Right side: Difference scalp potential maps (emotional–neutral) reveal the topography of the LPP modulation in the time interval of 400–600 ms poststimulus. Illustrated is a left and right-side view of the model head.
experience. Focusing on specific pleasant and unpleasant picture contents, it was found that picture contents of high evolutionary significance such as pictures of erotica and sexual contents and contents of threat and mutilations were associated with enlarged LPP amplitudes compared to picture categories of the same valence but less evolutionary significance (Schupp et al., 2004a). In the cognitive domain, it is a hallmark finding that increased LPP amplitudes are associated with the meaning of task-relevant stimuli rather than just simple physical stimulus characteristics (Johnson, 1988; Picton, 1992). In addition, a close
correspondence between instruction to attend to specified stimuli and the LPP amplitude associated with target detection has been reported. Moreover, dual-task studies revealed that the LPP amplitude to a primary counting task was reciprocally related to the LPP amplitude observed for a competing secondary task (Donchin et al., 1986). These findings suggest that the LPP is a sensitive measure of attention manipulations indicating the operation of a capacity-limited system. Functionally, the LPP amplitude has been considered to reflect the representation of stimuli in working memory (Donchin and Coles, 1988). Consistent with this view, research
37
with the attentional blink paradigm indicates that the LPP might reflect a gateway to conscious recognition (Luck et al., 2000; Kranczioch et al., 2003). Considering these findings from cognitive research, it might be proposed that the increased LPP amplitude elicited by emotional cues reflects their intrinsic significance similar to the distinct representation in working memory of task-relevant stimuli. An alternative interpretation might consider the LPP findings from the perspective of internal stimulus probability rather than emotional relevance. In fact, ample research documents that rare stimuli are associated with increased LPP amplitudes (Johnson, 1988). However, the LPP modulations discussed here appear to be unrelated to the probability of picture categories. When subjects had to spontaneously categorize the pictures on the basis of valence, the categories of pleasant, neutral and unpleasant contents were equally probable. Alternatively, assuming subjects saw the pictures as forming only two categories (emotional vs. neutral), the emotional stimuli were twice as frequent as neutral contents, as a consequence resulting in the opposite prediction as empirically observed: neutral pictures should have elicited larger LPP amplitudes compared to emotional categories. To resolve this issue more conclusively, the effect of stimulus probability and emotional context was explored in a follow-up study using the modified oddball paradigm. In this study, increased LPP amplitudes to emotional compared to neutral pictures were observed when these contents were interspersed in blocks of either neutral, pleasant, or unpleasant context stimuli (Schupp et al., 2000b). Thus, these data strongly suggest that emotional arousal is considerably more potent in determining LPP amplitude than local probability.
sustained perceptual operations and memory processes (Ruchkin et al., 1988; Ritter and Ruchkin, 1992). Accordingly, it was suggested that the positive slow wave reflects sustained attention to visual emotional stimuli. This hypothesis is supported by research using a secondary acoustic probe presented during picture viewing. In these experiments, the P3 component of the probe ERP is reliably smaller when viewing pleasant or unpleasant, compared to neutral pictures (Schupp et al., 1997, 2004a; Cuthbert et al., 1998). This result parallels findings from experiments on instructed attention. When participants are told to attend to one stimulus and ignore others (Johnson, 1988), a smaller P3 response to a secondary probe is consistently found during the attended stimulus. This result is held to reflect reduced availability of attentional resources for the probe, assuming that the resource pool is limited and there is high allocation to the primary stimulus (Donchin et al., 1986).
Positive slow wave
Exploring the emotion–attention interface
Research using longer picture presentation times revealed that the LPP is followed by an extended positive slow wave associated with the processing of emotional cues throughout the 6 s picture-viewing period (Cuthbert et al., 1995, 2000). In the cognitive domain, positive slow waves have been found to be sensitive to manipulations requiring
Building upon these findings, recent research was directed towards understanding effects of motivated attention in interaction with active and passive forms of attentional control. One goal of these studies was to determine the boundary conditions of the differential emotion processing. Thus, interference of the selective emotion processing was
Summary Summarizing these findings, a number of observations merit reemphasis: (1) The emotional guidance of selective attention is reflected by three distinct ERP components associated with different stages of stimulus processing including perceptual encoding, stimulus representation in working memory, and elaborate stimulus evaluation. (2) ERP modulations induced by emotional cues were reliable and consistently observed early in the processing stream (o300 ms). (3) Indices of selective attention observed for motivated and instructed attention appear strikingly similar.
38
determined as a function of stimulus novelty, competition by primary cognitive tasks, and the processing of emotional prime pictures. Another interest was to determine effects of cooperation, that is, when selective attention is devoted to emotional stimuli. Furthermore, learning and experience may shape attentional orienting as investigated in selected subject populations (e.g., simple phobia, substance abuse). Stimulus novelty and emotional perception In passive attention, stimuli may capture attentional resources simply because they are novel. In the studies described above, no extensive stimulus repetitions were implemented. Therefore, it remains to be determined to what extent the observed emotional attention capture depends on stimulus novelty. Widely studied in the context of the orienting response, the repeated presentation of sensory stimuli usually prompts habituation, that is, decrement across several response systems (reviewed in O¨hman et al., 2000b). Recent studies have demonstrated the importance of stimulus novelty for differential emotion processing measuring various motor output responses. Specifically, studying habituation in processing of emotional stimuli, skin conductance responses, heart rate, and corrugator muscle activity habituated rather quickly. In contrast, the startle reflex magnitude continued to be augmented for unpleasant compared to pleasant picture contents across repeated stimulus presentations (Bradley et al., 1993). Thus, habituation effects differ among measures of motor output and response preparation, and a series of recent studies extended the database to ERP measures of stimulus perception and evaluation. Given the attention capture by emotional cues may reflect a rather automatic phenomenon (O¨hman et al., 2000a), stimulus novelty might be less critical during perceptual processing and stimulus representation in working memory. To separate emotional meaning and stimulus novelty, the processing of erotica, neutral contents, and mutilations was examined across 90 picture repetitions (Schupp et al., 2006). Replicating previous results, emotional stimuli were associated with increased EPN amplitudes compared to
neutral pictures. Interestingly, differential emotion processing did not vary as a function of stimulus repetition and was similarly expressed across blocks of picture presentation (cf., Fig. 3). One might assume that presenting the pictures as continuous stream may be particularly effective in triggering attentional orienting to each stimulus, in effect preventing differential emotion processing from habituating. To pursue this possibility, a follow-up study presented the pictures for 120 ms, preceding a blank period of 880 ms. In this instance, three pictures displaying erotic, neutral, or mutilation scenes were repeated 700 times (Schupp et al., unpublished data). Again, the erotic and mutilation pictures elicited an increased EPN compared to the neutral scene, revealing no habituation as a function of excessive stimulus repetition. This issue was examined in yet another study focusing on the habituation of differential emotion processing indexed by the LPP amplitude (Codispoti et al., 2006). Three pictures displaying pleasant, neutral, or unpleasant scenes were repeated 60 times. Furthermore, to obtain autonomic and somatic indices of emotion processing, pictures were presented at slow rates. Results revealed augmented LPP amplitudes to pleasant and unpleasant compared to neutral pictures, which was maintained across repeated presentations of the stimuli. These results concur with findings in the cognitive domain observing pronounced LPP amplitudes in visual search tasks promoting automatic target detection (Hoffman et al., 1983). These studies suggest that stimulus novelty is not critical to observe differential processing of emotional compared with neutral stimuli reflected by EPN and LPP component. It has been suggested that perceptual processing in the cortex is regulated by subcortical structures involved in appetitive or defensive responding (Derryberry and Tucker, 1991; Davis and Whalen, 2001; Jungho¨fer et al., 2005; Sabatinelli et al., 2005). The present findings appear inconsistent with this hypothesis, considering that several studies observed relatively rapid habituation effects in the amygdala to emotional facial expressions. In addition, habituation effects were also demonstrated for the cingulate cortex, the hippocampus, and the dorsolateral prefrontal cortex (Breiter
39
Fig. 3. Effects of emotion and stimulus repetition.(a) Grand-averaged ERP waveforms for a selected right occipital sensor as a function of affect and first and last block of picture viewing. (b) Difference scalp potential maps (emotional–neutral) for the first and last block of picture presentation separately. (c) Illustration of the statistical effects observed in repeated measure ANOVAs calculated for each sensor and mean time interval. (d) Calculation of the L2-MNE for the scalp potential difference (emotional–neutral) separately for first and last block of picture presentation. All maps display a right-side view.
et al., 1996; Wright et al., 2001; Phan et al., 2003). However, a number of critical issues and findings need to be taken into account. First, as observed in a classical conditioning fMRI study, habituation may vary for the various anatomical subregions of the amygdala serving distinct functions (Morris et al., 2001). Second, a recent study observed that patients with amygdala lesions did not show enhanced activity to fearful faces in the fusiform and occipital gyri, which was indeed found for a group of healthy controls (Vuilleumier et al., 2004). Finally, sizeable BOLD activations in the amygdala have been observed for highly arousing emotional materials (Jungho¨fer et al., 2005; Sabatinelli et al., 2005), and it remains to be
determined in future studies whether the amygdala also reveals rapid habituation to these emotionally evocative stimuli. Thus, more research is needed to evaluate the hypothesis that the preferential processing of emotional images in the extended visual cortex is secondary to the appraisal of affective significance in the amygdala. Alternatively, as discussed in associative learning, limbic structures such as the amygdala might modulate the associative strength of cortical stimulus representations of emotionally significant materials (Bu¨chel and Dolan, 2000). Taken together, habituation studies have shed light on the role of stimulus novelty in emotion picture processing. Across several studies, we
40
observed that the differential processing of emotional compared to neutral stimuli indexed by the EPN and LPP components did not depend on stimulus novelty and were maintained across numerous stimulus repetitions. These findings suggest that detecting emotionally significant stimuli in the environment might be an obligatory task of the organism, apparently not habituating as a function of passive exposure.
Emotion processing in the context of primary attention tasks Recent studies approached the interaction of emotion and attention from the perspective of competition by concurrently presenting emotional stimuli and nonemotional task-relevant stimuli (Vuilleumier and Schwartz, 2001; Pessoa et al., 2002b; Anderson et al., 2003). For instance, relying on a primary spatial attention task, Vuilleumier and Schwartz (2001) demonstrated the selective activation of the amygdala and fusiform gyrus to fearful faces independent of whether the faces were presented at attended or unattended spatial locations. In another study, Pessoa and colleagues (2002b) observed that selective emotion processing depended on the availability of attentional resources. In their critical condition, subjects had to discriminate the orientation of eccentrically presented bars while maintaining fixation on centrally presented emotional or neutral faces. Contrasting with control conditions without additional task load, emotional faces did not elicit increased activation, neither in visual processing areas, nor in the amygdala. These data were interpreted from the perspective of the biased competition model of visual attention, assuming that emotional and task-relevant representations competed for processing resources in the visual cortex (Desimone and Duncan, 1995; Pessoa et al., 2002a). Complementary evidence was provided by recent ERP research. For instance, Pourtois and colleagues (2004) observed that task-relevant bar stimuli cued by fearful rather than neutral faces were associated with increased P1 components over lateral occipital leads. In addition, Holmes and Colleagues (2003) observed a reduced N1 peak over frontal
sites to fearful as compared to neutral faces specifically when the faces were presented at attended locations while absent when presented at nonattended locations. Taken together, these data suggest the interference of selective emotion processing when attentional resources are directed to locations of explicitly task-relevant stimuli. Pursuing the interaction of emotion and attention, Schupp and colleagues (2003b) studied the hypothesis of obligatory selective emotional processing while subjects performed a feature-based explicit attention task. Towards this end, task-irrelevant pleasant, neutral, and unpleasant images were presented while subjects had to detect target checkerboard images interspersed in the picture sequence. In order to increase perceptual load, stimuli were presented as a rapid continuous stream with individual presentation times of 333 ms. The findings revealed increased EPN amplitudes to pleasant and unpleasant stimuli, particularly pronounced for stimuli of high evolutionary significance. Additionally, behavioral and electrophysiological responses to the task stimuli showed successful top-down attentional control to nonemotional stimuli. This research illustrates the selective processing of emotional cues while these stimuli were irrelevant to the primary cognitive task. A stronger test for the hypothesis of the automatic nature of selective emotion processing was implemented by a study presenting emotional pictures and task-relevant information concurrently. As shown in Fig. 4(A), target and distracter stimuli were created by grids of horizontal and vertical lines, which were presented overlaying the IAPS pictures. To systematically vary the task load, the proportion of stimuli presenting the IAPS pictures with or without the overlaid task-relevant grid-pattern, respectively, was changed in four experimental conditions. Specifically, in separate blocks, task-relevant stimuli occurred with 10%, 50%, or 100% probability, respectively, while an additional control condition presented no task stimuli in order to replicate previous findings. The order of the four experimental conditions was balanced across subjects. Results closely replicated previous findings for the control condition with pleasant and unpleasant pictures eliciting a pronounced EPN amplitude relative to neutral
41
Fig. 4. Interaction of emotion and explicit attention. (A) Illustration of the used RSVP paradigm in two of four conditions. Upper row: passive viewing condition with IAPS pictures. Lower row: task condition with overlaid grid patterns in each trial. In this example, the fourth trial represents the rare target stimuli. (B) Upper panel: Grand-averaged ERP waveforms of pleasant, neutral, and unpleasant pictures in the four experimental conditions for a right occipital sensor. Gray-shaded area is the analyzed EPN time interval from 200–300 ms. Lower panel: Topographical difference maps for pleasant-neutral and unpleasant-neutral in the four conditions projected on the back view of a model head.
cues. Furthermore, occasionally interspersed task stimuli (10% condition) did not interfere with the processing of emotional compared to neutral stimuli. Again, an active task set did not interfere with
selective emotion processing of pictures not presenting task-related information (Schupp et al., 2003b). However, interference effects were pronounced in those conditions with higher task load.
42
The differentiation between emotional and neutral stimuli was greatly attenuated during the 50% and 100% conditions (cf., Fig. 4). As expected by an interference account, ERP indices of selective attention to the task-related stimuli were obtained. Specifically, target compared to distracter stimuli revealed increased EPN and LPP amplitudes. Taken together, consistent with the notion of capacity limitations during perceptual encoding, these data suggest that explicit attention to taskrelated stimuli interferes with selective processing of emotional ‘background’ information. Interestingly, a similar pattern of results was obtained when the visual oddball task was substituted by an auditory one (Schupp et al., unpublished data). The same ERP component, which was found to be insensitive to stimulus novelty, was modified by explicit attention control mechanisms suggesting that the stimulus-driven attention capture interacts with the goal-driven control of attention.
Emotional perception as a function of prime stimuli Interference effects may not be limited to concurrently presented stimuli, but may extend in time. In the real world, stimuli often appear not in isolation but follow each other, raising the question to what extent the selective encoding of emotional pictures varies as a function of the emotion category of a preceding ‘prime’ picture. For instance, is the processing of an erotic scene facilitated when preceded by another pleasant image rather than a scene of threat? Considered from the perspective of motivated attention (Lang et al., 1997), processing resources are automatically captured and sustained by emotional cues (Cuthbert et al., 2000; Keil et al., 2002). Accordingly, one might expect that an emotional prime picture drawing heavily from limited processing resources is hampering the processing of subsequently presented stimuli. Thus, the motivated attention hypothesis assumes that target picture processing will vary as a function of processing resources devoted to both prime and target pictures. Specifically, emotional target pictures are associated with a larger EPN compared to neutral images, and, emotional prime pictures, themselves associated with an increased EPN, should reduce
the posterior negativity of subsequent target pictures. While congruence in hedonic valence between a prime and a target picture has no special status in a motivated attention framework, such effects might be predicted from the perspective of affective priming. A typical finding in behavioral priming tasks is that target words are evaluated faster when preceded by a short-duration (e.g., 300 ms) prime of the same hedonic valence (for reviews see Bargh, 1989; Klauer and Musch, 2003). Affective priming may reflect spreading activation (Ferguson and Bargh, 2003) and this perspective would predict an interaction of prime and target picture category due to the facilitated processing of target pictures preceded by prime pictures of the same valence category. To examine this issue (Flaisch et al., submitted), subjects viewed a continuous stream of pleasant, neutral, and unpleasant pictures, presented for 335 ms each. In order to examine the effects of emotional primes on target picture processing, separate average waveforms were calculated for nine experimental cells (three emotional picture categories for the prime and target picture, respectively). As expected, emotional target pictures were associated with a larger early posterior negativity, compared to neutral ones. Moreover, it was found that the magnitude of the EPN for a target picture varied systematically as a function of the preceding prime (cf., Fig. 5). When a prime picture was emotional (and itself elicited an enhanced early posterior negativity), the EPN of the subsequent target picture was reduced. This effect of an emotional prime image was identical regardless of whether the target picture was pleasant, neutral, or unpleasant. Thus, the data revealed no evidence for affective priming in the perceptual domain. Rather, whether the hedonic valence of the prime was congruent or incongruent with the following target, occipital negativity of the target picture was decreased if the prime picture was affectively engaging. The novel finding of this study is that current target processing is affected not only by the emotional content of the current image but also systematically varies with the emotional content of the preceding prime picture. These findings imply that the capture of processing resources extends in
43
Fig. 5. Effects on target picture processing as function of preceding prime pictures. (A) Illustration of the prime –target picture combinations examined in this study. (B) Upper panel: ERP waveforms for a representative right-occipital sensor illustrating the main effects of prime (right part) and target (left part) valence. The prime effect is illustrated by reduced posterior negativities following emotional compared to neutral prime pictures averaged across pleasant, neutral, and unpleasant target pictures. Lower panel: Detailed illustration of the prime effect on target processing by presenting waveforms separately for pleasant, neutral, and unpleasant target pictures. (C) Scalp potential maps of the difference waves (pleasant–neutral) and (unpleasant-neutral) in the time interval of 248–288 ms post-target picture onset reveal the topographical distributions of the modulation of target picture processing as a function of prime valence.
time and interferes with successively presented stimulus materials. These data are consistent with the notion that successively presented pictures are characterized by a distinct neural representation, which may compete for a limited pool of processing resources (Kastner and Ungerleider, 2000; Keysers and Perrett, 2002; Potter et al., 2002). However, rather than being limited to concurrent stimulus processing, the present results suggest that processing resources allocated to target pictures systematically vary with the amount of processing resources captured by preceding emotional prime stimuli. Furthermore, previous studies assessed competition effects mainly by introducing an explicit attention task. In contrast,
the pattern of results was observed here in the absence of any active task instruction, presumed to reflect the implicit capture of attention by emotional cues (O¨hman et al., 2000a). Paying explicit attention to emotions Competition designs explore the boundary conditions of the differential processing of emotional cues. In real life, however, both forces assuring attentional orienting, namely motivated and explicit attention, often cooperate. This raises the question whether paying attention to emotional rather than neutral contents is even more effective in drawing attentional resources. Thus, the
44
interplay of attention and emotion was examined under conditions when these two forces pull in the same direction (Schupp et al., submitted). In this study, subjects viewed a rapid and continuous stream of pictures displaying erotic, neutral, and mutilation contents while the EEG was recorded with a 128 dense sensor array. In separate runs, each of the three picture categories served as target category in a counting task. Target stimuli were defined according to their emotional valence, and therefore, both avenues to induce selective attention were assessed within the same experimental context. It is worth noting that detecting emotional and neutral target stimuli elicited increased occipital negativity and late positive potentials with similar topography and latency as in previous studies requiring subjects to detect animal targets (Thorpe et al., 1996; Delorme et al., 2004; Codispoti et al., in press). As expected, selective emotional processing closely replicated previous findings (Cuthbert et al., 2000; Jungho¨fer et al., 2001; Keil et al., 2002; Schupp et al., 2003b, 2004b), and was associated with a similar cortical signature as instructed attention. Establishing these cortical indices of selective attention due to implicit emotional and explicit task-defined significance provides the foundation to meaningfully interpret the observed interplay of attention and emotion. Interestingly, overadditive effects of paying attention to emotional rather than neutral contents were only observed at later stages of processing indexed by the LPP (c.f., Fig. 6). This effect appeared sizable, the amplitude of the late positive potential almost doubled when attending to emotional rather than neutral contents. In contrast, effects of emotion and task relevance were independent from each other during earlier perceptual processes indexed by the EPN. Thus, ERP indices tracking the unfolding of stimulus processing revealed a shift in the interplay of directed attention and emotional stimulus significance. Attending to emotional cues appeared particularly efficient to boost the LPP component. This may point to a capacity-limited serial processing system possibly involved with short-term memory representations needed for focused attention and conscious recognition. The shift in the interplay of target and emotion relevance from independent to overadditive
effects is presumably secondary to changes in neural structures controlling the expression of selective attention. The structures believed to be involved in selective visual processing of target and emotional task relevance are heavily interconnected, providing possible anatomical pathways to implement boosted processing of emotional targets. For instance, heavy interconnections are described for the dorsolateral prefrontal cortex, a key structure in the frontoparietal network organizing explicit attention effects, and the ventromedial prefrontal cortex, part of the paralimbic and limbic network implicated in emotional stimulus evaluation (Barbas, 1995; Davis and Whalen, 2001; Ghashghaei and Barbas, 2002; Bar, 2003). Consistent with anatomical data, recent fMRI studies revealed the interplay among these structures when attention was directed to the location of emotionally relevant stimuli or when target detection was compared to the processing of emotional distracters (Armony and Dolan, 2002; Yamasaki et al., 2002). However, considering the interplay of attention and emotion from the perspective of specific subprocesses, a recent theory is relevant linking late positive potentials to the activity in a specific neuromodulatory system, namely the ascending locus coeruleus–norepinephrine (LC–NE) system which is presumed to increase the gain of cortical representations (Nieuwenhuis et al., 2005). From this perspective, the overadditive effects of paying attention to emotional targets might be secondary to the engagement of the neuromodulatory LC–NE system, a hypothesis awaiting more direct evidence by future research. Taken together, there is first evidence that obligatory emotion processing interacts with explicit attention processes. However, providing support for the claim of an interaction between attention and emotion at the level of specific subprocesses, the relationship between emotion and attention varied across processing stages. Explicit attention effects and emotional significance operated additively during early perceptual stimulus categorization while synergistic effects of implicit emotion and explicit attention effects were observed during the stage of higher-order conceptual stimulus representation in working memory.
45
Fig. 6. Effects of cooperation between emotional and explicit attention. (A) Upper panel: Scalp difference maps (target–non-target stimuli) for erotic, neutral, and mutilation images in the EPN time-window. Lower panel: Calculation of the L2-MNE for the scalp potential difference illustrating emotion and explicit attention effects. (B) Upper panel: Scalp difference maps (target–non-target stimuli) for erotic, neutral, and mutilation images in the LPP time-window. Lower panel: Calculation of the L2-MNE for the scalp potential difference illustrating the interaction of emotion and attention. All maps display a right-side view.
Learned stimulus relevance and attentional orienting The perspective that emotional cues capture attentional resources provides a framework to
investigate subject populations responding highly emotional to selected stimuli. Accordingly, the issue is raised how individual experience and learning shape the attentional orienting to these stimuli
46
in distinct processing stages. Previous studies already provided evidence for a pronounced attention capture by fear-relevant stimuli in specific phobia. For instance, subjects with small animal phobia display a pronounced defensive reactivity to fear relevant stimuli, that is, augmented defensive startle magnitude, as well as heightened autonomic responsivity and increased activation in the amygdala and infero-temporal cortex (Hamm et al., 1997; Cuthbert et al., 2003; Sabatinelli et al., 2005). Building upon these findings, we recently used rapid serial visual presentations to explore whether fear-related pictures are associated with enhanced EPN amplitudes in snake or spider phobics. Consistent with this hypothesis, the EPN was largest for snake and spider pictures in snake and spider phobics, respectively. However, the effect appeared more robust in snake phobics (Schupp et al., unpublished data). Focusing on spider phobia, a follow-up study including a larger sample revealed only marginally increased EPN amplitudes to fear relevant materials in spider phobics compared to healthy controls. This study observed pronounced enlarged LPP and slow wave amplitudes for the processing of fear-relevant stimuli in phobics compared to control participants (Michalowski et al., 2005). Similar results have also been reported by Miltner and colleagues (2005). Yet another line of research examined this issue focusing on addiction and the processing of drug-associated cues such as heroin, alcohol, cocaine, and tobacco (Franken et al., 2004). Increased LPP and slow wave amplitudes have been observed in heroin-dependent compared to control individuals in drug-associated pictures depicting cocaine and heroin (Franken et al., 2003, 2004). Taken together, evidence accumulates suggesting that ERP measures derived in research of emotion processing in healthy individuals might contribute to the understanding of clinically relevant issues. Previous research provides ample evidence that learning and experience triggers pronounced and fast physiological responses to clinically relevant stimuli (Globisch et al., 1999; O¨hman et al., 2000b). Extending these data, disorder-related stimuli appear to trigger exaggerated responding in specific ERP components partitioning the attention capture in substages. However,
current results appear not fully consistent and await more conclusive research. In particular, future studies using dense sensor ERP arrays may enable to provide a better assessment of the early selective stimulus processing. Also, examining clinically relevant stimuli as well as standard pleasant, neutral, and unpleasant materials in a range of disorder groups appears particularly informative. One promise of this endeavor is that ERP measurements may become a sensitive tool to evaluate effects of behavior therapy complementing fMRI and startle probe measurements (Vrana et al., 1992; Straube et al., 2005). Summary Experimental approaches to explore the emotion–attention interface further the understanding of the emotional capture of attentional resources in distinct processing stages. One issue is to determine to what extent the EPN component reflects an automatic phenomenon, that is, the unintentional, unconscious, and effortless processing of emotional cues. Suggestive of automaticity, the EPN modulation to emotional cues is observed across repeated stimulus presentations of the same materials and occurs in the absence of an active task instruction. Thus, in the absence of interference by concurrent stimulus processing, selective emotional processing appears to be unintentional, effortless, and presumably precedes conscious recognition. Conversely, the EPN is subject to interference by intentional goals held in mind and the processing of immediately preceding emotional cues. Thus, concurrent and successively presented pictures are characterized by distinct neural representations, which may compete for a limited pool of processing resources, attenuating or abolishing selective emotion processing. Taken together, albeit preliminary, results appear to support component features rather than all-ornothing concepts of automaticity (Bargh, 1989). Further research is needed to explore the boundary conditions of the EPN modulation with regard to the various component features of automaticity. Another interesting issue is to delineate dissociations among the various processing stages reflecting
47
the emotional attention capture. In passive viewing contexts, emotional EPN and LPP modulation appear similar in many respects, that is, not subject to habituation and most pronounced for higharousing pleasant and unpleasant materials. However, when paying attention to emotion, boosted processing effects were only observed for the LPP, suggesting that the relationship of emotion and attention changes across time from independent (EPN) to synergistic effects (LPP). Thus, interaction effects of explicit attention and implicit emotion processes vary for distinct processing stages. Future studies are needed to extend these findings including subject populations (e.g., phobia, substance abuse) demonstrating exaggerated attentional orienting to specific stimuli.
Selective attention to emotion: two-stage models of stimulus perception It might be informative to consider the ERP findings reviewed above from the broader perspective of two-stage models of stimulus perception. Integrating research from cognitive psychology, the orienting reflex, and associative learning (O¨hman, 1979, 1986), a two-stage model was proposed to explicate how emotional, novel, and task-relevant stimuli guide selective attention. The model proposed a first large capacity perceptual scanning stage providing a more or less complete analysis of sensory information. Upon detection of significant stimuli, the perceptual system may emit a call for processing resources in a capacity-limited second stage of stimulus processing acting as gateway to focused attention and conscious recognition. Empirical support for this view is brought out by research with the attentional blink and rapid visual serial presentations. This research determined that pictures can be recognized and comprehended at rapid rates while memory probed immediately afterwards appears quite poor. Potter (Chun and Potter, 1995; Potter, 1999) suggested that stimulus recognition occurs rapidly within the first few hundred milliseconds. However, for conscious stimulus recognition, this fleeting stage of processing needs to be followed by a second stage of consolidation in short-term memory. Two-stage models of
stimulus perception may provide a framework to gain more specific hypotheses regarding the functional significance of ERP indices of selective attention. Specifically, the EPN associated with the perceptual encoding of emotional material may reflect the call for resources in the capacity-limited second stage of processing. However, to become consciously represented, stimuli need to have access to the second processing stage, which might be indexed by the LPP component. Furthermore, the elaborate and sustained attentive processing of emotional stimuli is presumably reflected by the sustained positive slow waves. Thus, the call for processing resources triggered after initial stimulus categorization assures that emotional stimuli have priority access to a capacity-limited stage required for working memory consolidation and conscious recognition. Considered from a functional and evolutionary perspective, this priority processing of emotional cues facilitates adaptive behavior, finally promoting survival and reproductive success (Lang et al., 1997; O¨hman et al., 2000a). The various two-stage models of stimulus perception suggest the rapid perceptual categorization of stimuli. Consistent with this view, available data suggest that the differential processing of emotional cues is consequent upon access to semantic meaning. Indirect evidence was provided by the similar latency of emotion discrimination and explicit attention effects obtained in simple feature and higher-order object-based attention tasks (Smid et al., 1999; Potts and Tucker, 2001; Delorme et al., 2004; Codispoti et al., in press ). More direct evidence was provided in a recent study examining explicit attention to emotional cues (Schupp et al., submitted; see above). Findings revealed that explicit attention effects and selective emotion processing appeared with similar latency suggesting that perceptual categorization up to the level of semantic meaning was achieved. One advantage of considering the ERP findings in the context of two-stage models of stimulus perception is that a common frame is provided for considering the effects of the various active and passive forms of attentional orienting. Specifically, the call for processing resources is thought to reflect short- and long-term memory mechanisms (O¨hman et al., 2000a). Emotional stimuli capture
48
attention because they include ‘tags’ on memory representations denoting significant environmental elements in the environment. Active attention effects are considered to reflect the expectancy of certain objects; that is, the temporary activation of long-term memories. In contrast, passive attention effects as observed by the classical orienting responses are thought to reflect the failure to match current stimuli to the content of the short-term memory store. Considering the competition and cooperation studies of the emotion–attention interaction suggests that the call for processing resources triggered by the various forms of attentional control are independent from each other and may compete for processing resources. Furthermore, consistent with the main theme of the present review to consider the emotional guidance of attention in distinct processing stages, interaction patterns changed across processing stages and showed synergistic effects in the second stage associated with stimulus representation in working memory. Clearly, these hypotheses are speculative and await future research. One promising avenue to pursue this view is to complement information regarding the temporal dynamics revealed by ERPs with structural information obtained with fMRI. Overall, reviewing the ERP findings from the perspective of two stage models specifies hypotheses regarding the functional significance of distinct ERP components from a broader theoretical context and suggests multiple avenues to explore the emotion–attention interaction in future research.
Abbreviations CSD EEG EPN ERP FMRI LPP L2-MNE RSVP
current-source-density electroencephalogram early posterior negativity event-related potential functional magnetic resonance imaging late positive potential L2-minimum-norm-estimate rapid serial visual presentation
Acknowledgments This work was supported by the German Research Foundation (DFG) Grants Schu 1074/7-3 and 1074/10-1. Address reprint requests to Harald T. Schupp at the Department of Psychology, University of Konstanz, Universita¨tsstr. 10, 78457 Konstanz, Germany.
References Amrhein, C., Mu¨hlberger, A., Pauli, P. and Wiedemann, G. (2004) Modulation of event-related brain potentials during affective picture processing: a complement to startle reflex and skin conductance response? Int. J. Psychophysiol., 54: 231–240. Anderson, A.K., Christoff, K., Panitz, D., De Rosa, E. and Gabrieli, J.D. (2003) Neural correlates of the automatic processing of threat facial signals. J. Neurosci., 23: 5627–5633. Armony, J.L. and Dolan, R.J. (2002) Modulation of spatial attention by fear-conditioned stimuli: an event-related fMRI study. Neuropsychologia, 40: 817–826. Bar, M. (2003) A cortical mechanism for triggering top-down facilitation in visual object recognition. J. Cogn. Neurosci., 15: 600–609. Barbas, H. (1995) Anatomic basis of cognitive-emotional interactions in the primate prefrontal cortex. Neurosci. Biobehav. Rev., 19: 499–510. Bargh, J.A. (1989) Conditional automaticity: varieties of automatic influence in social perception and cognition. In: Uleman, J.S. and Bargh, J.A. (Eds.), Unintended Thought. Guilford Press, New York, pp. 3–51. Birbaumer, N., Elbert, T., Canavan, A.G. and Rockstroh, B. (1990) Slow potentials of the cerebral cortex and behavior. Physiol. Rev., 70: 1–41. Bradley, M.M. (2000) Emotion and motivation. In: Cacioppo, J.T., Tassinary, L.G. and Berntson, G. (Eds.), Handbook of Psychophysiology. Cambridge University Press, New York, pp. 602–642. Bradley, M.M., Codispoti, M., Cuthbert, B.N. and Lang, P.J. (2001) Emotion and motivation I: defensive and appetitive reactions in picture processing. Emotion, 1: 276–298. Bradley, M.M., Lang, P.J. and Cuthbert, B.N. (1993) Emotion, novelty, and the startle reflex: habituation in humans. Behav. Neurosci., 107: 970–980. Breiter, H.C., Etcoff, N.L., Whalen, P.J., Kennedy, W.A., Rauch, S.L., Buckner, R.L., et al. (1996) Response and habituation of the human amygdala during visual processing of facial expression. Neuron, 17: 875–887. Bu¨chel, C. and Dolan, R.J. (2000) Classical fear conditioning in functional neuroimaging. Curr. Opin. Neurobiol., 10: 219–223.
49 Cacioppo, J.T., Crites Jr., S.L., Berntson, G.G. and Coles, M.G. (1993) If attitudes affect how stimuli are processed should they not affect the event-related brain potential. Psychol. Sci., 4: 108–112. Cacioppo, J.T., Gardner, W.L. and Berntson, G.G. (1999) The affect system has parallel and integrative processing components: form follows function. J. Pers. Soc. Psychol., 76: 839–855. Chun, M.M. and Potter, M.C. (1995) A two-stage model for multiple target detection in rapid serial visual presentation. J. Exp. Psychol. Hum. Percept. Perform., 21: 109–127. Codispoti, M., Ferrari, V. and Bradley, M.M. (2006) Repetitive picture processing: autonomic and cortical correlates. Brain Res., 1068: 213–220. Codispoti, M., Ferrari, V., Jungho¨fer, M. & Schupp, H.T. (in press) The categorization of natural scenes: brain attention networks revealed by dense sensor ERPs. Neuroimage. Compton, R.J. (2003) The interface between emotion and attention: a review of evidence from psychology and neuroscience. Behav. Cogn. Neurosci. Rev., 2: 115–129. Cuthbert, B.N., Schupp, H.T., Bradley, M., McManis, M. and Lang, P.J. (1998) Probing affective pictures: attended startle and tone probes. Psychophysiology, 35: 344–347. Cuthbert, B.N., Schupp, H.T., McManis, M.H., Hillman, C., Bradley, M.M. and Lang, P.J. (1995) Cortical slow waves: emotional perception and processing [Abstract]. Psychophysiology, 32: S27. Cuthbert, B.N., Lang, P.J., Strauss, C., Drobes, D., Patrick, C.J. and Bradley, M.M. (2003) The psychophysiology of anxiety disorder: fear memory imagery. Psychophysiology, 40: 407–422. Cuthbert, B.N., Schupp, H.T., Bradley, M.M., Birbaumer, N. and Lang, P.J. (2000) Brain potentials in affective picture processing: covariation with autonomic arousal and affective report. Biol. Psychol., 52: 95–111. Davis, M. and Whalen, P.J. (2001) The amygdala: vigilance and emotion. Mol. Psychiatry, 6: 13–34. Delorme, A., Rousselet, G.A., Mace, M.J. and Fabre-Thorpe, M. (2004) Interaction of top-down and bottom-up processing in the fast visual analysis of natural scenes. Brain. Res. Cogn. Brain Res., 19: 103–113. Derryberry, D. and Tucker, D.M. (1991) The adaptive base of the neural hierarchy: elementary motivational controls of network function. In: Dienstbier, A. (Ed.), Nebraska Symposium on Motivation. University of Nebraska Press, Lincoln, NE, pp. 289–342. Desimone, R. and Duncan, J. (1995) Neural mechanisms of selective visual attention. Annu. Rev. Neurosci., 18: 193–222. Dickinson, A. and Balleine, B.W. (2002). The role of learning in motivation. In: Gallistel C.R. (Ed.), Steven’s Handbook of Experimental Psychology, Vol 3, Learning, Motivation and Emotion, 3rd edn. Wiley, New York, pp. 497–533. Dickinson, A. and Dearing, M.F. (1979) Appetitive-aversive interactions and inhibitory processes. In: Dickinson, A. and Boakes, R.A. (Eds.), Mechanisms of Learning and Motivation. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 203–231.
Diedrich, O., Naumann, E., Maier, S.G.B. and Bartussek, D. (1997) A frontal positive slow wave in the ERP associated with emotional slides. J. Psychophysiol., 11: 71–84. Donchin, E. and Coles, M.G. (1988) Is the P300 component a manifestation of context updating? Behav. Brain Sci., 11: 357–427. Donchin, E., Kramer, A.E. and Wickens, C. (1986) Applications of brain event-related potentials to problems in engineering psychology. In: Coles, M.G.H., Donchin, E. and Porges, S.W. (Eds.), Psychophysiology: Systems, Processes, and Applications. Guilford Press, New York, pp. 702–718. Ferguson, M.J. and Bargh, J.A. (2003) The constructive nature of automatic evaluation. In: Musch, J. and Klauer, K.C. (Eds.), Psychology of Evaluation: Affective Processes in Cognition and Emotion. Lawrence Erlbaum Associates, Mahwah, NJ, pp. 169–188. Flaisch, T., Jungho¨fer, M., Bradley, M.M., Schupp, H.T. and Lang, P.J. (submitted) Rapid picture processing: affective primes and targets. Franken, I.H., Hulstijn, K.P., Stam, C.J., Hendriks, V.M. and van den Brink, W. (2004) Two new neurophysiological indices of cocaine craving: evoked brain potentials and cue modulated startle reflex. J. Psychopharmacol., 18: 544–552. Franken, I.H., Stam, C.J., Hendriks, V.M. and van den Brink, W. (2003) Neurophysiological evidence for abnormal cognitive processing of drug cues in heroin dependence. Psychopharmacology, 170: 205–212. Frijda, N.H. (1986) The Emotions. Cambridge University Press, Cambridge. Ghashghaei, H.T. and Barbas, H. (2002) Pathways for emotion: interactions of prefrontal and anterior temporal pathways in the amygdala of the rhesus monkey. Neuroscience, 115: 1261–1279. Globisch, J., Hamm, A.O., Esteves, F. and O¨hman, A. (1999) Fear appears fast: temporal course of startle reflex potentiation in animal fearful subjects. Psychophysiology, 36: 66–75. Hamalainen, M.S. and Ilmoniemi, R.J. (1994) Interpreting magnetic fields of the brain: minimum norm estimates. Med. Biol. Eng. Comput., 32: 35–42. Hamm, A.O., Cuthbert, B.N., Globisch, J. and Vaitl, D. (1997) Fear and the startle reflex: blink modulation and autonomic response patterns in animal and mutilation fearful subjects. Psychophysiology, 34: 97–107. Hamm, A.O., Schupp, H.T. and Weike, A.I. (2003) Motivational organization of emotions: autonomic changes, cortical responses, and reflex modulation. In: Davidson, R.J., Scherer, K. and Goldsmith, H.H. (Eds.), Handbook of Affective Sciences. Oxford University Press, Oxford, pp. 188–211. Hauk, O., Keil, A., Elbert, T. and Mu¨ller, M.M. (2002) Comparison of data transformation procedures to enhance topographical accuracy in timeseries analysis of the human EEG. J. Neurosci. Methods, 113: 111–122. Hoffman, J.E., Simons, R.F. and Houck, M. (1983) The effects of automatic and controlled processing on the P300. Psychophysiology, 20: 625–632. Holmes, A., Vuilleumier, P. and Eimer, M. (2003) The processing of emotional facial expression is gated by spatial
50 attention: evidence from event-related brain potentials. Brain. Res. Cogn. Brain Res., 16: 174–184. Johnson Jr., R. (1988) The amplitude of the P300 component of the event-related potential: review and synthesis. In: Ackles, P.K., Jennings, J.R. and Coles, M.G.H. (Eds.) Advances in Psychophysiology, Vol. 3. JAI Press, Greenwich, pp. 69–138. Johnston, V.S., Miller, D.R. and Burleson, M.H. (1986) Multiple P3 s to emotional stimuli and their theoretical significance. Psychophysiology, 23: 684–694. Johnston, V.S. and Oliver-Rodriguez, J.C. (1997) Facial beauty and the late positive component of event-related potentials. J. Sex Res., 34: 188–198. Jungho¨fer, M., Bradley, M.M., Elbert, T.R. and Lang, P.J. (2001) Fleeting images: a new look at early emotion discrimination. Psychophysiology, 38: 175–178. Jungho¨fer, M., Sabatinelli, D., Bradley, M.M., Schupp, H.T., Elbert, T.R. and Lang, P.J. (2006) Fleeting images: rapid affect discrimination in the visual cortex. Neuroreport, 17: 225–229. Jungho¨fer, M., Schupp, H.T., Stark, R. and Vaitl, D. (2005) Neuroimaging of emotion: empirical effects of proportional global signal scaling in fMRI data analysis. Neuroimage, 25: 520–526. Kastner, S. and Ungerleider, L.G. (2000) Mechanisms of visual attention in the human cortex. Annu. Rev. Neurosci., 23: 315–341. Keil, A., Bradley, M.M., Hauk, O., Rockstroh, B., Elbert, T. and Lang, P.J. (2002) Large-scale neural correlates of affective picture processing. Psychophysiology, 39: 641–649. Keysers, C. and Perrett, D.I. (2002) Visual masking and RSVP reveal neural competition. Trends Cogn. Sci., 6: 120–125. Klauer, K.C. and Musch, J. (2003) Affective priming: findings and theories. In: Musch, J. and Klauer, K.C. (Eds.), The Psychology of Evaluation: Affective Processes in Cognition and Emotion. Erlbaum, Mahwah, NJ, pp. 7–49. Konorski, J. (1967) Integrative Activity of the Brain: An Interdisciplinary Approach. University of Chicago Press, Chicago. Kranczioch, C., Debener, S. and Engel, A.K. (2003) Eventrelated potential correlates of the attentional blink phenomenon. Brain Res. Cogn. Brain Res., 17: 177–187. Lang, P.J., Bradley, M.M. and Cuthbert, B.N. (1990) Emotion, attention, and the startle reflex. Psychol. Rev., 97: 377–395. Lang, P.J. (1995) The emotion probe: studies of motivation and attention. Am. Psychol., 50: 371–385. Lang, P.J., Bradley, M.M. and Cuthbert, B.N. (1997) Motivated attention: affect, activation, and action. In: Lang, P.J., Simons, R.F. and Balaban, M. (Eds.), Attention and Emotion: Sensory and Motivational Processes. Erlbaum, Mahwah, NJ, pp. 97–135. Lang, P.J., Bradley, M.M. and Cuthbert, B.N. (2005). International affective picture system (IAPS): digitized photographs, instruction manual and affective ratings. Technical Report A-6. University of Florida, Gainesville, FL. Lo¨w, A., Lang, P.J. and Bradley, M.M. (2005) What pops out during rapid picture presentation? [Abstract]. Psychophysiology, 42: S81. Luck, S.J. and Hillyard, S.A. (2000) The operation of selective attention at multiple stages of processing: evidence from human
and monkey electrophysiology. In: Gazzaniga, M.S. (Ed.), The Cognitive Neurosciences (2nd edn). MIT Press, Cambridge. Luck, S.J., Woodman, G.F. and Vogel, E.K. (2000) Event-related potential studies of attention. Trends Cogn. Sci., 4: 432–440. Lutzenberger, W., Elbert, T. and Rockstroh, B. (1987) A brief tutorial on the implications of volume conduction for the interpretation of the EEG. J. Psychophysiol., 1: 81–90. Michalowski, J.M., Melzig, C.A., Schupp, H.T. and Hamm, A.O. (2005) Cortical processing of emotional pictures in spider phobic students [Abstract]. Psychophysiology, 42: S89. Miltner, W.H., Trippe, R.H., Krieschel, S., Gutberlet, I., Hecht, H. and Weiss, T. (2005) Event-related brain potentials and affective responses to threat in spider/snake-phobic and non-phobic subjects. Int. J. Psychophysiol., 57: 43–52. Mogg, K. and Bradley, B.P. (1999) Selective attention and anxiety: a cognitive-motivational perspective. In: Dalgleish, T. and Power, M.J. (Eds.), Handbook of Cognition and Emotion. Wiley, Chichester, pp. 145–170. Morris, J.S., Bu¨chel, C. and Dolan, R.J. (2001) Parallel neural responses in amygdala subregions and sensory cortex during implicit fear conditioning. Neuroimage, 13: 1044–1052. Nieuwenhuis, S., Aston-Jones, G. and Cohen, J.D. (2005) Decision making, the P3, and the locus coeruleus-norepinephrine system. Psychol. Bull., 131: 510–532. O¨hman, A. (1979) The orienting response, attention, and learning: an information-processing perspective. In: Kimmel, H.D., van Olst, E.H. and Orlebek, J.F. (Eds.), The Orienting Reflex In Humans. Erlbaum, Hillsdale, NJ, pp. 443–471. O¨hman, A. (1986) Face the beast and fear the face: animal and social fears as prototypes for evolutionary analyses of emotion. Psychophysiology, 23: 123–145. O¨hman, A., Flykt, A. and Lundqvist, D. (2000a) Unconscious emotion: evolutionary perspectives, psychophysiological data and neuropsychological mechanisms. In: Lane, R.D. and Nadel, L. (Eds.), Cognitive Neuroscience of Emotion. Oxford University Press, Oxford, pp. 296–327. O¨hman, A., Hamm, A. and Hugdahl, K. (2000b) Cognition and the autonomic nervous system: orienting, anticipation, and conditioning. In: Cacioppo, J.T., Tassinary, L.G. and Bernston, G.G. (Eds.), Handbook of Psychophysiology (2nd edn.). Cambridge University Press, Cambridge, UK, pp. 533–575. Palomba, D., Angrilli, A. and Mini, A. (1997) Visual evoked potentials, heart rate responses and memory to emotional pictorial stimuli. Int. J. Psychophysiol., 27: 55–67. Pessoa, L., Kastner, S. and Ungerleider, L.G. (2002a) Attentional control of the processing of neutral and emotional stimuli. Brain Res. Cogn. Brain Res., 15: 31–45. Pessoa, L., McKenna, M., Gutierrez, E. and Ungerleider, L.G. (2002b) Neural processing of emotional faces requires attention. Proc. Natl. Acad. Sci. USA, 99: 11458–11463. Phan, K.L., Liberzon, I., Welsh, R.C., Britton, J.C. and Taylor, S.F. (2003) Habituation of rostral anterior cingulate cortex to repeated emotionally salient pictures. Neuropsychopharmacology, 28: 1344–1350. Picton, T.W. (1992) The P300 wave of the human event-related potential. J. Clin. Neurophysiol., 9: 456–479.
51 Potter, M.C. (1999) Understanding sentences and scenes: the role of conceptual short term memory. In: Coltheart, V. (Ed.), Fleeting Memories. MIT Press, Cambridge, MA, pp. 13–46. Potter, M.C., Staub, A. and O’Connor, D.H. (2002) The time course of competition for attention: attention is initially labile. J. Exp. Psychol. Hum. Percept. Perform., 28: 1149–1162. Potts, G.F. and Tucker, D.M. (2001) Frontal evaluation and posterior representation in target detection. Brain Res. Cogn. Brain Res., 11: 147–156. Pourtois, G., Grandjean, D., Sander, D. and Vuilleumier, P. (2004) Electrophysiological correlates of rapid spatial orienting towards fearful faces. Cereb. Cortex, 14: 619–633. Ritter, W. and Ruchkin, D.S. (1992) A review of event-related potential components discovered in the context of studying P3. Ann. NY. Acad. Sci., 658: 1–32. Ruchkin, D.S., Johnson Jr., R., Mahaffey, D. and Sutton, S. (1988) Toward a functional categorization of slow waves. Psychophysiology, 25: 339–353. Sabatinelli, D., Bradley, M.M., Fitzsimmons, J.R. and Lang, P.J. (2005) Parallel amygdala and inferotemporal activation reflect emotional intensity and fear relevance. Neuroimage, 24: 1265–1270. Schneirla, T. (1959) An evolutionary and developmental theory of biphasic processes underlying approach and withdrawal. In: Jones, M. (Ed.), Nebraska Symposium on Motivation. University of Nebraska Press, Lincoln, pp. 1–42. Schupp, H.T., Cuthbert, B.N., Bradley, M.M., Birbaumer, N. and Lang, P.J. (1997) Probe P3 and blinks: two measures of affective startle modulation. Psychophysiology, 34: 1–6. Schupp, H.T., Cuthbert, B.N., Bradley, M.M., Cacioppo, J.T., Ito, T. and Lang, P.J. (2000a) Affective picture processing: the late positive potential is modulated by motivational relevance. Psychophysiology, 37: 257–261. Schupp, H.T., Cuthbert, B.N., Bradley, M.M., Hillman, C.H., Hamm, A.O. and Lang, P.J. (2004a) Brain processes in emotional perception: motivated attention. Cogn. Emotion, 18: 593–611. Schupp, H.T., Jungho¨fer, M., Weike, A.I. and Hamm, A.O. (2003a) Emotional facilitation of sensory processing in the visual cortex. Psychol. Sci., 14: 7–13. Schupp, H.T., Jungho¨fer, M., Weike, A.I. and Hamm, A.O. (2003b) Attention and emotion: an ERP analysis of facilitated emotional stimulus processing. Neuroreport, 14: 1107–1110. Schupp, H.T., Jungho¨fer, M., Weike, A.I. and Hamm, A.O. (2004b) The selective processing of briefly presented affective pictures: an ERP analysis. Psychophysiology, 41: 441–449.
Schupp, H.T., O¨hman, A., Jungho¨fer, M., Weike, A.I., Stockburger, J. and Hamm, A.O. (2004c) The facilitated processing of threatening faces: an ERP analysis. Emotion, 4: 189–200. Schupp, H.T., Stockburger, J., Codispoti, M., Jungho¨fer, M., Weike, A.I. and Hamm, A.O. (2006) Stimulus novelty and emotion perception: the near absence of habituation in the visual cortex. Neuroreport, 17: 365–369. Schupp, H.T., Stockburger, J., Codispoti, M., Jungho¨fer, M., Weike, A.I. and Hamm, A.O. (submitted) Selective visual attention to emotion. Schupp, H.T., Weike, A.I. and Hamm, A. (2000b) Affect and evaluative context: high-density ERP recordings during picture processing [Abstract]. Psychophysiology, 37: S88. Smid, H.G., Jakob, A. and Heinze, H.J. (1999) An event-related brain potential study of visual selective attention to conjunctions of color and shape. Psychophysiology, 36: 264–279. Straube, T., Glauer, M., Dilger, S., Mentzel, H.J. and Miltner, W.H. (2005) Effects of cognitive-behavioral therapy on brain activation in specific phobia. Neuroimage, 29: 125–135. Thorpe, S., Fize, D. and Marlot, C. (1996) Speed of processing in the human visual system. Nature, 381: 520–522. Tucker, D.M. (1993) Spatial sampling of head electrical fields: the geodesic sensor net. Electroencephalogr. Clin. Neurophysiol., 87: 154–163. Vrana, S.R., Constantine, J.A. and Westman, J.S. (1992) Startle reflex modification as an outcome measure in the treatment of phobia: two case studies. Behav. Assess., 14: 279–291. Vuilleumier, P. (2005) How brains beware: neural mechanisms of emotional attention. Trends Cogn. Sci., 9: 585–594. Vuilleumier, P., Richardson, M.P., Armony, J.L., Driver, J. and Dolan, R.J. (2004) Distant influences of amygdala lesion on visual cortical activation during emotional face processing. Nat. Neurosci., 7: 1271–1278. Vuilleumier, P. and Schwartz, S. (2001) Beware and be aware: capture of spatial attention by fear-related stimuli in neglect. Neuroreport, 12: 1119–1122. Wieser, M.J., Mu¨hlberger, A., Alpers, G.W., Macht, M., Ellgring, H. and Pauli, P. (2006) Emotion processing in Parkinson’s disease: dissociation between early neuronal processing and explicit ratings. Clin. Neurophysiol., 117: 94–102. Wright, C.I., Fischer, H., Whalen, P.J., McInerney, S.C., Shin, L.M. and Rauch, S.L. (2001) Differential prefrontal cortex and amygdala habituation to repeatedly presented emotional stimuli. Neuroreport, 12: 379–383. Yamasaki, H., LaBar, K.S. and McCarthy, G. (2002) Dissociable prefrontal brain systems for attention and emotion. Proc. Natl. Acad. Sci. USA, 99: 11447–11451.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 3
Implicit and explicit categorization of natural scenes Maurizio Codispoti, Vera Ferrari, Andrea De Cesarei and Rossella Cardinale Department of Psychology, University of Bologna, Viale Berti Pichat, 5– 40127 Bologna, Italy
Abstract: Event-related potential (ERP) studies have consistently found that emotionally arousing (pleasant and unpleasant) pictures elicit a larger late positive potential (LPP) than neutral pictures in a window from 400 to 800 ms after picture onset. In addition, an early ERP component has been reported to vary with emotional arousal in a window from about 150 to 300 ms with affective, compared to neutral stimuli, prompting significantly less positivity over occipito-temporal sites. Similar early and late ERP components have been found in explicit categorization tasks, suggesting that selective attention to target features results in similar cortical changes. Several studies have shown that the affective modulation of the LPP persisted even when the same pictures are repeated several times, when they are presented as distractors, or when participants are engaged in a competing task. These results indicate that categorization of affective stimuli is an obligatory process. On the other hand, perceptual factors (e.g., stimulus size) seem to affect the early ERP component but not the affective modulation of the LPP. Although early and late ERP components vary with stimulus relevance, given that they are differentially affected by stimulus and task manipulations, they appear to index different facets of picture processing. Keywords: emotion; attention; categorization; habituation; natural scenes recent studies on natural scene categorization and motivated attention.
The ability to organize the world into meaningful and appropriate groupings is necessary for determining an organism’s behaviour in terms of capacity to achieve adaptive responses in different situations. A number of studies using simple stimuli and explicit tasks suggest marked limits to the capacity of visual attention. In contrast, we all know that despite the perceptual complexity of natural scenes, the processing of everyday scenes is almost instantaneous and feels effortless. Several recent studies have confirmed this notion and have experimentally demonstrated the impressive speed with which natural scenes are categorized. Moreover, it has been demonstrated that when natural scenes are motivationally relevant they rapidly capture attentional resources leading to automatic categorization. This paper provides an overview of
Categorization of natural scenes Since the 1970s, Mary Potter and colleagues have crafted a careful series of rapid serial visual presentation (RSVP) studies showing that natural objects belonging to a target category may be classified remarkably quickly (Potter, 1975, 1976). In these studies, a rapid sequence of unrelated pictures is presented and an immediate semantic detection is required; for example, participants had to respond when they saw a picture of ‘‘a dog’’ (which they had never seen before), presented within a sequence of 16 pictures. The results of these studies suggested that picture scenes are understood and become immune to visual masking within about 100 ms.
Corresponding author. Tel.: +39-51-2091836; Fax: +39-051243086; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56003-0
53
54
Evidence from human event-related potentials (ERPs) also gives cause to expect that ‘‘early’’ processing occurs considerably quicker than is commonly thought. Thorpe et al. (1996) investigated how long the visual system takes to perceive and understand a complex natural image through a paradigm in which subjects were asked to decide whether or not a briefly displayed colour picture contains an animal. This study revealed an early ERP difference between target (image with animal) and distractor (image without animal) trials that started roughly 150 ms after stimulus onset. Similar findings have been reported when participants had to categorize complex scenes on the basis of the presence or absence of a clearly artificial category: means of transport (VanRullen and Thorpe, 2001). Together with other recent findings (Li et al., 2002; Kirchner and Thorpe, 2005) the early differential activity starting at 150 ms from stimulus onset has been interpreted as evidence of a specific mode of visual processing, termed ‘‘ultrarapid visual categorization’’ that relies on a parallel and automatic feedforward mechanism (Fabre-Thorpe et al., 2001). These findings challenge the conventional view that focused attention is required for high-level visual processing (Evans and Treisman, 2005). Indeed, several studies have shown that many visual tasks necessitate focused attention, including feature-binding to form objects and change detection in complex scenes (Treisman and Gelade, 1980; Wolfe and Bennett, 1997; O’Regan et al., 1999). Evans and Treisman (2005) have recently shown evidence that when participants correctly detected that an animal target was present in a rapid serial visual sequence, they frequently failed to identify the specific exemplar suggesting that detection was based on partial processing. Also, the authors showed that detection of the target was considerably worse in sequences that also contained humans as distractors, presumably because of shared features between animal and human scenes. Based on these findings, Evans and Treisman suggested that natural scene categorization is characterized by two different stages of processing. A first stage involves a rapid and parallel detection of disjunctive sets of unbound features of intermediate
complexity that characterize a target category. These features of intermediate complexity might be used to discriminate between scenes that do or do not contain the target without necessarily fully identifying it. If the target stimulus is still present after this detection stage, focused attention binds these features to form a more specific representation of the target stimulus. Initially, the early differential ERP activity was considered to reflect anterior neural generator sites implicated in the inhibition of inappropriate behavioural responses during NoGo trials (Thorpe et al., 1996). Subsequent research has demonstrated that the early (150 ms) larger negativity elicited by targets compared to distractor stimuli is extensively due to the involvement of occipito-temporal regions in semantic processing of visual stimuli (Delorme et al., 2004; Rousselet et al., 2004). Similarly, in the same time window, Rudell described a posterior evoked potential named recognition potential (RP) to refer to an electrical brain response which occurs when a subject views a recognizable image of words (Rudell, 1991; Rudell and Hua, 1997) and pictures (Rudell, 1992). More recently, Hinojosa et al. (2000) showed that the neural generator of the RP is placed mainly in the infero-temporal cortex and they interpreted this potential as related to the processing of semanticconceptual aspects. The role of infero-temporal neurons in object recognition has been known since the seminal work of Gross and co-workers (1973); furthermore, an important role has been recently ascribed to the prefrontal cortex which receives highly processed visual information from the infero-temporal cortex and orchestrates voluntary, goal-directed behaviours (Riesenhuber and Poggio, 2002; Freedman et al., 2003; Miller et al., 2003). The role of the occipito-temporal cortices in natural scene categorization has been examined by Fize and co-workers (2000) in an event-related fMRI study. Results seem to indicate a differential activity in occipito-temporal cortical structures elicited by natural target and distractor scenes (Fize et al., 2000). Unfortunately, because of the low temporal resolution, fMRI cannot disentangle differences in BOLD activation due to early and late processing of target. However, selective
55
attention to a specific target stimulus is also reflected at the post-perceptual level of stimulus analysis. Specifically, the P3 component is a hallmark of selective attention. It is a positive-polarity ERP component, typically observed in a time window between 300 and 700 ms post-stimulus and reflects the amount of attention devoted to taskrelevant stimuli (Johnson, 1987; Kok, 2001). Following Kok (2001), the P3 is considered to reflect components of attention and working memory with the ‘‘event categorization process’’ as a core element. Most of the previous ERP studies on the categorization of natural scenes used a go/no-go task making the late differential activity difficult to interpret because the target and non-target conditions were unbalanced in terms of motor activation (Rousselet et al., 2004; Mace´ et al., 2005). In a recent study, we investigated cortical indicators of selective attention underlying categorization based on target features in natural scenes (Codispoti et al., 2006b). In particular, we were interested in whether the early differential ERP activity exclusively reflects primarily neural generators in the visual cortex responsible for the biased processing of target stimuli or, additional sources related to the generation of a biasing signal in prefrontal areas. Furthermore, with regard to the late differential ERP activity, we tested whether this component reflects target categorization and continued perceptual processing (Ritter and Ruchkin, 1992; Kok, 2001), which would suggest the presence of neural generators in higher order visual processing areas. As in previous studies by Thorpe and associates (1996), participants had to categorize images according to whether or not they contained an animal. However, our study did not require a go/nogo response and instead subjects responded as to whether an animal was present in the image or not (De Cesarei et al., 2006) using a two-alternative forced-choice task. Pictures were drawn from a large database containing 1200 exemplars with the target and non-target pictures occurring with equal probability; each picture was shown for 24 ms and an inter-stimulus interval of 3–4 s. Replicating previous findings, the early differential ERP activity appeared as a positive deflection over fronto-central sensor sites and as a
negative deflection over temporo-occipital regions (see Fig. 1). Furthermore, source estimation techniques (Current-Source-Density (CSD) and L2Minimum-Norm-Estimate) suggested primary sources of the early differential ERP activity in posterior, visual-associative, brain regions, and also a contribution of anterior sources to the early differential ERP. Also, in a time interval 300–600 ms after stimulus onset, target scenes were associated with augmented late positive potentials (LPPs) over centro-parietal sensor sites (see Fig. 1). Together, these findings seem to indicate that top-down influences early in processing (150 ms) may shape the activity in the ventral visual pathway during selective attention, facilitating categorization based upon target features in natural scenes, and that later (300 ms), task relevant stimuli determine larger allocation of both perceptual and central resources compared with non-target stimuli. Future studies should further clarify the functional and neural meaning of these early and late ERPs components.
Perception and categorization of emotional scenes Typically, studies on categorization, object recognition and selective attention employ explicit tasks where participants are asked to classify stimuli according to verbal instructions. A special type of categorization is represented by the processing of emotional stimuli. In fact, to the extent that natural images represent motivationally relevant cues, they are able to catch attentional resources and to be therefore quickly categorized (Lang et al., 1997). Emotional processing has been widely investigated by presenting affective pictures that are effective cues in evoking a broad range of emotional reactions, varying in intensity, and involving both pleasant and unpleasant affect (Lang et al., 1997). For instance, skin conductance responses are larger when viewing emotionally arousing (pleasant or unpleasant) pictures, compared to neutral pictures, whereas heart rate varies with affective valence, with more deceleratory heart rate responses elicited when viewing unpleasant,
56
Fig. 1. ERP waveforms showing early brain potentials over occipito-temporal (PO8) and frontal (F4) sites and late positive potentials over a parietal site (Pz) during explicit categorization of natural scenes. Scalp potential maps also reveal the topography of the early and late differential ERP activity (Target minus distractor).
relative to neutral pictures (Bradley et al., 2001). In addition, it has been shown that affective pictures are effective cues in evoking not only autonomic responses but also a broad range of neuroendocrine changes (Codispoti et al., 2003). The emotional context, elicited by photographic slides, also affects the startle reflex. It is well established that the magnitude of the blink response to a startling acoustic or visual probe varies according to the affective valence of the foreground
picture stimuli (Lang et al., 1990). Specifically, the startle reflex is larger when people view unpleasant rather than pleasant pictures (Vrana et al., 1988; Bradley et al., 2006). Consistent with the motivational hypothesis, it has been shown that the strongest reports of emotional arousal, largest skin conductance responses, and greatest modulation of the startle reflex occurs when participants view pictures depicting threats, mutilated bodies and erotica (Bradley et al., 2001, 2006).
57
In a number of recent experimental findings, ERPs were shown to be modulated by emotionally significant stimuli. In particular emotionally arousing (pleasant and unpleasant) pictures elicit larger LPPs then neutral stimuli in a window from 400 to 800 ms after picture onset (Cacioppo et al., 1994; Cuthbert et al., 2000; Schupp et al., 2000) and occipital and posterior parietal regions were suggested to be a possible origin of the arousalmodulated late positive wave (Keil et al., 2002). This effect has been linked to the concept of motivated attention, which proposes that motivationally significant stimuli are selectively processed because they naturally engage attentional resources (Lang et al., 1997). Furthermore, recent studies also found an early selective processing of affective cues, in which emotional pictures prompt less positivity than neutral pictures over occipito-temporal sites starting at 150 ms after picture onset and lasting for about 100 ms (Schupp et al., 2004). The affective modulation of this early time interval has been interpreted as indication that the emotional content of visual cues facilitates the sensory encoding of these stimuli (Schupp et al., 2003). Moreover, affective modulation of early and late ERP components do not rely on voluntary evaluation of the hedonic content (Cuthbert et al., 1995; Codispoti et al., 1998; Junghoefer et al., 2001; Keil et al., 2002; Schupp et al., 2003). For instance, Cuthbert et al. (1995) compared LPP during passive viewing or when an evaluative rating task was required, and found similar modulation, suggesting that affective evaluation, as measured by the LPP, may be a relatively obligatory process. These cortical and autonomic changes during affective picture processing are obtained when participants view pictures for a sustained time period (e.g., 6 s). In a series of studies, we investigated whether brief presentations are also able to engage the defensive and appetitive motivational systems that mediate emotional responding (Codispoti et al., 2001, 2002). In particular, pictures were presented briefly for different exposure times (from 25 ms to 6 s) and in one condition were followed by a blank screen (unmasked condition), while in another condition were followed by masking stimulus
(masked condition). The masking stimulus was a scrambled image. In the unmasked condition, affective modulation in terms of subjective and cortical reactions was only slightly affected by the pictures’ exposure time, while heart rate modulation appeared to rely on the presence of the stimulus. Moreover, in the masked condition, subjective and cortical reactions were modulated by the affective content of the stimulus even when pictures were presented very briefly (450 ms), whereas longer exposure times were needed to observe autonomic changes as a function of picture content. Taken together, these findings indicate that stimulus categorization, reflected in cortical and subjective changes, is elicited even when affective scenes are briefly presented. Differentially, autonomic changes probably reflecting stimulus intake and preparation for action are not activated when the emotional impact of the stimulus is reduced as a consequence of short picture exposure time. Brief presentation might be considered analogous to a distant (rather than imminent) predator or prey that should determine less intense appetitive and defensive activation leading to less autonomic changes associated with preparation for action (Lang et al., 1997). Consistent with the prediction of larger emotional response to proximal stimuli, a study comparing the reaction of snake-phobic participants to snakes presented at various distances (Teghtsoonian and Frost, 1982) showed a linear increase of autonomic responses and self-reported fear as a function of distance. Since distance and retinal size are strictly related (Loftus and Harley, 2005), it can be expected that changes in stimulus size determine arousal modulations similarly to distance. Moreover, in an evolutionary framework, the physical size of an encountered object or organism may determine the motivational relevance for the observer. This possibility is supported by the results of a recent study (Reeves et al., 1999) which investigated autonomic response following arousing and non arousing stimuli presented in different sizes, suggesting a more pronounced emotional response for bigger stimuli. Recently, we assessed the possibility that changes in stimulus size may influence the affective
58
modulation of early and late ERPs (De Cesarei and Codispoti, 2006). Assuming that size reduction determines lower relevance of the scene to the observer, a reduction in affective modulation at both early and late stages of processing was expected for smaller compared to larger images. Alternatively, since size reduction also results in decreased discriminability due to the loss of fine details in the scene (Loftus and Harley, 2005), we expected that the earlier time window, reflecting stages of perceptual analysis, would have been more affected by size reduction compared to the LPP, which is thought to reflect processes initiated after stimulus recognition. While a decrease in the modulation of the early ERP time interval (150–300 ms) as a function of stimulus content was observed following image size reduction, affective modulation of the LPP did not change as a function of picture size, suggesting that at this stage a semantic, size-invariant representation of the stimulus has been attained (see Fig. 2). Similarly, other studies using explicit categorization tasks have shown that when discriminability of the image is reduced, early modulation in the 150–300 ms latency range is largely reduced (Goffaux et al., 2003; Mace` et al., 2005).
Explicit categorization and affective modulation: emotional scenes as distractors In everyday life, while monitoring the environment, we evaluate the relevant stimuli, even when not explicitly intending to do so. We make such implicit categorization processes in order to understand and encode the context. The process of implicit categorization can be considered as the act of responding differently to objects and events in the environment that belong to separate classes or categories, even when it is not explicitly required. Recently, we examined whether selective attention underlying categorization based on target features and motivated attention share similar mechanisms and whether emotional modulation of the LPP is reduced when affective stimuli are presented as distractors during an explicit categorization task (Cardinale et al., 2005). Participants performed two categorization tasks in counterbalanced order: in one condition participants were asked to decide if a picture contained an animal or not, and in the other condition they had to decide if the picture contained a human. The stimuli involved three natural scene images: animal, nonliving (natural landscape scenes and pictures of
Fig. 2. Early and late ERP affective modulation to pictures presented in different sizes, ranging from smallest (left end) to larger (right end). Labels represent the ratio between the actual and the 100% stimulus size. Horizontal and vertical visual angles subtended by images are 31 21 (S/8), 51 41 (S/4), 111 81 (S/2) and 211 161 (S/1).
59
Fig. 3. Scalp potential maps reveal similar topographies of the early and late differential ERP activity during an explicit categorization task as well as during the perception of emotional non-target stimuli (animal target condition). Bilateral foci of occipito-temporal negativity appeared for the differential ERP activity in the time window from 150 to 300 ms. Later in the classical P3 window 300–600 ms, target as well as emotional pictures elicit a larger positive potential compared to distractor and neutral images, which is maximally pronounced over centro-parietal sensor sites. Illustrated is the right side view of the model head.
objects) and human (erotic couples, neutral people, mutilated bodies). Results seem to suggest that regardless of the cause of relevance of a stimulus, whether it be task relevant (Target) or inherently significant (emotional), similar cortical changes are involved (see Fig. 3). Also, affective modulation of the LPP persisted even when these stimuli (human pictures) were distractors (animal target condition). Specifically, when animal pictures were the target, they elicited a larger LPP compared to neutral people and this positivity did not differ from the one determined by emotional stimuli, suggesting that selective attention on target features and motivated attention share similar neural mechanisms (see Fig. 3). Interestingly, in this study the task did not affect ERP responses to animal pictures in the early time interval (150–300 ms), in which similar early brain potentials were observed across the two conditions (animal task and human task). This finding was interpreted as a consequence of the similarities between the two target categories in terms of features of intermediate complexity and is consistent with recent evidence from Evans and Treisman (2005), discussed above, showing that detection of target in a RSVP study was considerably worse in sequences that also contained humans as distractors.
These findings seem to indicate not only that affective modulation of the LPP does not depend on the evaluative nature of the task, but that is present even when participants were asked to perform a categorization task where pleasant, neutral and unpleasant images were distractors. Although these studies suggest that emotional pictures are automatically categorized and capture attentional resources, it should be noted that in these studies not only was the competing task undemanding, but the pictures had to be perceived and categorized in order to establish if they belonged to the target category or not. Pessoa and co-workers (Pessoa and Ungerleider, 2004; Pessoa et al., 2005) have shown that the processing of emotional visual stimuli might be modulated by the availability of attentional resources. In a recent fMRI study, central faces with neutral or fearful expression were presented centrally along with peripheral bars. During the barorientation task, subjects were asked to indicate whether the orientation of the bars was the same or not and the difficulty level of the task was manipulated according to the angular difference of the bars, in order to investigate the effect of attentional resources on emotionally relevant distractor stimuli. The results showed that in the
60
amygdala, differential responses to fearful faces compared to neutral faces were observed only during low-load conditions. This findings seem to be consistent with Lavie’s (1995; Lavie et al., 2004) proposal that if the processing load of a target task exhausts available capacity, stimuli irrelevant to that task will not be processed. Similarly in a recent experiment, we manipulated the perceptual load of a competing task to evaluate to what extent emotional stimuli are automatically categorized as a function of available resources. In this study, participants were presented a picture (pleasant, neutral and unpleasant) in the left or right hemifield for 100 ms while simultaneously performing a foveal task where they were asked to detect an X or a Z presented alone (low perceptual load condition) or flanked by other letters (high load condition). Results indicated that early (150–300 ms) modulation of the ERP over occipito-temporal regions was affected by the perceptual load of the foveal task. In fact, while emotional pictures elicited less positivity compared to neutral pictures in the low perceptual load condition, this differential activity disappeared in the high perceptual load condition. On the other hand, the late positive potential was unaffected by the foveal task suggesting that the LPP might index a post-perceptual stage of processing and that even in a highly demanding condition emotional content of the picture is processed by the brain. Interestingly, as discussed above, similar results were found when the size of the images was manipulated (De Cesarei and Codispoti, 2006). In fact, picture size affected the modulation of the early ERP time interval (150–300 ms) as a function of stimulus content, while affective modulation of the LPP did not change as a function of picture size. Taken together, these findings suggest that the affective modulation of the LPP, probably related to post-perceptual stages of processing, is not influenced by perceptual factors, persists even when pictures are presented as distractors and attention is manipulated by a demanding task (participants were engaged in a demanding competing task). By contrast, two different perceptual factors (the perceptual load of a competing task and picture size) affect the modulation of the early ERP time
interval (150–300 ms) as a function of stimulus content over occipito-temporal regions.
Affective modulation and habituation These previous findings seem to indicate that the affective modulation of the ERPs is an automatic process that does not rely on voluntary evaluation of the hedonic content of the stimulus. Along with this reasoning, several behavioural studies have found involuntary semantic processing of affective stimuli (Pratto and John, 1991; Stenberg et al., 1998; McKenna and Sharma, 2004). Moreover, behavioural studies (Bargh et al., 1996; Hermans et al., 2001) have shown that in a priming paradigm, emotional pictures speed up the response when prime and target share the same valence as compared to trials in which prime and target are of opposite valence. Similar findings have been observed also in the absence of an explicitly evaluative context by having participants merely pronounce or make a lexical decision, rather then evaluate target words after briefly viewing picture primes (Wentura, 1998; Giner-Sorolla et al., 1999; for different findings see Storbeck and Robinson, 2004). One important feature of an obligatory process is resistance to habituation. There is considerable evidence that prior exposure to a stimulus affects subsequent attentional processes and orienting response (Sokolov, 1963; Siddle, 1991; Bradley and Lang, 2000), influences perceptual facilitation associated with ‘‘neural suppression’’ (Tulving and Schacter, 1990; Henson and Rugg, 2003), and leads to changes in subjective ratings of pleasantness and arousal (Fechner, 1876; Zajonc, 1968; Bornstein, 1989; Raymond et al., 2003; Codispoti et al., 2006a). Although a single repetition of a stimulus facilitates subsequent recognition (e.g., repetition priming), with increasing repetitions the salience of the stimulus is reduced. Thus, habituation can be defined as an unlearned behavioural process that results in a diminution of response (i.e., decreased response magnitude and/or increases in response latency) to stimuli that are repeatedly presented (Harris, 1943; Thompson and Spencer, 1966).
61
Habituation of affective modulation has been investigated for the first time by Bradley and colleagues (1993). In this experiment the same pleasant, neutral and unpleasant picture stimuli were repeatedly presented (12 repetitions for each picture). Startle reflex habituation was assessed and compared with the habituation patterns of autonomic responses (heart rate and skin conductance). Results indicated that whereas all responses showed general habituation over trials, affective modulation of the blink reflex was not affected by picture repetition. As found in previous studies, the blink response was potentiated when startle probes were presented during processing of unpleasant pictures (relative to neutral stimuli) and reduced when viewing pleasant pictures, and this pattern persisted even after several repetitions of the same stimulus. Recently, we examined the affective modulation of the late positive potential as it varied with stimulus repetition (Codispoti et al., 2006a). Pleasant, neutral and unpleasant pictures were presented up to 60 times. If the LPP reflects automatic affective evaluation, we expected no difference in the modulation of this component across stimulus repetitions. We expected that the autonomic responses (heart rate and skin conductance) would habituate rapidly, providing evidence that the affective impact of the stimulus was maximal early within the habituation phase. Results showed that, although the amplitude of the late positive potential during picture viewing declined with stimulus repetition, affective modulation remained intact. On the other hand, autonomic responses habituated rapidly with stimulus repetition (see Fig. 4). These findings suggest that stimulus repetition does not change the associative strength of connections to subcortical motivational systems, as reflected in the LPP, but does change the output to systems involved in orienting and action. In this study, measurement of a relatively sparse electrode array (3 sensors) did not provide an opportunity to assess how picture repetition affects the early ERP component over frontal and occipital sites, or whether picture repetition differentially affects early and late ERP components that vary with picture emotionality. Also in this previous study, one factor that might have encouraged
Fig. 4. Late positive potential amplitude (Pz; 400–800 ms (A)) and skin conductance (B) changes elicited when viewing pleasant, neutral and unpleasant pictures during each of the three blocks within the habituation phase.
sustained processing of each stimulus was the use of a relatively long inter-picture interval (e.g., 10–20 s), which may have increased the ‘‘novelty’’ of each picture, despite its continued repetition. Thus, we decided to further investigate affective habituation in a new study, employing a dense sensor electrode array to assess early and late ERP components as they varied with repetition of affec-
62
tive pictures and using a shorter inter-stimulus interval (2–3 s) to reassess the effects of picture repetition on the late positive potential (Ferrari et al., 2005). This array allowed us to assess ERPs measured over both the occipital and frontal cortex in order to assess the early ERP differences reported in previous studies, as well as to assess once more the late positive potential, which is maximal over centro-parietal sites. Consistent with previous data (Codispoti et al., 2006a) emotionally arousing stimuli continued to prompt larger LPPs than neutral pictures, regardless of repetition. Nonetheless, the magnitude of the late positive potential decreased somewhat for both pleasant and unpleasant pictures following multiple repetitions. On the other hand, there were no effects of repetition on ERPs measured in the earlier time window, suggesting that early effects of emotional arousal may reflect a stimulus-driven process that occurs automatically. Recently, Schupp and colleagues (2006) showed similar findings using a rapid visual presentation where pictures were presented for 330 ms without an inter-stimulus interval further confirming that the early affective modulation of the ERP over occipito-temporal region is resistant to habituation even in a condition where pictures are presented without an inter-picture interval. Interestingly, using an explicit categorization task, Fabre-Thorpe and co-workers showed that repetitive presentation of natural scenes (14 days over 3 week period) did not facilitate information processing. That is, the early (150 ms) differential ERP effect indicated that the visual processing was just as fast with completely novel images as it was for images with which the subjects were highly familiar (FabreThorpe et al., 2001). Since in our previous affective habituation studies participants were only asked to look at the pictures, without an additional task, however, we cannot effectively rule out the possibility that affective modulation of the LPP results from a process in which more attention is voluntarily allocated to viewing affective pictures. A strategy for determining whether affective modulation of the LPP and its resistance to habituation is due to obligatory or voluntary processes would be to present affective pictures when participants are
engaged in a competing task. This issue was further investigated in a recent experiment (Biondi et al., 2005) where a picture flanked by two numbers was presented for 150 ms and participants were asked to decide whether the two numbers had the same parity, while ignoring the image. Results suggest that reaction times were markedly longer for arousing pictures, compared with those which were neutral, pictures, but after several repetitions of the same stimuli this effect vanished. Despite the fact that attention is occupied in a competing task, affective modulation of the early and late ERP components persisted even after several repetitions of the same stimulus suggesting that categorization of unattended affective stimuli is an obligatory process that continues to occur whenever a sensory stimulus is presented.
Summary and future directions Event-related potentials measured during picture viewing vary with emotional arousal, with affective (either pleasant or unpleasant), compared to neutral, pictures eliciting a larger LPP in the 300–600 ms time interval over centro-parietal regions. This component has been interpreted as reflecting enhanced attention to motivationally relevant pictures. In addition, an early ERP component was also reported to vary with emotional arousal in a window from about 150 to 300 ms, with affective, compared to neutral, stimuli, prompting significantly less positivity over occipito-temporal sites. Similarly, in explicit categorization tasks of complex natural scenes, the activity of target and non-target ERPs diverged sharply around 150 ms after stimulus onset and this early differential ERP activity (target minus non-target) appeared as negative deflection over occipito-temporal regions. Furthermore, in a time interval 300–600 ms after stimulus onset, target scenes were associated with augmented LPP over centro-parietal sites suggesting that similar mechanism are involved in selective attention to target features and to motivationally relevant stimuli. Several studies have shown that the affective modulation of the LPP persisted even when the same pictures are repeated several times, when
63
they are presented as distractors or when participants are engaged in a competing task, indicating that categorization of affective stimuli is an obligatory process. On the other hand, while the affective modulation of the LPP is not influenced by perceptual factors (e.g., stimulus size), these same factors strongly reduce the modulation of the early ERP time interval (150–300 ms) as a function of stimulus content. Although early and late ERP components vary with stimulus relevance, given that they are differentially affected by stimulus and task manipulations, they appear to reflect different sensory and attentional processes. Future studies should further investigate functional and neural mechanisms underlying natural scene categorization, and examine how and when motivational systems (e.g., appetitive and defensive) modulate the processing of stimulus features. Further work should also better clarify the nature of the perceptual features that affect early stages of processing in picture perception.
References Bargh, J.A., Chen, M. and Burrows, L. (1996) Automaticity of social behavior: direct effects of trait construct and stereotype activation on action. J. Pers. Soc. Psychol., 71: 230–244. Biondi, S., De Cesarei, A., Cardinale, R. and Codispoti, M. (2005) What is the fate of unattended emotional stimuli? Cortical and behavioural correlates of affective habituation. Psychophysiology, 42: S36. Bornstein, R.F. (1989) Exposure and affect: overview and metaanalysis of research, 1968–1987. Psychol. Bull., 106: 265–289. Bradley, M.M. and Lang, P.J. (2000) Emotion and motivation. In: Cacioppo, J.T., Tassinary, L.G. and Berntson, G. (Eds.), Handbook of Psychophysiology (3rd ed). Cambridge University Press, New York. Bradley, M.M., Lang, P.J. and Cuthbert, B.N. (1993) Emotion, novelty, and the startle reflex: habituation in humans. Behav. Neurosci., 107: 970–980. Bradley, M.M., Codispoti, M., Cuthbert, B.N. and Lang, P.J. (2001) Emotion and motivation I: defensive and appetitive reactions in picture processing. Emotion, 1: 276–298. Bradley, M.M., Codispoti, M. and Lang, P.J. (2006) A multiprocess account of startle modulation during affective perception. Psychophysiology, 43: 421–428. Cacioppo, J.T., Crites Jr., S.L., Gardner, W.L. and Bernston, G.G. (1994) Bioelectrical echoes from evaluative categorizations: I. A late positive brain potential that varies as a function of trait negativity and extremity. J. Pers. Soc. Psychol., 67: 115–125.
Cardinale, R., Ferrari, V., De Cesarei, A., Biondi, S. and Codispoti, M. (2005) Implicit and explicit categorization of natural scenes. Psychophysiology, 42: S41. Codispoti, M., Bradley, M.M., Cuthbert, B.N., Montebarocci, O. and Lang, P.J. (1998) Stimulus complexity and affective contents: startle reactivity over time. Psychophysiology, 35: S25. Codispoti, M., Bradley, M.M. and Lang, P.J. (2001) Affective modulation for briefly presented pictures. Psychophysiology, 38: 474–478. Codispoti, M., Mazzetti, M. and Bradley, M.M. (2002) Exposure time and affective modulation in picture perception. Psychophysiology, 39: S62. Codispoti, M., Gerra, G., Montebarocci, O., Zaimovic, A., Raggi, M.A. and Baldaro, B. (2003) Emotional perception and neuroendocrine changes. Psychophysiology, 40: 863–868. Codispoti, M., Ferrari, V. and Bradley, M.M. (2006a) Repetitive picture processing: autonomic and cortical correlates. Brain Res., 1068: 213–220. Codispoti, M., Ferrari, V., Jungho¨fer, M. and Schupp, H.T. (2006b) The categorization of natural scenes: brain attention networks revealed by dense sensor ERPs. Neuroimage, 31: 881–890. Cuthbert, B.N., Schupp, H.T., McManis, M., Hillman, C., Bradley, M.M. and Lang, P.J. (1995) Cortical slow waves: emotional perception and processing. Psychophysiology, 32: S26. Cuthbert, B.N., Schupp, H.T., Bradley, M.M., Birbaumer, N. and Lang, P.J. (2000) Brain potentials in affective picture processing: covariation with autonomic arousal and affective report. Biol. Psychol., 52: 95–111. De Cesarei, A., Codispoti, M., Schupp, H.T. and Stegagno, L. (2006) Selectively attending to natural scenes after alcohol consumption: an ERP analysis. Biol. Psychol., 72: 35–45. De Cesarei, A. and Codispoti, M. (2006) When does size not matter? Effects of stimulus size on affective modulation. Psychophysiology, 43: 207–215. Delorme, A., Rousselet, G.A., Mace, M.J. and Fabre-Thorpe, M. (2004) Interaction of top-down and bottom-up processing in the fast visual analysis of natural scenes. Brain Res. Cogn. Brain Res., 19: 103–113. Evans, K.K. and Treisman, A. (2005) Perception of objects in natural scenes: is it really attention free? J. Exp. Psychol. Hum. Percept. Perform., 31: 1476–1492. Fabre-Thorpe, M., Delorme, A., Marlot, C. and Thorpe, S.J. (2001) A limit to the speed of processing in ultra-rapid visual categorization of novel natural scenes. J. Cogn. Neurosci., 13: 171–180. Fechner, G.T. (1876) Vorschule der A¨sthetik. Breitkopf and Ha¨rtel, Leipzig. Ferrari, V., Codispoti, M. and Bradley, M.M. (2005) Not just the same old thing: cortical and autonomic measures of affective habituation. Psychophysiology, 42: S55. Fize, D., Boulanouar, K., Chatel, Y., Ranjeva, J.P., FabreThorpe, M. and Thorpe, S.J. (2000) Brain areas involved in rapid categorization of natural images: an event-related fMRI study. Neuroimage, 11: 634–643.
64 Freedman, D.J., Riesenhuber, M., Poggio, T. and Miller, E.K. (2003) A comparison of primate prefrontal and inferior temporal cortices during visual categorization. J. Neurosci., 23: 5235–5246. Giner-Sorolla, R., Garcia, M.T. and Bargh, J.A. (1999) The automatic evaluation of pictures. Social Cogn., 17: 76–96. Goffaux, V., Gauthier, I. and Rossion, B. (2003) Spatial scale contribution to early visual differences between face and object processing. Cogn. Brain Res., 16: 416–424. Gross, C.G. (1973) Visual functions of the inferotemporal cortex. In: Austrum, H., Jung, R., Loewenstein, W.R., Mackay, D.M. and Teuber, H.L. (Eds.) Handbook of Sensory Physiology: Central Processing of Visual Information, Vol. 7. Springer, Berlin, pp. 451–482. Harris, J.D. (1943) Habituatory response decrement in the intact organism. Psychol. Bull., 40: 385–422. Henson, R.N. and Rugg, M.D. (2003) Neural response suppression, haemodynamic repetition effects, and behavioural priming. Neuropsychologia, 41: 263–270. Hermans, D., Houwer, J. and Eelen, P. (2001) A time course analysis of the affective priming effect. Cogn. Emotion, 15: 143–165. Hinojosa, J.A., Martin-Loeches, M., Gomez-jarabo, G. and Rubia, F.J. (2000) Common basal extrastriate areas for the semantic processing of words and pictures. Clin. Neurophysiol., 111: 552–560. Johnson Jr., R. (1987) The amplitude of the P300 component of the event-related potential: review and synthesis. In: Ackles, P.K., Jennings, J.R. and Coles, M.G.H. (Eds.) Advance in Psychophysiology, Vol. III. JAI Press, Greenwich, CT. Junghoefer, M., Bradley, M.M., Elbert, T.R. and Lang, P.J. (2001) Fleeting images: a new look at early emotion discrimination. Psychophysiology, 38: 175–178. Keil, A., Bradley, M.M., Hauk, O., Rockstroh, B., Elbert, T. and Lang, P.J. (2002) Large-scale neural correlates of affective picture processing. Psychophysiology, 39: 641–649. Kirchner, H. and Thorpe, S.J. (2005) Ultra-rapid object detection with saccadic eye movements: visual processing speed revisited. Vision Res., 46: 1762–1776. Kok, A. (2001) On the utility of P3 amplitude as a measure of processing capacity. Psychophysiology, 38: 557–577. Lang, P.J., Bradley, M.M. and Cuthbert, M.M. (1990) Emotion, attention, and the startle reflex. Psychol. Rev., 97: 377–395. Lang, P.J., Bradley, M.M. and Cuthbert, M.M. (1997) Motivated attention: affect, activation and action. In: Lang, P.J., Simons, R.F. and Balaban, M.T. (Eds.), Attention and Orienting: Sensory and Motivational Processes. Lawrence Erlbaum Associates, Inc., Hillsdale, NJ, pp. 97–135. Lavie, N. (1995) Perceptual load as a necessary condition for selective attention. J. Exp. Psychol. Hum. Percept. Perform., 21: 451–468. Lavie, N., Hirst, A., Defockert, J.W. and Viding, E. (2004) Load theory of selective attention and cognitive control. J. Exp. Psychol. Gen., 133: 339–354. Li, F.F., VanRullen, R., Koch, C. and Perona, P. (2002) Rapid natural scene categorization in the near absence of attention. Proc. Natl. Acad. Sci. USA, 99: 9596–9601.
Loftus, G.R. and Harley, E.M. (2005) Why is it easier to identify someone close than far away? Psychon. Bull. Rev., 12(1): 43–65. Mace´, M.J.M., Thorpe, S.J. and Fabre-Thorpe, M. (2005) Rapid categorization of achromatic natural scenes: how robust at very low contrasts? Eur. J. Neurosci., 21: 2007–2018. McKenna, F.P. and Sharma, D. (2004) Reversing the emotional Stroop effect: the role of fast and slow components. J. Exp. Psychol. Learn. Mem. Cogn., 30: 382–392. Miller, E.K., Nieder, A., Freedman, D.J. and Wallis, J.D. (2003) Neural correlates of categories and concepts. Curr. Opin. Neurobiol., 13: 198–203. O’Regan, J.K., Rensink, R.A. and Clark, J.J. (1999) Changeblindness as a result of ‘mudsplashes’. Nature, 398: 6722–6734. Pessoa, L., Padmala, S. and Morland, T. (2005) Fate of unattended fearful faces in the amygdala is determined by both attentional resources and cognitive modulation. Neuroimage, 28: 249–255. Pessoa, L. and Ungerleider, L.G. (2004) Neural correlates of change detection and change blindness in a working memory task. Cereb. Cortex, 14: 511–520. Potter, M.C. (1975) Meaning in visual search. Science, 187: 965–966. Potter, M.C. (1976) Short-term conceptual memory for pictures. J. Exp. Psychol. Learn. Mem. Cogn., 2: 509–522. Pratto, F. and John, O.P. (1991) Automatic vigilance: the attention-grabbing power of negative social information. J. Pers. Soc. Psychol., 63: 380–391. Raymond, J.E., Fenske, M.J. and Tavassoli, N.T. (2003) Selective attention determines emotional responses to novel visual stimuli. Psychol. Sci., 14: 537–542. Reeves, B., Lang, A., Kim, E.Y. and Tatar, D. (1999) The effects of screen size and message on attention and arousal. Media Pyschol., 1: 49–67. Riesenhuber, M. and Poggio, T. (2002) Neural mechanisms of object recognition. Curr. Opin. Neurobiol., 12: 162–168. Ritter, W. and Ruchkin, D. (1992) A review of event-related potential components discovered in the context of studying the P3. In: Friedman, D. and Bruder, G. (Eds.), Psychophysiology and Experimental Psychopathology. The NewYork Academy of Sciences, NewYork, pp. 1–32. Rousselet, G.A., Thorpe, S.J. and Fabre-Thorpe, M. (2004) How parallel is visual processing in the ventral pathway? Trends Cogn. Sci., 8: 363–370. Rudell, A.P. and Hua, J. (1997) The recognition potential, word difficulty, and individual reading ability: on using event-related potentials to study perception. J. Exp. Psychol. Hum. Percept. Perform., 23: 1170–1195. Rudell, A.P. (1991) The recognition potential contrasted with the P300. Int. J. Neurosci., 60: 85–111. Rudell, A.P. (1992) Rapid stream stimulation and the recognition potential. Electroencephalogr. Clin. Neurophysiol., 83: 77–82. Schupp, H.T., Cuthbert, B.N., Bradley, M.M., Cacioppo, J.T., Ito, T. and Lang, P.J. (2000) Affective picture processing: the late positive potential is modulated by motivational relevance. Psychophysiology, 37: 257–261.
65 Schupp, H.T., Junghofer, M., Weike, A.I. and Hamm, A.O. (2003) Attention and emotion: an ERP analysis of facilitated emotional stimulus processing. Neuroreport, 14: 1107–1110. Schupp, H.T., Junghofer, M., Weike, A.I. and Hamm, A.O. (2004) The selective processing of briefly presented affective pictures: an ERP analysis. Psychophysiology, 41: 441–449. Schupp, H.T., Stockburger, J., Codispoti, M., Junghofer, M., Weike, A.I. and Hamm, A.O. (2006) Stimulus novelty and emotion perception: the near absence of habituation in the visual cortex. Neuroreport, 17: 365–369. Siddle, D.A. (1991) Orienting, habituation, and resource allocation: an associative analysis. Psychophysiology, 28: 245–259. Sokolov, E.N. (1963) Perception and the Conditioned Reflex. Macmillan, New York. Stenberg, G., Wilking, S. and Dahl, M. (1998) Judging words at face value: Interference in a word processing task reveals automatic processing of affective facial expressions. Cogn. Emotion, 12: 755–782. Storbeck, J. and Robinson, M.D. (2004) When preferences need inferences: a direct comparison of the automaticity of cognitive versus affective priming. Pers. Soc. Psychol. Bull., 30: 81–93. Teghtsoonian, R. and Frost, R.O. (1982) The effect of viewing distance on fear of snakes. J. Behav. Ther. Exp. Psychiatry, 13(3): 181–190.
Thompson, R.F. and Spencer, W.A. (1966) Habituation: a model phenomenon for the study of neuronal substrates of behavior. Psychol. Rev., 73: 16–43. Thorpe, S., Fize, D. and Marlot, C. (1996) Speed of processing in the human visual system. Nature, 381: 520–522. Treisman, A. and Gelade, G. (1980) A feature integration theory of attention. Cogn. Psychol., 12: 97–136. Tulving, E. and Schacter, D.L. (1990) Priming and human memory systems. Science, 247: 301–306. VanRullen, R. and Thorpe, S.J. (2001) The time course of visual processing: from early perception to decision-making. J. Cogn. Neurosci., 13: 454–461. Vrana, S., Spence, E.L. and Lang, P.J. (1988) The startle probe response: a new measure of emotion? J. Abnorm. Psychol., 97: 487–491. Wentura, D. (1998) Affektives Priming in der Wortentscheidungsaufgabe: Evidenz fur postlexikalische Urteilstendenzen [Affective priming in the lexical decision task: evidence for post-lexical judgment tendencies]. Sprache und Kognition, 17: 125–137. Wolfe, J.M. and Bennett, S.C. (1997) Preattentive object files: shapeless bundles of basic features. Vision Res., 37: 25–43. Zajonc, R.B. (1968) Attitudinal effects of mere exposure, J. Pers. Soc. Psychol., 9: 1–27.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 4
Dynamics of emotional effects on spatial attention in the human visual cortex Gilles Pourtois1,2, and Patrik Vuilleumier1,2 1
Neurology & Imaging of Cognition, Clinic of Neurology, University Hospital & Department of Neurosciences, University Medical Center, University of Geneva, Geneva, Switzerland 2 Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland
Abstract: An efficient detection of threat is crucial for survival and requires an appropriate allocation of attentional resources toward the location of potential danger. Recent neuroimaging studies have begun to uncover the brain machinery underlying the reflexive prioritization of spatial attention to locations of threat-related stimuli. Here, we review functional brain imaging experiments using event-related potentials (ERPs) and functional magnetic resonance imaging (fMRI) in a dot-probe paradigm with emotional face cues, in which we investigated the spatio-temporal dynamics of attentional orienting to a visual target when the latter is preceded by either a fearful or happy face, at the same (valid) location or at a different (invalid) location in visual periphery. ERP results indicate that fearful faces can bias spatial attention toward threatrelated location, and enhance the amplitude of the early exogenous visual P1 activity generated within the extrastriate cortex in response to a target following a valid rather than invalid fearful face. Furthermore, this gain control mechanism in extrastriate cortex (at 130–150 ms) is preceded by an earlier modulation of activity in posterior parietal regions (at 40–80 ms) that may provide a critical source of top-down signals on visual cortex. Happy faces produced no modulation of ERPs in extrastriate and parietal cortex. fMRI data also show increased responses in the occipital visual cortex for valid relative to invalid targets following fearful faces, but in addition reveal significant decreases in intraparietal cortex and increases in orbitofrontal cortex when targets are preceded by an invalid fearful face, suggesting that negative emotional stimuli may not only draw but also hold spatial attention more strongly than neutral or positive stimuli. These data confirm that threat may act as a powerful exogenous cue and trigger reflexive shifts in spatial attention toward its location, through a rapid temporal sequence of neural events in parietal and temporooccipital areas, with dissociable neural substrates for engagement benefits in attention affecting activity in extrastriate occipital areas and increased disengagement costs affecting intraparietal cortex. These brainimaging results reveal how emotional signals related to threat can play an important role in modulating spatial attention to afford flexible perception and action.
Accordingly, emotional processes may be intimately linked to action tendencies (Lang, 1979; Frijda, 1986) that can be mapped onto either a defensive motivational system (for negative emotions) or an appetitive motivational system (for pleasant emotions). Heightened vigilance and enhanced allocation of cognitive resources toward motivationally relevant information in the environment,
Introduction Negative emotions such as fear or anger imply both the appraisal of the presence of some nuisance, and the elicitation of appropriate behavioral reactions. Corresponding author. Tel.: +41-22-372-8478/379-5381; Fax: +41-22-379-5402; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56004-2
67
68
particularly when potentially threatening, is certainly among the most crucial adaptive aspects of emotional processing (Frijda, 1986; Eysenck, 1992; Cacioppo and Gardner, 1999), presumably shaped during evolution across many species to promote adequate actions and behaviors in the service of survival (Ohman and Mineka, 2001). In humans, abundant empirical evidence has converged from different fields in clinical psychology and cognitive sciences to indicate that processing of threat-related stimuli can exert strong influences on attentional mechanisms (Fox, 2002; Vuilleumier et al., 2004a), which constitute one of the most central cognitive abilities controlling perception and action. Furthermore, some of these interactions of emotions with attentional and cognitive functions are liable to important modulations by individual or contextual factors, including for instance anxiety, personality traits, learning, or goals and values. Thus, studies of emotion and attention interactions can provide many precious insights into the basic functional architecture of the human mind and more subtle individual differences. Here, we review our recent brain imaging work using event-related potentials (ERPs) and functional resonance imaging (fMRI) that has attempted to uncover the neuroanatomical substrates and temporal dynamics underlying the regulation of attentional resources by emotional processes. In particular, we will review recent data concerning the networks and time-course of neural activity in the healthy human brain that may control the spatial distribution of visual attention in response to threatrelated cues in the peripheral visual field (Fox, 2002; Vuilleumier, 2002). We used electrical and hemodynamic brain-imaging approaches during a very simple, well-standardized experimental paradigm (i.e., the dot-probe task, Bradley et al., 2000), allowing us to explore the sequence of neural events at stake during the different processing stages necessary to orient spatial attention toward behaviorally relevant stimuli (e.g., shift, engage, and disengage Posner et al., 1980), and thus enabling us to examine how emotional signals of threat might influence these different attentional mechanisms. In this classical dot-probe paradigm (Fig. 1A), derived from the spatial orienting task originally designed by Posner et al. (Posner et al., 1980;
Navon and Margalit, 1983), subjects are required to detect a neutral visual target (i.e., a dot or probe) whose location is uncertain from trial to trial but unpredictably preceded by another emotionally significant stimulus (i.e., a cue). Critically, this emotional cue can appear at either the same location as the subsequent target (valid trial) or at another location in the visual field (invalid trials), such that if the emotional value of the cue can exert some capture on the spatial distribution of attention, it will lead to a facilitation of target processing in the former (valid) situation and/or to a distraction from target processing in the latter (invalid) situation (Mogg and Bradley, 1999b). Accordingly, such effects on spatial orienting have been documented by several behavioral studies using a variety of threat-related cues including pictures of aversive scenes (Mogg and Bradley, 1998; Mogg et al., 2000), aversive words (MacLeod et al., 1986; Mogg et al., 1994), emotionally negative faces (Mogg and Bradley, 1999b; Bradley et al., 2000), or aversely conditioned stimuli (Armony and Dolan, 2002; Koster et al., 2004b, 2005). Moreover, these effects have often been found to vary as a function of individual anxiety levels, even in people without clinical manifestations (Fox et al., 2001; Koster et al., 2004b). However, behavioral studies have provided somewhat conflicting results as to whether the emotional effects on spatial attention in the dotprobe task (and the influence of anxiety on these processes) might primarily result from a facilitation of orienting to the targets following valid emotional cues (i.e., via shifting and engaging processes), or conversely from an interference on reorienting to the targets following invalid emotional cues (i.e., on disengaging processes) (Fox et al., 2001; Koster et al., 2004a; Mathews et al., 2004; see Fig. 1B). In our studies, by combining ERPs and fMRI during the dot-probe task, we could distinguish between brain responses to the cues themselves and their potential effect on the subsequent target as a function of cue validity, and thus show distinct emotional influences on orienting and reorienting stages in attention (Fox et al., 2001). While ERPs provide brain-imaging measures with a high temporal resolution allowing the registration of
69
Fig. 1. (A) Example of a valid (left panel) and an invalid (right panel) trial used in classical dot-probe tasks (Mogg and Bradley, 1999b). The target (here a dot probe) can unpredictably appear at one of two possible locations, either replacing (after a short-time interval) the location previously occupied by the emotional stimulus (the so-called valid, here a schematic angry face) or replacing another neutral position (the so-called invalid, here a schematic neutral face). Subjects are required to indicate the position (left or right) of the dot and to ignore the face display. (B) Typical behavioral results obtained in dot-probe tasks (see Bradley et al., 2000; Fox et al., 2001), showing either a facilitation in RTs for valid-fear relative to invalid fear trials (with low-anxious participants, reflecting an engagement benefit, left panel) or conversely an interference in RTs for invalid fear compared to valid fear trials (with high-anxious participants, reflecting a disengagement cost, right panel). No such modulations are reported with positive/happy cues.
neural activity from the scalp on a millisecond time-scale, which is optimal to separate cue and target processing, fMRI provides a measure with a much poorer temporal resolution of the order of 1–4 s but with excellent millimetric anatomical resolution. In addition, we also used modern source localization techniques allowing us to assess the
temporal dynamics of the cascade of successive stages implicated during spatial orienting. Taken together, our imaging data converge to suggest (i) that the deployment of spatial attention toward threat signals (here conveyed by fearful faces) can occur rapidly and produce an enhanced sensory response to targets within the extrastriate
70
visual cortex (at o200 ms); (ii) that this effect might result from a direct modulation of visual cortex by emotional inputs which can facilitate spatial orienting toward the target, through mechanisms controlled by the posterior parietal cortex and activated prior to the enhancement of target processing (at o100 ms); and (iii) that activation of these neural circuits by invalid emotional cues can produce a cost in disengaging attention toward a target presented at another location, by reducing the activity of orienting mechanisms within posterior parietal cortex, but at the same time inducing concomitant increases in the response of other brain regions within the ventromedial prefrontal cortex. These data therefore indicate that neural mechanisms responsible for orienting spatial attention toward threat-related stimuli may partly overlap with those controlling shifts of spatial attention through nonemotional (neutral) cues, but also involve partly distinct mechanisms. We discuss these brain-imaging observations in a more general framework (Vuilleumier, 2005) proposing that emotion and attention may operate through parallel neural pathways that can exert additive modulatory influences on sensory processing in the visual cortex.
Emotion and spatial attention A facilitation of spatial attentional orienting by threat signals has been documented in a variety of behavioral tasks (for review see Vuilleumier, 2002), including not only covert orienting in dot-probe tasks where emotional stimuli can act as exogenous spatial cues (MacLeod et al., 1986; Mogg and Bradley, 1999b; Bradley et al., 2000; Fox et al., 2001), but also visual search tasks where emotionally negative targets are detected faster than neutral or positive targets (Hansen and Hansen, 1988; Fox et al., 2000; Eastwood et al., 2001; Ohman et al., 2001). In many studies, these emotional effects on attention were obtained with negative stimuli, such as fearful or angry faces, whereas positive stimuli produced less consistent effects, suggesting a special role of threat-related cues in neural systems governing emotion and attention interactions. Moreover, some effects of positive emotional cues
on attention might be more frequently found in visual search tasks (Juth et al., 2005; Williams et al., 2005) than in dot-probe orienting tasks, possibly reflecting some intrinsic differences in attentional processes engaged in these two situations. Orienting to dot probes typically involves a single and rapid shift of attention toward brief stimuli that are presented in peripheral visual field; whereas visual search requires serial shifts of attention among stimuli that can be processed by foveal vision and remain visible throughout the task, potentially recruiting other exploratory strategies, including eye movement control. In the dot-probe task, emotional stimuli are generally irrelevant to the task, and subjects are instructed to respond only to the dot-probe target that can unpredictably appear at one of two possible locations (Fig. 1A), either replacing the location previously occupied by the emotional stimulus (the so-called valid) or replacing another neutral position (the so-called invalid). In the initial study using this paradigm (MacLeod et al., 1986), words (one threatening, one neutral) were presented at two separated spatial locations (in the upper and lower visual field) and followed after a short-time interval by a small dot probe at the spatial location of one of the two words. Participants were asked to respond as fast as possible to the dot probe. Results indicated that people with higher anxiety level directed their attention more readily to the location of threat-related words, suggesting a facilitated detection of threat with anxiety (Eysenck, 1992). Subsequent variants of this paradigm (Fox, 1993; Bradley et al., 1997) have used different types of cues to examine emotional influences on spatial attention in both (subclinical) anxious and nonanxious individuals (see Fox, 2002 for an overview). In many studies, stimuli have been presented laterally in either visual hemifield (rather than vertically), but only a few have found reliable differences between hemifields (Mogg and Bradley, 1999a; Hartikainen et al., 2000; Fox, 2002). Remarkably, similar emotional biases in spatial attention have been observed with different types of targets and tasks, including simple detection or more complex discrimination tasks (see Mogg and Bradley, 1999b; Bradley et al., 2000).
71
The dot-probe task can thus provide a useful snapshot of attentional allocation toward emotional stimuli by computing the difference in reaction times and/or accuracy rate for dot-probe targets appearing at invalid minus valid locations (Bradley et al., 2000). Note, however, that the ‘‘attentional bias’’ measure obtained by this invalid–valid difference might in principle reflect either faster orienting to targets at valid locations, or slower reorienting to targets at invalid locations, or both (Fox et al., 2001; see Fig. 1B). In several studies (Fox et al., 2001; Koster et al., 2004a; Mathews et al., 2004), target detection was found to be slowed by threat cues more than by neutral or positive cues presented at an invalid location, even though threat produced no advantage over neutral or positive cues when the targets were presented at valid locations, suggesting that threat stimuli may not necessarily act by attracting attention to their own location, but rather by influencing the disengagement of attention from current fixation. However, other findings demonstrate a substantial facilitation in visual processing for targets presented at locations validly cued by an emotionally negative stimulus (Phelps et al., 2006). Furthermore, the fact that such attentional biases can arise in response to task-irrelevant emotional stimuli also suggests that these effects may operate in a remarkably involuntary manner, even when potentially harmful for optimal performance (e.g., on invalid trials). Some studies reported significant biases under conditions where emotional cues (faces) were masked and rendered invisible (Mogg et al., 1994, 1995; Mogg and Bradley, 1999a). In our own experiments, similar effects were often observed in subjects who failed to note any emotional features in facial cues presented in their visual periphery (Pourtois et al., 2004). Further behavioral studies are needed to investigate whether some changes in the strategic set to respond to peripheral events vs. to focus on central stimuli may affect the degree of involuntary spatial orienting to emotional cues. However, some studies have reported significant interference by peripheral emotional flankers in tasks where subjects always responded to central stimuli only (Fenske and Eastwood, 2003). Similarly, neuropsychological studies in braindamaged patients with impaired spatial attention and hemi-neglect following parietal-lobe lesions
have shown that their detection of stimuli in the contralesional (neglected) visual field is better for emotional than for neutral pictures (Vuilleumier and Schwartz, 2001a,b; Vuilleumier et al., 2002; Fox, 2002). Note that contralesional deficits in these patients were reduced but not abolished for emotional stimuli, indicating that spatial attention was facilitated toward these contralesional events, yet still necessary and clearly compromised to afford normal awareness. Importantly, these results suggest that spatial attention can be captured by contralesional emotional stimuli despite a profound impairment in directing attention toward that side. This may already indicate at least a partial dissociation between brain mechanisms responsible for controlling spatial attention in response to emotional vs. nonemotional cues, such that damage to spatial attention networks in fronto-parietal areas leading to neglect (see Driver and Vuilleumier, 2001; Kerkhoff, 2001; Corbetta and Shulman, 2002) may still leave intact some effects on attention that are triggered by emotionally charged cues (through their valence and/or arousal properties, see Lang et al., 1998; Sabatinelli et al., 2005). Emotional signals may also influence attention during Stroop tasks (Pratto and John, 1991; Williams et al., 1996) and attentional blink experiments (Anderson, 2005). These nonspatial effects are beyond the scope of the present review (see Vuilleumier et al., 2004a), but also indicate a prioritization of processing for emotionally relevant information.
Time-course of spatial orienting to threat locations In a recent ERP study (Pourtois et al., 2004), we used a modified version of the dot-probe task (adapted from Mogg et al., 1994) in normal (nonanxious) adult participants while we recorded high-density EEG (Fig. 2B) to track the timecourse of spatial orienting toward the location of emotional stimuli. Cues were faces with fearful, happy, or neutral expression, appearing briefly prior to a single neutral target presented at the same location as one of these faces (see Fig. 2A). On each trial, two faces were first shown together,
72
Fig. 2. (A) Dot-probe paradigm used in our event-related potential (ERP) and functional resonance imaging (fMRI) experiment, showing the sequence of events within a trial. A bar-probe (target) could unpredictably appear in the upper visual field on either side, following a pair of faces in which one had a neutral expression and one an emotional expression (fearful or happy). The face-target interval was randomly jittered between 100 and 300 ms. At the time of target onset, the vertical or horizontal segment of the fixation cross was slightly thickened to indicate the relevant orientation to be monitored for targets. Participants had to judge the orientation of the peripheral target relative to the thick cross segment. Faces were by themselves always irrelevant to the task. (B) Spatial layout of the 62 sites used in our ERP experiment (Pourtois et al., 2004), positioned according to the extended international 10–20 System (Oostenveld and Praamstra, 2001). Two positions over the occipital lobe are marked: a right occipito-temporal electrode (PO8 where the P1 was maximal, in black) and an occipito-parietal electrode along the midline (POZ where the negative C1 to upper visual stimuli was maximal, in gray). (C) Grand average ERPs time-locked to the onset of the face display (fearful and happy faces collapsed). A clear C1, lateral occipital P1 and occipito-parietal N1 components were recorded (the C1 was larger for fearful compared to happy faces, data not shown). (D) Grand average ERPs time-locked to the onset of the bar probe (all conditions collapsed). Again, a conspicuous C1, lateral occipital P1 and occipito-parietal N1 components were recorded.
for a duration of 100 ms, one in the left visual field (LVF) and one in the right visual field (RVF), one neutral and one with an emotional expression (fearful or happy, Ekman and Friesen, 1976). The faces were then replaced by a small bar-probe (duration of 150 ms), oriented either vertically or horizontally, appearing at the position just occupied
by one of the faces (Fig. 2A). All stimuli (faces and bar-probe) were presented in the upper visual field to allow us to measure early retinotopic responses in ERPs (Jeffreys and Axford, 1972; Clark et al., 1995). Participants were asked to perform a go/no-go matching task in which they had to judge, on each trial, whether the orientation of
73
the bar-probe (in the LVR or RVF) matched that of the thicker line-segment within the fixation cross. The task was to press a button only when the bar orientation was the same as the thicker line of the cross (infrequent go trials), but to withhold responses otherwise (more frequent no-go trials). This task ensured that participants maintained their gaze on the central fixation cross (as confirmed subsequently by eye-tracking data, see below) and that all visual inputs were indeed restricted to the upper (peripheral) visual field. This also allowed us to record ERPs uncontaminated by any motor-related activity, since only ERPs for no-go trials were analyzed (Fig. 2C, D). Critically, the bar-probe could appear either on the side of the emotional face (valid condition) or on the side of the neutral face (invalid condition), in an unpredictable (50% valid and 50% invalid) and randomized manner. However, faces were entirely irrelevant to the participants’ task. Thus, spatial validity of the target-bar was arbitrarily defined by the position of the preceding emotion face expression. Moreover, since participants had to fixate the central cross, emotional cues appeared in a truly unattended location, which allowed us to properly assess any spatial biases in distribution of attention to peripherally presented probes (see Fox et al., 2001). We used only short-time intervals between the face pair and the bar onset (100–300 ms, systematically randomized) to tap exogenous mechanisms of spatial orienting (Egeth and Yantis, 1997). Our main question was whether sensory responses to the peripheral bar-probes would be enhanced when replacing an emotional (valid) face, rather than a neutral (invalid) face, as predicted if spatial attention was involuntarily oriented toward that particular location (see Bradley et al., 2000; Armony and Dolan, 2002; Vuilleumier, 2002); and whether such attentional bias would differ between negative (fearful) and positive (happy) emotional cues. Our main comparison therefore concerned the amplitude or latency of ERP generated by the exact same bar-probe as a function on the different emotional values of the preceding face context (Fig. 2D). We predicted that any spatial orienting of attention should affect early visual processing stages activated by the target-bar, consistent with
sensory gain or enhanced vigilance mechanisms (Heinze et al., 1990; Luck, 1995; Hopfinger and Mangun, 1998; Hillyard et al., 1998; Carrasco et al., 2000; Keil et al., 2005). Behaviorally, our modified version of the dotprobe paradigm was successful to trigger exogenous shifts of spatial attention toward the side of the negative face cues (Mogg et al., 1994; Mogg and Bradley, 1999b; Bradley et al., 2000). Participants showed a better discrimination of bar orientation when the latter appeared in valid rather than invalid locations, as demonstrated by higher d0 values from signal detection theory (Green and Swets, 1966), and this spatial validity effect was significantly greater for fearful than happy faces (see Mogg and Bradley, 1999b). Conventional analyses (Picton et al., 2000) on the exogenous visual ERPs confirmed that fearful faces (but not happy faces) significantly modulated the early sensory processing of bar-probes appearing at the same location. The lateral occipital P1 component peaking at 135 ms post-stimulus onset (Heinze et al., 1990; Luck et al., 1990) was significantly enhanced when the target-bar replaced a valid fearful face as compared with an invalid neutral face (Fig. 3A–C), even though the bars were always physically identical but differed only due to the preceding emotional face. This effect on P1 was equally present for bar probes shown in the upper LVF or RVF, in agreement with our behavioral results that did not show any hemispheric asymmetry in spatial orienting to threat faces (but see Mogg and Bradley, 1999b; Fox, 2002 for greater facilitation of RTs in the LVF in a similar paradigm). Source estimation methods (Pascual-Marqui et al., 1994) further confirmed that the P1 component was generated in the extrastriate visual cortex (Fig. 3D), including the middle occipital gyrus and inferior temporal gyrus (Pourtois et al., 2004). Such neural sources are consistent with previous ERP studies on P1 responses that were found to be modulated by spatial attention in tasks using nonemotional stimuli (Clark et al., 1995; Martinez et al., 1999; Di Russo et al., 2002, 2003). These results therefore suggest an amplification of sensory responses to a neutral visual stimulus (bar-probes) taking place at early processing stages
74
Fig. 3. (A) Grand average waveforms in the fear condition (electrode PO8). The black vertical line bar indicates the onset of the bar probe (target). The P1 (area highlighted by an orange shaded area) was larger for fear valid compared to fear invalid trials, although the target stimulus was the exact same in these two conditions. (B) Mean global field power (GFP, see Lehmann and Skrandies, 1980) recorded 130–140 ms post bar-probe onset (corresponding to a temporal window where P1 amplitude was maximal) across the different experimental conditions. A signal increase is observed in the fear valid condition relative to the three other conditions (fear invalid, happy valid, and happy invalid), as indicated by a significant validity emotion interaction. (C) Voltage maps for the P1 in the fear valid and fear invalid conditions (in the same 130–140 ms time interval following bar-probe onset) showing a more prominent P1 scalp topography in the former than the latter condition but without any qualitative change in the dipolar configuration of this map across conditions (amplitude modulation only). (D) Inverse solution by LAURA (Grave de Peralta Menendez et al., 2004) for the P1 revealing distributed brain sources in the extrastriate occipital cortex including the middle occipital gyrus (red line) and inferior temporal gyrus (blue line).
within extrastriate visual cortex, induced by the preceding emotional face presented at the same location. A greater amplitude of P1 activity in extrastriate visual cortex in response to bar probes following a fearful face also extend the related ERP findings by Stormark et al. (1995) who used emotion words (rather than facial expressions) and found enhanced P1 and P3 components for invalid
trials, but with a longer SOA between the cue and the target (600 ms) as compared with the present study. In our study, we found no effect on the latency of ERPs (Pourtois et al., 2004). The effect of threat signals on visual processing is therefore strikingly similar to the effect previously obtained with explicit manipulations of spatial attention (Heinze et al., 1990; Hillyard et al., 1998),
75
which is usually thought to operate by gain control mechanisms imposed on visual pathways through top-down signals from parietal and frontal areas. Such boosting of visual cortical processing may thus provide a neural substrate for recent psychophysical findings showing that the presence of a fearful (as opposed to neutral) face can enhance contrast sensitivity and reduce detection threshold for a following visual stimulus (Gabor patch), an effect that is also magnified with transient covert attention (Phelps et al., 2006). By contrast, we found that positive emotional cues conveyed by happy faces did not produce any effect on P1 responses to bar-probes appearing at valid vs. invalid locations (Fig. 3B). Importantly, in a control ERP experiment in a different group of participants, we could further ascertain that amplitude increases for P1 responses to bar-probe following fearful vs. neutral or happy faces was truly driven by the preceding facial expression rather than any low-level pictorial differences in the facial cues. Thus, no modulation of P1 was found when we used inverted (as opposed to upright) faces, which are known to impair the normal recognition of emotional expression in faces (Searcy and Bartlett, 1996). Moreover, a quantitative analysis of our face stimuli showed no significant difference in mean luminance, contrast, surface or central spatial frequency content between fearful, happy, and neutral faces (Pourtois et al., 2004). Noteworthy, the effect of fearful faces on ERPs to subsequent target-bars was selective for the lateral occipital P1 component, but did not affect other exogenous visual components, such as the earlier C1 component arising from the primary visual cortex (see Clark et al., 1995) or the subsequent N1 component presumably generated by higher extrastriate areas within occipito-parietal cortex (Vogel and Luck, 2000; Fig. 3A). Spatial validity effects produced by nonemotional exogenous cues (e.g., light flash or abrupt onset) have also been found to affect predominantly the lateral occipital P1 component on valid trials (see Clark and Hillyard, 1996; van der Lubbe and Woestenburg, 1997; Hopfinger and Mangun, 1998), whereas the N1 component is classically not affected by exogenous cues (Hopfinger and
Mangun, 1998) and is more sensitive to attentional manipulations requiring feature discrimination rather than detection (Vogel and Luck, 2000). Likewise, C1 component is thought to reflect initial processing in primary visual cortex that is not affected by spatial attention (Martinez et al., 1999; Di Russo et al., 2003), although primary visual cortex might be modulated at a later delay through feedback mechanisms from higher cortical areas (Martinez et al., 1999; Noesselt et al., 2002). Thus, our results suggest that the spatial orienting induced by fearful faces may operate through a modulation of early exogenous visual response in extrastriate areas, and act on the same processing pathways as traditional influences of spatial attention controlled by fronto-parietal mechanisms.
Early responses to emotional faces preceding visual targets One limitation of behavioral measures in the dotprobe task is that the attentional biases can only be measured at the time when the target-probe is presented, while any effect triggered by emotional cues themselves, prior to the subsequent attentional consequence on target processing, cannot directly be registered since behavioral measures provide only a snapshot of attentional focus by comparing the different responses to targets (Mogg and Bradley, 1999b). In our ERP study, however, continuous EEG recordings could be obtained not only in response to the bar-probe (as described above) but also in response to the preceding face pair (Fig. 2C). This allowed us to assess whether fearful or happy faces in the initial cue display actually produced any differential brain response already prior to the onset of the target (bar-probe). Thus, although our main question concerned how the location of emotional faces in the cue display affected ERPs time-locked to the subsequent bar-probe (see above), we also analyzed ERPs time-locked to the face pairs (Pourtois et al., 2004). This analysis revealed a striking effect of emotion on very early visual responses to faces, affecting the C1 component (Clark et al., 1995), which was selectively enhanced by fearful expressions. Thus,
76
C1 had a significantly higher amplitude for displays with a fearful face than a happy face, irrespective of the visual hemifield in which the fearful face was presented, with mean latency of 90 ms post onset. The scalp topography of this effect showed a negative polarity that corresponded to the expected response to visual stimulations (bilateral face display) presented in the upper visual field (Clark et al., 1995). A distributed source localization estimate also confirmed that this C1 response was evoked by main generators in two regions of the occipital visual cortex, including the cuneus and lingual gyrus, corresponding to early striate cortex (see Pourtois et al., 2004). This early valence effect of face expressions on ERPs to the cue display was not observed for the subsequent P1 or N170 components elicited by these faces. Moreover, this also allowed us to rule out the possibility that the modulation of P1 amplitude to targets could reflect a mere carry over effect, for example, due to some slow potential wave following emotional faces that would persist and contaminate subsequent ERPs to bar-probes (see Walter et al., 1964) despite our carefully randomized SOAs between faces and targets. Furthermore, analysis of low-level visual features in faces did not suggest any difference that could have caused early C1 increases (see above), and our control experiment with inverted faces did not produce a similar effect. Such early effects of emotional expression have not been previously reported for ERPs to fearful faces, which usually evoke later modulations in the P1, N170, and/or subsequent components (Pizzagalli et al., 1999, 2002; Krolak-Salmon et al., 2001; Eimer and Holmes, 2002; Batty and Taylor, 2003; Ashley et al., 2004; see Vuilleumier and Pourtois, in press for a recent overview). However, most studies on emotional faces have always presented stimuli only centrally or aligned on the horizontal meridian, which may prevent a reliable C1 component evoked by more peripheral stimulation in upper or lower visual fields (see Jeffreys and Axford, 1972; Clark et al., 1995). However, a previous MEG study (Halgren et al., 2000) has already reported a similar early visual effect for centrally presented emotional faces (sad vs. happy faces) arising in the occipital striate cortex. The
early latency (110 ms) of this emotion effect was puzzling, given that the authors only found a reliable difference between faces and scramble faces (i.e., basic categorical effect) at a later latency, around 165 ms post-stimulus onset, with sources located more anteriorly within the fusiform gyrus. According to Halgren et al. (2000), this early differential response to emotional face expression in early visual areas (V1) could serve to rapidly decode these socially relevant stimuli in distant regions such as the amygdala, which begins to respond to faces at 120 ms (Halgren et al., 1994) and which receives projections from early visual areas (Amaral et al., 2003). This could explain the preserved emotional processing of faces sometimes reported in prosopagnosic patients with occipitotemporal damage (Tranel et al., 1988). The nature of this C1 effect remains unclear and needs further replications; but it might also reflect a rapid modulation of primary visual cortex by reentrant feedback from emotion-related areas such as the amygdala (Anderson and Phelps, 2001; Amaral et al., 2003; Vuilleumier et al., 2004b), possibly activated at similar or even earlier latencies (Krolak-Salmon et al., 2004; Halgren et al., 1994; see Kawasaki et al., 2001 for early visual responses to emotional stimuli in the ventral prefrontal cortex), or alternatively reflect some other deeper sources activated at the same latency and contributing to a similar occipito-parietal scalp topography as the classic C1 component. We further asked whether this early effect of fearful faces on C1 responses might be functionally related to the subsequent enhancement of responses to the bar-probes. To indirectly address this issue, we examined whether there was any relationship between the amplitude of the C1 evoked by the face cues, and the magnitude of the validity effect on amplitude of P1 evoked by the bar target (Pourtois et al., 2004). Strikingly, we found a significant positive correlation between C1 timelocked to faces and enhancement of P1 timelocked to subsequent bar-probe, selectively arising in the fear condition, whereas there was no significant correlation in the happy condition. This correlation suggests that, even though the time interval between the two stimuli varied randomly, the larger the C1 response to a fearful face in the
77
peripheral visual field, the larger the subsequent validity effect on the occipital P1 evoked by a bar-probe appearing at the same location. These data provide indirect support for the idea that direct feedback from amygdala on early visual cortex might induce a sustained boosting of sensory processing and attention to visual stimuli (Amaral et al., 2003; Vuilleumier et al., 2004b); and it raises further questions about whether such boosting can be retinotopic or hemifield/hemispheric-specific. In keeping with this idea, several previous fMRI studies have shown that fearful faces can induce a greater activation of face-sensitive regions in fusiform cortex as compared with neutral stimuli (Morris et al., 1998; Vuilleumier et al., 2001; Armony and Dolan, 2002; Surguladze et al., 2003; see Sabatinelli et al., 2005 with emotional scenes), but similar emotional enhancements were also found in primary visual cortex (Vuilleumier et al., 2001; Pessoa et al., 2002) and in more lateral occipital areas (Lane et al., 1998). These fusiform and occipital increases to emotional faces are likely to depend on direct feedback from the amygdala because they are abolished in patients with amygdala lesions (Vuilleumier et al., 2004b) but persist in patients with parietal lesions (Vuilleumier et al., 2002). Moreover, such effects of emotion in early visual cortex may interact with selective attention and be further augmented when emotional stimuli appear in task-relevant location (Vuilleumier et al., 2001; Pessoa et al., 2002). It remains, however, unclear to what extent these increases can only affect the processing of emotional stimuli themselves, or also affect the processing of neutral stimuli following emotional cues as in our dot-probe paradigm.
Cascade of neural events and source of sensory gain in extrastriate visual cortex Our findings of increased P1 amplitude (without any change in the latency) for visual targets cued by fearful faces converge with similar effects observed in electrophysiological studies of spatial attention using nonemotional cues (van der Lubbe and Woestenburg, 1997; Hopfinger and Mangun, 1998; Hillyard and Anllo-Vento, 1998).
Importantly, in these studies, the enhancement of P1 amplitude for attended relative to unattended stimuli was not associated with any concomitant change in latency, waveform, or scalp voltage topography of this component, suggesting that the effect of spatial attention on target processing may primarily correspond to a gain control mechanism arising in identical visual pathways (Luck, 1995; Hillyard et al., 1998; Carrasco et al., 2002). According to this model, spatial attention is thought to operate as an amplification of visual processing via top-down signals from fronto-parietal areas, which are activated prior to target onset in the case of preparatory/endogenous attention or at an earlier latency post-stimulus onset in the case of reflexive/exogenous attention. These fronto-parietal areas can then in turn enhance the neural responses in extrastriate cortex (see Kastner and Ungerleider, 2000; Hopfinger et al., 2000; Corbetta and Shulman, 2002). In a follow-up study using our dot-probe paradigm (Pourtois et al., 2005), we therefore, tested whether any differential neural activity (e.g., within the fronto-parietal network) might precede the amplitude modulation of P1 responses to barprobes and thus correspond to a possible source of attentional biases in spatial attention. To this aim, we employed a different approach than our previous waveform analysis and turned to topographical segmentation methods (Lehmann and Skrandies, 1980; Michel et al., 1999) that allowed the identification of subtle changes in the topographic configuration of scalp EEG over time. Such topographic changes can arise independently of component waveforms and independently of differences in field strength (i.e., the amplitude of waveforms; see Lehmann and Skrandies, 1980). A standard topographic analysis (Lehmann and Skrandies, 1980; Michel et al., 1999) was performed on ERP data from the same dot-probe task as used previously (Pourtois et al., 2004), which provides a spatio-temporal segmentation of the successive field configurations activated during the sequence of stimulus processing (usually referred to as ‘‘microstate’’, Lehmann and Skrandies, 1980). The rational of this approach is to identify a series of statistically distinct topographic configurations (i.e., activity maps) over the
78
time-course of the evoked neural responses, reflecting the succession of different functional states engaged by stimulus processing, with the underlying assumption that topographic changes may denote the activation of distinct neural sources (Brandeis and Lehmann, 1986). Given that only landscape or topographic differences are of interest in this spatio-temporal cluster analysis, dominant maps are normalized to unitary strength values by dividing the voltage at each electrode by a measure of the global field power (Lehmann and Skrandies, 1980). Here again, we focused on responses evoked by the bar-probes, as a function of the preceding emotional face context. This new analysis (Pourtois et al., 2005) first showed that EEG activity during the timerange of the P1 component evoked by bar-probes (120–160 ms) did not exhibit any differences in topographic configuration across the different emotion and validity conditions of preceding face cues, whereas there was a significant increase in the strength of this P1 topography (as indicated by a higher global field power, Lehmann and Skrandies, 1980) for targets following a valid fearful face as compared with an invalid fearful face (Fig. 3C). This pattern fully agrees with our hypothesis of a gain control mechanism enhancing visual target processing subsequent to orienting of spatial attention toward threat-related cues (Pourtois et al., 2004), as typically found with attentional orienting based on nonemotional cues (Hillyard et al., 1998). More importantly, this analysis revealed the existence of an early (o100 ms post bar-probe onset) and stable (40–80 ms) topographical map that reliably distinguished valid from invalid target trials during the fear condition (Fig. 4A, B), just
preceding the topographical maps corresponding to P1. No differential activity was seen for valid vs. invalid trials in the happy condition, although the same map was also present prior to P1. The neural sources estimated for this map were clearly distinct from extrastriate occipital sources associated with P1, and involved cortical generators in posterior temporal and posterior parietal regions instead (Fig. 4C). In other words, these data indicate that an early topographic microstate (at 40–80 ms post-target onset) was differentially activated when targets appeared at the same vs. different location as a fearful face; and that this distinctive configuration of neural activity preceded another subsequent microstate (at 120–160 ms) corresponding to P1, whose generators did not differ but whose amplitude was enhanced for valid vs. invalid targets following fearful cues (Pourtois et al., 2005). These results are consistent with the idea that a first sweep of activity in posterior temporal and parietal regions might take place rapidly after a visual target onset (Bullier, 2001; Foxe and Simpson, 2002) and possibly provide the signal for subsequent top-down control of target processing (Hopfinger et al., 2000; Kastner and Ungerleider, 2000; Bullier, 2001). Remarkably, we also found that these two consecutive neural events were positively correlated (Pourtois et al., 2005; Fig. 4E), suggesting some functional coupling between the early posterior parietal activity (40–80 ms) and subsequent P1 activity (120–160 ms). The variance of the early 40–80 ms map (indexing the strength of its expression across subjects) showed a significant positive linear correlation with the variance of the next
Fig. 4. (A) Grand average waveforms in the fear condition (electrode PO8) time-locked to the onset of the bar probe (valid and invalid trials collapsed). Before the onset of the P1 (area highlighted by an orange shaded area), there was a significant topography difference between valid and invalid trials in the fear condition, although no exogenous electric component was detectable at this specific lateral occipito-temporal site (PO8) during this early time-period (40–80 ms post bar probe onset). (B) Voltage maps in the fear valid and fear invalid conditions in the 40–80 ms post bar probe showing a significant modulation of the global scalp configuration (with no change in amplitude). (C) Statistical parametric mapping provided by LAURA indicated that brain regions that were more activated by fear valid than fear invalid trials in the 40–80 ms post bar-probe onset were mainly located in the left posterior parietal cortex (po0.001, uncorrected). (D) Conversely, fear invalid trials evoked more activation than fear valid trials in medial frontal regions (corresponding to rostral anterior cingulate cortex (ACC; po0.01, uncorrected) during the same time interval (40–80 ms post bar-probe onset). (E) There was an enhanced positive correlation between this early scalp map (40–80 ms post stimulus onset) and the directly following P1 map in the fear valid condition (r ¼ 0.55, p ¼ 0.03) but a clear attenuation of this correlation in the fear invalid condition (r ¼ 0.004, p ¼ 0.50), suggesting an enhanced coupling between these two successive functional microstates in the fear valid condition.
79
80
120–160 ms P1 map, which was present across all conditions but selectively enhanced for valid targets following fearful faces, and selectively suppressed for invalid targets following fearful faces (Fig. 4E). This enhanced coupling between parietal and extrastriate activity might provide a plausible neural mechanism underlying the facilitation in orienting spatial attention toward targets appearing at the location of threat-related cues (Fox, 2002; Vuilleumier, 2002). If early activity in posterior parietal and temporal regions following target onset (o100 ms) is implicated in the generation of top-down signals to influence ongoing visual processing in occipito-temporal areas (120–200 ms) (Kastner and Ungerleider, 2000; Bullier, 2001), then these functional influences appear to be enhanced for targets at valid conditions and disrupted for targets at invalid locations following fearful faces, with no such effect for happy faces. Conversely, our topographical analyses revealed that ERPs to bar-probes invalidly cued by fearful faces were associated with a distinctive pattern of activity at the same early latency (40–80 ms), replacing the posterior parietal and temporal activation related to spatial orienting (Fig. 4B, D). Neural sources for this distinct map were located within ventromedial prefrontal areas, including the rostral anterior cingulate cortex (ACC) (Fig. 4D). Such activation in rostral ACC may be consistent with a role in controlling attention in conflict situations (MacDonald et al., 2000) and error processing (Carter et al., 1999), particularly based on affective or motivational signals (Bush et al., 2000; Bishop et al., 2004a) or related to breaches in implicit expectation generated by invalid cues (Nobre et al., 1999). If spatial attention was reflexively oriented toward the location of the threat-relevant stimulus in face display, then a target appearing on the opposite side might require resolving a potential conflict between responding to the task-relevant stimulus and disengaging from the emotionally alerting stimulus. This selective invalidity effect of fearful faces in ERPs also converge with behavioral findings suggesting that negative stimuli may not only draw more easily, but also hold spatial attention more strongly than neutral stimuli, and thus lead to greater invalidity cost rather than greater validity
benefit in covert orienting task (see Fox et al., 2001; Koster et al., 2004b). Problems in disengaging from threat signals might be particularly important in people with higher anxiety (even at subclinical level, see Fox et al., 2001; Mathews et al., 2004). Accordingly, increases in rostral ACC activity when ignoring emotional stimuli (Vuilleumier et al., 2001) have been found to be greater in anxious than nonanxious individuals (Bishop et al., 2004a). Taken together, our ERP data (Pourtois et al., 2004, 2005) reveal a precise cascade of neural events involved during spatial orienting to peripheral visual targets, with selective influences of emotional cues conveyed by fearful faces. Targets following valid fearful faces evoked an enhanced exogenous visual response in extrastriate cortex (presumably through gain control mechanism), preceded by specific enhancement of activity in posterior parietal cortex and posterior temporal regions; whereas the same targets appearing on the side opposite to a fearful face evoke no differential visual responses but greater activation of medial prefrontal regions. Collectively, these results have begun to provide novel insights into brain mechanisms by which fear-related signals can mobilize processing resources (Ohman and Mineka, 2001) and trigger a prioritization of spatial attention toward their location (Vuilleumier, 2002, 2005). In a follow-up fMRI study, we identified brain regions underlying this capture of spatial attention by threat-related stimuli.
fMRI correlates for benefits and cost in spatial attention produced by threat cues Converging neurophysiological, neuropsychological, and neuroimaging studies (see Driver and Vuilleumier, 2001; Kanwisher, 2001) have now clearly established how nonemotional (neutral) exogenous (e.g., light flash or abrupt onset/offset) or endogenous/symbolic cues recruit specific brain regions associated with visual spatial attention in a variety of paradigms (Gitelman et al., 1999; Corbetta et al., 2000; Hopfinger et al., 2000; Woldorff et al., 2004). A distributed cortical network of dorsal regions in fronto-parietal cortex
81
directly participating to the control of spatial attention, including the intraparietal sulcus (IPS) and frontal eye field (FEF), is crucially involved in the voluntary or endogenous control of spatial attention (Mesulam, 1998; Kastner and Ungerleider, 2000), whereas a more ventral cortical network in the ventrolateral prefrontal cortex (VLPFC) and temporoparietal junction (TPJ) contributes to the detection of unexpected, behaviorally relevant or salient stimuli, with the latter ventral system interacting with the dorsal system during involuntary or exogenous shifts in attention (Downar et al., 2000; Corbetta and Shulman, 2002). Moreover, the more dorsal areas (FEF and IPS) may be rapidly activated following stimulus onset (see Bullier, 2001; Foxe and Simpson, 2002) to act as a regulatory source and directly contribute to the top-down selection of stimuli and responses in distant brain areas located within the temporal, frontal, and occipital lobe. In this perspective, this dorsal network plays a crucial role during the deployment of spatial attention and can bias activity in remote visual areas to narrow sensory processing onto relevant visual targets (Hopfinger et al., 2000). Posterior parietal regions are then responsible for imposing feedback signals on sensory areas to enhance processing of behaviorally relevant, attended events (Kastner et al., 1999). Moreover, when attention is oriented reflexively toward behaviorally relevant or salient (but emotionally neutral) stimuli, the same dorsal fronto-parietal network can be activated together with the TPJ and VLPFC, to promote a shift of processing resources toward the new events (Downar et al., 2000; Corbetta and Shulman, 2002; Peelen et al., 2004). However, very few PET or fMRI studies have investigated whether the same fronto-parietal areas are also differentially activated during shifts of spatial attention in response to emotionally threatening stimuli. In a pioneer study, Fredrikson et al. (1995) found increased activation in superior parietal and frontal regions for fear-conditioned stimuli and suggested that such effects might reflect increased attention toward these stimuli. Another brain imaging study by Armony and Dolan (2002) was the first to use a dot-probe task in which aversively conditioned faces served as
cues and were presented in either visual field in an event-related manner, preceding the dot probe at either a valid or invalid location. Armony and Dolan (2002) reported an activation of frontoparietal areas during shifts of attention toward a unilateral dot probe when the aversively conditioned face stimulus (CS+) was briefly presented at a different location in the visual field prior to the target onset (see Mogg et al., 1994; Bradley et al., 2000), suggesting an involuntary capture of attention by the aversive face. We recently conducted an event-related fMRI study (Pourtois et al., 2006) using the same dotprobe paradigm as in our previous ERP work (Pourtois et al., 2004; Fig. 2A) to identify neural substrates responsible for a spatially selective modulation of attention when threat cues were conveyed by fearful expression in faces, rather than by prior explicit conditioning (unlike prior studies, see Fredrikson et al., 1995; Armony and Dolan, 2002). Our aims were to determine how brain responses to a neutral visual target might be altered when preceded by emotional signals at valid or invalid locations, as previously investigated by the modulation of visual ERP components recorded on the scalp (Pourtois et al., 2004). However, given the slow temporal resolution of fMRI (Bandettini, 1999), it was not directly possible to separate neural effects related to the processing of emotion in face pairs (cue) from those related to the processing of subsequent bar-probes (target). Moreover, in our ERP study, we used short temporal intervals between faces and targets (100–300 ms) to assess purely reflexive orienting mechanisms (Egeth and Yantis, 1997), such that any fMRI activation in this context could only correspond to a compound of neural responses to both the face pair (cue) and barprobe (target). For this reason, to identify a true modulation of target processing, distinct from the effect of emotional faces by themselves, we designed a slightly modified version of the dot-probe paradigm during fMRI and introduced ‘cue-only’ trials (face pairs with no subsequent target) that were unpredictably intermingled with the ‘‘cue-plus-target’’ trials (face pairs followed by a unilateral target at valid or invalid location, similarly to our ERP study, see Pourtois et al., 2004). A similar approach has been used in neuroimaging studies of spatial
82
attention during typical, nonemotional version of Posner orienting task (see Corbetta et al., 2000; Shulman et al., 2002; Woldorff et al., 2004), allowing a clear distinction of brain activity related to cueing effects from target processing. A second change in this fMRI experiment as compared with our initial EPR study (Pourtois et al., 2004) concerned the task: in the fMRI experiment, subjects were asked to respond on each ‘‘cue-plus-target’’ trial and to judge whether the orientation of the bar-probe matched (50%) or did not match (50%) the orientation of the thicker line-segment of the fixation cross on that particular trials (by pressing two different keys; see Pourtois et al., 2006). Recall that in our earlier ERP study, subjects were required to press a button only in a minority of trials where the peripheral bar’s orientation matched the thick fixation-cross segment (see Pourtois et al., 2004). Thus, in our fMRI experiment, the dual task requirement and the frequent withdrawal of responses did not provide a pure measure for spatially selective orienting effects in valid trials, even though different detection or discrimination judgments have been reported to produce similar attentional biases in emotional dot-probe tasks (Mogg and Bradley, 1999b). Otherwise, all stimuli and conditions were similar in this fMRI study as in our previous ERP study (with fearful and happy emotional expressions, unpredictably shown at valid and invalid locations, equally probable and randomized). Critically, fMRI results in our modified dotprobe task disclosed a pattern of brain activation clearly indicating that fearful faces again had a unique impact on spatial orienting of attention toward the subsequent bar-probe target, whereas happy faces did not produce similar effects. First, following cue displays with a fearful face, barprobes appearing at valid locations were found to produce an increased neural response in lateral occipital cortex, as compared with bar-probes appearing at invalid locations (Fig. 5). There was no difference in activation of the lateral occipital cortex when comparing fearful and happy faces on ‘‘cue-only’’ trials, indicating that this occipital activation reflected target processing, not face processing. Thus, consistent with previous ERPs showing enhanced P1 response in extrastriate
regions, fMRI showed that occipital responses to visual targets were enhanced by their presentation on the same (valid) rather than the opposite (invalid) side as a preceding fearful face, even though the bars and their orientation were otherwise identical across these two conditions, and faces always task-irrelevant. The lateral occipital cortex is known to be critically involved in visual shape recognition (Grill-Spector et al., 2001; Kourtzi and Kanwisher, 2001), and might therefore be enhanced by perceived threat signals (see also Lane et al., 1998). Note, however, that we found no difference in these occipital responses as a function of the side of fearful faces, and no effect of emotion or validity in early visual areas within the lingual gyrus that exhibited different retinotopic responses to targets in LVF or RVF. These data do not seem to support the idea that fearful stimuli may produce retinotopic increases in early visual areas. More importantly, the comparison of eventrelated fMRI responses to valid vs. invalid targets following fearful faces showed a unique pattern of spatially selective effects in the IPS in both hemispheres — an area playing a major role in spatial attention (Corbetta and Shulman, 2002). Right IPS was unresponsive to targets in the RVF after an invalid fearful face presented contralaterally on the left side, whereas the left IPS was conversely unresponsive to targets in the LVF after an invalid fearful face in the contralateral RVF, even though both IPS could respond to targets on either side in other conditions (see Fig. 6). IPS responses to peripheral targets were selectively reduced when targets were presented in the ipsilateral visual field after a fearful face in the contralateral (invalid) hemifield (Fig. 6). These data therefore suggest some suppression in the processing of ipsilateral targets when attention was focused on the contralateral side during invalid fear trials. By contrast, IPS was strongly activated by targets in the ipsilateral hemifield when preceded by a fearful face at the same (valid) location. In other words, for both hemispheres, attentional processing in IPS was apparently restricted to contralateral targets following a fearful face on that same side, but more bilaterally responsive to targets on either side in the other conditions. These fear-dependent spatial invalidity effects led to a significant emotion
83
Fig. 5. (A) Right extrastriate cortex activation (random effect analysis carried out in SPM2, po0.005, uncorrected) obtained in the fear valid4fear invalid contrast, irrespective of the spatial location of targets. This fMRI results is consistent with the ERP result showing an early enhanced sensory processing in the extrastriate occipital cortex for bar probes in the fear valid condition. (B) Mean beta signal estimates (proportional to percentage of signal change) for this right occipital region, across the different conditions of cueplus-target trials (* indicates significant comparison).
validity interaction in IPS and found symmetrically in both hemispheres. These spatially selective effects on target processing in IPS are highly consistent with the idea that the presentation of a fearful face in visual periphery might lead to a transient focusing of attentional resources mediated by contralateral parietal areas toward that side, with a relative suppression of responses to visual events occurring at other locations (Fig. 6). Moreover, no such effect was found in IPS when comparing contralateral and ipsilateral fearful faces in ‘‘cue-only’’ trials, suggesting a spatially selective modulation of IPS responses to targets by
invalid fearful cues (see Kincade et al., 2005) rather than a response to emotional faces alone. Again, we found no effect of happy faces (valid or invalid) in IPS. Furthermore, eye-tracking data during fMRI allowed us to ensure that our subjects made no systematic eye movement toward either the left or right upper visual field across the different stimulus conditions, but correctly maintained fixation on the central cross, ruling out the possibility that IPS activity may reflect different saccadic behavior during the task (Pourtois et al., 2006). These brain-imaging data clearly show that threat-related cues may not only draw covert
84
Fig. 6. (A) Left IPS activation (random effect analysis carried out in SPM2, po0.001, uncorrected) obtained in the fear valid4fear invalid contrast for bar probes presented in the LVF, overlaid on a single-subject T1-weighted anatomical template (normalized to MNI space). (B) Mean beta signal estimates (proportional to percentage of signal change) for this left IPS region, across the different conditions of cue-plus-target trials. (C) Right IPS activation (po0.001, uncorrected) obtained in the fear valid4fear invalid contrast for bar probes presented in the RVF. (D) Mean beta signal estimates for this right IPS region, across the same conditions. In both plots, the critical conditions compared in these contrasts are highlighted in gray (* indicates significant comparison).
attention more efficiently than neutral or positive cues, and thus produce benefits in visual (i.e., occipital) responses to targets presented at the same/ valid location; but may also hold attention more strongly and produce greater cost on reorienting (i.e., IPS) when targets are presented at a different/ invalid location (‘‘disengage’’ effect, see Fox et al., 2001; Koster et al., 2004b for behavioral evidence). In addition, we also found that bar-probes invalidly cued by fearful faces also increased activation in the left lateral orbitofrontal cortex, as compared with validly cued trials. This may be consistent with a role of this region in regulating the allocation of processing resources during breaches of expectation implicitly generated by emotional cues (Nobre et al., 1999), or in the presence of affective or motivational conflicts (Vuilleumier et al., 2001; Bishop et al., 2004a).
This activation of orbitofrontal regions on trials where a fearful face called for attention on one side and a visual target subsequently appeared on the opposite side converges with a similar effect in our previous ERP study (Pourtois et al., 2005), in which neural sources in ventro-medial prefrontal areas were found (Fig. 4D) in the early phase of the orienting response to invalid bar-probes, again following fearful but not happy faces.
fMRI responses to peripheral faces alone All fear-selective effects in our fMRI study described above were identified by examining brain activations for ‘‘cue-plus-target’’ trials, but not ‘‘cue-only’’ trials, indicating that these likely resulted from a spatially selective modulation of
85
target processing by the position of the preceding fearful faces, but not just emotional face processing. However, as in our ERP study (Pourtois et al., 2004) in which we could examine ERPs timelocked to the target onset, as well as ERPs timelocked to the face cues, here we could test for fMRI responses not only during ‘‘cue-plus-target’’ trials but also during ‘‘cue-only’’ trials to determine any effect of peripheral faces alone. Our analysis of ‘‘cue-only’’ trials showed that fearful but not happy faces produced an increased response of the right precuneus in medial occipitoparietal cortex, regardless of the side of fearful faces but with a contralateral predominance. This medial occipital response to fearful face might be consistent with the previous EEG (Pourtois et al., 2004) and MEG (Halgren et al., 2000) results showing increased activity for occipital sources arising at an early latency post face onset, corresponding to C1 time-range (see above), and potentially reflecting some general alerting or arousal effect triggered by fearful faces (Lang et al., 1998; Thiel et al., 2004). On the other hand, we found that peripheral fearful faces in ‘‘cue-only’’ trials produce a selective activation of the inferior temporo-parietooccipital junction on the side opposite to the fearful faces (with no effect of happy faces). These increases might be consistent with a role of ventral cortical regions within the attentional networks, more critically concerned with the detection of behaviorally relevant or unexpected stimuli, rather than with top-down selection or focusing as controlled by the more dorsal IPS (Corbetta and Shulman, 2002). However, unlike a study using similar displays with bilateral faces presented in peripheral visual field (Noesselt et al., 2005), we found no significant increases in fusiform cortex or amygdala in the hemisphere contralateral to fearful faces. Nevertheless, several previous findings suggest that the amygdala (and fusiform) is consistently activated by fearful faces (see Vuilleumier et al., 2004a), and that such amygdala activation might play an important role in triggering subsequent visual orienting of attention to threat locations (Amaral et al., 2003; Vuilleumier, 2005). We believe that this lack of amygdala effects might result from a number of methodological factors,
including an habituation due to repetition of the same faces (Breiter et al., 1996; Phillips et al., 2001; Fischer et al., 2003) and the presence of one fearful or happy face in all bilateral cue displays (Pessoa et al., 2002; Zald, 2003), but amygdala responses to fearful faces is likely to play a crucial role in eliciting attentional orienting behaviors toward threat (Holland and Gallagher, 1999). Taken together, these fMRI data have provided us with a valuable refinement concerning a wide network of brain areas within extrastriate occipital cortex, superior and inferior parietal cortex, and ventromedial prefrontal regions, implicated in attentional biases and spatially selective modulations of target processing induced by emotional cues such as fearful faces. More generally, our brain-imaging results show how emotional stimuli may act as exogenous cues on spatial attention by modulating activity in a network of brain areas that partly overlap with cortical systems previously associated with the control of attention for nonemotional stimuli (see Corbetta and Shulman, 2002), but also partly involve distinct neural system.
Role of anxiety in emotion–attention interactions Many behavioral studies (e.g., Fox et al., 2001; Koster et al., 2004b) have shown that attentional biases induced by emotionally threatening cues in dot-probe tasks can be significantly exaggerated in people with higher anxiety levels, even below clinical levels. Whereas some findings suggest that anxiety may facilitate orienting or engaging attention toward emotional cues (Mogg and Bradley, 1998), other findings indicate a greater difficulty to reorient away or disengage from threat cues (Fox et al., 2001; Koster et al., 2004b; Mathews et al., 2004). However, the neural correlates of these effects of anxiety remain unclear. A number of recent fMRI studies have pointed to heightened responses to negative emotional stimuli in highly anxious subjects in various regions including the amygdala (Bishop et al., 2004b; Etkin et al., 2004; Sabatinelli et al., 2005) or ACC (Bishop et al., 2004a). However, in our own studies, we have found no reliable modulation of amygdala responses to fearful faces in relation to general scores
86
of anxiety trait or state in similar tasks (Vuilleumier et al., 2001, unpublished data), but we observed more consistent effects in ventromedial prefrontal regions (e.g., see Sander et al., 2005). Moreover, in the current series of ERPs and fMRI studies using the dot-probe paradigm, we systematically recorded anxiety scores in our participants. But our preliminary analysis failed to identify straightforward correlations between anxiety levels and ERPs indices related to P1 amplitude or fMRI indices related to validity effects. As a tentative explanation, this might be caused by the weak variation and low score in state and trait anxiety levels (Spielberger, 1983) across our non preselected participants. In any case, further research is therefore needed to explore more systematically the anatomical substrate and timecourse of emotional biases in perception and attention, contrasting large but preselected (and representative) samples of anxious/fearful vs. nonanxious participants (based on a careful selection of more extreme anxiety scores in either the subclinical or clinical range) and using well controlled tasks that may provide more sensitive measures for different attentional subprocesses.
Conclusions Selective attention is an essential cognitive mechanism governing the capacity-limited processing resources of our brains (Marois and Ivanoff, 2005) and promoting an efficient selection of salient or goal-related information in the environment (Posner et al., 1980). Until recently, attention has mainly been studied in conditions where such selection was based on relatively simple bottom-up sensory-driven mechanisms (e.g., pop-out) or higher-level top-down influences (Egeth and Yantis, 1997; Kastner and Ungerleider, 2000). Here we have reviewed recent neuroimaging data (ERP and fMRI) indicating that emotional values, such as threat signals conveyed by fearful faces, can also influence the spatial distribution of selective attention (Mogg and Bradley, 1998; Vuilleumier et al., 2001, 2004a; Fox, 2002; Dolan and Vuilleumier, 2003). Using the classical dot-probe paradigm (Bradley et al., 2000), we could show that
healthy (nonanxious) participants may orient covertly and reflexively to the position briefly occupied by an irrelevant and nonpredictive fearful face in the (upper) visual field, such that this will modify their behavioral performance and brain responses to a subsequent target appearing at the same location. These results suggest that fearful faces may act as powerful exogenous cues and produce a transient involuntary capture of spatial attention, somehow similar to an abrupt onset or offset, or a sudden luminance change (Egeth and Yantis, 1997). By contrast, happy faces do not seem to produce similar effects, suggesting a special role of threat cues in such interactions with mechanisms of covert spatial attention. More specifically, brain imaging data obtained by both ERPs (Stormark et al., 1995; Pourtois et al., 2004, 2005) and fMRI (Armony and Dolan, 2002; Pourtois et al., 2006) during the dot-probe task (Posner et al., 1980; Bradley et al., 2000) are now converging to delineate the precise brain pathways and spatio-temporal dynamics underlying emotional biases in spatial attention. We suggest a current working model that implicates both direct effects of emotional signals from amygdala on sensory processing and indirect effects on attentional systems subserved by parietal cortical areas (Vuilleumier, 2005). Taken together, the extant fMRI results (Vuilleumier et al., 2004b; Noesselt et al., 2005; Pourtois et al., 2006) combined with ERPs results (Pourtois et al., 2004, 2005) indicate that a fearful face cue in peripheral visual field may not only activate the amygdala but also induce rapid feedback to visual cortex to enhance face-sensitive areas as well as earlier occipital areas, possibly within less than 100 ms postonset for the latter region (and probably within 170–200 ms for the former). The effect of direct feedback to occipital areas may then outlast the presentation of the facial threat cue, to reduce sensory threshold in retinotopic or nonretinotopic regions of early occipital cortex during a brief period (e.g., see Phelps et al., 2006), leading to a subsequent facilitation of spatial selection mechanisms directing attention to the same location (or same side), and thus to an enhanced activation of posterior parietal and posterior temporal regions implicated in orienting to a subsequent visual
87
target (through processes presumably activated at o80 ms post-target onset, see Bullier, 2001). This may in turn lead to enhanced sensory processing of the target in the lateral extrastriate occipital cortex, and generate enhanced P1 responses to the target (at o130–150 ms post-target onset), in a similar manner as the result of preparatory baseline shifts of activity imposed by endogenous topdown biases or by other exogenous signals in attention (Hillyard et al., 1998; Kastner et al., 1999; Super et al., 2003; Tallon-Baudry et al., 2005; Liu et al., 2005). This amplitude modulation of early sensory responses to visual targets following a valid emotional cue seems compatible with a gain control mechanism (Hillyard et al., 1998). The electrophysiological properties (i.e., latency, polarity, topography and neural sources in the extrastriate cortex) are entirely consistent with those previously reported by ERP studies of spatial attention using nonemotional exogenous cues (e.g., Hopfinger and Mangun, 1998; Hillyard and Anllo-Vento, 1998; Van der Lubbe and Woestenburg, 2000), suggesting a similar substrate and time-course but distinct sources for the top-down bias signal (Pourtois et al., 2004). On the other hand, when a target follows a threat cue at another location, posterior parietal responses to the target are reduced in the ipsilateral hemisphere, disclosing a spatially selective restriction of attention to the location invalidly cued by the preceding threat, which is consistent with behavioral observations that fearful stimuli do not only draw spatial attention more readily than neutral or positive stimuli, but also hold attention more durably (Fox et al., 2001). In ERPs, this corresponds to a relative suppression of an early microstate in posterior parietal and posterior temporal cortex (o80 ms post-onset) associated with orienting attention to the target. These results reveal dissociable neural substrates for the engage and disengage components of spatial attention with threat-related cues. Concomitantly, disengaging from an invalid fearful face to reorient toward a task-relevant target on the opposite side involves motivational processes in ventromedial frontal cortex and/or executive control processes in anterior ACC that may become activated within the same time-range to resolve any competition
between reflexive emotional signals and goaldriven attentional set. Collectively, our electrical and hemodynamic brain-imaging results highlight the complexity of the spatio-temporal dynamics underlying the prioritization of attention resources to threat. These data converge with other evidence suggesting that some emotional influences originating from phylogenetically ancient systems in ‘‘limbic’’ brain regions can act in parallel with top-down influences traditionally associated with selective attention or executive functions of fronto-parietal cortical areas (LeDoux, 1996; Ohman and Mineka, 2001; Vuilleumier, 2005). In this perspective, emotion is not separated from cognition (Zajonc, 1980) but plays a fundamental role in regulating brain functions involved in perception, attention and adaptive behaviors.
Acknowledgements This work is supported by a grant from the Swiss National Science Fund to PV (grant # 632.065935) and the Swiss National Centre for Competence in Research in Affective Sciences (NCCR grant # 51A240-104897).
References Amaral, D.G., Behniea, H. and Kelly, J.L. (2003) Topographic organization of projections from the amygdala to the visual cortex in the macaque monkey. Neuroscience, 118: 1099–1120. Anderson, A.K. (2005) Affective influences on the attentional dynamics supporting awareness. J. Exp. Psychol. Gen., 134: 258–281. Anderson, A.K. and Phelps, E.A. (2001) Lesions of the human amygdala impair enhanced perception of emotionally salient events. Nature, 411: 305–309. Armony, J.L. and Dolan, R.J. (2002) Modulation of spatial attention by fear-conditioned stimuli: an event-related fMRI study. Neuropsychologia, 40: 817–826. Ashley, V., Vuilleumier, P. and Swick, D. (2004) Time course and specificity of event-related potentials to emotional expressions. Neuroreport, 15: 211–216. Bandettini, P.A. (1999) The temporal resolution of functional MRI. In: Moonen, C.T.W. and Bandettini, P.A. (Eds.), Functional MRI. Springer, Berlin, pp. 205–220. Batty, M. and Taylor, M.J. (2003) Early processing of the six basic facial emotional expressions. Brain Res. Cogn. Brain Res., 17: 613–620.
88 Bishop, S., Duncan, J. and Lawrence, A.D. (2004a) Prefrontal cortical function and anxiety: controlling attention to threatrelated stimuli. Nat. Neurosci., 7: 184–188. Bishop, S., Duncan, J. and Lawrence, A.D. (2004b) State anxiety modulation of the amygdala response to unattended threat-related stimuli. J. Neurosci., 24: 10364–10368. Bradley, B.P., Mogg, K., Millar, N., Bonham Carter, C., Fergusson, E., Jenkins, J. and Parr, M. (1997) Attentional biases for emotional faces. Cogn. Emotion, 11: 25–42. Bradley, B.P., Mogg, K. and Millar, N.H. (2000) Covert and overt orienting of attention to emotional faces in anxiety. Cogn. Emotion, 14: 789–808. Brandeis, D. and Lehmann, D. (1986) Event-related potentials of the brain and cognitive processes: approaches and applications. Neuropsychologia, 24: 151–168. Breiter, H.C., Etcoff, N.L., Whalen, P.J., Kennedy, W.A., Rauch, S.L., Buckner, R.L., Strauss, M.M., Hyman, S.E. and Rosen, B.R. (1996) Response and habituation of the human amygdala during visual processing of facial expression. Neuron, 17: 875–887. Bullier, J. (2001) Integrated model of visual processing. Brain Res. Brain Res. Rev., 36: 96–107. Bush, G., Luu, P. and Posner, M.I. (2000) Cognitive and emotional influences in anterior cingulate cortex. Trends Cogn. Sci., 4: 215–222. Cacioppo, J.T. and Gardner, W.L. (1999) Emotion. Ann. Rev. Psychol., 50: 191–214. Carrasco, M., Penpeci-Talgar, C. and Eckstein, M. (2000) Spatial covert attention increases contrast sensitivity across the CSF: support for signal enhancement. Vision Res., 40: 1203–1215. Carrasco, M., Williams, P.E. and Yeshurun, Y. (2002) Covert attention increases spatial resolution with or without masks: support for signal enhancement. J. Vis., 2: 467–479. Carter, C.S., Botvinick, M.M. and Cohen, J.D. (1999) The contribution of the anterior cingulate cortex to executive processes in cognition. Rev. Neurosci., 10: 49–57. Clark, V.P., Fan, S. and Hillyard, S.A. (1995) Identification of early visual evoked potential generators by retinotopic and topographic analyses. Hum. Brain Mapp., 2: 170–187. Clark, V.P. and Hillyard, S.A. (1996) Spatial selective attention affects early extrastriate but not striate components of the visual evoked potential. J. Cogn. Neurosci., 8: 387–402. Corbetta, M., Kincade, J.M., Ollinger, J.M., McAvoy, M.P. and Shulman, G.L. (2000) Voluntary orienting is dissociated from target detection in human posterior parietal cortex. Nat. Neurosci., 3: 292–297. Corbetta, M. and Shulman, G.L. (2002) Control of goaldirected and stimulus-driven attention in the brain. Nat. Rev. Neurosci., 3: 201–215. Di Russo, F., Martinez, A. and Hillyard, S.A. (2003) Source analysis of event-related cortical activity during visuo-spatial attention. Cereb. Cortex, 13: 486–499. Di Russo, F., Martinez, A., Sereno, M.I., Pitzalis, S. and Hillyard, S.A. (2002) Cortical sources of the early components of the visual evoked potential. Hum. Brain Mapp., 15: 95–111.
Dolan, R.J. and Vuilleumier, P. (2003) Amygdala automaticity in emotional processing. Ann. NY Acad. Sci., 985: 348–355. Downar, J., Crawley, A.P., Mikulis, D.J. and Davis, K.D. (2000) A multimodal cortical network for the detection of changes in the sensory environment. Nat. Neurosci., 3: 277–283. Driver, J. and Vuilleumier, P. (2001) Perceptual awareness and its loss in unilateral neglect and extinction. Cognition, 79: 39–88. Eastwood, J.D., Smilek, D. and Merikle, P.M. (2001) Differential attentional guidance by unattended faces expressing positive and negative emotion. Percept. Psychophys., 63: 1004–1013. Egeth, H.E. and Yantis, S. (1997) Visual attention: control, representation, and time course. Annu. Rev. Psychol., 48: 269–297. Eimer, M. and Holmes, A. (2002) An ERP study on the time course of emotional face processing. Neuroreport, 13: 427–431. Ekman, P. and Friesen, W.V. (1976) Pictures of Facial Affect. Consulting Psychologists Press, Palo-Alto. Etkin, A., Klemenhagen, K.C., Dudman, J.T., Rogan, M.T., Hen, R., Kandel, E.R. and Hirsch, J. (2004) Individual differences in trait anxiety predict the response of the basolateral amygdala to unconsciously processed fearful faces. Neuron, 44: 1043–1055. Eysenck, M.W. (1992) Anxiety: The Cognitive Perspective. Erlbaum Ltd, Hove, UK. Fenske, M.J. and Eastwood, J.D. (2003) Modulation of focused attention by faces expressing emotion: evidence from flanker tasks. Emotion, 3: 327–343. Fischer, H., Wright, C.I., Whalen, P.J., McInerney, S.C., Shin, L.M. and Rauch, S.L. (2003) Brain habituation during repeated exposure to fearful and neutral faces: a functional MRI study. Brain Res. Bull., 59: 387–392. Fox, E. (1993) Allocation of Visual-Attention and Anxiety. Cogn. Emotion, 7: 207–215. Fox, E. (2002) Processing emotional facial expressions: the role of anxiety and awareness. Cogn. Affect Behav. Neurosci., 2: 52–63. Fox, E., Lester, V., Russo, R., Bowles, R.J., Pichler, A. and Dutton, K. (2000) Facial expressions of emotion: Are angry faces detected more efficiently? Cogn. Emotion, 14: 61–92. Fox, E., Russo, R., Bowles, R. and Dutton, K. (2001) Do threatening stimuli draw or hold visual attention in subclinical anxiety? J. Exp. Psychol. Gen., 130: 681–700. Foxe, J.J. and Simpson, G.V. (2002) Flow of activation from V1 to frontal cortex in humans. A framework for defining ‘‘early’’ visual processing. Exp. Brain Res., 142: 139–150. Fredrikson, M., Wik, G., Fischer, H. and Andersson, J. (1995) Affective and attentive neural networks in humans: a PET study of Pavlovian conditioning. Neuroreport, 7: 97–101. Frijda, N.H. (1986) The Emotions. Cambridge University Press, New-York. Gitelman, D.R., Nobre, A.C., Parrish, T.B., LaBar, K.S., Kim, Y.H., Meyer, J.R. and Mesulam, M. (1999) A large-scale distributed network for covert spatial attention: further
89 anatomical delineation based on stringent behavioural and cognitive controls. Brain, 122(Pt 6): 1093–1106. Grave de Peralta Menendez, R., Murray, M.M., Michel, C.M., Martuzzi, R. and S.L., G.A. (2004) Electrical neuroimaging based on biophysical constraints. Neuroimage, 21: 527–539. Green, D. and Swets, J. (1966) Signal Detection Theory and Psychophysics. Robert E. Krieger Publishing Company, Huntington, NY. Grill-Spector, K., Kourtzi, Z. and Kanwisher, N. (2001) The lateral occipital complex and its role in object recognition. Vision Res., 41: 1409–1422. Halgren, E., Baudena, P., Heit, G., Clarke, J.M., Marinkovic, K. and Clarke, M. (1994) Spatio-temporal stages in face and word processing. I. Depth-recorded potentials in the human occipital, temporal and parietal lobes [corrected]. J. Physiol. Paris, 88: 1–50. Halgren, E., Raij, T., Marinkovic, K., Jousmaki, V. and Hari, R. (2000) Cognitive response profile of the human fusiform face area as determined by MEG. Cereb. Cortex, 10: 69–81. Hansen, C.H. and Hansen, R.D. (1988) Finding the face in the crowd –– an anger superiority effect. J. Pers. Soc. Psychol., 54: 917–924. Hartikainen, K.M., Ogawa, K.H. and Knight, R.T. (2000) Transient interference of right hemispheric function due to automatic emotional processing. Neuropsychologia, 38: 1576–1580. Heinze, H.J., Luck, S.J., Mangun, G.R. and Hillyard, S.A. (1990) Visual event-related potentials index focused attention within bilateral stimulus arrays.1. Evidence for early selection. Electroen. Clin. Neuro., 75: 511–527. Hillyard, S.A. and Anllo-Vento, L. (1998) Event-related brain potentials in the study of visual selective attention. Proc. Natl. Acad. Sci. USA, 95: 781–787. Hillyard, S.A., Vogel, E.K. and Luck, S.J. (1998) Sensory gain control (amplification) as a mechanism of selective attention: electrophysiological and neuroimaging evidence. Philos. Trans. R. Soc. Lond. B Biol. Sci., 353: 1257–1270. Holland, P.C. and Gallagher, M. (1999) Amygdala circuitry in attentional and representational processes. Trends Cogn. Sci., 3: 65–73. Hopfinger, J.B., Buonocore, M.H. and Mangun, G.R. (2000) The neural mechanisms of top-down attentional control. Nat. Neurosci., 3: 284–291. Hopfinger, J.B. and Mangun, G.R. (1998) Reflexive attention modulates processing of visual stimuli in human extrastriate cortex. Psychol. Sci., 9: 441–447. Jeffreys, D.A. and Axford, J.G. (1972) Source locations of pattern-specific components of human visual evokedpotentials 1. Component of striate cortical origin. Exp. Brain Res., 16: 1–21. Juth, P., Lundqvist, D., Karlsson, A. and Ohman, A. (2005) Looking for foes and friends: Perceptual and emotional factors when finding a face in the crowd. Emotion, 5: 379–395. Kanwisher, N. (2001) Neural events and perceptual awareness. Cognition, 79: 89–113. Kastner, S., Pinsk, M.A., De Weerd, P., Desimone, R. and Ungerleider, L.G. (1999) Increased activity in human visual
cortex during directed attention in the absence of visual stimulation. Neuron, 22: 751–761. Kastner, S. and Ungerleider, L.G. (2000) Mechanisms of visual attention in the human cortex. Annu. Rev. Neurosci., 23: 315–341. Kawasaki, H., Kaufman, O., Damasio, H., Damasio, A.R., Granner, M., Bakken, H., Hori, T., Howard, M.A. and III and Adolphs, R. (2001) Single-neuron responses to emotional visual stimuli recorded in human ventral prefrontal cortex. Nat. Neurosci., 4: 15–16. Keil, A., Moratti, S., Sabatinelli, D., Bradley, M.M. and Lang, P.J. (2005) Additive effects of emotional content and spatial selective attention on electrocortical facilitation. Cereb. Cortex, 15: 1187–1197. Kerkhoff, G. (2001) Spatial hemineglect in humans. Prog. Neurobiol., 63: 1–27. Kincade, J.M., Abrams, R.A., Astafiev, S.V., Shulman, G.L. and Corbetta, M. (2005) An event-related functional magnetic resonance imaging study of voluntary and stimulusdriven orienting of attention. J. Neurosci., 25: 4593–4604. Koster, E.H.W., Crombez, G., Van Damme, S., Verschuere, B. and De Houwer, J. (2004a) Does imminent threat capture and hold attention? Emotion, 4: 312–317. Koster, E.H.W., Crombez, G., Van Damme, S., Verschuere, B. and De Houwer, J. (2005) Signals for threat modulate attentional capture and holding: fear-conditioning and extinction during the exogenous cueing task. Cogn. Emotion, 19: 771–780. Koster, E.H.W., Crombez, G., Verschuere, B. and De Houwer, J. (2004b) Selective attention to threat in the dot probe paradigm: differentiation vigilance from difficulty to disengage. Behav. Res. Ther., 42: 1083–1092. Kourtzi, Z. and Kanwisher, N. (2001) Representation of perceived object shape by the human lateral occipital complex. Science, 293: 1506–1509. Krolak-Salmon, P., Fischer, C., Vighetto, A. and Mauguiere, F. (2001) Processing of facial emotional expression: spatio-temporal data as assessed by scalp event-related potentials. Eur. J. Neurosci., 13: 987–994. Krolak-Salmon, P., Henaff, M.A., Vighetto, A., Bertrand, O. and Mauguiere, F. (2004) Early amygdala reaction to fear spreading in occipital, temporal, and frontal cortex: a depth electrode ERP study in human. Neuron, 42: 665–676. Lane, R.D., Reiman, E.M., Axelrod, B., Yun, L.S., Holmes, A. and Schwartz, G.E. (1998) Neural correlates of levels of emotional awareness: evidence of an interaction between emotion and attention in the anterior cingulate cortex. J. Cogn. Neurosci., 10: 525–535. Lang, P.J. (1979) A bio-informational theory of emotional imagery. Psychophysiology, 16: 495–512. Lang, P.J., Bradley, M.M., Fitzsimmons, J.R., Cuthbert, B.N., Scott, J.D., Moulder, B. and Nangia, V. (1998) Emotional arousal and activation of the visual cortex: An fMRI analysis. Psychophysiology, 35: 199–210. LeDoux, J. (1996) The Emotional Brain: The Mysterious Underpinnings of Emotional Life. Simon & Schuster, New York.
90 Lehmann, D. and Skrandies, W. (1980) Reference-free identification of components of checkerboard-evoked multichannel potential fields. Electroencephalogr. Clin. Neurophysiol., 48: 609–621. Liu, T., Pestilli, F. and Carrasco, M. (2005) Transient attention enhances perceptual performance and FMRI response in human visual cortex. Neuron, 45: 469–477. Luck, S.J. (1995) Multiple mechanisms of visual-spatial attention: recent evidence from human electrophysiology. Behav. Brain Res., 71: 113–123. Luck, S.J., Heinze, H.J., Mangun, G.R. and Hillyard, S.A. (1990) Visual event-related potentials index focused attention within bilateral stimulus arrays. 2. Functional dissociation of P1 and N1 components. Electroen. Clin. Neuro., 75: 528–542. MacDonald, A.W., Cohen, J.D., Stenger, V.A. and Carter, C.S. (2000) Dissociating the role of the dorsolateral prefrontal and anterior cingulate cortex in cognitive control. Science, 288: 1835–1838. MacLeod, C., Mathews, A. and Tata, P. (1986) Attentional bias in emotional disorders. J. Abnorm. Psychol., 95: 15–20. Marois, R. and Ivanoff, J. (2005) Capacity limits of information processing in the brain. Trends Cogn. Sci., 9: 296–305. Martinez, A., Anllo-Vento, L., Sereno, M.I., Frank, L.R., Buxton, R.B., Dubowitz, D.J., Wong, E.C., Hinrichs, H., Heinze, H.J. and Hillyard, S.A. (1999) Involvement of striate and extrastriate visual cortical areas in spatial attention. Nat. Neurosci., 2: 364–369. Mathews, A., Yiend, J. and Lawrence, A.D. (2004) Individual differences in the modulation of fear-related brain activation by attentional control. J. Cogn. Neurosci., 16: 1683–1694. Mesulam, M.M. (1998) From sensation to cognition. Brain, 121(Pt 6): 1013–1052. Michel, C.M., Seeck, M. and Landis, T. (1999) Spatiotemporal dynamics of human cognition. News Physiol. Sci., 14: 206–214. Mogg, K. and Bradley, B.P. (1998) A cognitive-motivational analysis of anxiety. Behav. Res. Ther., 36: 809–848. Mogg, K. and Bradley, B.P. (1999a) Orienting of attention to threatening facial expressions presented under conditions of restricted awareness. Cogn. Emotion, 13: 713–740. Mogg, K. and Bradley, B.P. (1999b) Some methodological issues in assessing attentional biases for threatening faces in anxiety: a replication study using a modified version of the probe detection task. Behav. Res. Ther., 37: 595–604. Mogg, K., Bradley, B.P. and Hallowell, N. (1994) Attentional bias to threat –– roles of trait anxiety, stressful events, and awareness. Q. J. Exp. Psychol.-A, 47: 841–864. Mogg, K., Bradley, B.P. and Williams, R. (1995) Attentional bias in anxiety and depression –– the role of awareness. Brit. J. Clin. Psychol., 34: 17–36. Mogg, K., McNamara, J., Powys, M., Rawlinson, H., Seiffer, A. and Bradley, B.P. (2000) Selective attention to threat: A test of two cognitive models of anxiety. Cogn. Emotion, 14: 375–399. Morris, J.S., Friston, K.J., Buchel, C., Frith, C.D., Young, A.W., Calder, A.J. and Dolan, R.J. (1998) A neuromodulatory role for the human amygdala in processing emotional facial expressions. Brain, 121(Pt 1): 47–57.
Navon, D. and Margalit, B. (1983) Allocation of attention according to informativeness in visual recognition. Q. J. Exp. Psychol.-A, 35: 497–512. Nobre, A.C., Coull, J.T., Frith, C.D. and Mesulam, M.M. (1999) Orbitofrontal cortex is activated during breaches of expectation in tasks of visual attention. Nat. Neurosci., 2: 11–12. Noesselt, T., Driver, J., Heinze, H.J. and Dolan, R. (2005) Asymmetrical activation in the human brain during processing of fearful faces. Curr. Biol., 15: 424–429. Noesselt, T., Hillyard, S.A., Woldorff, M.G., Schoenfeld, A., Hagner, T., Jancke, L., Tempelmann, C., Hinrichs, H. and Heinze, H.J. (2002) Delayed striate cortical activation during spatial attention. Neuron, 35: 575–587. Ohman, A., Lundqvist, D. and Esteves, F. (2001) The face in the crowd revisited: a threat advantage with schematic stimuli. J. Pers. Soc. Psychol., 80: 381–396. Ohman, A. and Mineka, S. (2001) Fears, phobias, and preparedness: toward an evolved module of fear and fear learning. Psychol. Rev., 108: 483–522. Oostenveld, R. and Praamstra, P. (2001) The five percent electrode system for high-resolution EEG and ERP measurements. Clin. Neurophysiol., 112: 713–719. Pascual-Marqui, R.D., Michel, C.M. and Lehmann, D. (1994) Low resolution electromagnetic tomography: a new method for localizing electrical activity in the brain. Int. J. Psychophysiol., 18: 49–65. Peelen, M.V., Heslenfeld, D.J. and Theeuwes, J. (2004) Endogenous and exogenous attention shifts are mediated by the same large-scale neural network. Neuroimage, 22: 822–830. Pessoa, L., McKenna, M., Gutierrez, E. and Ungerleider, L.G. (2002) Neural processing of emotional faces requires attention. Proc. Natl. Acad. Sci. USA, 99: 11458–11463. Phelps, E.A., Ling, S. and Carrasco, M. (2006) Emotion facilitates perception and potentiates the perceptual benefit of attention. Psychol. Sci., 17: 292–299. Phillips, M.L., Medford, N., Young, A.W., Williams, L., Williams , S.C., Bullmore, E.T., Gray, J.A. and Brammer, M.J. (2001) Time courses of left and right amygdalar responses to fearful facial expressions. Hum. Brain Mapp., 12: 193–202. Picton, T.W., Bentin, S., Berg, P., Donchin, E., Hillyard, S.A., Johnson, R., Miller, G.A., Ritter, W., Ruchkin, D.S., Rugg, M.D. and Taylor, M.J. (2000) Guidelines for using human event-related potentials to study cognition: recording standards and publication criteria. Psychophysiology, 37: 127–152. Pizzagalli, D., Regard, M. and Lehmann, D. (1999) Rapid emotional face processing in the human right and left brain hemispheres: an ERP study. Neuroreport, 10: 2691–2698. Pizzagalli, D.A., Lehmann, D., Hendrick, A.M., Regard, M., Pascual-Marqui, R.D. and Davidson, R.J. (2002) Affective judgments of faces modulate early activity (approximately 160 ms) within the fusiform gyri. Neuroimage, 16: 663–677. Posner, M.I., Snyder, C.R.R. and Davidson, B.J. (1980) Attention and the detection of signals. J. Exp. Psychol. Gen., 109: 160–174. Pourtois, G., Grandjean, D., Sander, D. and Vuilleumier, P. (2004) Electrophysiological correlates of rapid spatial orienting towards fearful faces. Cereb. Cortex, 14: 619–633.
91 Pourtois, G., Schwartz, S., Seghier, M.L., Lazeyras, F. and Vuilleumier, P. (2006) Neural systems for orienting attention to the location of threat signals: an event-related fMRI study. Neuroimage, 31: 920–933. Pourtois, G., Thut, G., Grave de Peralta, R., Michel, C. and Vuilleumier, P. (2005) Two electrophysiological stages of spatial orienting towards fearful faces: early temporo-parietal activation preceding gain control in extrastriate visual cortex. Neuroimage, 26: 149–163. Pratto, F. and John, O.P. (1991) Automatic vigilance – the attention-grabbing power of negative social information. J. Pers. Soc. Psychol., 61: 380–391. Sabatinelli, D., Bradley, M.M., Fitzsimmons, J.R. and Lang, P.J. (2005) Parallel amygdala and inferotemporal activation reflect emotional intensity and fear relevance. Neuroimage, 24: 1265–1270. Sander, D., Grandjean, D., Pourtois, G., Schwartz, S., Seghier, M.L., Scherer, K.R. and Vuilleumier, P. (2005) Emotion and attention interactions in social cognition: brain regions involved in processing anger prosody. Neuroimage, 28: 848–858. Searcy, J.H. and Bartlett, J.C. (1996) Inversion and processing of component and spatial-relational information in faces. J. Exp. Psychol. Hum., 22: 904–915. Shulman, G.L., d’Avossa, G., Tansy, A.P. and Corbetta, M. (2002) Two attentional processes in the parietal lobe. Cereb. Cortex, 12: 1124–1131. Spielberger, C.D. (1983) Manual for the State-Trait Anxiety Inventory. Consulting Psychologists Press, Palo Alto, CA. Stormark, K.M., Nordby, H. and Hugdahl, K. (1995) Attentional shifts to emotionally charged cues — Behavioral and Erp data. Cogn. Emotion, 9: 507–523. Super, H., van der Togt, C., Spekreijse, H. and Lamme, V.A. (2003) Internal state of monkey primary visual cortex (V1) predicts figure-ground perception. J. Neurosci., 23: 3407–3414. Surguladze, S.A., Brammer, M.J., Young, A.W., Andrew, C., Travis, M.J., Williams, S.C. and Phillips, M.L. (2003) A preferential increase in the extrastriate response to signals of danger. Neuroimage, 19: 1317–1328. Tallon-Baudry, C., Bertrand, O., Henaff, M.A., Isnard, J. and Fischer, C. (2005) Attention modulates gamma-band oscillations differently in the human lateral occipital cortex and fusiform gyrus. Cerebral Cortex, 15: 654–662. Thiel, C.M., Zilles, K. and Fink, G.R. (2004) Cerebral correlates of alerting, orienting and reorienting of visuospatial attention: an event-related fMRI study. Neuroimage, 21: 318–328. Tranel, D., Damasio, A.R. and Damasio, H. (1988) Intact recognition of facial expression, gender and age in patients with impaired recognition of face identity. Neurology, 38: 690–696. van der Lubbe, R.H. and Woestenburg, J.C. (1997) Modulation of early ERP components with peripheral precues: a trend analysis. Biol. Psychol., 45: 143–158. Van der Lubbe, R.H. and Woestenburg, J.C. (2000) Location selection in the visual domain. Psychophysiology, 37: 662–676.
Vogel, E.K. and Luck, S.J. (2000) The visual N1 component as an index of a discrimination process. Psychophysiology, 37: 190–203. Vuilleumier, P. (2002) Facial expression and selective attention. Curr. Opin. Psychiatr., 15: 291–300. Vuilleumier, P. (2005) How brains beware: neural mechanisms of emotional attention. Trends Cogn. Sci., 9: 585–594. Vuilleumier, P., Armony, J.L., Clarke, K., Husain, M., Driver, J. and Dolan, R.J. (2002) Neural response to emotional faces with and without awareness: event-related fMRI in a parietal patient with visual extinction and spatial neglect. Neuropsychologia, 40: 2156–2166. Vuilleumier, P., Armony, J.L. and Dolan, R.J. (2004a) Reciprocal links between emotion and attention. In: Frackowiak, R.J.S. and Mazziotta, J. (Eds.), Human Brain Function. Elsevier, London, pp. 419–444. Vuilleumier, P., Armony, J.L., Driver, J. and Dolan, R.J. (2001) Effects of attention and emotion on face processing in the human brain: an event-related fMRI study. Neuron, 30: 829–841. Vuilleumier, P. and Pourtois, G. (in press) Distributed and interactive brain mechanisms during emotion face perception: evidence from functional neuroimaging. Neuropsychologia. Vuilleumier, P., Richardson, M.P., Armony, J.L., Driver, J. and Dolan, R.J. (2004b) Distant influences of amygdala lesion on visual cortical activation during emotional face processing. Nat. Neurosci., 7: 1271–1278. Vuilleumier, P. and Schwartz, S. (2001a) Beware and be aware: Capture of spatial attention by fear-related stimuli in neglect. Neuroreport, 12: 1119–1122. Vuilleumier, P. and Schwartz, S. (2001b) Emotional facial expressions capture attention. Neurology, 56: 153–158. Walter, W.G., Cooper, R., Aldridge, V.J., McCallum, W.C. and Winter, A.L. (1964) Contingent negative variation: An electric sign of sensorimotor association and expectancy in the human brain. Nature, 203: 380–384. Williams, J.M.G., Mathews, A. and MacLeod, C. (1996) The emotional stroop task and psychopathology. Psychol. Bull., 120: 3–24. Williams, M.A., Moss, S.A., Bradshaw, J.L. and Mattingley, J.B. (2005) Look at me, I’m smiling: visual search for threatening and nonthreatening facial expressions. Vis. Cogn., 12: 29–50. Woldorff, M.G., Hazlett, C.J., Fichtenholtz, H.M., Weissman, D.H., Dale, A.M. and Song, A.W. (2004) Functional parcellation of attentional control regions of the brain. J. Cogn. Neurosci., 16: 149–165. Zajonc, R.B. (1980) Feeling and thinking — preferences need no inferences. Am. Psychol., 35: 151–175. Zald, D.H. (2003) The human amygdala and the emotional evaluation of sensory stimuli. Brain Res. Brain Res. Rev., 41: 88–123.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 5
The neural basis of narrative imagery: emotion and action Dean Sabatinelli1,, Peter J. Lang1, Margaret M. Bradley1 and Tobias Flaisch2 1
NIMH Center for the Study of Emotion and Attention, University of Florida, PO Box 100165 HSC, Gainesville, FL 32608, USA 2 Department of Psychology, University of Konstanz, Universita¨tstrasse 10, 78457 Konstanz, Germany
Abstract: It has been proposed that narrative emotional imagery activates an associative network of stimulus, semantic, and response (procedural) information. In previous research, predicted response components have been demonstrated through psychophysiological methods in peripheral nervous system. Here we investigate central nervous system concomitants of pleasant, neutral, and unpleasant narrative imagery with functional magnetic resonance imaging. Subjects were presented with brief narrative scripts over headphones, and then imagined themselves engaged in the described events. During script perception, auditory association cortex showed enhanced activation during affectively arousing (pleasant and unpleasant), relative to neutral imagery. Structures involved in language processing (left middle frontal gyrus) and spatial navigation (retrosplenium) were also active during script presentation. At the onset of narrative imagery, supplementary motor area, lateral cerebellum, and left inferior frontal gyrus were initiated, showing enhanced signal change during affectively arousing (pleasant and unpleasant), relative to neutral scripts. These data are consistent with a bioinformational model of emotion that considers response mobilization as the measurable output of narrative imagery. Keywords: emotion; script imagery; fMRI
which narrative emotional imagery, relative to neutral imagery, is accompanied by heightened physiological reactivity. This ‘efferent leakage’ can be demonstrated in cardiovascular, autonomic, and somato-motor systems (Lang, 1977; Miller et al., 1987; Cook et al., 1988; Vrana and Lang, 1990; McNeil et al., 1993; for a detailed review see Cuthbert et al., 1991). While heightened peripheral reactivity during narrative emotional imagery has been thoroughly demonstrated, central indices of response information processing have only begun to be explored. Several neuroimaging studies of mental imagery have examined simple sensorimotor tasks (as opposed to dynamic, affect-relevant behavior). Focusing on the visual system, imagery paradigms
It has been proposed that narrative imagery is characterized by a network assembly of stimulus, response, and semantic or ‘meaning’ representations (Lang, 1979, 1985, 1994). Considering emotion as disposition toward action (Frijda, 1986; Lang et al., 1994), emotional imagery may be considered essentially equivalent to emotional responding, with overt motor aspects ‘gated out.’ From this perspective, emotional engagement during narrative imagery is defined by the extent of response mobilization measurable in central and peripheral nervous system. This conception has been corroborated by experimental paradigms in Corresponding author. Tel.: +1-352-392-2439; Fax: +1-352-392-6047; E-mail: sabat@ufl.edu DOI: 10.1016/S0079-6123(06)56005-4
93
94
commonly present a stimulus and ask subjects to imagine the stimulus after it has been removed. Visual cortical activation during imagery can be seen in categorically specific regions of inferior temporal lobe (O’Craven and Kanwisher, 2000; Mechelli et al., 2004) as well as functionally specific regions of occipital lobe (Kosslyn et al., 1995; Thompson et al., 2001), in one report with retinotopic consistency (Slotnick et al., 2005); however, the involvement of primary visual cortex in visual imagery is inconsistently reported (Goebel et al., 1998; Ishai et al., 2000; see Kosslyn and Thompson, 2003). Motor imagery studies have exploited similar, simple tasks, intermixing motor execution and motor imagery trials that report premotor and cerebellar activities during both tasks (Hanakawa et al., 2003; Nair et al., 2003; Solodkin et al., 2004). The involvement of primary motor areas in motor imagery has been variously reported as partial (Ehrsson et al., 2003) and absent (Dechent et al., 2004). Mental imagery of diagnosis-relevant narratives has been explored in a few clinical studies in an attempt to understand the neural effects of therapy, which often employ narrative imagery as a desensitization method. These studies typically contrast symptom-relevant (therefore aversive) imagery with neutral imagery, and report increased activity in limbic and prefrontal cortex (Rauch et al., 1996; Shin et al., 1997, 2004; Bystritsky et al., 2001; Lanius et al., 2003). The methodologies used in the experiments were variants of narrative imagery paradigms originated in earlier psychophysiological research (Lang, 1977). Here we investigate directly the neural concomitants of pleasant, neutral, and unpleasant narrative imagery using functional MRI (fMRI) to probe potential response-related activation that has been associated with emotional imagery. We adapted an established imagery paradigm to the scanner environment, presenting brief story-scripts to a nonclinical sample of young, healthy participants, and asked them to vividly imagine themselves engaged in the emotional and neutral scenes described while blood oxygen level-dependent (BOLD) signal is recorded throughout the brain. We expect to identify language processing structures involved in the perceptual intake of narrative
script contents, which may reflect a similar pattern of enhanced BOLD signal intensity during emotional, relative to neutral, imagery script processing, as is the case in the visual system during emotional, relative to neutral picture processing (Lang et al., 1998, Bradley et al., 2003; Sabatinelli et al., 2004, 2005). As subjects vividly imagine the script narratives, we hypothesize that response-related neural circuitry will become active, and potentially show enhanced activation during emotional, relative to neutral contents. Past studies have included only aversive and neutral imagery scripts; therefore, differences identified may reflect differences in emotional intensity (which may be high in appetitive and aversive contexts) or emotional valence. The inclusion of pleasant as well as unpleasant imagery scripts will enable us to more specifically investigate the role of emotional arousal as well as emotional valence on brain activation. Method Procedure Twenty-one introductory psychology students (11 male) at the University of Florida participated in the experiment for course credit. All volunteers consented to participate after reading a description of the study, approved by the local human subjects’ review board. Before entering the scanner room, participants read all imagery scripts and recorded ratings of pleasantness and arousal, using the Self Assessment Manikin (Lang, 1980). The scripts consisted of 12 exemplars of pleasant scene contents, 6 of neutral scene contents, and 12 of unpleasant scene contents1 (see appendix for scripts). Prior to entering the bore of the Siemens 3 T Allegra MR scanner, subjects were fitted with MRcompatible, shielded headphones, a button response paddle, and a patient-alarm squeezeball 1 Two additional specific fear-relevant script contents (dental and snake fear) were excluded from the current analyses. These script contents were included in the design in order to allow for comparisons with clinical populations.
95
that was positioned within easy reach. Padding and explicit verbal instruction were used to limit head motion. Once comfortable, subjects were moved into the bore, and an 8 min 3D structural volume was collected. T1-weighted structural scanning involved the collection of 160 sagittal slices with 1 mm isotropic voxels. After a brief delay, the functional images were acquired. The functional prescription included fifty 2.5-mm-thick coronal slices (0.5 mm gap) and covered the entire cortex in most subjects. The session included 580 images, with 3 s temporal resolution and 16 ml voxel size (3 s TR, 35 ms TE, 160 mm FOV, 64 64 matrix). Subjects were asked to close their eyes, and imagery trials were presented in a continuous series (see Fig. 1). A trial began with an auditory script presentation for 12 s. To limit the masking effect of scanner noise, a thin sound-deadening mat was attached to the inside of the bore, and the headphones were also shielded for sound. To be clearly audible over the background of scanner noise, yet not uncomfortable to the listener, auditory scripts were presented at 98 dB. All subjects reported that the scripts were clear and decipherable. Each auditory script was followed immediately by a tone signaling subjects to vividly imagine yourself fully engaged in the scene described. A second tone signaled the end of the 12 s imagery period, after which subjects listened to a series of nine numbers, one per second, and made a buttonpress when a prespecified number was presented. A delay of 3–9 s was followed by the next trial, in pseudo-random order by content (not more than two trials of the same content in succession); the entire series lasted 29 min. Script presentation order was unique for each subject.
Data analyses The functional time series was motion-corrected, mean-intensity adjusted, linearly detrended, high pass filtered at 0.02 Hz, and spatially smoothed with a 5 mm spatial filter using BrainVoyager QX (www.brainvoyager.com). All subjects’ preprocessed functional activity was coregistered to their respective structural volumes, after transformation into standardized Talairach coordinate space (Talairach and Tournoux, 1988). These standardized functional maps were then analyzed using ANOVA to identify voxels whose intensity, after convolution with a standard hemodynamic response function, was associated with auditory script presentation as well as script imagery. A conservative false discovery rate (FDR) correction of po0.00001 revealed clusters of activity associated with script presentation and script imagery, from which regions of interest (ROI) were sampled in individual subjects and broken down by script content. ROIs were sampled from each subject’s ANOVA map of significant (po0.01) script- and imageryassociated activity, within the area of interest, yet allowing the center of the 100 ml ROI sample to be sensitive to the individual’s specific neuroanatomy. Peak signal in each ROI (average percent BOLD signal 3–9 s post script/imagery onset) for each subject was entered into an ANOVA including script content (erotica, pleasant arousing, neutral, contamination, and threat). This two-step analysis process enables conservative identification of active clusters in the average group activity map while remaining sensitive to individual differences in structural variation, which is typically lost after spatial standardization. ROI analyses also enable
Fig. 1. Trial structure. Narrative imagery scripts were presented over headphones as subjects lay in the magnet. Immediately after script presentation, subjects imagined themselves as actively engaged in the scene described. After 12 s, a tone signaled subjects to listen to a 9 s series of numbers over the headphones, and make a button-press when a predetermined number was presented. After a delay of 3–9 s, the next trial was presented. The entire series lasted approximately 29 min.
96
Fig. 2. Brain activation during narrative script presentation. Significant clusters of activity were evoked by auditory scripts in primary (A) and secondary auditory cortex (B) as well as retrosplenium (C) and the left MFG (D). The minimum threshold of significance in the random-effects analysis was po1 106. ROI analyses from individual subjects yielded each cluster’s time course of signal change, with respect to script presentation (0–12 s). Blue waveforms represent signal change during pleasant scripts, green during neutral scripts, and red during unpleasant scripts. Error bars represent standard errors of the mean.
the examination of event-related time courses of clusters of activity. Results Script ratings The narrative scripts were rated (1–9 scale) as intended, with neutral scripts (7.0, SD 0.6) falling between pleasant (8.5, SD 0.3) and unpleasant (2.4, SD 0.6) scripts in terms of valence, and ranking below (3.0, SD 1.1) both pleasant (7.3, SD 1.3) and unpleasant scripts (6.7, SD 1.1) in terms of emotional arousal. Pleasant and unpleasant scripts were rated equivalently in emotional arousal (t ¼ 1.20 ns). BOLD signal during script-presentation The presentation of auditory scripts elicited activity in primary and secondary auditory cortex
(see Fig. 2A, B, left panel). In addition, script presentation triggered activation in bilateral retrosplenial cortex (Fig. 2C, left) and left middle frontal gyrus (MFG; Fig. 2D, left).2 ROI time courses (waveform panels of Fig. 2) demonstrate that peak signal change was significantly affected by script content in secondary auditory cortex (Fig. 2B, right panel), F(2,40) ¼ 21.75, po0.001, with greater signal change during pleasant and unpleasant, relative to neutral scripts (quadratic trend F(1,20) ¼ 36.75, po0.001). No effects or content interactions of hemisphere were found in tests of primary or secondary auditory BOLD signal. 2 Clusters of activation during script presentation that did not meet our conservative statistical threshold were seen in ventromedial prefrontal cortex and parahippocampal gyrus. During script imagery, below-threshold clusters of activity were present in caudate, hippocampus, and posterior parietal lobe.
97
Script content also affected signal change in left MFG, F(2,38) ¼ 7.11, po0.01, again with greater signal change during pleasant and unpleasant, relative to neutral scripts (quadratic trend F(1,19) ¼ 8.77, po0.01). In retrosplenium, script valence modulated signal change as well, F(2,34) ¼ 4.71, po0.05, with greater signal change during neutral relative to pleasant and unpleasant script presentations (quadratic trend F(1,17) ¼ 7.53, po0.05).
BOLD signal during mental imagery The onset of the imagery period triggered an increase in BOLD signal in supplementary motor area (SMA; Fig. 3A), left inferior frontal gyrus (IFG; Fig. 3B), and right lateral cerebellum
(Fig. 3C). ROI waveforms (right panels of Fig. 3A–C) show that the peak level of signal change in SMA was sensitive to imagery content F(2,38) ¼ 3.94, po0.05, with greater BOLD signal evoked during pleasant and unpleasant, relative to neutral script imagery (quadratic trend F(1,19) ¼ 5.02, po0.05). Left IFG showed the same sensitivity to imagery content F(2,32) ¼ 11.91, po0.001, with greater signal change during pleasant and unpleasant relative to neutral script imagery (quadratic trend F(1,16) ¼ 26.50, po0.001). Signal change in the right lateral cerebellum showed a marginal sensitivity to imagery content F(2,34) ¼ 3.44, p ¼ 0.07, with a trend toward greater signal change during pleasant and unpleasant, relative to neutral imagery (quadratic trend F(1,17) ¼ 3.93, p ¼ 0.06).
Fig. 3. Brain activity during narrative imagery. Significant clusters of activity were evoked by narrative imagery in SMA (A), left IFG (B), and right lateral cerebellum (C). The minimum threshold of significance in the random-effects analysis was po1 106. Region of interest analyses from individual subjects yield each cluster’s time course of signal change, with respect to the imagery period (12–24 s). Blue waveforms represent signal change during pleasant imagery, green during neutral imagery, and red during unpleasant imagery. Error bars represent standard errors of the mean. In panel D, the relative coronal locations of the left MFG cluster active during script presentation (Fig. 2D, presented here in blue), and the left IFG cluster active during script imagery (in red) are highlighted.
98
Interestingly, two of the three ROIs that showed signal increases at the onset of the imagery task (SMA and IFG) also showed significant effects of script content 3–9 s after the presentation of the auditory script (see Fig. 3, right panels). These signal increases were greater during pleasant and unpleasant relative to neutral script presentation (SMA, F(2,38) ¼ 6.71, po0.01, quadratic trend F(1,19) ¼ 9.36, po0.01; IFG, F(2,32) ¼ 4.01, po0.05, quadratic trend F(1,16) ¼ 4.85, po0.05). The script content effect is not continuous from presentation to imagery, however, as neither SMA nor IFG shows content effects at the end of the script presentation period, prior to the onset of the imagery period (SMA, F(2,38) ¼ 1.56 ns; IFG, F(2,32) ¼ 2.94 ns).
Discussion The act of imagining oneself engaged in a narrative recruits areas of the brain involved in planning and executing action–supplementary motor area, prefrontal cortex, and cerebellum. These effects are consistent with fMRI studies of explicit motor imagery (Hanakawa et al., 2003; Nair et al., 2003; Solodkin et al., 2004; Cunnington et al., 2005). In this dataset, it is demonstrated that the intensity of preparatory motor activation reflects in part the nature of the narrative; scenes characterized by emotional events evoke stronger signal increases. These data support a bioinformational account of emotional imagery, in which response components of associative networks are thought to be more strongly activated by emotional, than neutral narrative imagery. Prior reports of enhanced physiological mobilization observed in peripheral nervous system during emotional imagery (Lang et al., 1980; Miller et al., 1987; Cook et al., 1988; Vrana and Lang, 1990; McNeil et al., 1993) can thus be associated with response-related central nervous system correlates. Emotional imagery appears to reflect neural disposition toward action (e.g., approach, withdrawal), which can be observed as enhanced preparatory motor activation. Immediately prior to each imagery period, subjects listened to a recorded speaker read the narrative in an emotionally muted voice. Despite
the background of scanner noise, the presentation of the auditory script led to sharp and widespread BOLD signal increases in primary and secondary auditory cortex (Fig. 2). As in the visual system during picture perception (Lang et al., 1998; Bradley et al., 2003; Sabatinelli et al., 2004, 2005), the intensity of signal change in auditory association cortex was modulated by emotional intensity, i.e., the arousing pleasant and unpleasant scripts evoked (equivalently) greater signal change than did the neutral scripts. Interestingly, the primary auditory cortex did not show an effect of script emotionality at the peak of signal change, yet the descending leg of signal change (the first 6 s after script offset, see Fig. 2) — did show a reliable emotional arousal effect (F(2,40) ¼ 25.02, po0.001, quadratic trend F(1,20) ¼ 47.64, po0.001). Outside the auditory cortex, the activity initiated during script presentation in retrosplenium and left IFG tended to return to baseline much more slowly. This gradual decline in the BOLD signal may reflect the involvement of these structures in both the decoding and the imagination of the narrative script. It might also be a result of the tendency for subjects to begin the imagery task while the script is being presented. For the current dataset, it is defensible to suggest that the retrosplenium and IFG are involved in narrative script intake, and perhaps script imagery as well. Beyond the timing of signal change, it is clear that the activity in retrosplenial cortex shows a unique pattern with respect to the other ROIs — a reverse arousal effect, with greater signal change associated with neutral script processing relative to pleasant and unpleasant script processing. In neuroimaging and neuropsychological studies, the retrosplenium has been closely tied to environmental navigation and orienting in large-scale space (Aquirre et al., 1998; Gro¨n et al., 2000; for review see Maguire, 2001). We can speculate that in the absence of an emotional focus, our subjects were processing the spatial details of the imagined local environment to a greater extent. Future studies in which the degree of environmental detail is manipulated could address this issue. The narrative scripts used in the current study were constructed to evoke pleasant, neutral, and
99
unpleasant contexts, but were not explicitly controlled for the represented level of physical activity. Considering that emotion itself can be defined as disposition toward action, the separation of emotion and activity may be impossible to achieve. However, it might be possible in future work to manipulate the level of physical activity represented within each script category — to include active, inactive, and neutral as well as emotional narratives. In this way, the potentially entangled roles of action and emotion may be investigated. The distinction between subsections of left prefrontal cortex during script presentation and script imagery (Fig. 3D) is consistent with functional distinctions commonly identified in studies of phonological and semantic language processing. In the current dataset, encoding of the narrative scripts evoked activity in a superior, posterior subsection of the left MFG that has been shown to be involved in phonological processing. During script imagery, activity increases were seen in a more inferior and anterior subsection of the left IFG, a region associated with semantic language processing (Wagner et al., 2001; Gold et al., 2005; Gitelman et al., 2005; see Hagoort, 2005). Perhaps as subjects listen to the script, content is input, but semantic elaboration is delayed until the entire text is understood and the imagery period is signaled. Investigations of imagery-induced clinical symptoms have primarily reported activity in limbic and prefrontal structures. In panic disorder patients, relative to healthy controls, Bystritsky et al. (2001) reported limbic activation during anxiety-provoking narrative imagery, including inferior and orbitofrontal cortex, anterior and posterior cingulate, and hippocampus. Shin et al. (1997), using positron emission tomography (PET) in combat veterans with and without posttraumatic stress disorder (PTSD), identified stronger activation during trauma-related imagery in orbitofrontal cortex, insula, anterior cingulate, and amygdala. A more recent PET study (Shin et al., 2004) suggested amygdala hyperactivity in PTSD to be inversely related to hyporeactivity in medial frontal cortex. However, in a similar design, traumatized subjects with diagnosed PTSD (Lanius et al., 2003) showed relatively less anterior cingulate
and thalamic activity during trauma-relevant imagery than traumatized subjects without PTSD, and no amygdala activation was reported. The variability in reports of limbic activity during traumatic imagery may in part reflect individual differences in dissociative symptoms within PTSD samples (Lanius et al., 2006). The lack of limbic activation in the current study may in part be a result of our conservative analyses, as a more liberal statistical threshold during imagery (FDR po0.05) revealed subcortical activity in caudate and hippocampus, but no clusters in orbitofrontal cortex or amygdala. It may be that subcortical recruitment in emotional imagery is specially potentiated in clinical populations. Another possibility is that subcortical motivational circuits are more readily engaged by perceptual, rather than imagined stimuli. In summary, these data demonstrate several effects; that auditory cortex activation reflects the emotional intensity of narrative scripts during perception. Despite the need to decipher language, affective modulation of cortical response in secondary, and perhaps primary, auditory cortex is evident while the script is heard. The current design preexposed subjects to the script stimuli prior to the experimental session, thus some of the dynamic aspects of narrative presentation were removed, potentially converting the onset of each script into a static cue for memory retrieval. Future work in which preexposure is manipulated will provide more information regarding the dependence of stimulus dynamics on this effect. In any case, this sensory cortical effect is consistent with picture processing effects in visual system, in which lateral occipital and inferior temporal visual system show reliably greater activation during affective arousing, relative to neutral picture processing (Lang et al., 1998; Bradley et al., 2003; Jungho¨fer et al., 2006). Thus, there may be a modality-nonspecific mechanism by which emotionally arousing stimuli evoke enhanced cortical processing. The neural effects of narrative imagery appear to be essentially response-related. Activity in premotor, cerebellum, and prefrontal cortex show clear time-locked increases at the onset of the imagery task. Considering that response
100
readiness is especially critical in emotionally charged situations, it is perhaps not surprising that emotional imagery triggered greater activation in these structures. Abbreviations BOLD FDR fMRI IFG MFG MR PET PTSD SMA
blood oxygen level dependent false discovery rate functional magnetic resonance imaging inferior frontal gyrus middle frontal gyrus magnetic resonance positron emission tomography posttraumatic stress disorder supplementary motor area
Acknowledgments This work was supported by grants from the National Institutes of Mental Health P50 MH072850-01. Appendix Pleasant scripts 1. You are lying together, legs over legs, arms around bodies — kisses deep and sweet. In love on a blanket, beneath a tree, on a warm summer day. 2. You tense as the roller coaster reaches the crest. Then, you are all plunging down, screaming above the roar, together, laughing, and waving your arms. 3. As soon as you saw each other, the affair began. You remember beautiful eyes looking straight into yours — your heart in your throat, at the first touch. 4. The band is terrific. The room vibrates with sound and your skin tingles. You’re dancing together, moving effortlessly with the music. You’re feeling great! 5. The mountain air is clear and cold. The sun glistens on the powder as you head down the slope in gliding turns, mastering the mountain, moving with a sure, easy grace.
6. Music murmurs in the background. You’re together in the big bed, naked but apart, eyes locked. You feel fingers barely touching, gliding softly along your thigh. 7. A moan of pleasure. Your body responds slowly at first, languorously, and then with a more urgent rhythm. You feel gentle hands, a soft mouth, your back arches. 8. It’s a beautiful day and you’re heading a new convertible to the beach. The CD player is blasting, and you’re singing along at the top of your voice. 9. It’s the last few minutes of the big game and it’s close. The crowd explodes in a deafening roar. You jump up, cheering. Your team has come from behind to win. 10. You shiver as your bodies brush together. You reach out. You want to touch everywhere, kiss everywhere. You hear the words, ‘I love you’. 11. The registered letter says ‘You have just won ten million dollars!’ It’s amazing — You bought the winning ticket in the lottery. You cry, scream, jump with joy! 12. You are both aroused, breathless. You fall together on the couch. Kisses on your neck, face — warm hands fumbling with clothing, hearts pounding. Neutral scripts 1. You run the comb through your hair, straighten your collar, smooth out the shirt’s wrinkles. Water is running in the sink. You turn it off and leave. 2. You are relaxing on a lawn chair, looking out into the garden. A child’s tricycle is abandoned on the grass. You hear the low buzz of a lawn mower in the distance. 3. It’s good to be able to do nothing and just stretch out on the couch. The television is on with the sound off. You can hear the low rumble of traffic in the distance. 4. You unfold the map, spread it out on the table, and with your finger trace a route south toward the beach. You refold the map, pick up your bag, and leave.
101
5. It’s a quiet day without much to do. You’re sitting around your place, resting, reading, and looking out the window — where leaves swirl gently in the wind. 6. You are sitting at the kitchen table with yesterday’s newspaper in front of you. You push back the chair when you hear the coffee maker slow to a stop.
Unpleasant scripts 1. The garbage can is upset. Maggots crawl on the rotted food spilling out on the floor, staining the carpet. Your throat tightens with a wave of nausea, but you must clean it up. 2. A night landing in high winds: your hands clutch the seat-arms in the swaying plane. Stomach queasy. The engine coughs; stops; restarts with a strange whine. 3. A vagrant, wino, approaches you, yellow teeth and scabs on his face, clothes smelling of mold and urine. You cringe as his hand touches your sleeve. 4. As you ease the car onto the wooden bridge, it groans. In the headlights, a broken railing swings in the wind. A swift current rams against the pilings below. 5. The bathroom is filthy, toilet overflowing onto the floor spreading toward your feet. The smell is overwhelming and you run for the door. 6. You’re alone in the alley in a bad part of the city. A street gang slowly surrounds you, knives out, laughing with menace. Your heart pounds as they close in. 7. It’s late at night in a poorly lit parking lot. You are tense, clutching the keys. Your car stands alone in the distance, when footsteps sound behind you. 8. You are leaving the concert. (When) A drunk, smelling of smoke and alcohol, stumbles into you and throws up on your jacket. You retch, as vomit drips onto your hand. 9. You jump back, muscles tense, as the large dog strains against the chain, slobbering
with teeth bared, leaping, and snarling in a crazy rage. 10. You gag, seeing a roach moving slowly over the surface of the pizza. You knock the pie on the floor. Warm cheese spatters on your shoes. 11. You bite hungrily into the hamburger, and abruptly catch the putrid smell of spoiled meat. You spit out, and a greasy piece falls down your chin onto your pants. 12. You flinch, at the screech of brakes; you look up, and see the speeding car slam into your friend. Her leg is crushed, the artery torn, and blood pumps on the road.
References Bradley, M.M., Sabatinelli, D., Lang, P.J., Fitzsimmons, J.R., King, W. and Desai, P. (2003) Activation of the visual cortex in motivated attention. Behav. Neurosci., 117: 369–380. Bystritsky, A., Pontillo, D., Powers, M., Sabb, F.W., Craske, M.G. and Bookheimer, S.Y. (2001) Functional MRI changes during panic anticipation and imagery exposure. Neuroreport, 12: 3953–3957. Cook III, E.W., Melamed, B.G., Cuthbert, B.N., McNeil, D.W. and Lang, P.J. (1988) Emotional imagery and the differential diagnosis of anxiety. J. Consult. Clin. Psychol., 56: 734–740. Cunnington, R., Windischberger, C. and Moser, E. (2005) Premovement activity of the pre-supplementary motor area and the readiness for action: studies of time-resolved eventrelated functional MRI. Hum. Mov. Sci., 24: 644–656. Cuthbert, B.N., Vrana, S.R. and Bradley, M.M. (1991) Imagery: function and physiology. In: Ackles, P.K., Jennings, J.R. and Coles, M.G.H. (Eds.) Advances in Psychophysiology, Vol. 4. JAI, Greenwich, CT, pp. 1–42. Dechent, P., Merboldt, K.D. and Frahm, J. (2004) Is the human primary motor cortex involved in motor imagery? Brain Res. Cogn. Brain Res., 19: 138–144. Ehrsson, H.H., Geyer, S. and Naito, E. (2003) Imagery of voluntary movement of fingers, toes, and tongue activates corresponding body-part-specific motor representations. J. Neurophysiol., 98: 3304–3316. Frijda, N.H. (1986) The Emotions. Cambridge, New York. Gitelman, D.R., Nobre, A.C., Sonty, S., Parrish, T.B. and Mesulam, M.M. (2005) Language network specializations: an analysis with parallel task designs and functional magnetic resonance imaging. Neuroimage, 26: 975–985. Goebel, R., Khorram-Sefat, D., Muckli, L., Hacker, H. and Singer, W. (1998) The constructive nature of vision: direct evidence from functional magnetic resonance imaging studies
102 of apparent motion and motion imagery. Eur. J. Neurosci., 10: 1563–1673. Gold, B.T., Balota, D.A., Kirchhoff, B.A. and Buckner, R.L. (2005) Common and dissociable activation patterns associated with controlled semantic and phonological processing: evidence from FMRI adaptation. Cereb. Cortex, 15: 1438–1450. Gro¨n, G., Wunderlich, A.P., Spitzer, M., Tomczak, R. and Riepe, M.W. (2000) Brain activation during human navigation: gender-different neural networks as substrate of performance. Nat. Neurosci., 3: 404–408. Hagoort, P. (2005) On Broca, brain, and binding: a new framework. Trends Cogn. Sci., 9: 416–423. Hanakawa, T., Immisch, I., Toma, K., Dimyan, M.A., Van Gelderen, P. and Hallett, M. (2003) Functional properties of brain areas associated with motor execution and imagery. J. Neurophysiol., 89: 989–1002. Ishai, A., Ungerleider, L.G. and Haxby, J.V. (2000) Distributed neural systems for the generation of visual images. Neuron, 28: 979–990. Jungho¨fer, M., Sabatinelli, D., Bradley, M.M., Schupp, H.T., Elbert, T.R. and Lang, P.J. (2006) Fleeting images: rapid affect discrimination in the visual cortex. Neuroreport, 17: 225–229. Kosslyn, S.M. and Thompson, W.L. (2003) When is early visual cortex activated during visual mental imagery? Psychol. Bull., 129: 723–746. Kosslyn, S.M., Thompson, W.L., Kim, I.J. and Alpert, N.M. (1995) Topographical representations of mental images in primary visual cortex. Science, 378: 496–498. Lang, P.J. (1977) Imagery in therapy: an information processing analysis of fear. Behav. Ther., 8: 862–886. Lang, P.J. (1979) A bio-informational theory of emotional imagery. Psychophysiology, 16: 495–512. Lang, P.J. (1980) Behavioral treatment and bio-behavioral assessment: computer applications. In: Sidowski, J.B., Johnson, J.H. and Williams, T.A. (Eds.), Technology in Mental Health Care Delivery Systems. Ablex Publishing, Norwood, NJ, pp. 119–137. Lang, P.J. (1985) The cognitive psychophysiology of emotion: fear and anxiety. In: Tuma, A.H. and Maser, J.D. (Eds.), Anxiety and the Anxiety Disorders. Erlbaum, Hillsdale, NJ, pp. 131–170. Lang, P.J. (1994) The motivational organization of emotion: affect-reflex connections. In: VanGoozen, S., Van de Poll, N.E. and Sergeant, J.A. (Eds.), Emotions: Essays on Emotion Theory. Erlbaum, Hillsdale, NJ, pp. 61–93. Lang, P.J., Bradley, M.M., Fitzsimmons, J.R., Cuthbert, B.N., Scott, J.D., Moulder, B. and Nangia, V. (1998) Emotional arousal and activation of the visual cortex: an fMRI analysis. Psychophysiology, 35: 199–210. Lanius, R.A., Bluhm, R., Lanius, U. and Pain, C. (2006) A review of neuroimaging studies in PTSD: heterogeneity of response to symptom provocation. J. Psychiatr. Res., in press. Lanius, R.A., Williamson, P.C., Hopper, J., Densmore, M., Boksman, K., Gupta, M.A., Neufeld, R.W., Gati, J.S.
and Menon, R.S. (2003) Recall of emotional states in posttraumatic stress disorder: an fMRI investigation. Biol. Psychiatry, 53: 204–210. Maguire, E.A. (2001) The retrosplenial contribution to human navigation: a review of lesion and neuroimaging findings. Scand. J. Psychol., 42: 225–238. McNeil, D.W., Vrana, S.R., Melamed, B.G., Cuthbert, B.N. and Lang, P.J. (1993) Emotional imagery in simple and social phobia: fear versus anxiety. J. Abnorm. Psychol., 102: 212–225. Mechelli, A., Price, C.J., Friston, K.J. and Ishai, A. (2004) Where bottom-up meets top-down: neuronal imteractions during perception and imagery. Cereb. Cortex, 14: 1256–1265. Miller, G.A., Levin, D.N., Kozak, M.J., Cook III, E.W., McLean Jr., A. and Lang, P.J. (1987) Individual differences in imagery and the psychophysiology of emotion. Cogn. Emotion, 1: 367–390. Nair, D.G., Purcott, K.L., Fuchs, A., Steinberg, F. and Kelso, J.A. (2003) Cortical and cerebellar activity of the human brain during imagined and executed unimanual and bimanual action sequences: a functional MRI study. Brain Res. Cogn. Brain Res., 15: 250–260. O’Craven, K.M. and Kanwisher, N. (2000) Mental imagery of faces and places activates corresponding stiimulus-specific brain regions. J. Cogn. Neurosci., 12: 1013–1023. Rauch, S.L., van der Kolkm, B.A., Fislerm, R.E., Alpertm, N.M., Orrm, S.P., Savage, C.R., Fischman, A.J., Jenike, M.A. and Pitman, R.K. (1996) A symptom provocation study of posttraumatic stress disorder using positron emission tomography and script-driven imagery. Arch. Gen. Psychiatry, 53: 380–387. Sabatinelli, D., Bradley, M.M., Fitzsimmons, J.R. and Lang, P.J. (2005) Parallel amygdala and inferotemporal activation reflect emotional intensity and fear relevance. Neuroimage, 24: 1265–1270. Sabatinelli, D., Flaisch, T., Bradley, M.M., Fitzsimmons, J.R. and Lang, P.J. (2004) Affective picture perception: gender differences in visual cortex? Neuroreport, 15: 1109–1112. Shin, L.M., Kosslyn, S.M., McNally, R.J., Alpert, N.M., Thompson, W.L., Rauch, S.L., Macklin, M.L. and Pitman, R.K. (1997) Visual imagery and perception in posttraumatic stress disorder: a positron emission tomographic investigation. Arch. Gen. Psychiatry, 54: 233–241. Shin, L.M., Orr, S.P., Carson, M.A., Rauch, S.L., Macklin, M.L., Lasko, N.B., Peters, P.M., Metzger, L.J., Dougherty, D.D., Cannistraro, P.A., Alpert, N.M., Fischman, A.J. and Pitman, R.K. (2004) Regional cerebral blood flow in the amygdala and medial prefrontal cortex during traumatic imagery in male and female Vietnam veterans with PTSD. Arch. Gen. Psychiatry, 61: 168–176. Slotnick, S.D., Thompson, W.L. and Kosslyn, S.M. (2005) Visual mental imagery induces retinotopically organized activation of early visual areas. Cereb. Cortex, 15: 1570–1583. Solodkin, A., Hlustik, P., Chen, E.E. and Small, S.L. (2004) Fine modulation in network activation during motor execution and motor imagery. Cereb. Cortex, 14: 1246–1255.
103 Talairach, J. and Tournoux, P. (1988) Co-planar stereotaxic atlas of the human brain. 3-dimensional proportional system: an approach to cerebral imaging. Thieme Medical Publishers, Inc, New York. Thompson, W.L., Kosslyn, S.M., Sukel, K.E. and Alpert, N.M. (2001) Mental imagery of high- and low-resolution gratings activates area 17. Neuroimage, 14: 454–464.
Vrana, S.R. and Lang, P.J. (1990) Fear imagery and the startle probe reflex. J. Abnorm. Psychol., 99: 189–197. Wagner, A.D., Mari, L.A., Bjork, R.A. and Schacter, D.L. (2001) Prefrontal contributions to executive control: fMRI evidence for functional distinctions within lateral prefrontal cortex. Neuroimage, 6: 1337–1347.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 6
Subliminal emotion perception in brain imaging: findings, issues, and recommendations Stefan Wiens1,2, 1 Department of Psychology, Stockholm University, Frescati Hagva¨g, 106 91 Stockholm, Sweden Section of Psychology, Department of Clinical Neuroscience, Karolinska Institute, 171 76 Stockholm, Sweden
2
Abstract: Many theories of emotion propose that emotional input is processed preferentially due to its relevance for the organism. Further, because consciousness has limited capacity, these considerations imply that emotional input ought to be processed even if participants are perceptually unaware of the input (subliminal perception). Although brain imaging has studied effects of unattended, suppressed (in binocular rivalry), and visually masked emotional pictures, conclusions regarding subliminal perception have been mixed. The reason is that subliminal perception demands a concept of an awareness threshold or limen, but there is no agreement on how to define and measure this threshold. Although different threshold concepts can be identified in psychophysics (signal detection theory), none maps directly onto perceptual awareness. Whereas it may be tempting to equate unawareness with the complete absence of objective discrimination ability (d0 ¼ 0), this approach is incompatible with lessons from blindsight and denies the subjective nature of consciousness. This review argues that perceptual awareness is better viewed as a continuum of sensory states than a binary state. When levels of awareness are characterized carefully in terms of objective discrimination and subjective experience, findings can be informative regarding the relative independence of effects from awareness and the potentially moderating role of awareness in processing emotional input. Thus, because the issue of a threshold concept may never be resolved completely, the emphasis is to not prove subliminal perception but to compare effects at various levels of awareness. Keywords: consciousness; attention; emotion; brain imaging; subliminal perception; backward masking evaluate a picture of another human face as threatening, and to respond to it, we would first need to become consciously aware of the facial expression. The importance of perceptual awareness in responding to emotional events has been challenged by evolutionary considerations and theories of emotion (O¨hman, 1986; Robinson, 1998; LeDoux, 2000; Dolan and Vuilleumier, 2003; O¨hman and Wiens, 2003). In particular, because of their relevance to organisms, threatening situations need to be registered and handled swiftly. However, because consciousness is limited and slow (Shevrin and Dickman, 1980; Roser and Gazzaniga, 2004; Marois and Ivanoff, 2005), these considerations
Because humans are complex organisms, many processes need to occur automatically to permit proper functioning and survival. Although our own experience of consciousness accepts that consciousness is clearly insufficient to mediate all of these processes (e.g., blood pressure adjustments during postural changes), it appears to us that consciousness plays a critical role in important mental processes. For example, to evaluate an external event as good or bad, or for it to affect our behavior, we would need to be consciously aware of it. That is, to Corresponding author. Tel.: +46-8-163933; Fax: +468-159342; E-mail:
[email protected],
[email protected] DOI: 10.1016/S0079-6123(06)56006-6
105
106
suggest that emotional input needs to be processed partly unconsciously to ensure survival. Several research paradigms have been developed to study the role of perceptual awareness in processing of emotional pictures. The most commonly used approach is the dissociation paradigm (Holender, 1986). The goal of this approach is to present emotional pictures of which people are not consciously aware, and to study whether or not these emotional pictures have effects despite peoples’ unawareness. If so, such findings would provide evidence that emotional pictures are processed in the absence of awareness. Stated differently, because emotional effects would be obtained after elimination of perceptual awareness, awareness may not be necessary for their occurrence. Further, because people are considered either aware or unaware, perception is treated as a dichotomous state. Therefore, research on subliminal (or implicit) perception studies the degree to which visual input is processed below the threshold (‘‘limen’’) of perceptual awareness. This review summarizes main results from recent brain imaging findings on subliminal perception of emotional visual input. However, even though there has been a surge of findings on this topic, no agreement about the existence of subliminal perception has been reached. This paper reviews the main issues and presents alternative strategies for future research. Although this review focuses on subliminal perception of emotional pictures, the issues and strategies are generally applicable to research on subliminal perception.
Findings Perceptual awareness can be manipulated in a number of ways (Frith et al., 1999; Kim and Blake, 2005). Although visual masking has been traditionally used in subliminal perception, alternative approaches such as manipulations of attention might have comparable effects on awareness (Merikle and Joordens, 1997). This review focuses on brain imaging findings with emotional pictures of which participants were unaware as a result of manipulation of attention (i.e., unattended pictures), binocular rivalry (i.e., suppressed pictures), and visual masking (i.e., masked pictures).
Participants can be instructed to attend to particular visual input, and responses to unattended input can be studied. If participants attend to some pictures but fail to notice pictures that are outside of their attentional focus, responses to the unattended pictures might qualify as subliminal perception (Merikle and Joordens, 1997). Several brain imaging studies with unattended emotional pictures suggest that unattended pictures are processed even though participants are unaware of the pictures (Vuilleumier et al., 2001; Anderson et al., 2003; Bishop et al., 2004; Williams et al., 2004). In one study, participants were shown simultaneously two houses and also two faces with either neutral or fearful expressions (Vuilleumier et al., 2001). Pictures of the same category were positioned left and right, or above and below fixation. For example, one house each was shown left and right of fixation, and one fearful face each above and below fixation. During different blocks of trials, participants were instructed to attend to either the horizontal or vertical picture pairs, and determine whether the two attended pictures were the same or different. Results showed that the amygdala responded more to fearful than neutral faces irrespective of whether or not the faces were attended. In a separate behavioral study with a surprise awareness test, participants could not report facial expression (fearful or neutral), gender, or identity of the unattended face pair from the preceding trial. These findings suggest that the amygdala differentiated between fearful and neutral expression even though participants were unaware of the faces (as these were unattended). In another study (Anderson et al., 2003), pictures of places were colored in green and superimposed with faces in red, and participants were instructed to attend to the places or faces. When attending to the places, participants rated whether it was inside or outside, and when attending to faces whether it was male or female. Results showed that the amygdala responded to fearful faces irrespective of whether or not they were attended. In another study, pairs of emotional faces were shown superimposed on houses in the periphery. Again, amygdala activation was greater to unattended than attended fearful faces (Williams et al., 2004). Taken together, these findings suggest that fearful faces elicit
107
amygdala activation even if participants are unaware of them because they are unattended. However, this conclusion has been challenged by results from two studies. In one study, participants were shown a fearful, happy, or neutral face in the center of the screen together with a small bar in the left and right periphery (Pessoa et al., 2002). During different blocks, participants rated either the gender of the face or judged whether the two bars had the same orientation. Results showed that amygdala differentiated among the expressions when participants rated the faces but not when they performed the bar task. In a similar study, participants also judged bar orientation while faces were presented centrally, but task difficulty was manipulated in three levels (Pessoa et al., 2005). Results showed that amygdala differentiated between unattended fearful and neutral faces only when the bar task was simple. Based on these findings, Pessoa et al. concluded that unattended pictures are not processed outside of awareness if awareness is focused sufficiently. Although the studies did not include a manipulation check to determine that awareness of the faces was actually reduced by the bar task (cf. Williams et al., 2005), findings from these and other studies suggest that whether or not unattended faces are processed depends on how unaware participants are of them. In addition, individual differences such as state anxiety may play a moderating role (Bishop et al., 2004). Accordingly, it is a matter of debate if studies of unattended faces provide evidence for subliminal perception (Pessoa et al., 2005; Vuilleumier, 2005). Subliminal perception can also be studied in binocular rivalry. In these studies, two pictures are presented simultaneously in different colors (e.g., red and green), and participants wear glasses with differently colored lenses on each side (e.g., red on left and green on right). Under these conditions, participants are typically aware of only one picture at a time. For example, if a red house is presented to one eye and a green face to the other eye, participants might be aware only of the house. Thus, responses to the suppressed face can be studied. Two studies support the idea that fearful faces are processed under conditions of binocular suppression (Pasley et al., 2004; Williams et al., 2005). Williams et al. (2005) showed brief pictures of fearful, happy, and neutral faces simultaneously with
houses. During the experiment, participants performed a one-back task in which they had to report the repetition of the same image on consecutive trials. Although participants detected repetition of nonsuppressed pictures on most trials, they missed all repetitions of suppressed pictures. Nonetheless, amygdala differentiated between suppressed fearful and neutral faces. Similarly, Pasley et al. (2004) presented houses to one eye and suppressed fearful faces or chairs to the other eye. After excluding the small number of trials in which participants detected the presence of suppressed faces or chairs, results showed that amygdala responded more strongly to the suppressed fearful faces than the chairs. Taken together, as suppressed fearful faces were processed even though participants reported that they were unaware of the faces, these findings provide evidence for subliminal perception. However, it has been argued that mixed states of perception are possible in binocular rivalry (Kim and Blake, 2005; Pessoa, 2005). This finding suggests that the measures of awareness used in studies of binocular rivalry may not be sensitive enough to assess participants’ awareness. For example, because participants performed a one-back task and monitored repetition of the clearly visible nonsuppressed pictures, they might have been distracted from reporting the suppressed pictures. That is, although aware of the suppressed pictures, they did not report them. To conclude, as with studies of unattended emotional pictures, there is a debate as to whether or not the findings provide unequivocal evidence for subliminal perception. Although manipulations of attention and binocular rivalry are important methods to study subliminal perception, visual masking has a long tradition in research on subliminal perception (Holender, 1986; O¨hman and Soares, 1994). In visual masking, a target picture is shown briefly and typically followed by another irrelevant picture (mask). When picture parameters are adjusted carefully, people often report that they are not consciously aware of the target pictures. Thus, subliminal perception of masked target pictures can be studied. During the last few years, there has been a surge of brain imaging studies using visual masking (Morris et al., 1998; Whalen et al., 1998; Rauch et al., 2000; Sheline et al., 2001; Critchley et al., 2002; Hendler et al.,
108
2003; Carlsson et al., 2004; Etkin et al., 2004; Killgore and Yurgelun-Todd, 2004; Phillips et al., 2004; Whalen et al., 2004; Liddell et al., 2005; Pessoa et al., 2006). However, because research has shown that small changes in picture parameters can have strong effects on perceptual awareness (Esteves and O¨hman, 1993), it is important to use a setup that controls picture durations reliably. If picture duration cannot be held constant over repeated presentations, picture duration might vary and thus confound results (Wiens and O¨hman, 2005b). For example, if a certain level of awareness is targeted but picture duration is unreliable over trials, it is difficult to maintain a particular level of awareness. Similarly, if trials are to be sorted after the experiment on the basis of individual responses, differences in responding might be due to variable picture duration rather than to differences in participants’ processing of the pictures. Although traditional (bulky-type) cathode ray tube (CRT) monitors have adequate reliability, they cannot be used in functional magnetic resonance imaging due to magnetic interference with the imaging process (Wiens et al., 2004). Also, the reliability of recent (flat-display) technologies based on liquid crystal displays (LCD) and thin-film transistors (TFT) is often poorer than assumed by researchers and claimed by manufacturers (Wiens et al., 2004). However, reliable picture presentation is possible with a setup involving two data projectors and mechanical high-speed shutters (Wiens and O¨hman, 2005b). This setup permits precise control of picture durations in milliseconds rather than refresh cycles. Because many studies do not describe their setup and do not provide convincing evidence that picture presentation was reliable, it cannot be ruled out that many brain imaging results from visual masking are confounded. Early brain imaging studies of subliminal perception used masked facial expressions and reported findings that supported subliminal perception (Morris et al., 1998; Whalen et al., 1998). In these and subsequent studies, emotional facial expressions have been mainly masked with neutral faces. Although the use of faces as both targets and masks might confound the results, faces continue to be used because it is easier to mask faces with other faces (Costen et al., 1994); this procedure is necessary with limitations in minimum picture
duration for regular display devices (Wiens et al., 2004; Wiens and O¨hman, 2005b). In one study (Whalen et al., 1998), participants were shown fearful and happy faces that were masked by neutral faces. After the experiment, participants were interviewed about their awareness and asked to point out the faces that they had seen during the experiment. Although participants did not report seeing any emotional expressions and pointed only at neutral faces, results showed that amygdala responded more strongly to fearful than happy faces. Similarly, in another study (Morris et al., 1998), participants were shown two angry faces and either face was fear-conditioned (i.e., paired with a loud noise). Participants were instructed that after each picture they should respond ‘‘yes’’ or ‘‘no’’ depending on whether they detected either angry face. Although participants responded yes only when the pictures were nonmasked, amygdala differentiated between the two angry faces even when masked. Findings from other studies are broadly consistent with these results for faces (Rauch et al., 2000; Sheline et al., 2001; Critchley et al., 2002; Etkin et al., 2004; Killgore and Yurgelun-Todd, 2004; Liddell et al., 2005) and other emotional pictures (Hendler et al., 2003; Carlsson et al., 2004). However, although findings in visual masking are consistent with subliminal perception, many studies have used only an indirect awareness measure (preference measure, e.g., Critchley et al., 2002), provided insufficient information about whether and how awareness was measured (Carlsson et al., 2004; Liddell et al., 2005), or reported evidence for partial awareness (Rauch et al., 2000; Sheline et al., 2001; Killgore and Yurgelun-Todd, 2004; Whalen et al., 2004). Also, other studies have not found amygdala activation to masked fearful faces (Phillips et al., 2004; Pessoa et al., 2006) and suggested a moderating role of individual differences (Etkin et al., 2004). In sum, it is a matter of debate (Pessoa, 2005) whether visual masking provides evidence for subliminal perception. Issues Despite these numerous approaches and findings, researchers continue to disagree on whether these
109
data provide convincing evidence for subliminal perception. To end this debate, researchers would have to concur on a concept of threshold (limen) to determine whether emotional processing can occur below this threshold of awareness. However, there is no threshold concept that researchers agree upon. Indeed, research on subliminal perception has often blurred the distinction between concepts of measurement and threshold (Reingold and Merikle, 1990). That is, awareness is defined only indirectly by the measure that is used. Also, although awareness measures are often distinguished in terms of objective and subjective measures, this distinction is vague and has unclear references to contemporary psychophysics (Macmillan, 1986; Macmillan and Creelman, 1991). The remainder of this paper discusses pros and cons of objective and subjective measures, and reviews potential candidates for threshold concepts that fit with contemporary psychophysics, in particular signal detection theory. Requirements for a valid measure of awareness are that it should be exhaustive and exclusive (Reingold and Merikle, 1990; Merikle and Reingold, 1998; Wiens and O¨hman, 2002). This means that it should capture all aspects of conscious processing (i.e., exhaustive) but no unconscious processes (i.e., exclusive). To illustrate, if a measure is not sensitive enough to capture conscious processes completely (i.e., it is not exhaustive), any emotional effects could be due to the conscious processes that were not captured by the measure. Similarly, if a measure is too sensitive and captures unconscious as well as conscious processes (i.e., it is not exclusive), the apparent absence of unconscious emotional effects could be due to this mislabeling of unconscious as conscious processes. Numerous measures have been proposed as valid indexes of awareness. However, two forms of awareness thresholds are often distinguished, namely objective and subjective thresholds (Cheesman and Merikle, 1984, 1986). Cheesman and Merikle (1984) defined the subjective threshold as the ‘‘level at which subjects claim not to be able to discriminate perceptual information at better than at a chance level’’, and the objective threshold as the ‘‘level at which perceptual information is actually discriminated at a chance level’’ (p. 391).
Whereas subjective measures assess participants’ self-reported (subjective) ability to discriminate the stimuli, objective measures assess participants’ actual (objective) ability to discriminate the stimuli. In this context, the terms subjective and objective refer to the content rather than the quality of measurement. That is, if a subjective measure requires participants to report about their awareness by pressing buttons, it can be as reliable (i.e., objective in a measurement sense) as an objective measure of discrimination ability. Objective measures are often favored over subjective measures. The most important reason is that objective measures typically allow one to separate discrimination ability from response criterion (Eriksen, 1960). That is, even though participants may not differ in their actual awareness of the pictures, they might have different notions about their level of awareness. For example, some participants might already report that they saw a face even if they noticed only of a pair of eyes, whereas others might do so only if they could clearly identify eyes, nose, and mouth. If so, the latter participants would perform less well on a subjective measure than the former participants, and the subjective measure would incorrectly suggest that the participants differed in awareness. Response criteria are affected by demand characteristics (Eriksen, 1960). Also, participants tend to underestimate their performance in difficult perception tasks (Bjo¨rkman et al., 1993). In support, when awareness is assessed both objectively and subjectively, objective measures often show evidence of discrimination ability in the absence of subjective awareness. Thus, subjective measures of awareness tend to be less sensitive than objective measures (Cheesman and Merikle, 1984, 1986). Further, because the experimenter has little control over the participant’s criterion, ‘‘we are in fact using as many different criteria of awareness as we have experimental subjects’’ (Eriksen, 1960, pp. 292–293). In contrast, objective measures typically allow one to control for differences in response criterion. However, this is true only if performance measures are used that allow one to separate discrimination ability and response criterion. That is, even if awareness is measured with an objective task, it is possible that the performance measure might be
110
confounded by response bias. As such, it would not be a pure (objective) index of participants’ discrimination ability but might represent a measure of participants’ subjective awareness. For example, although yes–no detection tasks are objective measures, the commonly used measure of performance (percent correct) is affected by response bias as well as discrimination ability (Macmillan and Creelman, 1991). That is, if participants are instructed to detect whether or not a target is shown, percent correct is affected by their predisposition to respond that a target was shown. Thus, a participant with a particular discrimination ability might score anywhere from 50% to 95% correct on a yes–no detection task (see Figure 1 in Azzopardi and Cowey, 1997). Although objective tasks can be used to assess discrimination ability, a potential problem with objective measures is the possibly confounding effect from lack of motivation. When masking parameters are chosen so that the masked pictures are barely visible, participants might have no motivation to perform a discrimination task (Duncan, 1985; Merikle, 1992) as it might be experienced as meaningless. Because participants would loose motivation, they might push buttons randomly and thus perform at chance. Hence, objective measures might not necessarily index the objective threshold but the subjective threshold (i.e., participants’ self-reported unawareness). This discussion highlights the issue of whether or not the use of objective measures guarantees that an objective threshold is assessed. Indeed, Merikle and Daneman (2000) suggested that factors such as an insufficient number of trials on the objective measure as well as insufficient motivation of the participants could explain why many reports of unconscious processes have been comparable for studies in which unawareness was assessed with an objective measure (purportedly indexing the objective threshold) or a subjective measure. Aside from these issues, a major problem is that unawareness is commonly defined in terms of statistical deviations from chance performance instead of an absolute level of performance. For example, mean performance is measured for an individual or a group, and if the mean is not significantly different from the level of performance expected by chance, it is concluded that awareness
was absent. In this view, the absolute level of performance is almost irrelevant, as long as mean performance does not differ significantly from chance. However, because unawareness is defined on the basis of results from a statistical significance test, the outcome of this test depends greatly on statistical power. In fact, this definition of unawareness based on statistical testing can result in nonsensical conclusions (for review, see Wiens and O¨hman, 2005a). That is, if the absolute level of performance is constant and only the number of trials or participants changes, one would have to conclude that an individual became suddenly aware when more trials were run, or that the whole group became suddenly aware when more participants were run. To illustrate, results from a statistical test (e.g., one-sample t test) depend on the number of observations and on the variability among observations. Statistical power increases with the number of observations and decreases with the variability among observations. Thus, when the number of observations is increased and everything else is held constant, a lower observed significance value (p-value) is obtained. For example, assume that the observed mean performance is 60% and chance level is 50%. For the ease of argument, assume further that percent correct is unbiased (see above). Then, at a constant mean performance of 60%, an individual might perform significantly better than chance (by definition, become aware) when the objective measure consists of 40 trials but not when it consists of 20 trials. Similarly, a group might perform significantly better than chance (by definition, become aware) when the sample consists of 40 participants but not when it consists of 20 participants. Also, because statistical power decreases with variability among observations, heterogeneity among participants might result in nonsignificance. However, these data may not suggest that all participants were unaware of the pictures, but may indicate that there is substantial variation among participants. Thus, the mean may not be representative for the group. Although variability among participants can be evaluated in terms of confidence intervals (Cumming and Finch, 2005) and correlations with variables of interest, this is not commonly discussed in research. Further, because researchers
111
are mainly interested in retaining the null hypothesis (i.e., participants do not perform better than chance and are thus unaware), the commonly used a ¼ 0.05 is probably too low. However, there is no general agreement on which a (0.20, 0.10) to use. Another problem with objective and subjective measures is that they are often measured on different time scales. Objective measures typically require participants to respond after each picture. For example, on each trial, participants indicate if they saw a happy or a fearful face by pushing a button. In contrast, subjective measures often require participants to respond only after a series of pictures. For example, participants are interviewed at the end of the experiment if they saw any emotional facial expressions. This makes it difficult to compare results from the two tasks, as any differences might be partly due to contextual differences during measurement (Lovibond and Shanks, 2002). That is, because in subjective measures, participants are often asked about their integrated experience over trials, confounding effects of forgetting seem more problematic for subjective measures than objective measures. Yet another problem with both objective and subjective measures is that because the definitions are rather vague, it is unclear which specific measures ought to be used to capture these thresholds. For example, Cheesman and Merikle (1986) measured objective thresholds in terms of either performance on a detection task (Experiment 2) or identification task (Experiment 3). That is, performance was indexed by participants’ ability to detect whether or not a word was presented (detection task) or which word was presented (discrimination task). However, because other research has used detection tasks as the objective measure and identification tasks as the measure of unconscious processing (Snodgrass et al., 2004b), it is unclear which measure is the correct one to index a supposed objective threshold. Indeed, dissociations among measures might be expected by chance (Eriksen, 1960) or result from subtle differences in task requirements or performance indexes (Fisk and Haase, 2005). To give a simple example, O¨hman and Soares (1994) measured ability to classify masked spiders, snakes, flowers, and mushrooms. However, as Lovibond and
Shanks (2002) argued, participants might be able to discriminate among the masked pictures without being able to verbally label the masked pictures. Because both measures assess different kinds of discrimination abilities, it is unclear which measure indexes the supposed objective threshold. Similarly, if participants are shown masked spiders and snakes, but they report that they did not see any spiders or snakes, it might be concluded that they were unaware (subjective threshold). However, if participants also reported that they noticed that masked pictures were shown even though they could not tell if the pictures were spiders or snakes, these findings would suggest that participants had some subjective awareness of the masked pictures. Therefore, it is unclear if it can be concluded that participants were unaware at the subjective threshold. In fact, this decision might be particularly difficult for tasks that combine subjective with objective features. For example, Kunimoto et al. (2001) described a measure that combines (objective) discrimination ability with (subjective) confidence. Apparently, objective measures have the advantage of objectifying awareness by removing individual differences in response criteria and assessing pure discrimination ability. However, this is their greatest drawback, as they ignore the principally subjective nature of awareness. That is, because awareness refers to phenomenological experience, it may be more relevant to index what people notice subjectively rather than what they can discriminate objectively (Bowers, 1984; Wiens and O¨hman, 2002). In analogy, the phenomenological experience of pain cannot be indexed in terms of whether people can discriminate stimuli objectively but whether they experience them subjectively as painful (Wiens, 2006). Because awareness is a process that is closer to the process of ‘‘noticing’’ than ‘‘discriminating,’’ a valid measure of awareness ought to capture what participants notice rather than what they can discriminate. Indeed, because research has shown that objective measures often provide evidence for discrimination ability despite subjective unawareness, objective measures might fulfill validity requirements of exhaustiveness but not exclusiveness (Merikle and Reingold, 1998). That is, objective measures might
112
capture not only conscious aspects but also unconscious processing, and their apparent greater sensitivity might be due to this violation of exclusiveness. Hence, objective measures might not be valid indexes of awareness. Indeed, proponents of objective measures of awareness would have to infer that performance better than chance demonstrates necessarily that people are aware of the pictures. This reasoning is inconsistent with findings from studies of braindamaged patients. In these studies, objective measures were used to demonstrate that people could perform a task although they are unaware of the target stimuli, as indexed by subjective measures. The most famous example is blindsight (Weiskrantz et al., 1974; Weiskrantz, 1986; Cowey and Stoerig, 1995; Cowey, 2004). Blindsight is observed in patients with damaged primary visual cortex (V1) who report that they are completely unaware of the stimuli in their damaged visual field; nonetheless, they can discriminate among them (de Gelder et al., 1999; Morris et al., 2001; for blindsight studies with emotional stimuli and brain imaging, see Anders et al., 2004; Pegna et al., 2005). Blindsight is commonly demonstrated by a dissociation of performance on two visual tasks: a localization task (objective measure) and a classification task (subjective measure). In the localization task (objective), subjects focus on the center of the screen. When they push a button to initiate a trial, a light flashes somewhere, and subjects are instructed to push the screen at that position. Typical results are that subjects can localize the flashes accurately, even in their damaged visual field. In the classification task (subjective), subjects have an additional response option to indicate when an empty trial (i.e., no light flash) was presented. Here, typical results are that subjects report blanks only when light flashes are presented in their damaged visual field. Hence, although subjects can localize the light flashes in their damaged visual field accurately when forced to point at the location, they choose the blank button when given the option to do so. Such discrepant results between tasks have been obtained for both humans and monkeys with similar lesions in V1 (Cowey and Stoerig, 1995; Stoerig et al., 2002). In humans, similar performance on the subjective classification task is obtained irrespective of whether they are instructed to verbally report their awareness
or to indicate their responses with button presses. Also, although monkeys first had to be trained on the tasks, they showed similar dissociations in task performance when compared to human blindsight patients. These findings imply that monkeys have perceptual awareness (Cowey, 2004). Critically, they also challenge the notion of discrimination ability as a valid index of awareness. That is, because objective measures (i.e., localization task) were used to index performance outside of awareness, these data challenge the notion that above-chance performance on objective measures necessarily shows that people are aware of the target stimuli. However, because it is possible that patients are not reporting accurately about their awareness, potential alternative explanations must be considered. Regarding fellow humans, it may be tempting to trust them about their self-report. In contrast, when considering data from animal research, it is easier to remain skeptical and to think of possible confounding variables. The role of potentially confounding variables has been studied extensively in monkey models of blindsight (Cowey and Stoerig, 1995; Stoerig et al., 2002). The main challenge to blindsight is the argument that patients are perceptually aware of the visual input in their damaged visual field, but because the pictures are perceived as weaker than in the undamaged visual field, patients report that they are unaware of them. That is, targets in the damaged visual field are classified as blanks because they appear more similar to blanks than to targets in the undamaged visual field. In signal detection terms, this argument can be conceptualized in terms of stimuli falling below the response criterion. Although signal detection will be explained below, this means that people use an arbitrary level of visibility at which they report whether or not they are aware of a target. Accordingly, targets in the damaged visual field fall below this criterion and are thus reported as blanks. However, a series of experiments have addressed this potential confound (e.g., Stoerig et al., 2002). First, when target visibility was manipulated so that target contrast was near threshold in the undamaged visual field and maximal in the damaged field, subjects reported only a small proportion of targets in the undamaged field as targets. Still, subjects continued to report targets
113
in the damaged field as blanks. Importantly, these findings were obtained although performance on the localization task was apparently better for targets with maximal contrast in the damaged than near-threshold contrast in the undamaged visual field. These findings are inconsistent with the argument of a decreased visibility in the damaged field, as it would have been expected that with similarly poor levels of target visibility in undamaged and damaged visual fields, differences in ratings would disappear. Second, when number of trials was greater for the damaged than undamaged visual field, subjects continued to classify targets in their damaged visual field as blanks. Importantly, even though monkeys received rewards for reporting targets correctly and did not receive a reward when they misclassified a target in the damaged field as a blank, they reported targets in the damaged field as blanks and thus did not receive any rewards on the majority of trials. These findings argue against the possibility that less visible targets in the damaged visual field were classified as blanks because they occurred less often than targets in the undamaged visual field or because they had different outcomes associated with them. Taken together, the most parsimonious explanation for these findings is that greater-than-chance performance on a task (e.g., localization) does not demonstrate awareness per se. If this conclusion is rejected, then it indicates that blindsight does not exist, as greater-thanchance performance would necessarily indicate that patients are aware of the target stimuli. In fact, this reasoning makes it logically impossible to demonstrate performance without awareness (Bowers, 1984). Accordingly, because there is no a priori reason that emotional effects might not be considered indexes of perceptual awareness, any form of discriminative responding could actually be viewed as evidence of awareness (Wiens and O¨hman, 2002). However, in the absence of a convincing alternative explanation for blindsight, lessons from blindsight suggest that discrimination ability per se is not indicative of awareness. To conclude, the distinction between objective and subjective measures makes intuitive sense because it reflects differences between actual discrimination performance and phenomenological aspects of noticing. However, underlying concepts and their
measurement are rather unclear. As illustrated above, different measures are used interchangeably as indexes of the same process, and identical measures are sometimes used as indexes of either conscious or unconscious processing. As a potential solution, it has been advocated to use awareness measures based on signal detection analyses (e.g., Hannula et al., 2005). However, the theory of signal detection makes no reference to awareness (Macmillan, 1986; Macmillan and Creelman, 1991). So, how can signal detection measures be used to index something that it is not part of the theory of signal detection? Threshold concepts in signal detection theory Although signal detection theory (SDT) makes no reference to perceptual awareness, several concepts can be construed as thresholds (Macmillan, 1986; Macmillan and Creelman, 1991). These threshold concepts can then be evaluated for their usefulness in indexing the threshold idea in subliminal perception. From a psychophysics perspective, four thresholds might be distinguished: sensory, criterion, empirical, and energy threshold. For the ease of argument, the following discussion of these concepts is illustrated for a simple yes–no detection task. For example, in a face detection task, face and no-face trials are presented and participants decide after each trial whether or not they detected a face (yes or no). Sensory threshold A sensory or observer threshold is a hypothetical threshold that is internal to the participant and determines if a stimulus is sensed or not. This sensory threshold cannot be mapped directly to participants’ overt responses. Thus, responding yes or no in a detection task does not correspond directly to sensory states that fall above or below the sensory threshold, respectively. The concept of a sensory threshold is probably closest to the notion of a limen and the distinction between subliminal and supraliminal. Although this concept was included in many early theories of psychophysics, there is little evidence that supports the idea of a
114
sensory threshold. In fact, SDT has been introduced as an alternative model that can account for many findings without resorting to the concept of a sensory threshold (Macmillan and Creelman, 1991). SDT argues against a sensory threshold in favor of an internal continuum of sensory states (more generally called strength of evidence; Pastore et al., 2003). According to SDT, each presentation of a stimulus (signal) occurs against a variable background of internal noise. In theory, the variability of this noise over trials can be captured by presenting no-signal trials repeatedly, measuring the values of these no-signal trials on the sensory continuum, and forming a histogram of these values. This noise distribution characterizes the mean and variability of the internal noise on the continuum of sensory states. Due to continuous background noise, a signal is then presented in the context of noise; that is, a signal is superimposed on the noise (i.e., signal plus noise). As with no-signal trials, a hypothetical distribution for signal (plus noise) trials can be plotted. Theoretically, there is no point on the continuum that allows one to determine unmistakably whether or not a signal was presented. That is, a signal can sometimes evoke a relatively weak internal response, whereas the absence of a stimulus (no-signal trial) can even evoke a relatively strong internal response. Accordingly, it is possible to make only probability statements. For example, if the point on the continuum is high relative to the mean of the noise distribution, it is likely that a signal was presented, whereas if the point is low, it is likely that no signal (noise) was presented. However, the more the signal (plus noise) and noise distributions overlap, the more difficult it is to distinguish between both types of trials. If the distributions overlap perfectly, signal and noise trials cannot be distinguished at all. The locations of the noise and signal (plus noise) distributions cannot be measured directly. Instead, they need to be inferred. To do that, many signal and no-signal trials need to be presented, and participants are asked to make a response on each trial. For example, in the face detection task, face and no-face trials are shown and participants decide after each trial whether or not they detected a face (yes or no). According to SDT, when observers are asked to make overt responses in a detection task, they
choose an arbitrary level on the sensory continuum as a cutoff score (criterion). Above this criterion, they respond yes, and below this criterion, they respond no. Based on the criterion, it is possible to distinguish among hits (responding yes on signal trials), false alarms (responding yes on no-signal trials), misses (responding no on signal trials), and correct rejections (responding no on no-signal trials). Then, probabilities for hits and false alarms can be used to compute the actual distributions to determine the relative location of signal and noise distributions. This is commonly expressed in terms of d0 (d prime), which is the distance between the means of the signal and noise distribution in z scores. Sensitivity (d0 ) is high if signal and noise distributions are far apart and low if the distributions overlap closely. Alternative indexes of sensitivity such as Br and A0 have been proposed. However, Br is based on a different threshold model, and A0 has been criticized for alleged claims that it is nonparametric (Snodgrass and Corwin, 1988; Macmillan and Creelman, 1990; Pastore et al., 2003). However, in practice, these indexes often give comparable results. Nonetheless, to calculate z scores and thus d0 , hit and false alarm rates must be greater than zero and less than 1. This is a potential problem when the perceptual input is degraded, as participants might never report that they detected a signal. To permit calculation of d0 , extreme scores are often dealt with by adding 0.5 in the numerator and 1 in the denominator when calculating hit rates and false alarm rates (Snodgrass and Corwin, 1988). A critical feature of SDT is that the observer’s placement of the criterion does not affect estimation of discrimination ability, as the distance between the signal and noise distributions is unaffected by participants’ placement of their criterion. Different indexes of criterion placement have been proposed (Snodgrass and Corwin, 1988; Macmillan and Creelman, 1991). The likelihood ratio b (beta) is the ratio of the heights of the signal and noise distributions, and the criterion C is the distance from the intersection point of signal and noise distributions as a z score. A neutral criterion or absence of a response bias is present if participants set their criterion so that the probabilities for signal and noise are equally likely (i.e., where they cross). That is, b ¼ 1 and C ¼ 0. If participants position the
115
criterion more towards the lower end of the sensory continuum, then they exhibit a lax or liberal response bias (as they are more willing to respond yes); if they position it more to the higher end of the sensory continuum, then they exhibit a strict or conservative response bias (as they are less willing to respond yes). A liberal response bias results in bo1 and Co0, whereas a conservative response bias results in b41 and C40. Although beta is used more commonly as an index of response bias, a number of arguments favor C (Macmillan and Creelman, 1990). Although the location of the criterion is chosen arbitrarily by the observer, it is often affected by the pay-off associated with different response outcomes. For example, if there is a big reward for detecting a signal, observers are more willing to respond yes (lax response bias). In contrast, if there is a punishment for false alarms, observers are less willing to respond yes (strict response bias).
Criterion threshold The criterion itself might be viewed as a threshold of awareness. For example, applied to the face detection task, participants could be instructed to respond yes only if they were consciously aware of the faces. Thus, the placement of the criterion would correspond to a subjective measure of awareness. However, participants might have different notions about their awareness. In the face detection task, some might respond yes already if they noticed only eyes (lax response bias) whereas others might respond yes only if they noticed eyes, nose, and mouth (strict response bias). Indeed, in SDT the criterion is considered a pure index of response bias that says nothing about awareness. However, if participants receive clear instructions about how they should place their criterion (e.g., respond yes in the face detection task only if they can clearly see eyes, nose, and mouth), individual differences might be reduced. Of course, this requires that the experimenter has a clear and explicit definition of awareness (e.g., which experiences constitute awareness of a face). But, if these instructions are clear, the criterion might be a useful measure of the subjective aspect
of awareness. Also, because a response is collected on every trial, it has an advantage over other subjective measures that are assessed only across a number of trials. But, a drawback is that it is unclear how this measure can be averaged over trials. For example, if participants report awareness on 12% of the signal trials, it is unclear if they should be considered aware or unaware. Intuitively, one might consider whether participants reported awareness of faces even when no faces were shown. However, the false alarm rate cannot be used to make inferences about participants’ subjective awareness, as this would assess merely their discrimination ability (as d0 is calculated from hit rates and false alarm rates). Another approach to using the criterion as a threshold might be to sort signal trials into detected and undetected trials (i.e., above and below criterion). Then, effects of interest could be studied for undetected signals, that is, signals of which participants report to be unaware. This approach has a long tradition in experimental psychology (for review, see Merikle et al., 2001). However, SDT can account for seemingly surprising findings that undetected stimuli can be discriminated. For example, undetected faces might be discriminated in their facial expressions. The reason is that undetected signals do not necessarily indicate that discrimination ability is absent (d0 ¼ 0) (Macmillan, 1986; Haase et al., 1999). Because the relative location of signal and noise distributions is unaffected by the location of the criterion, discrimination ability between signal and noise might be quite high even if the signal is not detected on 95% of the signal trials (strict response bias). Hence, there would be nothing mysterious if participants can discriminate undetected faces in terms of facial expressions. However, SDT would predict that if detection ability were indeed absent (d0 ¼ 0), participants should be unable to discriminate among signals, as the signal and noise distributions would overlap perfectly. Although this point is debated, evidence suggests that observed effects are small and might be due to slight differences in task setup (Snodgrass, 2002; Haase and Fisk, 2004; Holender and Duscherer, 2004; Reingold, 2004; Snodgrass et al., 2004b; Snodgrass et al., 2004a; Fisk and Haase, 2005).
116
Further, although effects below a subjective threshold can be studied, this threshold seems arbitrary. For example, instead of asking participants to respond yes or no, they could rate level of awareness (visibility) on a continuous scale (Sergent and Dehaene, 2004). When subjective awareness is considered on a continuum, it comes at no surprise that unreported (i.e., below threshold) stimuli are processed. That is, participants may not report pictures below a particular cutoff on a continuous scale (e.g., 6 on a 10-point scale) but still be able to discriminate among these pictures.
Empirical threshold An empirical threshold is defined arbitrarily as a particular level of behavioral performance. For example, different empirical thresholds might correspond to various performance levels (e.g., d0 ¼ 1, 2). Because there is no theory of awareness that equates particular empirical thresholds with awareness, empirical thresholds appear to have limited usefulness in indexing awareness and unawareness. However, findings of qualitative differences would provide some support for the validity of particular empirical thresholds. As discussed by Merikle and colleagues (e.g., Merikle et al., 2001), the distinction between subliminal (unconscious) and supraliminal (conscious) processes is supported if they have qualitatively different effects. In fact, the distinction between subliminal and supraliminal perception might be interesting only if they have effects that differ qualitatively rather than quantitatively. For example, Merikle and Cheesman (1987) found that reaction times in the Stroop task were in opposite directions for masked (subliminal) and nonmasked (supraliminal) words. Because awareness was indexed by whether or not participants reported awareness of the masks, these findings support this measure of awareness. However, although interesting, qualitative differences have been reported only for a few experimental conditions (Merikle et al., 2001). Also, this validation is rather indirect, as qualitative differences might indicate only that awareness is a marker or correlate of qualitative differences rather than a causal mechanism (Kunimoto et al., 2001; Wiens and O¨hman, 2005a).
Energy threshold The energy threshold is the level at which performance level is null, that is, d0 ¼ 0. Thus, it can be conceptualized as a specific empirical threshold. The energy threshold is the most common denominator among researchers in subliminal perception. That is, researchers agree that people are unaware if their performance is null (d0 ¼ 0). However, the debate ensues as to whether performance above null reflects awareness. Some models propose that subliminal perception ought to be studied at d0 ¼ 0 (Snodgrass, 2002; Haase and Fisk, 2004; Snodgrass et al., 2004a, b; Fisk and Haase, 2005). However, this approach has methodological and conceptual problems. A major problem is that it attempts to prove the null hypothesis. This endeavor is generally known to be difficult if not impossible. It requires thousands of trials to obtain a reliable estimate of d0 , and a lax significance criterion (a ¼ 0.20) to guard against a type 2 error of retaining the null (i.e., participant is unaware) even though the alternative hypothesis (i.e., participant is aware) is true. Also, because signal and noise trials will be barely distinguishable, it is doubtful that participants will stay motivated during this task. Therefore, it is likely that participants will give up and start pushing buttons randomly. As a consequence, d0 ¼ 0 may not accurately reflect absence of discrimination ability but lack of motivation (Merikle and Daneman, 2000). Further, this approach concludes that performance above null (d0 40) necessarily indicates perceptual awareness. However, it is not intuitive to conclude that any deviation from 0, however small, indicates awareness. Critically, as discussed above, it ignores the subjective nature of perceptual awareness as well as lessons from blindsight. In sum, central aspects of contemporary psychophysics, and in particular SDT, are that there is a continuum of sensory states (i.e., there is no actual sensory threshold), and that the relationship between stimulus events and sensory states is probabilistic due to constant background noise in the sensory system. It is therefore impossible to deduce unequivocally from a given sensory activation whether it resulted from a signal or noise trial. Also, although a response (e.g., yes or no) is
117
obtained on each trial, it does not reflect awareness per se but the cutoff point on the continuum of sensory states that participants choose arbitrarily to separate different response alternatives (e.g., yes and no).
Recommendations It is definitely an oversimplification to treat perceptual awareness as a unitary concept. Perceptual input often consists of various aspects, each with their own thresholds for awareness. For example, a face comprises aspects such as features (eyes, nose, and mouth), expression, gender, race, age and so on. Similarly, awareness of words might be differentiated in terms of awareness of individual characters and of the whole words (Kouider and Dupoux, 2004). Awareness is likely to differ for these various aspects and in their underlying mechanisms (e.g., Stoerig, 1996). Therefore, future studies ought to include measures that capture the relevant stimulus dimension of interest. Further, several masking studies have reported that awareness was not measured at all because picture parameters were similar to other experiments (e.g., 30 ms SOA). However, reliability and luminance curves of picture parameters vary substantially for different display technologies (Wiens et al., 2004; Wiens and O¨hman, 2005b), and small differences in picture parameters can have strong effects on perceptual awareness (Esteves and O¨hman, 1993). Therefore, it is recommended that reliability of display equipment is demonstrated and that participants’ awareness is measured explicitly rather than assumed. In fact, to rule out potentially confounding effects from individual differences in awareness, individual performance needs to be assessed. Also, if a particular level of performance is targeted, specific stimulus parameters could be selected on the basis of an awareness test prior to the actual experiment. This approach is recommended from a psychophysics approach, but a potential drawback is that participants might habituate to the target pictures. Most studies lack an explicit definition of awareness (Reingold and Merikle, 1990). Because nobody challenges the conclusion that participants
who are completely unable to discriminate visual input are unaware, researchers might have been tempted to use this definition of unawareness. In SDT terms, this definition corresponds to the energy threshold or an empirical threshold that is set at d0 ¼ 0. However, because this approach attempts to prove the null, any null findings on the awareness task might be challenged on the grounds of insufficient statistical power or confounding effects from lack of motivation (Merikle and Daneman, 2000). In fact, the conclusion that brain imaging studies of subliminal perception are based on d0 ¼ 0 has been questioned (Hannula et al., 2005; Pessoa, 2005). Therefore, if researchers intend to argue that d0 ¼ 0, they need to provide convincing evidence that participants were actually unable to discriminate the stimuli. A possible reason why researchers tend not to be explicit about their definition of awareness may be that they actually feel uncomfortable about equating unawareness with d0 ¼ 0. Indeed, if d0 40 is inevitably equated with awareness, then this approach denies the subjective nature of awareness (Bowers, 1984). Also, it tends to make it logically impossible to demonstrate subliminal perception, and implies that findings from blindsight patients are invalid (Wiens, 2006). Although many researchers might agree with this conclusion, they might be unsure about which measure to use to capture subjective experience. Participants might be instructed to make a yes–no decision about their own awareness (i.e., placement of criterion). If participants are not instructed where to place this criterion, there will probably be as many definitions of awareness as there are participants (Eriksen, 1960). However, if participants are presented with an explicit definition of subjective awareness, participants will probably use this definition accurately. Thus, subjective experience can be assessed objectively. Despite methodological difficulties, awareness needs to be treated and assessed as a subjective state. Indeed, shortcomings in dichotomizing performance on subjective measures do not argue against subjective measures in general but against a conceptualization of awareness that assumes it to be a binary state. Hence, conclusions about awareness may be more realistic and informative in
118
terms of relative awareness rather than as awareness as present or absent. Because awareness might be treated more accurately as a continuum, a psychophysics approach lends itself to study stimulus-response relationships between awareness and effects of interest. First, participants might be asked to rate their perceptual awareness on a continuous scale to determine if awareness changes gradually or dichotomously. For example, Sergent and Dehaene (2004) propose that participants experience the attentional blink as dichotomous. In general, when series of pictures are presented briefly and participants have to detect two target pictures, participants often fail to detect the second target if it follows about 200–500 ms after the first target (i.e., attentional blink). In their study, participants were instructed to rate visibility of the second target on a 21-point scale. Results showed a bimodal distribution of visibility ratings for the second target. This bimodal distribution was probably not due to response biases, as participants gave gradually higher visibility ratings for detected targets when the duration of the target was lengthened. Second, a psychophysics approach allows studying how perceptual input is processed at different levels of awareness. For example, in a follow-up study on the attentional blink, Sergent et al. (2005) sorted trials based on visibility ratings to study their neural correlates. This example illustrates that continuous measures of awareness can be powerful tools to index different levels of awareness and to study their neural correlates. Therefore, if facial expressions compared to other pictures were processed similarly at various levels of awareness, such results would suggest that awareness does not play a critical role in processing facial expressions. Alternatively, if emotional input can be shown to have (qualitatively) different effects at different levels of awareness, this would suggest that awareness (as indexed by a particular measure) plays a moderating role in emotion face perception. A similar approach might be useful for studying the role of attention in processing of emotional input (Pessoa et al., 2005; Vuilleumier, 2005). So, even though findings for unattended, suppressed, and masked emotional pictures may not permit absolute statements about subliminal perception, findings that suggest a relative independence of
effects from awareness or yield different effects depending on awareness are informative. However, in order to characterize particular effects in terms of awareness, it is necessary to document awareness carefully. To conclude, because the debate in defining and measuring awareness is conceptual, results from brain imaging cannot solve this issue. Nonetheless, by adopting an eclectic approach using subjective and objective measures, and treating awareness as a continuum, brain imaging can provide informative insights on how the brain processes emotional input at various levels of awareness. Thus, past and future findings from brain imaging studies should not be evaluated in terms of whether or not they demonstrate subliminal perception but instead in terms of if and how effects differ at different levels of awareness.
Acknowledgements Preparation of this article was funded by grants from the Swedish Research Council.
References Anders, S., Birbaumer, N., Sadowski, B., Erb, M., Mader, I., Grodd, W. and Lotze, M. (2004) Parietal somatosensory association cortex mediates affective blindsight. Nat. Neurosci., 7: 339–340. Anderson, A.K., Christoff, K., Panitz, D., De Rosa, E. and Gabrieli, J.D.E. (2003) Neural correlates of the automatic processing of threat facial signals. J. Neurosci., 23: 5627–5633. Azzopardi, P. and Cowey, A. (1997) Is blindsight like normal, near-threshold vision? Proc. Natl. Acad. Sci. USA, 94: 14190–14194. Bishop, S.J., Duncan, J. and Lawrence, A.D. (2004) State anxiety modulation of the amygdala response to unattended threat-related stimuli. J. Neurosci., 24: 10364–10368. Bjo¨rkman, M., Juslin, P. and Winman, A. (1993) Realism of confidence in sensory discrimination — the underconfidence phenomenon. Percept. Psychophys., 54: 75–81. Bowers, K.S. (1984) On being unconsciously influenced and informed. In: Bowers, K.S. and Meichenbaum, D. (Eds.), The Unconscious Reconsidered. Wiley, New York, pp. 227–272. Carlsson, K., Petersson, K.M., Lundqvist, D., Karlsson, A., Ingvar, M. and Ohman, A. (2004) Fear and the amygdala: manipulation of awareness generates differential cerebral
119 responses to phobic and fear-relevant (but nonfeared) stimuli. Emotion, 4: 340–353. Cheesman, J. and Merikle, P.M. (1984) Priming with and without awareness. Percept. Psychophys., 36: 387–395. Cheesman, J. and Merikle, P.M. (1986) Distinguishing conscious from unconscious perceptual processes. Can. J. Psychol., 40: 343–367. Costen, N.P., Shepherd, J., Ellis, H. and Craw, I. (1994) Masking of faces by facial and non-facial stimuli. Vis. Cogn., 1: 227–251. Cowey, A. (2004) The 30th Sir Frederick Bartlett lecture: fact, artefact, and myth about blindsight. Q. J. Exp. Psychol. A, 57: 577–609. Cowey, A. and Stoerig, P. (1995) Blindsight in monkeys. Nature, 373: 247–249. Critchley, H.D., Mathias, C.J. and Dolan, R.J. (2002) Fear conditioning in humans: the influence of awareness and autonomic arousal on functional neuroanatomy. Neuron, 33: 653–663. Cumming, G. and Finch, S. (2005) Inference by eye — confidence intervals and how to read pictures of data. Am. Psychol., 60: 170–180. de Gelder, B., Vroomen, J., Pourtois, G. and Weiskrantz, L. (1999) Non-conscious recognition of affect in the absence of striate cortex. Neuroreport, 10: 3759–3763. Dolan, R.J. and Vuilleumier, P. (2003) Amygdala automaticity in emotional processing. Ann. N. Y. Acad. Sci., 985: 348–355. Duncan, J. (1985) 2 techniques for investigating perception without awareness. Percept. Psychophys., 38: 296–298. Eriksen, C.W. (1960) Discrimination and learning without awareness: a methodological survey and evaluation. Psychol. Rev., 67: 279–300. Esteves, F. and O¨hman, A. (1993) Masking the face: recognition of emotional facial expressions as a function of the parameters of backward masking. Scand. J. Psychol., 34: 1–18. Etkin, A., Klemenhagen, K.C., Dudman, J.T., Rogan, M.T., Hen, R., Kandel, E.R. and Hirsch, J. (2004) Individual differences in trait anxiety predict the response of the basolateral amygdala to unconsciously processed fearful faces. Neuron, 44: 1043–1055. Fisk, G.D. and Haase, S.J. (2005) Unconscious perception or not? An evaluation of detection and discrimination as indicators of awareness. Am. J. Psychol., 118: 183–212. Frith, C., Perry, R. and Lumer, E. (1999) The neural correlates of conscious experience: an experimental framework. Trends Cogn. Sci., 3: 105–114. Haase, S.J. and Fisk, G.D. (2004) Valid distinctions between conscious and unconscious perception? Percept. Psychophys., 66: 868–871. Haase, S.J., Theios, J. and Jenison, R. (1999) A signal detection theory analysis of an unconscious perception effect. Percept. Psychophys., 61: 986–992. Hannula, D.E., Simons, D.J. and Cohen, N.J. (2005) Opinion — imaging implicit perception: promise and pitfalls. Nat. Rev. Neurosci., 6: 247–255.
Hendler, T., Rotshtein, P., Yeshurun, Y., Weizmann, T., Kahn, I., Ben-Bashat, D., Malach, R. and Bleich, A. (2003) Sensing the invisible: differential sensitivity of visual cortex and amygdala to traumatic context. Neuroimage, 19: 587–600. Holender, D. (1986) Semantic activation without conscious identification in dichotic-listening, parafoveal vision, and visual masking: a survey and appraisal. Behav. Brain Sci., 9: 1–23. Holender, D. and Duscherer, K. (2004) Unconscious perception: the need for a paradigm shift. Percept. Psychophys., 66: 872–881. Killgore, W.D. and Yurgelun-Todd, D.A. (2004) Activation of the amygdala and anterior cingulate during nonconscious processing of sad versus happy faces. Neuroimage, 21: 1215–1223. Kim, C.Y. and Blake, R. (2005) Psychophysical magic: rendering the visible ‘invisible’. Trends Cogn. Sci., 9: 381–388. Kouider, S. and Dupoux, E. (2004) Partial awareness creates the ‘‘illusion’’ of subliminal semantic priming. Psychol. Sci., 15: 75–81. Kunimoto, C., Miller, J. and Pashler, H. (2001) Confidence and accuracy of near-threshold discrimination responses. Conscious Cogn., 10: 294–340. LeDoux, J.E. (2000) Emotion circuits in the brain. Annu. Rev. Neurosci., 23: 155–184. Liddell, B.J., Brown, K.J., Kemp, A.H., Barton, M.J., Das, P., Peduto, A., Gordon, E. and Williams, L.M. (2005) A direct brainstem-amygdala-cortical ‘alarm’ system for subliminal signals of fear. Neuroimage, 24: 235–243. Lovibond, P.F. and Shanks, D.R. (2002) The role of awareness in Pavlovian conditioning: empirical evidence and theoretical implications. J. Exp. Psychol. Anim. Behav. Process., 28: 3–26. Macmillan, N.A. (1986) The psychophysics of subliminal perception. Behav. Brain Sci., 9: 38–39. Macmillan, N.A. and Creelman, C.D. (1990) Response bias: characteristics of detection theory, threshold theory, and nonparametric indexes. Psychol. Bull., 107: 401–413. Macmillan, N.A. and Creelman, C.D. (1991) Detection Theory: A User’s Guide. Cambrigde University Press, New York. Marois, R. and Ivanoff, J. (2005) Capacity limits of information processing in the brain. Trends Cogn. Sci., 9: 296–305. Merikle, P.M. (1992) Perception without awareness: critical issues. Am. Psychol., 47: 792–795. Merikle, P.M. and Cheesman, J. (1987) Current status of research on subliminal perception. Adv. Consum. Res., 14: 298–302. Merikle, P.M. and Daneman, M. (2000) Conscious vs. unconscious perception. In: Gazzaniga, M.S. (Ed.), The New Cognitive Neurosciences. MIT Press, Cambridge, MA, pp. 1295–1303. Merikle, P.M. and Joordens, S. (1997) Parallels between perception without attention and perception without awareness. Conscious Cogn., 6: 219–236. Merikle, P.M. and Reingold, E.M. (1998) On demonstrating unconscious perception: comment on Draine and Greenwald (1998). J. Exp. Psychol. Gen., 127: 304–310.
120 Merikle, P.M., Smilek, D. and Eastwood, J.D. (2001) Perception without awareness: perspectives from cognitive psychology. Cognition, 79: 115–134. Morris, J.S., DeGelder, B., Weiskrantz, L. and Dolan, R.J. (2001) Differential extrageniculostriate and amygdala responses to presentation of emotional faces in a cortically blind field. Brain, 124: 1241–1252. Morris, J.S., O¨hman, A. and Dolan, R.J. (1998) Conscious and unconscious emotional learning in the human amygdala. Nature, 393: 467–470. O¨hman, A. (1986) Face the beast and fear the face: animal and social fears as prototypes for evolutionary analyses of emotion. Psychophysiology, 23: 123–145. O¨hman, A. and Soares, J.J.F. (1994) Unconscious anxiety: phobic responses to masked stimuli. J. Abnorm. Psychol., 103: 231–240. O¨hman, A. and Wiens, S. (2003) On the automaticity of autonomic responses in emotion: an evolutionary perspective. In: Davidson, R.J., Scherer, K. and Goldsmith, H.H. (Eds.), Handbook of Affective Sciences. Oxford University Press, New York, pp. 256–275. Pasley, B.N., Mayes, L.C. and Schultz, R.T. (2004) Subcortical discrimination of unperceived objects during binocular rivalry. Neuron, 42: 163–172. Pastore, R.E., Crawley, E.J., Berens, M.S. and Skelly, M.A. (2003) ‘‘Nonparametric’’ A ‘ and other modem misconceptions about signal detection theory. Psychon. Bull. Rev., 10: 556–569. Pegna, A.J., Khateb, A., Lazeyras, F. and Seghier, M.L. (2005) Discriminating emotional faces without primary visual cortices involves the right amygdala. Nat. Neurosci., 8: 24. Pessoa, L. (2005) To what extent are emotional visual stimuli processed without attention and awareness? Curr. Opin. Neurobiol., 15: 188–196. Pessoa, L., Japee, S., Sturman, D. and Ungerleider, L.G. (2006) Target visibility and visual awareness modulate amygdala responses to fearful faces. Cereb. Cortex, 16: 366–375. Pessoa, L., McKenna, M., Gutierrez, E. and Ungerleider, L.G. (2002) Neural processing of emotional faces requires attention. Proc. Natl. Acad. Sci. USA, 99: 11458–11463. Pessoa, L., Padmala, S. and Morland, T. (2005) Fate of unattended fearful faces in the amygdala is determined by both attentional resources and cognitive modulation. Neuroimage, 28: 249–255. Phillips, M.L., Williams, L.M., Heining, M., Herba, C.M., Russell, T., Andrew, C., Brammer, M.J., Williams, S.C.R., Morgan, M., Young, A.W. and Gray, J.A. (2004) Differential neural responses to overt and covert presentations of facial expressions of fear and disgust. Neuroimage, 21: 1484–1496. Rauch, S.L., Whalen, P.J., Shin, L.M., McInerney, S.C., Macklin, M.L., Lasko, N.B., Orr, S.P. and Pitman, R.K. (2000) Exaggerated amygdala response to masked facial stimuli in posttraumatic stress disorder: a functional MRI study. Biol. Psychiatry, 47: 769–776. Reingold, E.M. (2004) Unconscious perception and the classic dissociation paradigm: a new angle? Percept. Psychophys., 66: 882–887.
Reingold, E.M. and Merikle, P.M. (1990) On the inter-relatedness of theory and measurement in the study of unconscious processes. Mind Language, 5: 9–28. Robinson, M.D. (1998) Running from William James’ bear: a review of preattentive mechanisms and their contributions to emotional experience. Cogn. Emotion, 12: 667–696. Roser, M. and Gazzaniga, M.S. (2004) Automatic brains — interpretive minds. Curr. Dir. Psychol., 13: 56–59. Sergent, C., Baillet, S. and Dehaene, S. (2005) Timing of the brain events underlying access to consciousness during the attentional blink. Nat. Neurosci., 8: 1391–1400. Sergent, C. and Dehaene, S. (2004) Is consciousness a gradual phenomenon? Evidence for an all-or-none bifurcation during the attentional blink. Psychol. Sci., 15: 720–728. Sheline, Y.I., Barch, D.M., Donnelly, J.M., Ollinger, J.M., Snyder, A.Z. and Mintun, M.A. (2001) Increased amygdala response to masked emotional faces in depressed subjects resolves with antidepressant treatment: an fMRI study. Biol. Psychiatry, 50: 651–658. Shevrin, H. and Dickman, S. (1980) The psychological unconscious — a necessary assumption for all psychological theory. Am. Psychol., 35: 421–434. Snodgrass, J.G. and Corwin, J. (1988) Pragmatics of measuring recognition memory: applications to dementia and amnesia. J. Exp. Psychol. Gen., 117: 34–50. Snodgrass, M. (2002) Disambiguating conscious and unconscious influences: do exclusion paradigms demonstrate unconscious perception? Am. J. Psychol., 115: 545–579. Snodgrass, M., Bernat, E. and Shevrin, H. (2004a) Unconscious perception at the objective detection threshold exists. Percept. Psychophys., 66: 888–895. Snodgrass, M., Bernat, E. and Shevrin, H. (2004b) Unconscious perception: a model-based approach to method and evidence. Percept. Psychophys., 66: 846–867. Stoerig, P. (1996) Varieties of vision: from blind responses to conscious recognition. Trends Neurosci., 19: 401–406. Stoerig, P., Zontanou, A. and Cowey, A. (2002) Aware or unaware: assessment of cortical blindness in four men and a monkey. Cereb. Cortex, 12: 565–574. Vuilleumier, P. (2005) How brains beware: neural mechanisms of emotional attention. Trends Cogn. Sci., 9: 585–594. Vuilleumier, P., Armony, J.L., Driver, J. and Dolan, R.J. (2001) Effects of attention and emotion on face processing in the human brain: an event-related fMRI study. Neuron, 30: 829–841. Weiskrantz, L. (1986) Blindsight. A Case Study and Implications. Oxford University Press, New York. Weiskrantz, L., Warrington, E.K., Sanders, M.D. and Marshall, J. (1974) Visual capacity in hemianopic field following a restricted occipital ablation. Brain, 97: 709–728. Whalen, P.J., Kagan, J., Cook, R.G., Davis, F.C., Kim, H., Polis, S., McLaren, D.G., Somerville, L.H., McLean, A.A., Maxwell, J.S. and Johnstone, T. (2004) Human amygdala responsivity to masked fearful eye whites. Science, 306: 2061. Whalen, P.J., Rauch, S.L., Etcoff, N.L., McInerney, S.C., Lee, M.B. and Jenike, M.A. (1998) Masked presentations of
121 emotional facial expressions modulate amygdala activity without explicit knowledge. J. Neurosci., 18: 411–418. Wiens, S. (2006) Remain aware of awareness [Correspondence]. Nat. Rev. Neurosci. Wiens, S., Fransson, P., Dietrich, T., Lohmann, P., Ingvar, M. and O¨hman, A. (2004) Keeping it short: a comparison of methods for brief picture presentation. Psychol. Sci., 15: 282–285. Wiens, S. and O¨hman, A. (2002) Unawareness is more than a chance event: comment on Lovibond and Shanks (2002). J. Exp. Psychol. Anim. Behav. Process, 28: 27–31. Wiens, S. and O¨hman, A. (2005a). Probing unconscious emotional processes: on becoming a successful masketeer.
In: Coan, J.A., Allen, J.J.B. (Eds.), The Handbook of Emotion Elicitation and Assessment. Series in Affective Sciences. Oxford University Press, Oxford, in press. Wiens, S. and O¨hman, A. (2005b) Visual masking in magnetic resonance imaging. Neuroimage, 27: 465–467. Williams, M.A., McGlone, F., Abbott, D.F. and Mattingley, J.B. (2005) Differential amygdala responses to happy and fearful facial expressions depend on selective attention. Neuroimage, 24: 417–425. Williams, M.A., Morris, A.P., McGlone, F., Abbott, D.F. and Mattingley, J.B. (2004) Amygdala responses to fearful and happy facial expressions under conditions of binocular suppression. J. Neurosci., 24: 2898–2904.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 7
Neuroimaging methods in affective neuroscience: Selected methodological issues Markus Jungho¨fer1,, Peter Peyk2, Tobias Flaisch3 and Harald T. Schupp3 1
Institute for Biosignalanalysis and Biomagnetism, University of Mu¨nster, Mu¨nster, Germany 2 Department of Psychology, University of Basel, Basel, Switzerland 3 Department of Psychology, University of Konstanz, Konstanz, Germany
Abstract: A current goal of affective neuroscience is to reveal the relationship between emotion and dynamic brain activity in specific neural circuits. In humans, noninvasive neuroimaging measures are of primary interest in this endeavor. However, methodological issues, unique to each neuroimaging method, have important implications for the design of studies, interpretation of findings, and comparison across studies. With regard to event-related brain potentials, we discuss the need for dense sensor arrays to achieve reference-independent characterization of field potentials and improved estimate of cortical brain sources. Furthermore, limitations and caveats regarding sparse sensor sampling are discussed. With regard to eventrelated magnetic field (ERF) recordings, we outline a method to achieve magnetoencephalography (MEG) sensor standardization, which improves effects’ sizes in typical neuroscientific investigations, avoids the finding of ghost effects, and facilitates comparison of MEG waveforms across studies. Focusing on functional magnetic resonance imaging (fMRI), we question the unjustified application of proportional global signal scaling in emotion research, which can greatly distort statistical findings in key structures implicated in emotional processing and possibly contributing to conflicting results in affective neuroscience fMRI studies, in particular with respect to limbic and paralimbic structures. Finally, a distributed EEG/MEG source analysis with statistical parametric mapping is outlined providing a common software platform for hemodynamic and electromagnetic neuroimaging measures. Taken together, to achieve consistent and replicable patterns of the relationship between emotion and neuroimaging measures, methodological aspects associated with the various neuroimaging techniques may be of similar importance as the definition of emotional cues and task context used to study emotion. Keywords: EEG; MEG; fMRI; average reference; sensor standardization; proportional global signal scaling; SPM of EEG/MEG distributed source estimations Phelps, 2004). However, the inherent time lag of hemodynamic responses limits the temporal resolution of fMRI to reveal the dynamics of brain activity (Bandettini et al., 1992; Blamire et al., 1992). Recordings of the brain’s magnetic and electrical fields provide data with high temporal precision needed to determine the brain dynamics of emotional processes. Availability of dense sensor electroencephalography (EEG; up to 256 channels) and magnetoencephalography (MEG;
Neuroimaging methods have been increasingly used to explore the neural substrate of emotion. Over the last decade, a multitude of studies utilized functional magnetic resonance imaging (fMRI) to indirectly reveal brain activity by measuring blood-flow-dependent signal changes in magnetic resonance (Murphy et al., 2003; Phan et al., 2004; Corresponding author. Tel.: +49-251-8-356-987; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56007-8
123
124
up to 275 channels) enable more adequate spatial assessment of electromagnetic fields thereby improving the ability of these measures to uncover brain sources. Thus, hemodynamic and electromagnetic neuroimaging measures provide complementary information regarding brain processes and progress may be expected in combining fMRI and EEG measures (Salek-Haddadi et al., 2003; Debener et al., 2005; Wang et al., 2006). Studying emotional perception from the perspective of biphasic emotion theory, we utilized eventrelated potentials (ERPs), event-related magnetic field recordings (ERFs), and functional magnetic resonance imaging (fMRI) to reveal the brain dynamics and neural structures of the processing of emotional visual cues (Schupp et al., this volume; Kissler et al., this volume; Lang et al., this volume; Sabatinelli et al., this volume). In this research, methodological problems and challenges were encountered, unique to each neuroimaging method, with important implications for the interpretation of data, design of analyses, and comparison across studies. Specifically, we first discuss reasons for the adequate spatial assessment in event-related brain potential studies and highlight limitations associated with popular reference sites such as linked mastoid or earlobes. Next, we provide reasons and ways to implement sensor standardization in MEG research. In fMRI analyses, while proportional global signal scaling is routinely used, we report findings that violations in assumptions of this correction can dramatically impact statistical findings. None of these problems is specific to emotion processing and, indeed, points of concern raised by this review have been articulated repeatedly in cognitive neuroscience. However, we hope that illustration of these issues with specific examples from emotion research raises sensitivity to methodological issues in affective neuroscience. Finally, we describe a source spaced analysis of EEG/MEG data achieving electromagnetic data analyses with the same SPM (Statistical Parametric Mapping) routines as established for hemodynamic measures. Event-related brain potentials and brain sources An enduring problem inherent to event-related potential (ERP) research is the limitation to
draw inferences about underlying brain sources. One challenge to determine more precisely neural sources in ERP research is the adequate spatial assessment of brain field potentials. Technological advances enable the routine recording of dense sensor ERPs up to 256 channels (Tucker et al., 2003), approaching the ideal of the reference-independent characterization of brain field potentials. The difficulty to achieve a reference-independent characterization of brain field potentials and the reference dependency of sparse sampling reflects the biophysics of EEG signal generation. It is widely assumed that EEG represents signals meeting two requirements: First, because individual neuron activity generates small field potentials, activity has to be synchronous in thousands of neurons to allow for summation (Creutzfeld et al., 1966). Second, activity has to occur in spatially aligned neurons assuring that summation is effective (rather than cancellation as in closed fields; Lorente de No, 1947; Coles et al., 1990). It is for this reason that larger ERP components are considered to reflect predominantly excitatory postsynaptic potentials of cortical pyramidal cells. Neuronal activity is volume conducted through the scull to the scalp, where it can be recorded through surface sensors (Caton, 1875; Berger, 1929). Measuring brain potentials a few millimeters away from the source, the measured potential can be approximated by dipoles, with negligible contribution from multipole field potentials (Lutzenberger et al., 1987; Nunez, 1989). The dipolar nature of potentials measured at the head surface can be described as the superposition of the individual potentials of all active generators. Thus, assuming a complete coverage of the head surface (including the neck) and homogenous conductivity, the integral ERP activity would be zero. This logic underlies the calculation of the so-called average reference (Offner, 1950; Bertrand, 1985). Thus, after recording, the average reference can be computed by subtracting the mean of all sensors from each individual site. If the potentials at all body surface points would be known, an average reference transformation would provide a real inactive reference and thus complete reference independency of the EEG. However, EEG recordings according to the traditional international 10/20 system use a
125
comparably small number of electrodes and even the international 10/10 system provides limited coverage of the head surface. Similarly, even though some researchers consider sensor arrays larger than 30 sensors as sufficient to calculate the average reference, the averaged potential across uncovered head surface areas may reveal residual mean activity leading to an inaccuracy of the average reference (Katznelson, 1981; Dien, 1998). Expanded EEG sensor head coverage (Tucker, 1993; Gevins et al., 1994) increasingly approximates the requirements of the average reference and the residual average activity approaches the expected zero potential. Simulation studies show that the step from a 10/20 system to a 128 whole head sensor recording approximately reduces the residual average reference activity by 50%. The improved coverage of inferior frontal, temporal, and occipital regions of 256 sensor arrays provides a further substantial reduction of the residual average reference activity (Jungho¨fer et al., 1999). Residual average reference activity in regions not covered by electrodes can be estimated by extrapolation of the measured potential distribution. The compensation for residual average reference activity can approach a so-called ‘‘infinite’’ reference (Jungho¨fer et al., 1999; Yao, 2001). However, extrapolation of activity in uncovered areas is not unique even if physiological constraints about reasonable field propagation are taken into account and the extrapolation accuracy shrinks dramatically with an increasing integral of uncovered regions. For instance, international 10/20 system recordings are not sufficient for any reasonable extrapolation. Simulation studies demonstrate that a 128-sensor system allows compensation of roughly a third while a 256-sensor system can approximate more than the half of the residual average reference activity (Jungho¨fer et al., 1999). A further mean to achieve a reference-free characterization of ERPs is to calculate the current source density (CSD; Perrin et al., 1987, 1989; Gevins et al. 1991), the negative second spatial derivative of the scalp voltage distribution. As the derivative of a constant value is zero, the CSD extracts the globally constant effect of a reference. The calculation of the CSD as well as
the fundamentally equivalent methods ‘‘Laplacian’’ or ‘‘Cortical Mapping’’ (Jungho¨fer et al., 1997) are mathematically unique transformations compensating for the strong spatial lowpass filtering effect of the head as volume conductor — predominantly the ‘‘blurring’’ effect of the skull. These ‘‘deblurring’’ methods do not demand any a priori constraints or assumptions and are not affected by the ambiguity of the ‘‘inverse problem.’’ Thus, the CSD recommends itself as a reference-independent method to uncover local cortical generator sources and simulation studies demonstrate that progressively more details about cortical potential distribution can be obtained as spatial sampling is increased even beyond 128 channels (Srinivasan et al., 1996). In CSD solutions, a focal generator source is indicated by a sink/source pattern of inward/outward flow of current, which on the other hand reveals a more complicated distribution of multiple inward and outward currents in the case of activation of multiple adjacent generator sources. For instance, Jungho¨fer et al. (2001) used CSD to provide an increased spatial resolution of the early posterior negative (EPN) potential observed in emotion processing. Specifically, using the difference in evoked potentials of subjects viewing emotionally arousing or neutral pictures, the CSD revealed bilateral symmetric sources in occipital areas accompanied by right lateralized twin parietal sink sources (see Schupp et al., this volume). However the goodness of ‘‘deblurring’’ methods heavily depends on a sufficient spatial sampling ratio (Srinivasan et al., 1996; Jungho¨fer et al., 1997) as the sampling needs to meet the constraint of the Nyquist sampling theorem in order to avoid ghost effects consequent upon spatial aliasing. For scalp potential interpolation, CSD computation and ‘‘Cortical Mapping,’’ any choice of Green’s spline functions could be used. However, optimized spline functions can be derived from additional information such as physiological conductivity properties and estimated depth of generator structures as described in Jungho¨fer et al. (1997). A further requirement for applying CSD is a high signal-to-noise ratio because the CSD technique profoundly emphasizes high spatial frequencies and spurious findings may emerge by noisy data.
126
The calculation of inverse distributed source estimations such as the minimum-norm-least-square (MNLS; Hamalainen and Ilmoniemi, 1994) or low-resolution tomography (LORETA; PascualMarqui et al., 1994) provides further methods to achieve a reference-independent characterization of ERP potentials. These models use a large number of distributed test dipoles varying in strength to represent the scalp measured field potentials. The MNLS, as a linear estimation technique, is based on the assumption that the measured scalp potential distribution (U) at each point in time can be described as the product of a so-called leadfield matrix (L), specifying each electrode sensitivity to each of the distributed sources of the model head, and the generator activation (G)-U ¼ LG. In order to estimate the generator distribution G ¼ LU the inverse of the leadfield matrix L has to be multiplied with the measured scalp potential distribution. However, this matrix inversion is only defined if the number of columns (given by the number of sensors in U) and rows (given by the number of sources in G) of L would be identical and L would have maximal rank. With distributed source models, the number of sources by far exceeds the number of sensors, and thus L has to be replaced by the pseudoinverse leadfield matrix L+ leading to G ¼ L+U. In this case, the inverse equation is underdetermined and the inverse problem is ill posed, i.e., the scalp potential may be represented by an infinite number of solutions that could produce the identical measured field potentials. Thus, selection of the ‘‘most realistic’’ solution in distributed source models requires further constraints or criteria. In addition to representing the measured scalp field, the MNLS estimate uses as an additional criteria that the pseudoinverse multiplication is characterized by minimizing the mean power (least square) of the estimated current density of the sources. In contrast, LORETA selects the distributed source solution that is maximally smooth assuming that the higher spatial resolution of other solutions is usually not justified for EEG data. While the most important advantage of distributed source modeling is that these techniques do not depend on assumptions regarding location or number of brain generators, the assumptions introduced by
these methods need to be considered providing limitations for the interpretation of the findings. Distributed source estimations considerably limit the detection of nearby focal sources and provide estimations of rather distributed neural generators. Thus, the nonuniqueness of the inverse estimation requires criteria that do not automatically reveal the correct solution, and, in the absence of independent further support, is probably best viewed in relation to broader anatomical regions rather then specific neural structures. While the spatial high pass filter characteristic of the CSD diminishes the impact of extended potential distributions of deeper neural activities and thus overestimates superficial sources, the MNLS and LORETA tend to explain activities in deeper structures by widely distributed superficial activities, an effect which can be compensated to some extent by depth weighting (Fuchs et al., 1994; Pascual-Marqui et al., 1994). Similar as CSD, inversely distributed source estimates demand a good signal-to-noise ratio because projecting electrophysiological data from two-dimensional (2D) signal space into 3D source space may lead to strongly magnified spatial variance. Taken together, dense sensor arrays provide multiple avenues to achieve reference-independent characterization of electrophysiological recordings. Adequate spatial sampling is requested by these techniques and this may heavily impact the outcome of their application. As each method invokes unique assumptions, converging evidence across different analysis tools and complementary neuroimaging methods is particularly desirable. For instance, the early differential occipital negativity elicited by emotional compared to neutral pictures is revealed by brain maps based on the average reference, CSD sink/source patterns in occipito-temporo-parietal regions, and distributed posterior sources in minimum-norm solutions (cf. Fig. 2; Jungho¨fer et al., 2001; Schupp et al., 2006). Furthermore, fMRI results and electromagnetic recordings provide independent evidence for the increased activation of visual-associative structures by emotional cues amounting to converging evidence across several neuroimaging methods and types of analyses (cf., Figs. 4 and 5; Jungho¨fer et al., 2005a,b, 2006).
127
Sparse sensor sampling and the reference issue As discussed above, sparse sampling of field potentials would seriously violate the assumption underlying the calculation of the average reference (Katznelson, 1981; Dien, 1998). Sparse sampling arrays provide a reference-dependent depiction of brain field potentials because EEG amplifier systems have the requirement to obtain voltage recordings as difference between two locations on the head or body surface. In the past, many researchers hoped to minimize the recording of brain activity from one recording site by choosing positions such as the linked earlobes, mastoids, or nose. The hope to approach monopolar recordings has been dubbed as ‘‘convenient myth’’ (Davidson et al., 2000, p. 33) or ‘‘EEG/ERP folklore’’ (Nunez, 1990, p. 25). Considering that electrical field potentials are volume conducted throughout the head, there is no site on the head surface showing a consistent zero activity across all possible brain sources. Specifically, depending on the location of brain sources, electrodes considered to represent an inactive reference may reveal substantial field potentials varying dynamically across time. To provide a potentially more intuitive analogy, consider temperature measurements obtained as difference between locations and compare effects to absolute (true) values. On a sunny day, the temperature may be 20 1C, 25 1C, and 30 1C in locations A, B, and C. Using A as reference, we measure for B and C temperature of 5 1C and 10 1C, respectively. Using B as reference, we obtain –5 1C and +5 1C for locations A and C. Without information about the reference temperature, absolute temperatures are not achievable. It is for this reason that the choice of reference can have rather dramatic effects on the appearance of ERP recordings and thus, interpretation of ERP findings need to consider the choice of reference. Applying these issues to the field of emotion research, a large number of studies utilized visually presented stimuli and many of these studies relied on the popular linked mastoid reference when studying the processing of emotional pictures (Cuthbert et al., 2000; Schupp et al., 2000; Kemp et al., 2002; Pause et al., 2003; Amrhein et al., 2004; Carretie et al., 2004, 2005), emotional facial
expressions (de Gelder et al., 2002; Holmes et al., 2003, 2005, Pourtois et al., 2004, 2005) or emotional words (Chapman et al., 1980; Bernat et al., 2001; Pauli et al., 2005). Other research utilized dense sensor recordings providing an improved description of the field potentials by calculation of the average reference in emotional picture (Jungho¨fer et al., 2001; Keil et al., 2002, 2005; Schupp et al., 2004a,b, 2006; Stolarova et al., 2006; Flaisch et al., 2005), emotional face (Batty and Taylor, 2003; Schupp et al., 2003a,b; Meeren et al., 2005), or emotional word processing (Skrandies et al., 2003; Ortigue et al., 2004; Herbert et al., 2006; Kissler et al., this volume). In the following, significant effects of the reference choice (average reference vs. linked mastoids) are illustrated using data from a recent study in which subjects viewed a continuous stream of emotionally arousing and neutral pictures, each presented for 1 s (Jungho¨fer et al., 2003). Figure 1 illustrates the time course of anterior, posterior, and right inferior-lateral scalp potential activity using mid-frontal, mid-occipital, and occipitotemporal electrode sites. On the basis of the average reference, the occipital sensor (Fig. 1c) revealed a more negative potential for subjects viewing emotional compared to neutral cues. The relative negative difference component appeared sizable, developed with the falling slope of the P100 and was maximally pronounced around 220 ms. Polarity reversal was observed over anterior sensor sites as illustrated for a frontal sensor (Fig. 1a). Specifically, emotional pictures were associated with enhanced positivity compared to neutral items. A much different pattern of results emerged for the linked mastoid reference. Of most relevance, the relative difference potential ‘‘Emotional minus Neutral’’ appears small at the occipital sensor (Fig. 1d). Furthermore, frontal positivity effects of the difference potential are greatly amplified compared to the average reference recordings (Fig. 1b). These differences in the appearance of the effects of differential emotion processing are easy to explain: Mastoid sensors are most sensitive to generator sources in occipito-temporal brain regions engaged during processing of visual cues. Consistent with this notion, pronounced ERP activity for the right
128
Fig. 1. While subjects viewed a continuous stream of emotionally arousing and neutral pictures, each presented for 1 s, ERPs were measured with a 128-sensor whole head EEG system. On the basis of an average reference (left row), emotional pictures evoke a relative negative ERP difference component (EPN) over occipital and occipito-temporal regions starting with the falling slope of the P100 and finding its maximum around 225 ms at occipital (c) and roughly 250 ms at occipito-temporal (mastoid) sites (e). If referenced to linked mastoids (right row) the strong negative difference component at the occipito-temporal mastoid sites is subtracted from all other electrodes strongly diminishing the posterior negativity at occipital leads (d) and significantly increasing the corresponding positive potential differences at frontal sites (b). The spreading brain activation from occipital to temporal areas leads to artificial latency shifts between posterior negativities and frontal positivities if referenced to linked mastoids.
mastoid site is revealed with the average reference (similar left mastoid activity is omitted for brevity). In contrast, the mean activity of both mastoids gets subtracted from all other sensors when using a linked mastoid reference, leading to a constant zero potential at the mastoids (Fig. 1e).
Brain potential maps (Fig. 2) further detail the effects of linked mastoids on the appearance of the ERP difference of emotional and neutral cues. Obviously, while the spatial characteristics of the potential distribution remain unaffected by choice of reference, the absolute magnitude of the
129
Fig. 2. Reference dependency of the EPN potential distribution at its maximum around 225 ms: The linked mastoid reference artificially converts the posterior (a) and right hemispheric dominant (c) EPN into an anterior (b) positivity without clear laterality effects (d). If based on the scalp potential distribution only, interpretations with respect to underlying generator structures are difficult. Additional information from reference-independent analysis techniques like the CSD (e) or the minimum-norm-least-square (f) or other neuroimaging methods such as fMRI (see Figs. 4 and 5) is needed.
potential characteristic depends entirely on the chosen reference location. In effect, with a mastoid reference (Fig. 2b) the ERP correlate of differential processing over distributed temporo-occipital areas is strongly suppressed if compared to the average reference (Fig. 2a), which in contrast reveals a pronounced differential occipital negativity for emotional pictures compared to neutral pictures. In addition to such quite obvious spatial effects, the choice of reference may also have more subtle
effects on the latency at which condition effects appear. For instance, on the basis of linked mastoid reference, latency differences between earlier negative posterior and later positive anterior potential differences in visual processing have been considered as evidence to reflect distinct generators in anterior/posterior brain regions (e.g. Bentin et al., 1996; Eimer, 2000). Similarly, in the example at hand, peak latency of emotional discrimination appears also to be distinctly later for the fronto-central positivity (around 245 ms;
130
Fig. 1b) compared to the residual occipital negativity (around 220 ms; Fig. 1d) if applying a linked mastoid reference. However, the different peak latencies might be attributed to electrical activity generated in occipito-temporal brain regions picked up by the mastoid sensors. Possibly reflecting the spread of brain activation from occipital to occipito-temporal brain regions, peak latency of emotional discrimination appears earlier at occipital sites (225 ms; Fig. 1c) and roughly 20 ms later at occipito-temporal electrodes such as the right mastoid (245 ms; Fig. 1e). Thus, neglecting the reference issue and the principle that voltage differences reflect the difference between bipolar sensor sites, the mastoid reference is suggestive of anterior/posterior latency differences by pushing peaks at sites more posterior than the mastoids to earlier latencies and peaks at more anterior sites to later latencies. In contrast, average reference montage provides a rather different view suggesting that mid-occipital and lateral occipital sensor sites have different peak latencies with regard to emotion discrimination. It is important to note that this example illustrates distinct latency effects for different reference choices only and should of course not be taken as evidence against early affect-driven difference activities in frontal brain areas. In addition to latency effects and spatial effects in posterior–anterior direction, the choice of reference may also affect the appearance of hemispheric differences in emotion processing. With respect to affective processing of visual stimuli, the occipito-temporal areas of the right hemisphere often showed stronger effects of motivated attention for emotional pictures (Jungho¨fer et al., 2001) and faces (Schupp et al., 2004b) while the left hemisphere showed dominance for the processing of emotional words (Kissler et al., this volme). As shown in Fig. 2c, the right hemispheric dominance for emotional picture processing can be readily observed in the posterior negativity with average reference recording. However, forcing effects away from the occipito-temporal references, linked mastoids appear to sandwich difference effects from both sides to the frontal midline direction as illustrated by the lateral symmetric anterior positivity (Fig. 2d). Thus, the attenuated posterior
negative difference potential observed for the linked mastoid reference may also affect findings of hemispheric dominance. Summary ERP recordings are increasingly used to reveal the brain dynamics in emotion processing. Dense sensor arrays provide multiple avenues to achieve the reference-independent characterization of field potentials and considerably improve the estimate of cortical brain sources. Caveats specific to each of these analyses need to be considered as well as the general principle that ERP field potentials reflect the superimposition of all active brain sources. However, limitations and shortcomings not withstanding, inference of brain sources appears possible and reasonable in particular when converging evidence is provided by other neuroimaging methods. The use of sparse sensor arrays imposes limitations with regard to the assessment of surface potentials. Active brain sites cannot be inferred and the reference issue needs to be considered when interpreting data and comparing across studies. Event-related magnetic fields: effects of sensor standardization The MEG detects weak magnetic fields generated by the flow of intracellular postsynaptic currents (Williamson and Kaufman, 1981; Hari and Forss, 1999) of pyramidal cells, which constitute two thirds of the neurons of the cerebral cortex (Creutzfeldt, 1995). It has been estimated that a small area of about 40 mm2 including tens of thousand synchronously active neurons can yield a net dipole moment of about 10 nAm, which is strong enough to be detected extracranially by MEG (Hamalainen, 1993). While the EEG measures the electric potential of the secondary currents, the MEG measures the additive overlay of the weak magnetic fields of both primary and volume currents (Sarvas, 1987). MEG is principally sensitive to sources that are oriented tangentially to the skull, and much less sensitive to those oriented radially. Hence, MEG is mainly constrained
131
to cortical areas that are bounded in the walls of fissural cortex and the amplitude of the measured MEG signal decreases rapidly as the source depth increases. MEG measures provide a reference-independent characterization of magnetic fields. Furthermore, being differentially sensitive to tangential and radial generator orientations, MEG provides a different view on neural generator sources compared to EEG. Combining EEG and MEG measures may therefore provide complementary evidence of neuronal emotion processing. However, results of ERP and ERF studies are difficult to compare because MEG analysis is traditionally predominantly performed in the generator source space while ERP analyses is most often based on measured sensor space subsequently extended to source space. The main reason for these different approaches is that a standardized alignment of magnetic field sensors in MEG is almost unachievable as sensor positions are fixed within the MEG scanner and can thus not be adjusted for individual head sizes and head shapes. In contrast, sensor positioning in EEG is usually standardized with respect to head landmarks (nasion, inion, vertex, and mastoids). Thus, standardized sensor positioning across different subjects is almost impossible for the MEG and even exact repositioning of a subject in the MEG scanner for a second measure is much harder to achieve compared to the EEG. Consequently, to allow within subject comparisons across different sessions, comparisons across different subjects, or comparisons across different MEG systems (e.g. magnetometer vs. gradiometer system), extrapolation of the individually measured fields onto a standard or target sensor configuration would be necessary. Techniques for such MEG sensor standardization have been developed and recommended (Ha¨ma¨la¨inen, 1992; Numminen et al., 1995; Burghoff et al., 2000; Kno¨sche, 2002) but, until now, did not find acceptance as standard application in MEG research. For instance, neither MEG manufacturer data analysis software (4D-Neuroimaging, VSM Medtech Ltd. or Elekta Neuromag Oy) nor the most popular commercial EEG/MEG analysis software programs (e.g., BESAs or CURRYs) include MEG sensor standardization techniques.
One reason that MEG sensor standardization is not routinely employed may be that MEG research often has the goal to use inverse modeling to localize brain activity. The application of inverse methods, like multiple equivalent current dipoles (ECD; Brazier, 1949; de Munck et al., 1988) or distributed source estimations like MNLS (Ha¨ma¨la¨inen and Ilmoniemi, 1994) or LORETA (Pascual-Marqui et al., 1994), is based on individual head-sensor configuration without necessity of sensor standardization across sessions or individuals. However, except for special cases where the number of underlying sources involved in neural processing (ECDs) is fairly well known (Scherg and von Cramon, 1985) inverse source estimation is not unique, i.e., different generator distributions can lead to identical magnetic field measures (Helmholtz, 1853). The nonuniqueness of the inverse estimation, as well as the substantial enlargement of the signal space with transformation from the 2D ‘‘sensor space’’ into the 3D ‘‘source space’’ adds significant spatial variance and might eventually decrease statistical power compared to the analysis in the sensor space. Consequently and similar to the EEG, statistical analysis in MEG sensor space can be favorable if temporal aspects are the main focus of interest and regional localization of neural activities would be sufficient. Moreover, due to (i) the better signal-to-noise ratio, (ii) the undisturbed DC coupling, (iii) the reference independency, and (iv) the higher spatial resolution because of the less blurred topographies, MEG analysis in the sensor space may still offer some advantages compared to the EEG sensor space analysis. Furthermore, procedures for MEG sensor standardization have been already validated on the basis of simulated data or phantom measurements (Kno¨sche, 2002). MEG sensor standardization is based on the principle that a magnetic field distribution measured from some distance — the target sensor configuration — can be determined uniquely from a magnetic field distribution known at all sites of a closed surface enclosing all neural generators (Yamashita, 1982; Gonzalez et al., 1991). However, in reality the individual field is not known at all points of a closed surface but only at some points (the magnetometer or gradiometer positions) of a
132
partial surface and thus, the accuracy of field extrapolation depends on an adequate spatial sampling (density of sensors), a sufficient coverage of the magnetic fields generated by neural activity and adequate extrapolation functions. Demands for dense sensor coverage are even higher in MEG compared to EEG because magnetic fields usually reveal higher spatial frequencies compared to the EEG, mainly a consequence of the low conductivity of the cranial bone strongly affecting EEG recordings. However, modern whole head MEG systems provide an adequate spatial sampling, as well as sufficient head coverage (Kno¨sche, 2002). The extrapolation of the 2D ‘‘sensor space’’ onto a standard sensor configuration (forward solution) poses similar problems as the ‘‘inverse solution’’ in source modeling. In the same way as inverse methods try to minimize ambiguities by application of physiological constraints — e.g. information about reasonable source locations in the gray matter (Nunez, 1990) and orientations perpendicular to the gyri (Lorente de No, 1938) — extrapolation functions should take into account corresponding physiological constraints with regard to reasonable magnetic field propagation. An optimal extrapolation should thus estimate the ‘‘most reasonable’’ neural generator distribution explaining the magnetic field topography measured at an individual sensor configuration and use forward modeling to reveal the corresponding magnetic field at the standardized target positions. Similar to inverse modeling, different techniques to estimate neural sources (e.g., MNLS, LORETA) provide somewhat differing inverse solutions, extending in this case to differing extrapolations in the standardized target space. However, the important point is that the extrapolated 2D magnetic field distributions based on differing inverse solutions reveal much less variance compared to the variance in the 3D source space. Thus, estimation of the magnetic field in standardized sensor space may have increased statistical power compared to the analyses in ‘‘source space.’’ To illustrate the effects of MEG analyses with and without sensor standardization, we used whole head MEG data (275 channels) obtained during a classical conditioning experiment (Jungho¨fer et al., 2005a). Sensor standardization was achieved by the application of inverse/forward
sensor extrapolations based on inverse MNLS estimations, which has been suggested as sensor standardization procedure in magnetocardiography (MCG) by Numminen et al. (1995) (see Kno¨sche, 2002 for a similar procedure). In the alternative stream, data analysis was achieved without compensation for sensor positioning similar to previous MEG studies (Costa et al., 2003; Susac et al., 2004). Furthermore, to compare statistical power, the original sample (n ¼ 24) was reduced to either 18 or 12 subjects. Figure 3 illustrates the statistical effects comparing CS+ and CS– (CS — conditioned stimulus) processing with paired ttests for the auditory N1 peak (95–125 ms) and the original and reduced samples. Areas with statistically significant (po0.01) enhanced outgoing magnetic fields for CS+ compared to CS stimulus processing are marked by ‘‘+’’ signs and areas with significantly enhanced ingoing fields are indicated by ‘‘‘‘signs, respectively. The statistical maps reflect a tangential N1 dipole field with negative ingoing fields over centro-parietal regions and positive outgoing magnetic fields over inferior fronto-temporal regions. Obviously, sensor standardization of the same data is reflected by increased statistically significant sensor areas to detect classical conditioning effects of the N1 peak and the advantages of sensor standardization are distinctly more pronounced for smaller subject samples. In addition, although superficially similar, topographic differences emerge for the original sample in the topography of the effects. Sensor standardization is superior to detect conditioning effects over inferior temporal regions. Without sensor standardization, the ‘‘effective sensor coverage’’ is limited to areas covered by the majority of subjects. These data suggest that MEG sensor standardization should be considered as standard routine in MEG sensor space analyses. Statistical power and ‘‘effective sensor coverage’’ was increased in the present example and there are reasons to suspect that standardization effects were rather underestimated in the present study. Specifically, effects of sensor standardization may become significantly stronger with MEG recordings using less sensors and smaller coverage compared to the 275 whole head VSM Medtech Ltd. system. Advantages are
133
Fig. 3. Effects of sensor standardization for MEG sensor space data analysis. Statistical effects of paired t-tests comparing the processing of affectively conditioned CS+ and CS tones with a 275-sensor whole head MEG are shown. Areas with statistically significant (po0.01) enhanced outgoing (+) and ingoing () magnetic fields of the auditory N1 peak (95—125 ms) are projected onto a model head. Sensor standardization leads to increased statistically significant sensor areas and widens the effective sensor coverage, in particular over inferior temporal regions. The advantages of sensor standardization are distinctly more pronounced for smaller subject samples.
not limited to the increased statistical power and ‘‘effective sensor coverage.’’ Sensor standardization is a prerequisite for averaging across subjects or sessions in EEG and MEG research to avoid ghost effects as pure consequence of inconsistent sensor configuration. Furthermore, sensor standardization may facilitate comparison of MEG waveforms across studies.
about reasonable magnetic field propagation provide a high goodness of extrapolation if they are based on adequate sensor density and sufficient head coverage. Sensor standardization significantly improves effect sizes in typical neuroscientific investigations, avoids the finding of ghost effects, and facilitates comparison of MEG waveforms across studies.
Summary
Functional magnetic resonance imaging: effects of proportional global signal scaling
As a consequence of both the nonuniqueness of the inverse solution as well as the higher dimensionality of the ‘‘source’’ compared to the ‘‘sensor space,’’ statistical analysis in MEG ‘‘sensor space’’ can be favorable. However, sensor standardization is a necessary prerequisite for such an analysis. Three-dimensional extrapolation functions taking into account physiological constraints
ERP and ERF measures provide excellent temporal resolution but are limited with respect to determining the exact location of brain generators. In contrast, fMRI (Bandettini et al., 1992; Kwong et al., 1992; Ogawa et al., 1992) is an important method for determining the location of activated brain regions during emotion processing with
134
much better spatial resolution. This method provides an indirect assessment of brain activity by measuring local concentration variations of oxygenated hemoglobin in the blood presumed to be mostly induced by synaptic events of neuronal activity (Logothetis et al., 2001; Raichle, 2001). However, utilizing this method in emotion research poses important challenges. Limbic and paralimbic target structures are notoriously difficult to assess and often require specific protocols. For instance, sampling of the amygdala is improved by acquiring thin coronal slices that minimize signal loss due to susceptibility artifacts (Ojemann et al., 1997; Merboldt et al., 2001). Similarly, a number of recommendations for an optimal assessment of blood oxygen level dependent (BOLD) activity in the orbitofrontal cortex also suffering from strong susceptibility artifacts are outlined by Kringelbach and Rolls (2004). In addition to such challenges in the assessment of neural target structures, methodological issues in data analysis may be also of concern when studying emotional processes. A still unresolved controversy is the use of proportional global signal scaling (PGSS) of BOLD signals in fMRI analysis. Global variations of BOLD-fMRI signal are changes common to the entire brain volume and have been considered to reflect background activity rather than signal changes related to experimental manipulations (Ramsay et al., 1993). Consequently, global variations of the BOLD signal are commonly considered as nuisance effects, contributing unwanted sources of variance such as hardware scanner drifts, physiological movements, or pulsations. While agreeing on the fundamental necessity of global signal correction, discussion centered on the proper method of normalization, identifying global signal changes as either ‘‘additive’’ or ‘‘multiplicative’’ compared to the regional effects of interest (Fox et al., 1988; Friston et al., 1990; Arndt et al., 1996). However, normalization of global signal changes might turn into a confound, i.e. significantly changing the outcome of the analysis when the global signal is not orthogonal to the experimental paradigm (Andersson, 1997). We recently explored reservations regarding the use of PGSS in the domain of emotion research
comparing emotional and neutral pictures taken from the International Affective Picture System (IAPS, Lang et al., 2005). One specific concern was to consider the effects of emotional intensity. As suspected, the strong and distributed BOLD activations elicited by high-arousing emotional contents dominated the global signal variance violating the orthogonality assumption of global signal and experimental condition for high arousing emotional materials but not for low-arousing emotional contents. In line with previous reports (cf. Aguirre et al., 1998), this violation of the orthogonality assumption resulted in widely differing outcomes comparing two streams of fMRI analysis with and without global proportional signal scaling: As shown in Fig. 4, the unjustified application of proportional global signal scaling (PGSS) leads to an attenuated effect of emotional activation in structures with a positive correlation of local and global BOLD signal (‘‘activation’’; indicated by reddish colors). Omitting global proportional signal scaling, structures associated with pronounced BOLD signal activations when contrasting high-arousing and neutral picture conditions were apparent in uni- and heteromodal sensory processing areas in the occipital, parietal, and temporal lobe. Invariably, although still significant, application of PGSS was reflected by reduced effect sizes and cluster volumes. However, both streams of analysis also differed qualitatively, i.e., revealing significant findings in the analyses without PGSS, when focusing on structures with moderate effect sizes in the parieto-temporo-occipital cortex as well as limbic and paralimbic structures. Focusing on ‘‘deactivations,’’ the use of global signal covariates augmented effects in structures with a negative correlation of emotional arousal and BOLD signal (indicated by bluish colors in Fig. 4a). Specifically, areas revealing significant deactivation in the analysis without PGSS, such as parietal, occipital, and temporal structures, revealed enlarged effect sizes and cluster volumes in the PGSS analysis. Furthermore, the PGSS analysis revealed significant deactivations not observed when omitting global signal covariates, especially left parietal and right hemispheric subcortical structures. As can be seen in Fig. 4b, analysis with and without PGSS appeared similar
135
Fig. 4. Effects of MRI proportional global signal scaling (PGSS) in emotion research (random effect analysis; 21 subjects; k ¼ 20; DF ¼ 20). Significant BOLD signal differences contrasting high-arousing emotional (a) or low-arousing emotional (b) with neutral picture conditions are shown. Reddish colors indicate a stronger BOLD signal for the emotional stimuli compared to the neutral stimuli; bluish colors indicate stronger activations for neutral conditions compared to emotional picture conditions. The unjustified application of PGSS greatly distorts statistical findings in key structures implicated in emotional processing and might contribute to conflicting results in affective neuroscience fMRI studies, in particular with respect to limbic and paralimbic structures.
when comparing low-arousing emotional and neutral picture contents restricting the detrimental effects of unjustified global scaling to emotional stimuli of high intensity. Emotional stimuli have been associated with activations in limbic and paralimbic structures (Morris et al., 1998; Vuilleumier et al., 2001; Hamann and Mao, 2002; Pessoa et al., 2002). However, results were rather inconsistent regarding the
emotion-driven activation of these structures across studies (Davis and Whalen, 2001; Phan et al., 2002). The present data suggest that the unjustified use of global signal scaling might contribute to inconsistencies in findings. In contrast to the analysis omitting global scaling, the PGSS-based analysis failed to reveal significant activations in limbic and paralimbic regions for high-arousing emotional materials. It seems noteworthy that this
136
effect was apparent for the amygdala, a key structure of emotional processing in current emotion theories (LeDoux et al., 2000; Lang et al., 2000; Lang and Davis, this volume). Thus, the unjustified application of global signal covariates appears to be a strong confound with regard to limbic and paralimbic structures, possibly contributing to inconsistencies in the literature. It appears likely that the untested use of global signal covariates impedes the generalization across studies, subject groups, and stimulus materials (Davis and Whalen, 2001; Phan et al., 2002; Wager et al., 2003). Particularly troublesome errors of interpretation might arise when comparing results of experimental conditions with and without significant global signal correlation and thus, this information should be provided in publications (Aguirre et al., 1998). Summary Taken together, this study demonstrated that the concerns and precautions questioning the standard use of PGSS in the cognitive domain also apply to emotion research. The unjustified application of PGSS in emotion research greatly distorts statistical findings in key structures implicated in emotional processing and might contribute to conflicting results in affective neuroscience fMRI studies, in particular with respect to limbic and paralimbic structures. Reiterating Aguirre et al. (1998), it is recommended to report the correlation of global signal and experimental condition when using PGSS and omit this confound in cases where the global signal and experimental condition show a significant relationship. Distributed EEG/MEG source analysis with statistical parametric mapping A current challenge for progress to determine the neural underpinnings of emotion processing is to integrate ERP/ERF and fMRI measures. In the present example, we provide an approach to analyze ERP/ERF data (Jungho¨fer et al., 2005a) that allows application of the statistical parametric mapping toolbox of SPM used for functional
imaging (Friston et al., 1995). Please note that a similar approach will be implemented in future releases of SPM (www.fil.ion.ucl.ac.uk/spm). The idea behind this approach is that the distributed neural activity generating measurable scalp potentials or magnetic fields can be estimated by inverse distributed source methods like MNLS (Hamalainen and Ilmoniemi, 1984), LORETA (PascualMarqui et al., 1994), sLORETA (Pascual-Marqui, 2002; Liu et al., 2005), or even Beamformer techniques (Van Veen et al., 1997; Robinson and Vrba, 1999; Sekihara et al., 2001; Huang et al., 2004) and the resulting 3D volumes of estimated neural activity can be treated and analyzed in a similar fashion as fMRI volumes of BOLD activity. Thus, event-related single trial EEG or MEG data can be averaged across all time samples in an interval of interest, the underlying generator activity of this time interval can be estimated using inverse distributed source modeling, and these functional volumes of brain activity can then be submitted to the parametric analysis of SPM. The number of epochs in EEG or MEG studies does often outnumber the number of functional scans in fMRI examinations. However, the ratio is usually modest and state-of-the-art workstations can easily process this amount of data in reasonable time. In order to demonstrate the practicability of this method, we will present a brief example where MEG data of 16 subjects who viewed a rapid serial visual presentation (RSVP; 333 ms per picture; no ISI) of 200 alternating aversive and neutral pictures (IAPS; Lang et al., 1999) were reanalyzed with this suggested method. As a first step, the event-related fields were edited on the basis of a method for statistical control of artifacts in highdensity EEG/MEG measures (Jungho¨fer et al., 2000) which includes the extraction of globally noise contaminated epochs and interpolation of regional contaminations within trials. In a second step, the Tihkonov regularized MNLS with a regularization value of l ¼ 0.05 was calculated for the mean magnetic field in the EPN time interval (150—230 ms; see Fig. 1) for each artifact corrected epoch. As recommended by Hauk et al. (2002), a distributed source model with four concentric shells and evenly distributed sources in both tangential directions (2 350, 2 197,
137
2 87, and 2 21 dipoles per shell, no radial direction in MEG) was used. The 3D MNLS volumes of estimated neural source activity were than stored in a 3D voxel based file format (ANALYZE) with a field of view covering the whole inverse model head (51 51 51 voxel) and an isotropic voxel size of 4 mm (adapted to the smallest spatial distance of neighboring sources). In the following step, the functional MNLS volumes were spatially filtered by SPM (Friston et al., 1995) with an isotropic 20 mm full width maximum Gaussian kernel. The motivation for this spatial lowpass filtering of the MNLS estimations is the same as in fMRI data smoothing: The application of a Gaussian random field theory (as applied in SPM) is based on smooth Gaussian fields and smoothing will normalize the error distribution and ensure the validity of statistical tests. With respect to across-subject random effects analysis, smoothing allows some compensation for functional and anatomical differences between subjects. Afterwards, the experimental conditions were analyzed using the statistical parametric mapping toolbox SPM. Within this procedure, conditions were modeled with boxcar functions and condition effects were estimated according to the general linear model while the specific effect of interest (aversive versus neutral picture perception) was compared using a linear contrast. As second-level analysis, across-subject random effects analysis was performed on this contrast of interest. With respect to the visualization of the final calculated statistical parametric map, the spherical model head was coregistered onto the Montreal Neurological Institute (MNI) standard brain and functional results were superposed on it using the freely available software tool MRIcro (www.sph.sc.edu/comd/rorden/mricro). In the posterior regions of interest, the goodness of coregistration was high, but it was unsatisfying in anterior regions. Thus, regional spheres, realistic boundary element (BE; Cuffin, 1996), or finite element (FE; Weinstein et al., 2000) head models may be needed to explore activations in prefrontal regions. The results obtained with this analysis are illustrated in Fig. 5, contrasting the differential processing of aversive compared to neutral pictures in the
early posterior negativity (EPN-M) time interval (150–230 ms) and, for comparison, the hemodynamic correlate (BOLD) of the corresponding contrast (aversive vs. neutral picture processing; Jungho¨fer et al., 2005b). Although the same stimuli were used in both studies and picture presentation rate was identical, the comparison of both effects is somehow limited since different subjects have been investigated and even more important, different paradigms have been applied — block design in fMRI and alternating design in MEG. However, considering these methodological limitations, results demonstrate converging findings across both methods with respect to posterior brain regions. Both methods suggest that neural activity is elicited as function of emotional arousal in bilateral occipito-temporal and occipito-parietal areas, somewhat more pronounced on right hemispheric regions. In contrast, prefrontal activations revealed by fMRI are not readily apparent in MEG results which might be explained by the fact that the chosen isotropic spherical head model is less appropriate for prefrontal areas revealing strong conductivity inhomogenities and anisotropies (Wolters et al., 2006). Of course, due to the limited depth resolution of the MEG and the MNLS bias towards superficial sources, the MEG analysis did not reveal activations in subcortical structures like thalamus, hippocampus, or amygdale, which thus appear in the fMRI analysis only. A strong advantage of the statistical parametric mapping of ERP/ERF data is the possible consideration of additional covariates of interest like behavioral or autonomic measures. Summary Distributed EEG/MEG source analysis with statistical parametric mapping is one approach that uses common software routines across different neuroimaging measures. Applying the same analysis routines across fMRI, MEG, and EEG may help to demonstrate the convergence across measures. Furthermore, data processing of EEG/MEG measures using source localization procedures are usually performed within subject-related coordinate systems. Even if source localization is performed in standardized coordinate systems, the
138
Fig. 5. Distributed EEG/MEG source analysis with statistical parametric mapping reveals convergent effects of increased blood flow and increased MEG source activity in the EPN time interval (150–230 ms) in subjects observing emotionally stimulating and emotionally neutral images.
interindividual variability of the ‘‘source space’’ can be considered to influence and bias results to some extend. This is even more important if results over subjects and/or different groups of subjects are compared. The use of SPM together with source analysis procedures opens the gate to nonlinear subject normalization methods — a widely used procedure in fMRI analysis — and therefore can be expected to significantly decrease spatial variance across subjects.
our view, progress in the field of affective neuroscience is facilitated by the improvement and standardization of data acquisition and analysis, in particular regarding the goal to compare findings across studies. To achieve consistent and replicable patterns of the relationship between emotion and neuroimaging measures, methodological aspects associated with the various neuroimaging techniques may be similarly important as the definition of emotional cues and task context used to study emotion (Bradley, 2000).
Conclusion Abbreviations A current goal of affective neuroscience is to reveal the relationship between emotion and dynamic brain activity in specific neural circuits (LeDoux, 2000; Lang and Davis, this volume). In humans, noninvasive neuroimaging measures are of primary interest in this endeavor. The review raised specific methodological issues, which we encountered in our research, while neglecting many other issues deserving similar consideration. However, in
BOLD CS CSD DC ECD EEG EPN EPN-M
blood oxygen level dependent conditioned stimulus current source density direct current equivalent current dipoles electroencephalography early posterior negativity early posterior negativity (meg)
139
ERF ERP FMRI IAPS LORETA MCG MEG MNLS PGSS RSVP
event-related magnetic field event-related potential functional magnetic resonance imaging international affective picture system low-resolution tomography magnetocardiography magnetoencephalography minimum-norm-least-square proportional global signal scaling rapid serial visual presentation
Acknowledgments This work was supported by the German Research Foundation (DFG) Grants FOR 348/3-3, 348/3-4 and by the Academy of Science, Heidelberg (WINProject). Address reprint requests to Markus Jungho¨fer; Institute for Biosignalanalysis, University of Mu¨nster, Malmedyweg 15, 48149 Mu¨nster; Germany.
References Aguirre, G.K., Zarahn, E. and D’Esposito, M. (1998) The inferential impact of global signal covariates in functional neuroimaging analyses. Neuroimage, 8: 302–306. Amrhein, C., Muhlberger, A., Pauli, P. and Wiedemann, G. (2004) Modulation of event-related brain potentials during affective picture processing: a complement to startle reflex and skin conductance response? Int. J. Psychophysiol., 54(3): 231–240. Andersson, J.L. (1997) How to estimate global activity independent of changes in local activity. Neuroimage, 6: 237–244. Arndt, S., Cizadlo, T., O’Leary, D., Gold, S. and Andreasen, N.C. (1996) Normalizing counts and cerebral blood flow intensity in functional imaging studies of the human brain. Neuroimage, 3: 175–184. Bandettini, P.A., Wong, E.C., Hinks, R.S., Tikofsky, R.S. and Hyde, J.S. (1992) Time course EPI of human brain function during task activation. Magn. Reson. Med., 25: 390–397. Batty, M. and Taylor, M.J. (2003) Early processing of the six basic facial emotional expressions. Brain Res. Cogn. Brain Res., 17(3): 613–620. Bentin, S., Allison, T., Puce, A., Perez, A. and McCarthy, G. (1996) Electrophysiological studies of face perception in humans. J. Cogn. Neurosci., 8: 551–565. Berger, H. (1929) U¨ber das Elektroenkephalogramm des Menschen. Arch. Psychiat, Nervenkr., 87: 527–570.
Blamire, A.M., Ogawa, S., Ugurbil, K., Rothman, D., McCarthy, G., Ellermann, J.M., Hyder, F., Rattner, Z. and Shulman, R.G. (1992) Dynamic mapping of the human visual cortex by high-speed magnetic resonance imaging. Proc. Natl. Acad. Sci. USA, 15; 89(22): 11069–11073. Bernat, E., Bunce, S. and Shevrin, H. (2001) Event-related brain potentials differentiate positive and negative mood adjectives during both supraliminal and subliminal visual processing. Int. J. Psychophysiol., 42(1): 11–34. Bertrand, O., Perrin, F. and Pernier, J. (1985) A theoretical justification of the average reference in topographic evoked potential studies. Electroencephalogr. Clin. Neurophysiol., 62(6): 462–464. Bradley, M.M. (2000) Emotion and motivation. In: Cacioppo, J.T., Tassinary, L.G. and Berntson, G. (Eds.), Handbook of Psychophysiology. Cambridge University Press, New York, pp. 602–642. Brazier, M.A.B. (1949) A study of the electric field at the surface of the head. Electroenceph. Clin. Neurophysiol., 2: 38–52. Burghoff, M., Nenonen, J., Trahms, L. and Katila, T. (2000) Conversion of magnetocardiographic recordings between two different multichannel SQUID devices. IEEE Trans. Biomed. Eng., 47(7): 869–875. Carretie, L., Hinojosa, J.A., Martın-Loeches, M., Mercado, F. and Tapia, M. (2004) Automatic attention to emotional stimuli: neural correlates. Hum. Brain Mapp., 22: 290–299. Carretie, L., Hinojosa, J.A., Mercado, F. and Tapia, M. (2005) Cortical response to subjectively unconscious danger. Neuroimage, 24(3): 615–623. Caton, R. (1875) The electric currents of the brain. Br. Med. J., 2: 278. Chapman, R.M., McCrary, J.W., Chapman, J.A. and Martin, J.K. (1980) Behavioral and neural analyses of connotative meaning: word classes and rating scales. Brain Lang., 11(2): 319–339. Coles, M., Gratton, G. and Fabiani, M. (1990) Event related brain potentials. In: Cacioppo, J. and Tassinary, L. (Eds.), Principles of Psychophysiology. Cambridge University Press, Cambridge, New York, Port Chester, Melbourne, Sydney. Costa, M., Braun, C. and Birbaumer, N. (2003) Gender differences in response to pictures of nudes: a magnetoencephalographic study. Biol. Psychol., 63(2): 129–147. Creutzfeld, O., Watanabe, S. and Lux, H. (1966) Relations between EEG phenomena and potentials of single cortical cells. I. Evoked responses after thalamic and epicortical stimulations. Electroenceph. Clin. Neurophysiol., 20: 1–28. Cuffin, B.N. (1996) EEG localization accuracy improvements using realistically shaped head models. IEEE Trans. Biomed. Eng., 43(3): 299–303. Cuthbert, B.N., Schupp, H.T., Bradley, M.M., Birbaumer, N. and Lang, P.J. (2000) Brain potentials in affective picture processing: covariation with autonomic arousal and affective report. Biol. Psychol., 52: 95–111. Davidson, R.J., Jackson, D.C., Larson, C.L. (2000) Human electroencephalography. In: Cacioppo, J.T., Tassinary, L.G. and Berntson, G.G. (Eds.), Handbook of psychophysiology. 2nd ed., Cambridge University Press, New York, NY, USA. 27–52.
140 Davis, M. and Whalen, P.J. (2001) The amygdala: vigilance and emotion. Mol. Psychiatr., 6: 13–34. de Gelder, B., Pourtois, G. and Weiskrantz, L. (2002) Fear recognition in the voice is modulated by unconsciously recognized facial expressions but not by unconsciously recognized affective pictures. Proc. Natl. Acad. Sci. USA, 99(6): 4121–4126. de Munck, J.C., van Dijk, B.W. and Spekreijse, H. (1988) Mathematical dipoles are adequate to describe realistic generators of human brain activity. IEEE Trans. Biomed. Eng., 35(11): 960–966. Debener, S., Ullsperger, M., Siegel, M., Fiehler, K., von Cramon, D.Y. and Engel, A.K. (2005) Trial-by-trial coupling of concurrent electroencephalogram and functional magnetic resonance imaging identifies the dynamics of performance monitoring. J. Neurosci., 25(50): 1730–1737. Dien, J. (1998) Issues in the application of the average reference: review, critiques, and recommendations. Behav. Res. Meth., Instrum. Comput., 30: 34–43. Eimer, M. (2000) Effects of face inversion on the structural encoding and recognition of faces -evidence from event-related brain potentials. Cogn. Brain Res., 10: 145–158. Flaisch, T., Jungho¨fer, M. and Schupp, H. (2005) Motivational priming and emotional perception: an ERP-study. J. Psychophysiol., 19(2): 115–115. Fox, P.T., Mintun, M.A., Reiman, E.M. and Raichle, M.E. (1988) Enhanced detection of focal brain responses using intersubjects and change distribution analysis of subtracted PET images. J. Cereb. Blood Flow Metab., 8: 642–653. Friston, K.J., Frith, C.D., Liddle, P.F., Dolan, R.J., Lammertsma, A.A. and Frackowiak, R.S. (1990) The relationship between global and local changes in PET scans. J. Cereb. Blood Flow Metab., 10: 458–466. Friston, K.J., Holmes, A.P., Worsley, K.J., Poline, J.P., Frith, C.D. and Frackowiak, R.S. (1995) Statistical parametric maps in functional imaging: a general linear approach human brain mapping. Hum. Brain Mapp., 2: 189–210. Fuchs, M., Wagner, M. and Wischmann, H.A. (1994). Generalized minimum norm least squares reconstruction algorithms. In: Skrandies W. (Ed.), ISBET Newsletter No. 5. pp. 8–11. Gevins, A., Jian, L., Brickett, P., Reutter, B. and Desmond, J. (1991) Seeing through the skull: advanced EEGs accurately measure cortical activity from the scalp. Brain Topogr., 4: 125–131. Gevins, A., Le, J., Martin, N.K., Brickett, P., Desmond, J. and Reutter, B. (1994) High resolution EEG: 124-channel recording, spatial deblurring and MRI integration methods. Electroenceph. Clin. Neurophysiol., 90: 337–358. Gonzalez, S., Grave de Peralta, R., Lfitkenhoner, B., Merminghaus, E. and Hoke, M. (1991) A strategy for the solulion of the inverse problem using simultaneous EEG and MEG measurements. In: Hoke, M., Okada, Y., Erno, S.N. and Romani, G.L. (Eds.), Advances in Biomagnetism ‘91: Clinical Aspects. Elsevier, Amsterdam, pp. 99–100. Ha¨ma¨la¨inen, M.S. (1992) Magnetoencephalography: a tool for functional brain imaging. Brain Topogr., 5(2): 95–102.
Hamalainen, M.S. and Ilmoniemi, R.J. (1994) Interpreting magnetic fields of the brain: minimum norm estimates. Med. Biol. Eng. Comput., 32: 35–42. Hamann, S. and Mao, H. (2002) Positive and negative emotional verbal stimuli elicit activity in the left amygdala. Neuroreport, 13: 15–19. Hari, R. and Forss, N. (1999) Magnetoencephalography in the study of human somatosensory cortical processing. Philos. Trans. R. Soc. Lond. B Biol. Sci., 354: 1145–1154. Hauk, O., Keil, A., Elbert, T. and Mu¨ller, M.M. (2002) Comparison of data transformation procedures to enhance topographical accuracy in time-series analysis of the human EEG. J. Neurosci. Meth., 113(2): 111–122. Helmholtz, H. (1853) Ueber einige gesetze der vertheilung elektrischer strome in korperlichen leitern, mit anwendung auf die thierischelektrischen versuche. Ann. Phys. Chem., 89: 211–233 353–377. Herbert, C., Kissler, J., Jungho¨fer, M., Peyk, P. and Rockstroh, B. (2006) Processing of emotional adjectives: evidence from startle EMG and ERPs. Psychophysiology, 43(2): 197–206. Holmes, A., Vuilleumier, P. and Eimer, M. (2003) The processing of emotional facial expression is gated by spatial attention: evidence from event-related brain potentials. Brain Res. Cogn. Brain Res., 16(2): 174–184. Holmes, A., Winston, J.S. and Eimer, M. (2005) The role of spatial frequency information for ERP components sensitive to faces and emotional facial expression. Brain Res. Cogn. Brain Res., 25(2): 508–520. Huang, M.X., Shih, J.J., Lee, R.R., Harrington, D.L., Thoma, R.J., Weisend, M.P., Hanlon, F., Paulson, K.M., Li, T., Martin, K., Millers, G.A. and Canive, J.M. (2004) Commonalities and differences among vectorized beamformers in electromagnetic source imaging. Brain Topogr., 16(3): 139–158. Jungho¨fer, M., Bradley, M.M., Elbert, T.R. and Lang, P.J. (2001) Fleeting images: a new look at early emotion discrimination. Psychophysiology, Special Report, 38(2): 175–178. Jungho¨fer, M., Elbert, T., Leiderer, L., Berg, P. and Rockstroh, B. (1997) Mapping EEG-potentials on the surface of the brain: a strategy for uncovering cortical sources. Brain Topogr., 9(3): 203–217. Jungho¨fer, M., Elbert, T., Tucker, D. and Braun, C. (1999) The polar effect of average reference: a bias in estimating the head surface integral in EEG recording. Clin. Neurophysiol., 110(6): 1149–1155. Jungho¨fer, M., Elbert, T., Tucker, D. and Rockstroh, B. (2000) Statistical control of artifacts in dense array EEG/MEG studies. Psychophysiology, 37: 523–532. Jungho¨fer, M., Keil, A. and Peyk, P. (2003) An early posterior negative EEG difference component, mirroring facilitated processing of emotionally arousing material in the extended visual cortex, is almost independent of presentation rate. Psychophysiology, 40: S51–S51 Suppl. 1. Jungho¨fer, M., Peyk, P., Steinstra¨ter, O., Schupp, H. and Pantev, C. (2005a) ‘‘The sound of fear’’: MEG based evidence for affective conditioning in the auditory pathway. Psychophysiology, 42, S5–S5 Suppl 1.
141 Jungho¨fer, M., Sabatinelli, D., Schupp, H.T., Elbert, T.R., Bradley, M.M. and Lang, P.J. (2006) Fleeting images: rapid affect discrimination in the visual cortex. Neuroreport, 17(2): 225–229. Jungho¨fer, M., Schupp, H., Stark, R. and Vaitl, D. (2005b) Neuroimaging of emotion: empirical effects of proportional global signal scaling in fMRI data analysis. Neuroimage, 25: 520–526. Katznelson, R.D. (1981) EEG recording, electrode placement, and aspects of generator localization. In: Nunez, P.L. (Ed.), Electric Fields of the Brain: The Neurophysics of EEG. Oxford University Press, New York, pp. 76–213. Keil, A., Bradley, M.M., Hauk, O., Rockstroh, B., Elbert, T.R. and Lang, P.J. (2002) Large-scale neural correlates of affective picture viewing. Psychophysiology, 39: 641–649. Keil, A., Moratti, S., Sabatinelli, D., Bradley, M.M. and Lang, P.J. (2005) Additive effects of emotional content and spatial selective attention on electrocortical facilitation. Cereb. Cortex, 15(8): 1187–1197. Kemp, A.H., Gray, M.A., Eide, P., Silberstein, R.B. and Nathan, P.J. (2002) Steady-state visually evoked potential topography during processing of emotional valence in healthy subjects. Neuroimage, 17(4): 1684–1692. Kno¨sche, T.R. (2002) Transformation of whole-head MEG recordings between different sensor positions. Biomed. Tech. (Berl.), 47(3): 59–62. Kringelbach, M.L. and Rolls, E.T. (2004) The functional neuroanatomy of the human orbitofrontal cortex: evidence from neuroimaging and neuropsychology. Prog. Neurobiol., 72(5): 341–372. Kwong, K.K., Belliveau, J.W., Chesler, D.A., Goldberg, I.E., Weiskoff, R.M., Poncelet, B.P., Kennedy, D.N., Hoppel, B.E., Cohen, M.S., Turner, R., Cheng, H.M., Brady, T.J. and Rosen, B.R. (1992) Dynamic magnetic resonance imaging of human brain activity during primary sensory stimulation. Proc. Natl. Acad. Sci. USA, 89: 5675–5679. Lang, P.J., Bradley M.M. and Cuthbert, B.N. (1999). International Affective Picture System (IAPS): Technical Manual and Affective Ratings. The Center for Research in Psychophysiology, Gainesville, FL. Lang, P.J., Bradley, M.M. and Cuthbert, B.N. (2005). International affective picture system (IAPS): digitized photographs. Instruction Manual and Affective Ratings. Technical Report A-6, University of Florida, Gainesville, FL. Lang, P.J., Davis, M. and Ohman, A. (2000) Fear and anxiety: animal models and human cognitive psychophysiology. J. Affect Disord., 61: 137–159. LeDoux, J.E. (2000) Emotion circuits in the brain. Annu. Rev. Neurosci., 23: 155–184. Liu, H., Schimpf, P.H., Dong, G., Gao, X., Yang, F. and Gao, S. (2005) Standardized shrinking LORETA-FOCUSS (SSLOFO): a new algorithm for spatio-temporal EEG source reconstruction. IEEE Trans. Biomed. Eng., 52(10): 1681–1691. Logothetis, N.K., Pauls, J., Augath, M., Trinath, T. and Oeltermann, A. (2001) Neurophysiological investigation of the basis of the fMRI signal. Nature, 412(6843): 150–157.
Lorente de No, R. (1947) Analysis of the distribution of action currents of nerve in volume conductors. Stud. Rockefeller Inst. Med. Res., 132: 384–482. Lorente de No, R. (1938) Cerebral cortex: architecture, intracortical connections, motor projections. In: Fulton, J.F. (Ed.), Physiology of the Nervous System (Chap. 15). Oxford University Press, Oxford, London. Lutzenberger, W., Elbert, T. and Rockstroh, B. (1987) A brief tutorial on the implications of volume conduction for the interpretation of the EEG. J. Psychophysiol., 1: 81–90. Meeren, H.K., van Heijnsbergen, C.C. and de Gelder, B. (2005) Rapid perceptual integration of facial expression and emotional body language. Proc. Natl. Acad. Sci. USA, 102(45): 16518–16523. Merboldt, K.D., Fransson, P., Bruhn, H. and Frahm, J. (2001) Functional MRI of the human amygdala? Neuroimage, 14(2): 253–257. Morris, J.S., Ohman, A. and Dolan, R.J. (1998) Conscious and unconscious emotional learning in the human amygdala. Nature, 393: 467–470. Murphy, F.C., Nimmo-Smith, I. and Lawrence, A.D. (2003) Functional neuroanatomy of emotions: a meta-analysis. Cogn. Affect. Behav. Neurosci., 3(3): 207–233. Numminen, J., Ahlfors, S., Ilmoniemi, R., Montonen, J. and Nenonen, J. (1995) Transformation of multichannel magnetocardiographic signals to standard grid form. IEEE Trans. Biomed. Eng., 42(1): 72–78. Nunez, P.L. (1989) Estimation of large scale neocortical source activity with EEG surface Laplacians. Brain Topogr., 2(1–2): 141–154. Nunez, P.L. (1990) Localization of brain activity with electroencephalography. In: Sato, S. (Ed.) Advances in Neurology, Magnetencephalography, Vol. 54. Raven Press, New York, pp. 39–65. Offner, F.F. (1950) The EEG as a potential mapping: the value of the average monopolar reference. Electroenceph. Clin. Neurophysiol., 2: 213–214. Ogawa, S., Tank, D.W., Menon, R., Ellermann, J.M., Kim, S.G., Merkle, H. and Ugurbil, K. (1992) Intrinsic signal changes accompanying sensory stimulation: functional brain mapping with magnetic resonance imaging. Proc. Natl. Acad. Sci. USA, 89: 5951–5955. Ojemann, J.G., Akbudak, E., Snyder, A.Z., McKinstry, R.C., Raichle, M.E. and Conturo, T.E. (1997) Anatomic localization and quantitative analysis of gradient refocused echo-planar fMRI susceptibility artifacts. Neuroimage, 6(3): 156–167. Ortigue, S., Michel, C.M., Murray, M.M., Mohr, C., Carbonnel, S. and Landis, T. (2004) Electrical neuroimaging reveals early generator modulation to emotional words. Neuroimage, 21(4): 1242–1251. Pascual-Marqui, R.D., Michel, C.M. and Lehmann, D. (1994) Low resolution electromagnetic tomography: a new method for localizing electrical activity in the brain. International Journal of Psychophysiology, 18: 49–65. Pascual-Marqui, R.D. (2002) Standardized low resolution brain electromagnetic tomography (sLORETA): technical details. Methods Find. Exp. Clin. Pharmacol., 24: 5–12.
142 Pauli, P., Amrhein, C., Muhlberger, A., Dengler, W. and Wiedemann, G. (2005) Electrocortical evidence for an early abnormal processing of panic-related words in panic disorder patients. Int. J. Psychophysiol., 57(1): 33–41. Pause, B.M., Raack, N., Sojka, B., Goder, R., Aldenhoff, J.B. and Ferstl, R. (2003) Convergent and divergent effects of odors and emotions in depression. Psychophysiology, 40(2): 209–225. Perrin, F., Bertrand, O. and Pernier, J. (1987) Scalp current density mapping: value and estimation from potential data. IEEE Trans. Biomed. Eng., 34(4): 283–289. Perrin, F., Pernier, J., Bertrand, O. and EchaUier, J.F. (1989) Spherical splines for potential and current density mapping. Electroenc. Clin. Neurophys., 72: 184–187. Pessoa, L., Kastner, S. and Ungerleider, L.G. (2002) Attentional control of the processing of neural and emotional stimuli. Brain Res. Cogn. Brain Res., 15: 31–45. Phan, K.L., Wager, T., Taylor, S.F. and Liberzon, I. (2002) Functional neuroanatomy of emotion: a meta-analysis of emotion activation studies in PET and fMRI. Neuroimage, 16: 331–348. Phan, K.L., Wager, T.D., Taylor, S.F. and Liberzon, I. (2004) Functional neuroimaging studies of human emotions. CNS Spectr., 9(4): 258–266. Phelps, E.A. (2004) Human emotion and memory: interactions of the amygdala and hippocampal complex. Curr. Opin. Neurobiol., 14(2): 198–202. Pourtois, G., Grandjean, D., Sander, D. and Vuilleumier, P. (2004) Electrophysiological correlates of rapid spatial orienting towards fearful faces. Cereb. Cortex, 14(6): 619–633. Pourtois, G., Thut, G., Grave de Peralta, R., Michel, C. and Vuilleumier, P. (2005) Two electrophysiological stages of spatial orienting towards fearful faces: early temporo-parietal activation preceding gain control in extrastriate visual cortex. Neuroimage, 26(1): 149–163. Raichle, M.E. (2001) Cognitive neuroscience. Bold insights. Nature, 412(6843): 128–130. Ramsay, S.C., Murphy, K., Shea, S.A., Friston, K.J., Lammertsma, A.A., Clark, J.C., Adams, L., Guz, A. and Frackowiak, R.S. (1993) Changes in global cerebral blood flow in humans: effect on regional cerebral blood flow during a neural activation task. J. Physiol., 471: 521–534. Robinson, S.E. and Vrba, J. (1999) Functional neuroimaging by synthetic aperture magnetometry (SAM). In: Yoshimoto, T., et al. (Eds.), Recent Advances in Biomagnetism. Tohoku University Press, Sendai, pp. 302–305. Salek-Haddadi, A., Friston, K.J., Lemieux, L. and Fish, D.R. (2003) Studying spontaneous EEG activity with fMRI. Brain Res. Brain Res. Rev., 43(1): 110–133. Sarvas, J. (1987) Basic mathematical and electromagnetic concepts of the biomagnetic inverse problem. Phys. Med. Biol., 32(1): 11–22. Scherg, M. and von Cramon, Y.D. (1985) Two bilateral sources of the late AEP as identified by a spatio-temporal dipole model. Electroencephalogr. Clin. Neurophysiol., 62: 32–44. Schupp, H.T., Cuthbert, B.N., Bradley, M.M., Cacioppo, J.T., Ito, T. and Lang, P.J. (2000) Affective picture processing: the
late positive potential is modulated by motivational relevance. Psychophysiology, 37(2): 257–261. Schupp, H.T., Stockburger, J., Codispoti, M., Jungho¨fer, M., Weike, A.I. and Hamm, A.O. (2006) Stimulus novelty and emotion perception: the near absence of habituation in the visual cortex. Neuroreport, 17: 365–369. Schupp, H., Jungho¨fer, M., Weike, A.I. and Hamm, A.H. (2003a) Attention and emotion: an ERP analysis of facilitated emotional stimulus processing. Neuroreport, 14: 1107–1110. Schupp, H., Jungho¨fer, M., Weike, A.I. and Hamm, A.H. (2003b) Emotional facilitation of sensory processing in the visual cortex. Psychol. Sci., 14: 7–13. Schupp, H., Jungho¨fer, M., Weike, A.I. and Hamm, A.H. (2004a) The selective processing of briefly presented affective pictures: an ERP analysis. Psychophysiol., 41(3): 441–449. Schupp, H., O¨hman, A., Jungho¨fer, M., Weike, A. and Hamm, A. (2004b) Emotion guides attention: the selective encoding of threatening faces. Emotion, 4: 189–200. Sekihara, K., Nagarajan, S.S., Poeppel, D., Marantz, A. and Miyashita, Y. (2001) Reconstructing spatio-temporal activities of neural sources using an MEG vector beamformer technique. IEEE Trans. Biomed. Eng., 48(7): 760–771. Skrandies, W. and Chiu, M.J. (2003) Dimensions of affective semantic meaning — behavioral and evoked potential correlates in Chinese subjects. Neurosci. Lett., 341(1): 45–48. Srinivasan, R., Nunez, P.L., Tucker, D.M., Silberstein, R.B. and Cadusch, P.J. (1996) Spatial sampling and filtering of EEG with spline laplacians to estimate cortical potentials. Brain Topogr., 8(4): 355–366. Stolarova, M., Keil, A. and Moratti, S. (2006) Modulation of the C1 visual event-related component by conditioned stimuli: evidence for sensory plasticity in early affective perception. Cereb. Cortex, 16(6): 876–887. Susac, A., Ilmoniemi, R.J., Pihko, E. and Supek, S. (2004) Neurodynamic studies on emotional and inverted faces in an oddball paradigm. Brain Topogr., 16(4): 265–268. Tucker, D.M. (1993) Spatial sampling of head electrical fields: the geodesic sensor net. Electroenceph. Clin. Neurophysiol., 87: 154–163. Tucker, D.M., Luu, P., Frishkoff, G., Quiring, J. and Poulsen, C. (2003) Frontolimbic response to negative feedback in clinical depression. J. Abnorm. Psychol., 112(4): 667–678. Van Veen, B.D., van Drongelen, W., Yuchtman, M. and Suzuki, A. (1997) Localization of brain electrical activity via linearly constrained minimum variance spatial filtering. IEEE Trans. Biomed. Eng., 44(9): 867–880. Vuilleumier, P., Armony, J.L., Driver, J. and Dolan, R.J. (2001) Effects of attention and emotion on face processing in the human brain: an event-related fMRI study. Neuron, 30: 829–841. Wager, T.D., Phan, K.L., Liberzon, I. and Taylor, S.F. (2003) Valence, gender, and lateralization of functional brain anatomy in emotion: a meta-analysis of findings from neuroimaging. Neuroimage, 19: 513–531.
143 Wang, Z., Ives, J.R. and Mirsattari, S.M. (2006) Simultaneous electroencephalogram-functional magnetic resonance imaging in neocortical epilepsies. Adv. Neurol., 97: 129–139. Weinstein, D., Zhukov, L. and Johnson, C. (2000) Lead-field bases for electroencephalography source imaging. Ann. Biomed. Eng., 28(9): 1059–1066. Williamson, S.J. and Kaufman, L. (1981) Biomagnetism. J. Magn. Magn. Mater., 22: 129–201. Wolters, C.H., Anwander, A., Tricoche, X., Weinstein, D., Koch, M.A. and Macleod, R.S. (2006) Influence of tissue conductivity anisotropy on EEG/MEG field and return
current computation in a realistic head model: a simulation and visualization study using high-resolution finite element modeling. Neuroimage, 30(3): 813–826. Yamashita, Y. (1982) Theoretical studies on the inverse problem in electrocardiography and the uniqueness of the solution. IEEE Trans. Biomed. Eng., 29: 719–724.78. Yao, D., Wang, L., Oostenveld, R., Nielsen, K.D., ArendtNielsen, L. and Chen, A.C. (2005) A comparative study of different references for EEG spectral mapping: the issue of the neutral reference and the use of the infinity reference. Physiol. Meas., 26(3): 173–184.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 8
Emotional and semantic networks in visual word processing: insights from ERP studies Johanna Kissler, Ramin Assadollahi and Cornelia Herbert Department of Psychology, University of Konstanz, P.O. Box D25, D-78457 Konstanz, Germany
Abstract: The event-related brain potential (ERP) literature concerning the impact of emotional content on visual word processing is reviewed and related to general knowledge on semantics in word processing: emotional connotation can enhance cortical responses at all stages of visual word processing following the assembly of visual word form (up to 200 ms), such as semantic access (around 200 ms), allocation of attentional resources (around 300 ms), contextual analysis (around 400 ms), and sustained processing and memory encoding (around 500 ms). Even earlier effects have occasionally been reported with subliminal or perceptual threshold presentation, particularly in clinical populations. Here, the underlying mechanisms are likely to diverge from the ones operational in standard natural reading. The variability in timing of the effects can be accounted for by dynamically changing lexical representations that can be activated as required by the subjects’ motivational state, the task at hand, and additional contextual factors. Throughout, subcortical structures such as the amygdala are likely to contribute these enhancements. Further research will establish whether or when emotional arousal, valence, or additional emotional properties drive the observed effects and how experimental factors interact with these. Meticulous control of other word properties known to affect ERPs in visual word processing, such as word class, length, frequency, and concreteness and the use of more standardized EEG procedures is vital. Mapping the interplay between cortical and subcortical mechanisms that give rise to amplified cortical responses to emotional words will be of highest priority for future research. Keywords: emotion; semantics; word processing; event-related potentials; healthy volunteers; clinical populations and activity (Osgood et al., 1957), the first two accounting for the majority of the variance. On the semantic differential, a word’s evaluative connotation is determined by ratings on a multitude of seven-point scales, spanned by pairs of antonyms such as hot–cold, soft–hard, happy–sad, etc. Factor analyses of the judgments of many words on such scales, given by large subject populations, reveal a three-dimensional evaluative space, whose structure has been replicated many times and across different cultures (Osgood et al., 1975). Figure 1 provides an illustration of the evaluative space determined by the semantic differential.
Introduction Influential dimensional approaches to the study of emotion derive their basic dimensions from analyses of written language. Osgood and collaborators, using the ‘semantic differential’ technique, were the first to empirically demonstrate that affective connotations of words are determined by three principal dimensions, namely evaluation, potency,
Corresponding author. Tel.: +49-7531-884616; Fax: +49-7531-4601; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56008-X
147
148
Fig. 1. A three-dimensional affective space of connotative meaning as postulated by Osgood (1957, 1975). The figure depicts the three orthogonal bipolar dimensions, evaluation (E), activity (A) and potency (P), and gives examples of prototypical words for each dimension and polarity. (Adapted from Chapman, 1979.)
Osgood’s principal dimensions are at the core of other circumplex theories of affect (Lang, 1979; Russel, 1980). For instance, Lang and colleagues posit that human affective responses are determined by the dimensions valence, arousal, and dominance; again the first two having the largest impact, leading these authors to propose model of affect defined by the dimensions of arousal and valence. Within such a two-dimensional affective space, different classes of stimuli such as pictures, sounds, and words cluster in a u-shaped manner. Highly arousing stimuli usually receive valence ratings as either highly pleasant or highly unpleasant, low arousing material is generally regarded as more neutral with regard to valence (see
Fig. 2 for an illustration derived from the ratings of German word stimuli). The impact of valence and arousal on various central nervous and peripheral physiological indicators of affective processing has been repeatedly validated using picture and sound media (Lang et al., 1993; Bradley and Lang, 2000; Jungho¨fer et al., 2001; Keil et al., 2002). Word stimuli assessed for perceived arousal and valence such as the ‘Affective Norms for English Words’, ANEW (Bradley and Lang, 1998) have also been put to use in physiological research (see e.g., Fischler and Bradley, this volume), although so far the resulting evidence seems more restricted than the one for pictorial material.
149
Fig. 2. Illustration of a two-dimensional affective space spanned by the dimensions arousal and valence. Examples of the relative position of some German adjectives and nouns used in our studies are given in English translation. Along the x- and y-axis are depicted the arousal and valence scales of the Self-Assessment Manikin (SAM, Bradley and Lang, 1994) used to rate individual emotional responses to the words.
Emotions are generally viewed as culturally universal, largely innate evolutionary ‘old’ signaling and activation systems, residing in ‘old’, subcortical parts of the brain. They are designed to promote survival in critical situations, i.e., to signal and activate fight, flight or feeding, attachment, and sexual behavior. Reading and writing, by contrast, represent comparatively recent developments in the history of mankind, and in individual development these are acquired much later than oral language. Consequently, these skills are often regarded as
cultural achievements, but during the acquisition of written language considerable regional specialization emerges in the human brain (Warrington and Shallice, 1979, 1980; Dehaene et al., 2005). Reading acts as a secondary process that utilizes the processing capabilities of the earlier acquired auditory language system once the analysis of visual word form is completed (Perfetti, 1998; Everatt et al., 1999; Perfetti and Sandak, 2000). Language and emotion share a communicative function but linguistic communicative functions are obviously not restricted to the communication of affect.
150
How do the ‘emotional brain’ and the ‘linguistic brain’ interact, when written words with emotional connotations are encountered? Emotion theories posit that linguistic expressions are stored within semantic networks that encompass links to all aspects of their linguistic and pragmatic usages and emotional connotations (Lang, 1979; Bower, 1981). Thus, the word ‘gun’, for example, not only represents the object itself, but also includes links to its operations, use, purposes, and their consequences as well as their emotional evaluation (Bower, 1981). A converging view is shared in neurolinguistics (Pulvermu¨ller, 1999) and cognitive semantics (Barsalou, 1999): All information related to a word is stored in a dynamic network. Recent evidence suggests that subnetworks1 representing different aspects of a word’s lexical representation can be separately and dynamically activated. For instance, differential neuromagnetic activations of semantic subnetworks have recently been shown for subclasses of plant or animal names (Assadollahi and Rockstroh, 2005). Moreover, biasing contextual constrains can affect the timing of access to the dominant vs. subordinate meaning of homonyms (Sereno et al., 2003), challenging the modular view that in word processing initially all lexical entries have to be exhaustively accessed. Functional divisions of the semantic system mirroring functional divisions in the organization of the cortex have repeatedly been shown for verbs denoting different types of actions (Pulvermu¨ller et al., 2001b; Hauk et al., 2004). Investigating verbs pertaining to movements carried out with different parts of the body, these authors demonstrate that the meaning of action words is reflected by the correlated somatotopic activation of motor and premotor cortex. These patterns of coactivations presumably reflect individual learning history, where the so-called referential meaning has been acquired by repeated coactivation of the body 1 The terms sub-network or sub-representation as used here are not necessarily intended to imply a fixed hierarchical ordering of the neural networks coding for different aspects of semantics, although a certain degree of hierarchical ordering may indeed exist. Instead, sub-network or sub-representation refers to the fact that different neural networks are likely to code for different aspects of a word’s meaning, such as animacy, emotional connotation, grammatical gender, etc.
movement and the descriptive speech pattern, for instance when a child observes or carries out a gesture such as throwing and simultaneously hears the caregiver say the respective word. Later, in the acquisition of written language, this phonological code is mapped onto the visual word form (Perfetti and Sandak, 2000). For emotional concepts, Lang et al. (1993, 1994) assume that not only associated semantic but also physiological and motor response information is coactivated in associative networks. Figure 3 illustrates such a multilevel network representation of an emotional scene, encompassing a semantic code of the given situation as well as associated motor and physiological responses. Thus, the semantic network that codes for ‘emotional semantics’ could include the neuronal circuitry processing the associated emotion (see also Cato and Crosson, this volume, for a related suggestion). How does emotional content influence different stages of visual word processing? Here, the literature is sparse. Event-related brain potentials (ERPs), the scalp recorded averaged synchronized activity of several thousands cortical pyramidal cells, have successfully been used to delineate different stages of visual word processing (for reviews see e.g., Posner et al., 1999; Tarkiainen et al., 1999). A closer look also reveals that a considerable number of electrophysiological studies of emotional processing have employed visually presented word-stimuli. Some, particularly early, studies have used the semantic differential as their theoretical vantage point. In fact, in the wake of Osgood’s studies, the examination of ERP correlates of emotional semantics generated substantial research interest (Chapman et al., 1978, 1980; Chapman, 1979; Skrandies, 1998; Skrandies and Chiu, 2003). More recently, two-dimensional valence arousal models have been used as a framework for research (see Fischler and Bradley, this volume). However, many studies used words with emotional connotations as experimentally convenient instances of a broader class of emotional events (Anderson and Phelps, 2001; Dijksterhuis and Aarts, 2003) or conversely, as a semantic class without much reference to any particular theory of language and/or emotion (Begleiter and Platz, 1969). So far, little systematic knowledge has been gathered on the relationship between the emotion and language
151
Fig. 3. A network representation of a complex emotional scene (exam situation) illustrates how in dynamic emotional processing perceptual, semantic and response systems are interactively linked. Activation on any level of this system can spread to other subsystems. (Adapted after Lang, 1994.)
systems in visual word processing and possible implications for the underlying neural implementation of meaning. ERP recordings have an excellent temporal resolution, allowing for a fine-grained analysis of the temporal sequence of different processing stages. Their spatial resolution is more restricted, and inferences from the spatial distribution of scalp measured ERPs to their neural generators can only be made with caution. The number of electrodes and the recording reference used influence the probability of finding effects, the generalizability of these findings, and the accuracy of spatial localization (see also Jungho¨fer et al., this volume). The present review will summarize and systematize existing studies on the role of emotion in visual word processing, both in healthy volunteers and in clinical populations and relate this evidence
to the available knowledge on ERP indices of stages of visual word processing. First, we will address studies reporting early effects of emotional content on ERP responses, occurring within the first 300 ms after stimulus onset, separately for healthy volunteers and clinical populations; second, we will review effects of emotional content on the late ERPs to visual words; and third, discuss how an interplay of subcortical and cortical mechanisms of emotion and word processing may give rise to the observed effects. To facilitate comparisons between studies, main methodological parameters and results concerning effects of emotional content, as stated in the reviewed studies, are summarized in two tables in the appendix. Table A1 describes studies with healthy volunteers; Table A2 describes work with clinical populations. There, it becomes immediately apparent that the studies
152
described vary considerably in theoretical approach and methodology.
Early effects — occurring within 300 ms after stimulus onset Healthy volunteers The extent to which early exogenous components of the human event-related potential are subject to modification by nonphysical stimulus characteristics is a matter of ongoing controversy. In visual word processing, a traditional view holds that within the first 150–200 ms after a word has been presented, specific perceptual features of written words but no meaning-related attributes are extracted (Schendan et al., 1998; Posner et al., 1999) and for many years the N400 potential, a centro-parietal negativity arising around 400 ms after stimulus onset has been viewed as ‘the’ index of semantic processing (Kutas and Federmeier, 2000). Using other types of visually presented stimuli with emotional content, such as faces or pictures, remarkable ERP differences between emotionally significant and neutral stimuli have been found within the first 300 ms after stimulus onset, some even within the first 100 ms. In his single subject study of ERP responses to emotional words, Lifshitz (1966) failed to find a visually impressive differentiation between emotional and neutral words within the first 500 ms after word onset, although upon visual inspection the difference between the likewise presented erotic and neutral line drawings was sizeable. However, in the meantime there is a considerable body of evidence indicating that even very early ERP responses can diverge between emotional and neutral words. Thus, the probably first quantitative study on the effects of emotional content on ERP indices of word processing found differences between negative ‘taboo words’ and neutral words already within the first 200 ms after word presentation (Begleiter and Platz, 1969). Notably, this study was entitled: ‘Cortical evoked potentials to semantic stimuli’, expressing a view of emotion as a vital part of semantics. The technological standards at the time were not very sophisticated and the authors
recorded only from a single right occipital electrode (O2), but ERP differences due to emotional content appeared in the ERP tracings already in the P1–N1 complex: twenty minimally above threshold presented repetitions of each of two clearly negative ‘four-letter –words’ led to larger responses than twenty repetitions of each of the words ‘tile’ and ‘page’. This pattern held for both a passive viewing condition and a naming condition. Several years later, Kostandov and Azurmanov (1977) contrasted ERP responses to both subliminally and supraliminally presented ‘conflict’ and neutral words. In the subliminal condition, the earliest differences between ‘conflict’ and neutral appeared around 200 ms after stimulus onset. Subliminally presented ‘conflict words’ apparently relating to the subjects’ relationship conflicts caused by jealousy led to larger N200 responses than neutral words. Supraliminally presented words led to later differentiations starting around 300 ms, in the P3a window. Unfortunately, the description of the study is somewhat sparse on details regarding the materials used. Chapman et al. (1978) sought to electrophysiologically validate Osgood’s connotative dimensions of meaning (see Fig. 1), namely evaluation (E), potency (P), and activity (A). They recorded from one midline electrode (CPz) referenced to linked mastoids while subjects had to name briefly flashed (17 ms) words. Twenty words each from the extreme ends of the three connotative dimensions, E, P, and A, yielding six semantic classes (E+/-, P+/-, and A +/-) were used as stimuli. Random sequences of all these words were presented 12–20 times to the subjects (until a set criterion of artifact-free trials was entered into the average). Although the exact multivariate analysis of the data in the original report is somewhat hard to reconstruct, in essence Chapman et al. (1978) were able to statistically differentiate between all six connotative dimensions using ERP data from a single central channel. Whereas the different connotative dimensions are not mapped onto any specific ERP components, in the original grandaverages a second positive peak at around 260 ms after word onset is noticeable that clearly differentiates between the extreme ends (+ and -) of all three dimensions but not so much among E, P, and
153
A. In general, the material in this study was well controlled for physical aspects but apparently not for other linguistic attributes such as word frequency, word class, or concreteness. Replication studies by the same group investigated the effects of explicit affective rating tasks on ERPs (Chapman, 1979) and directly compared the effects of affective rating and word naming in one study, yielding similar results (Chapman et al., 1980). Thus, a series of studies demonstrated reliable statistical differentiation between six connotative categories within 500 ms after stimulus onset on the basis of ERP data from a single channel. Extending their 1969 study across a larger selection of stimuli, Begleiter et al. (1979) recorded ERPs elicited by unpleasant, pleasant, and neutral words, as 10 subjects had to either identify the last vowel in a presented word or give their personal affective evaluation of the word. Recordings were made from three electrodes over each hemisphere, referenced to linked mastoids. However, results are reported only for two of these electrodes, namely P3 and P4. Stimuli were the 62 most unpleasant, 62 most neutral, and 62 most pleasant five-letter words derived from a larger pool of words previously assessed with the semantic differential (Osgood et al., 1957). Words were presented very briefly (20 ms). The words’ emotional meaning affected N1–P2 peak-to-peak amplitude during emotional evaluation. ERP responses to words evaluated as pleasant, neutral, or unpleasant could be distinguished statistically at both electrodes. Overall, effects of emotional content were somewhat more pronounced over the left hemisphere and were restricted to the affective evaluation condition. At the left hemispheric electrode, ERPs were also generally larger when the words were shown in the emotional evaluation than in the letter-identification task. Unlike the previous studies, Begleiter et al. (1979) was the first to provide evidence for a major impact of task, showing early ERP effects of emotional content only during active evaluation. Skrandies (1998), studying ‘ERP correlates of semantic meaning’, used an approach conceptually similar to Chapman’s (1978, 1979, 1980). A pool of 60 nouns representing the bipolar extremes on Osgood’s E, P, and A dimensions were selected and presented in a rapid serial visual presentation
(RSVP) design, i.e., a continuous stream of alternating words without interstimulus interval. Skrandies (1998) used a comparatively slow RSVP design presenting each word for 1 s. Ten stimuli per category and polarity were selected and each stimulus was repeated 40 times, yielding 400 averages per category, thus optimizing the signal-to-noise ratio. Subjects were instructed to visualize the words and to remember them for a subsequent memory test in order to ensure active engagement with the stimuli. EEG was recorded from 30 channels referenced to an average reference. Brain responses between the emotional word categories differed in six distinct time windows, with respect to either their peak latency, associated global field power or the location of the centroids of the scalp distribution. Remarkably, most of the differences occurred within the first 300 ms after word presentation, starting with P1 at around 100 ms. These results were recently extended cross-culturally in a virtually identical study of affective meaning in Chinese, yielding a somewhat different but equally complex pattern of differences depending on emotional word content, which were restricted to the first 300 ms after word onset (Skrandies and Chiu, 2003). Together with Begleiter and Platz (1969), Skrandies studies report probably the earliest meaning-dependent differentiation between words. Consequently, these studies are frequently cited in the visual word processing literature, albeit without reference to the particular emotional semantic contents used. Schapkin et al. (2000) recorded ERPs as subjects evaluated nouns as emotional or neutral. The stimuli consisted of a total of 18 words, 6 pleasant, 6 unpleasant, and 6 neutral, which were matched across emotional categories for word length, frequency, concreteness, and initial letters. Words were presented peripherally, to the left and right visual fields for 150 ms, while the EEG was recorded from 14-linked mastoid referenced electrodes, eight of which were consistently analyzed. Stimuli were repeated 32 times, 16 times in each visual field. The earliest effect of emotional significance of the words on ERP responses was observed in the P2 window, peaking at 230 ms. At bilateral central sites, P2 responses to pleasant words were larger than responses to unpleasant and neutral ones. While the P2 response was generally larger
154
over the left hemisphere, the emotion effect was not lateralized. Similar effects of emotional content on ERP responses to words were also observed in later time windows (see below). As already suggested by Kostandov and Arzumanov (1977), the powerful effects of emotional connotation on brain responses appear to extend even below the limits of conscious perception. Bernat et al. (2001), recording from six-linked mastoid referenced scalp electrodes, report a differentiation between both subliminally and briefly (40 ms) presented unpleasant and pleasant adjectives at lefthemispheric electrode positions already in the P1 and N1 time ranges, as participants simply maintained fixation on a central cross and viewed the computer screen without an explicit behavioral response being required. In the P1 and N1 windows, unpleasant adjectives lead to larger ERP responses in the left hemisphere. Overall larger responses to unpleasant as compared to pleasant adjectives were obtained in the subsequent P2, P3a, and LPC time ranges, the main effects of emotional content having earlier onsets in the subliminal condition. The affective valence of the stimuli had been determined by assessing a larger pool of words on five bipolar scales from the evaluative dimension of the semantic differential. Both the subsequent ERP study subjects and an independent sample had repeatedly rated the stimuli. ERPs to 6 repetitions of the 10 most extremely pleasant and unpleasant as well as to 12 neutral words were recorded. Thus, study subjects had considerable experience with the words. Also, it is unclear, whether the stimuli were assessed on other potentially relevant emotional or linguistic dimensions such as arousal and dominance or word length, frequency, and abstractness. Still, these results as well as data from Kostandov and Azurmanov (1977) or Silvert et al. (2004) and Naccache et al. (2005) in principle support the possibility of measurable physiological responses to subliminally presented emotional words and add to the evidence of emotional content-dependent P1 differences (Begleiter, 1969; Skrandies, 1998; Skrandies and Chiu, 2003). Recently, Ortigue et al. (2004) also reported a very early effect of word emotionality in a dense array ERP study recording from 123 scalp channels. The task consisted of a lexical decision to
very briefly flashed (13 ms) stimuli. Subjects had to indicate which of two simultaneously presented letter combinations in both visual fields constituted an actual word. Stimuli were half neutral and half emotional nouns of both pleasant and unpleasant valence. They were matched for word length and frequency and selected from a larger pool of words pre-rated on a bipolar seven-point scale spanning the neutral-emotional continuum. Overall, emotional words presented in the right visual field were classified most accurately and fastest. However, the relative advantage for emotional words was larger for words presented in the left visual field. Using a source estimation approach (LAURA) the authors identified a stable topographic pattern from the spatiotemporal distribution of the ERP data that accounted for the processing advantage of emotional words in the right visual field. This pattern emerged between 100 and 140 ms after stimulus onset, i.e., mostly in the P1/N1 window. Curiously, it was localized to primarily right-hemispheric extra-striate cortex. Surprisingly, no specific neurophysiological correlate of the even more pronounced advantage for emotional words in the left visual field was identified within the 250 ms after stimulus presentation that this study restricted its analysis to. Recent data from our own laboratory also produced evidence for early differences between cortical responses to emotionally arousing (both pleasant and unpleasant) and neutral adjectives and nouns. The word’s emotional content had been predetermined in a separate experiment, obtaining valence and arousal ratings on two ninepoint rating scales (see Fig. 2) from 45 undergraduate students. According to these ratings, highly arousing pleasant and unpleasant and low arousing neutral words were selected. Different subsets of these words were used in three studies where ERPs from 64 scalp sites were measured. Neutral and highly arousing pleasant and unpleasant words matched for word length, frequency, and in one experiment also for concreteness were presented in RSVP designs to subjects instructed to read the words. Across three different stimulus presentation durations (333, 666, 1000 ms) and regardless of word type (adjectives or nouns), a lefthemispheric dominant occipitotemporal negativity
155
differentiated emotional (both pleasant and unpleasant) from neutral words. This negativity had its maximum around 260 ms after stimulus onset. The influence of stimulus repetition was assessed, but neither habituation nor sensitization was found for the emotional-neutral difference within the five repetitions used (Fig. 4 illustrates the effect for the 666 ms presentation rate). In one of the studies we also manipulated task demands, instructing subjects to attend to and count one of the two word classes (adjective or noun). Interestingly, this manipulation did not affect the enhanced early negativity to emotional words but had a significant impact on the later positivity. Herbert et al. (2006) also recorded ERPs from 64 average-reference linked scalp channels, as 26 subjects evaluated the emotional significance of
highly arousing pleasant and unpleasant as well as neutral adjectives. The affective content of the stimuli had been predetermined in a separate population using the above-described procedure. Words were presented for a relatively long period, namely 5 s. In this study the P2 component was the first index of differential processing of emotional vs. neutral words. This P2 component primarily responded to perceived stimulus intensity/arousal and did not differentiate brain responses to pleasant from those to unpleasant words. The same was true for the subsequent P3a component, but the picture changed for a later LPC component and the simultaneously recorded startle response that were more pronounced for pleasant than for unpleasant and neutral words. This sequence of effects is depicted in Fig. 5.
Fig. 4. Early arousal-driven enhancement of cortical responses to emotional words. Uninstructed reading of both pleasant and unpleasant words in a rapid serial visual stimulation paradigm (RSVP, 666 ms stimulus duration) leads to larger occipitotemporal negativities than reading of neutral words. The effect is illustrated at two occipital sensors (O9, O10) and the scalp topography of the difference potential emotional–neutral words is depicted. Grand-averages from 16 subjects are shown.
156
Fig. 5. Difference maps of cortical activation for emotional minus neutral words in a covert evaluation task. Averaged activity in three time windows is shown: P2 (180–280 ms), P3 (280–400 ms), and LPC (550–800 ms). For P2 and P3 both pleasant and unpleasant words are associated with larger positivities than neutral ones. In the LPC window only processing of pleasant words diverges from neutral. The time course of the activity is shown at electrode Pz. Grand-averages from 26 subjects are shown.
Clinical studies One of the first studies to use emotional words as a tool to address processing biases (or a lack thereof) in clinical populations is Williamson et al. (1991) who investigated behavioral and cortical responses to pleasant, unpleasant, and neutral words in psychopathic and nonpsychopathic prisoners. Subjects had to detect words in a sequence consisting of words and nonwords while their EEG was being recorded from five scalp positions referenced to linked mastoids. Stimuli were presented vertically for 176 ms, separately to either visual field, and repeated three times. Stimuli had been matched for length,
number of syllables, word frequency, and concreteness but differed in emotional connotation. Nonpsychopathic subjects had faster reaction times and larger P2 responses to both pleasant and unpleasant emotional words than to neutral ones. These differences induced by emotional arousal extended into the late positive component (LPC) time range in controls but were completely absent in psychopaths. Weinstein (1995) assessed correlates of enhanced processing of threatening and nonthreatening verbal information in university students with elevated or normal trait anxiety levels. Subjects read sentences with threatening and pleasant content that served as primes for subsequently presented threat related,
157
neutral, or pleasant words. ERPs in response to the target words were assessed at Fz, Cz, and Pz, as subjects had to decide whether the target word contextually fit the previously shown sentence. Highly anxious subjects were reported to exhibit a larger frontal N1 in the threat-priming condition and an enhanced P400 (i.e., reduced N400, see below) in a later time window. In retrospect, a number of methodological problems seem to exist in this study or the data presentation may contain errors. For instance, a large ERP offset at baseline is shown, the presented condition means occasionally do not seem to correspond to the ERPs displayed and information on linguistic properties of the stimuli used is missing. However, taken at face value, the results indicate heightened selective attention to (N1) and facilitated semantic integration of (N400/P400) threat-related information in students with elevated trait anxiety levels. A similar pattern of early ERP differences was also found in a study investigating the processing of pain-, body-related, and neutral adjectives in healthy volunteers and prechronic pain patients. Knost et al. (1997) report enhanced N1 responses to the pain-related stimuli at a left frontal sensor (F3) in the patient group. ERPs had been recorded from 11 mastoid linked scalp positions while subjects had to name words that were presented at the individually determined perceptual threshold. In both groups, pain- and body-related words produced larger positivities than neutral ones in a later time window (600–800 ms) and were also associated with larger startle eye-blink responses on separately administered startle probe trials. The authors interpret their findings as evidence for preconsciously heightened attention to unpleasant, pain-related stimuli in the patients. These results were paralleled in an analogous study with chronic pain patients (Flor et al., 1997). Chronic pain patients had a larger frontal N1 response to pain-related words than comparison subjects. Additionally, a general hemispheric asymmetry emerged: N1 responses were larger over the right for pain-related words and over the left side of the head for neutral words. The enhanced responses for painrelated words in the patient group were also visible in a centro-parietally maximal N2 component. In the P2 window, right hemispheric responses to pain
words were likewise larger in the patients. In contrast to the first study, no differential effects of emotional category were observed in subsequent time windows (P3 and LPC). The ERP results are taken to reflect heightened preconscious allocation of attention (N1) and stimulus discrimination (N2) to disorder-related words in pain patients, but show no evidence of further evaluative processing. In a similar vein, Pauli et al. (2005) studied cognitive biases in panic patients and healthy volunteers analyzing ERP responses from nine scalp channels to panic-related unpleasant and neutral words that were presented, in separate runs, at individually determined perceptual thresholds and for 1000 ms. Early ERPs differentiated panic patients from comparison subjects. At threshold presentation, patients showed two enhanced early frontal positivities in response to panic words, one between 100 and 200 ms (P2) and the other (P3a) between 200 and 400 ms post-stimulus onset. Both early effects were absent at the longer exposure duration and did not occur at all in the control group. Interestingly, subsequent positivities between 400 and 600 ms as well as between 600 and 1000 ms differentiated between panic words from neutral words in both groups and for both presentation durations. This pattern of data resembles the results by Knost et al. (1997). Kissler and colleagues (in preparation) assessed processing biases in depressed patients and comparison subjects using the above-described RSVP paradigm, recording from 256 scalp electrodes and comparing the amplitude and scalp distribution of the previously described early negativity to pleasant, unpleasant, and neutral adjectives matched for length and frequency. Around 250 ms after word onset (see Fig. 3), comparison subjects displayed the above-described left-hemispheric dominant enhanced negativity for emotional words, pleasant and unpleasant alike. Depressed patients, by contrast, exhibited this enhanced negativity solely in response to the unpleasant words and only in the right hemisphere. Comparing early emotional and early semantic processing In sum, numerous studies have found early (o300 ms) amplifications of ERPs in response to
158
words with emotional content compared with neutral words. The occurrence of such effects is remarkable since controlled conscious processing has been suggested to arise only with the P3/N400 components (Halgren and Marinkovic, 1995), implying that emotion can affect preconscious stages of word processing. Such effects appear to be more pronounced in various clinical populations, reflecting heightened sensitivity and orienting to unpleasant, disorderrelated material (Weinstein, 1995; Flor et al., 1997; Knost et al., 1997; Pauli et al., 2005). Other patients groups, by contrast, seem to selectively lack processing advantages for emotional words, psychopaths showing no cortical differentiation between emotional and neutral words (Williamson et al., 1991) and depressed patients showing preferential processing of unpleasant but not pleasant words (Kissler et al., in preparation). At any rate, processing biases in a number of clinical populations are reflected in their patterns of early responses to pleasant, unpleasant, and neutral words. A debated issue in emotion research pertains to whether, when, and how cortical responses differ as a function of arousal, valence, and additional factors such as dominance, or complex interactions of these. Here, the data must remain somewhat inconclusive, as the studies discussed differed vastly on the dimensions included and assessment methods used. Studies that assessed their materials with the semantic differential found that brain responses differentiate between all dimensions and polarities within the first 300 ms after stimulus onset (Chapman et al., 1978, 1980; Skrandies, 1998; Skrandies and Chiu, 2003). However, the arising pattern of results is so complex that it is hard to gauge the effect of each individual dimension on brain responses. The vast majority of studies report generally larger ERP responses to emotional than to neutral words, with some studies reporting these effects even in the absence of a task that would explicitly require processing of emotional content or other types of semantic access (Begleiter and Platz, 1969; Bernat et al., 2001; Kissler et al., submitted manuscript). However, occasionally, early emotion effects in word processing were found restricted to situations where explicit
processing of the emotion dimension is required by the task (Begleiter et al., 1979). Directly comparing the impact of pleasant vs. unpleasant word content yields mixed results, with some studies finding larger early effects for pleasant words (Schapkin et al., 2000) and others larger effects of unpleasant ones (Bernat et al., 2001). As mentioned above, the subjects’ clinical or motivational status may bias their cortical responses in either direction. Also, task characteristics as well as the timing of stimulus presentation may have an additional impact but so far the influence of these parameters is not well understood. Of note, some of the described effects occurred even before 200 ms after word onset, in a time range in which from a traditional theoretical standpoint meaning-related processing differences would not be expected (Schendan et al., 1998; Cohen et al., 2000). These very early effects of emotional content on ERP indices of visual word processing are rather heterogeneous with regard to timing, locus, and direction. Some of the inconsistencies are probably related to differences in instrumentation and recording methodology, number of electrodes, and choice of reference electrode(s) representing but the most obvious differences. The described studies also differ vastly in the way emotional content of the stimulus material was assessed as well as in the extent to which other, nonemotional, linguistic factors such as word class, length, and frequency or concreteness were controlled. Nevertheless, the bulk of the evidence suggests that, indeed, under certain circumstances the emotional connotation of words can affect even the earliest stages of preconscious sensory processing. Thus, the challenge is to specify under which circumstances such emotional modulation of earliest processing may occur and what the underlying mechanisms are. Two experimental factors arise from the reviewed studies that may contribute to the emergence of very early emotion effects. First, very brief stimulus presentation, near or even below the perceptual threshold (Begleiter and Platz, 1969; Kostandov and Arzumanov, 1977; Chapman et al., 1978, 1980; Flor et al., 1997; Knost et al., 1997; Bernat et al., 2001; Ortigue et al., 2004; Pauli et al.,
159
2005) and second, repeated presentation of comparatively small stimulus sets (Begleiter and Platz, 1969; Chapman et al., 1978, 1980; Skrandies, 1998; Skrandies and Chiu, 2003; Ortigue et al., 2004). None of the cited studies have explicitly assessed the effect of stimulus repetition on the latency of emotional-neutral ERP differences. Our own studies of repetition effects on negative difference waves distinguishing emotional from neutral content around 250 ms after stimulus onset show no evidence of change within five repetitions. However, some of the cited studies used by far more than five stimulus repetitions and studies of early semantic processing indeed suggest an effect of stimulus repetition on the timing on meaning-related differences in cortical activity: Pulvermu¨ller and colleagues report neurophysiological evidence of differences in semantic processing from 100 ms post word onset (Pulvermu¨ller et al., 2001a). They used a task in which a single subject was repeatedly, over several days, presented with a set of 16 words that she had to monitor and hold active memory as responses to occasionally presented new words were required. Thus, in above threshold presentation, preactivation of the cortical networks coding for meaning by using tasks that require continuous attention to and working memory engagement with the stimuli as well as use of many repetitions may foster earliest semantic processing differences, nonemotional and emotional alike. Further, a recent study on repetition effects in symbol processing found an increase in N1 amplitude (around 150 ms) across three repetitions of initially unfamiliar symbol strings (Brem et al., 2005), supporting the view that stimulus repetition can amplify early cortical responses to word-like stimuli. Thus, repetition effects affecting emotional stimuli more than neutral ones as a consequence of differential initial capture of attention and rapid perceptual learning may account for some of the very early ERP effects in emotional word processing. Early effects of emotional content on brain responses to subliminally or near-subliminally presented stimuli have occasionally been accounted for by fast, subcortical short-cut routes (see Wiens, this volume, for a discussion of issues of subliminal stimulus presentation). Evidence from animal
experiments and functional neuroimaging indeed reveals the existence of such subcortical ‘short-cut’ routes in emotional processing, particularly of fear-relevant stimuli (Davis, 1992; LeDoux, 1995; Morris et al., 1999). Direct pathways from the superior colliculi and the thalamus to the amygdala and the cortex allow for the automatic processing of relevant stimuli outside of conscious awareness, preparing rapid behavioral responses. In humans, this subcortical pathway has been mapped by functional neuroimaging during fear conditioning of subliminally presented faces (Morris et al., 1999) as well as during the subliminal presentation of faces with fearful expressions (Liddell et al., 2005). On a cortical level, its activity may be reflected in transient early responses. Brief, subliminal stimulation with fearful faces has recently been shown to result in a transient enhancement of the N2 and early P3a components, which, however, did not continue in later N4/P3b/LPC windows. For supraliminal stimulation, conversely, N4/P3/LPC but not N2 components responded to emotional content (Liddell et al., 2004), suggesting the operation of a slower, conscious processing and evaluation route. Conceivably, subliminal stimuli receive a temporally limited amount of processing that wanes if it is not confirmed by further supraliminal input, much like in the case of subliminal priming (Greenwald et al., 1996; Kiefer and Spitzer, 2000). Recording ERPs during subliminal and supraliminal semantic priming, Kiefer and Spitzer observe decay of subliminal semantic activation within 200 ms, a delay at which supraliminal priming effects can still be robustly demonstrated. A plastic, maladaptive downregulation of subcortical excitability may account for early responsiveness to unpleasant and disorder-related words in clinical populations (see e.g. Pauli et al., 2005 for supportive evidence). Clearly, at present the operation of a fast subcortical route from the thalamus and the amygdala in emotional word processing that could account for near or subthreshold emotion effects in visual word processing remains a speculative conjecture. A most critical point is that such a mechanism would require at least basic ‘reading abilities’ in the thalamus. While the case for stimuli such as faces or threatening scenes that by some are
160
assumed to be part of our ‘evolved fear module’ (O¨hman and Mineka, 2001) can be made much more easily, many would have a hard time believing in rather sophisticated subcortical visual capacities allowing for the discrimination of written words. On the other hand, subcortical structures are also subject to modifications by learning, and by the time people take part in experiments they will usually have had about two decades of reading expertise. So far, most of the evidence for the subliminal processing of emotional stimuli is based on studies with aversive material. Accordingly, the abovereviewed studies evidence extremely early effects primarily for unpleasant words (Flor et al., 1997; Knost et al., 1997; Bernat et al., 2001). An alternative explanation of some of these early effects of enhancement by emotional content in visual word processing that would not rely on subcortical by-pass routes and therefore on subcortical vision is reentrant connections between the so-called visual word form area (VWFA) and the emotion processing system. During visual word recognition the earliest activation of an invariant version of the visual word form (i.e. the font-, size-, position-invariant representation of the string of letters) occurs from about 100 ms after stimulus onset (Sereno et al., 1998; Assadollahi and Pulvermuller, 2003). Form invariant, abstract representations of highly overlearned visual objects such as words and faces have been found to originate in the fusiform gyrus (Haxby et al., 1994; Chao et al., 1999; Cohen et al., 2000; Dehaene et al., 2002). Electrophysiological evidence with regard to the onset of word-specific effects of fusiform activity varies, with some authors reporting onsets around 120 ms (Tarkiainen et al., 1999; Assadollahi and Pulvermu¨ller, 2001) and others somewhat later around 170 ms (Bentin et al., 1999; Cohen et al., 2000). Timing differences may be partly attributable to differences in word familiarity across experiments (King and Kutas, 1998). Immediately after access of the visual word form, meaning can be activated: Assadollahi and Rockstroh (2005) showed that activation differences due to super-ordinate categorical differences (animals vs. plants) can be found in left occipitotemporal
areas between 100 and 150 ms after word onset, whereas activation differentiating between subordinate categories was evident only from 300 ms on. Dehaene (1995) observed the earliest ERP differences between words of different categories (verbs, proper names, animals), 250–280 ms after word onset. Semantic category differences were reflected in the scalp distribution of a left occipitotemporal negativity. Using RSVP designs a similar occipitotemporal negativity has been identified. This negativity has been termed the ‘recognition potential’ (RP). It is sensitive to semantic aspects of visual word processing and has its maximum around 250 ms after word (Rudell, 1992; Martin-Loeches et al., 2001; Hinojosa et al., 2004). The ‘RP’ responds to manipulations of depth of semantic analysis, its amplitude increasing with the meaningfulness and task-relevance of the presented word. Source analysis has placed the origin of the RP in the fusiform gyrus (Hinojosa et al., 2001). Results from our laboratory are consistent with the view that a word’s emotional connotation enhances the associated recognition potential (see Fig. 4). Thus, a word’s emotional connotation could be directly connected to the abstract representation of its visual form. Moreover, the combined evidence suggests that emotional content amplifies early stages of semantic analysis in much the same way an instructed attention enhancing processing task would. If enhanced semantic processing is an important mechanism by which emotional content affects visual word processing, again, the question arises as to the causative mechanism: back-projections from the anterior cingulate and the amygdala may give rise to such processing enhancements. In support, amygdala lesions impair the enhanced detection of unpleasant words in an RSVP attentional blink paradigm but not of identification enhancements caused by manipulation of target color (Anderson and Phelps, 2001). Thus, assuming that the amygdala plays a pivotal role in the preferential processing of emotional words as recently suggested by several neuroimaging and lesion studies (Isenberg et al., 1999; Anderson and Phelps, 2001; Garavan et al., 2001; Hamann and Mao, 2002; Naccache et al., 2005), an alternative model to the above described thalamo-amygdalo-cortical route
161
could account for most of the data. Emotional amplification of semantic processing would occur after initial stimulus identification, caused by bidirectional reentrant communication between cortical regions and the amygdale (Amaral et al., 2003). Crucially, cortical analysis would precede and spark subcortical amplification of cortical processing. Clearly, a theoretically crucial priority for future research is to determine the timing of subcortical mechanisms in relation to cortical enhancement of ERP responses to emotional words. Unlike for other semantic categories such as movement-related verbs (Pulvermu¨ller et al., 2000, 2001b), ERP data for emotional words so far suggest little consistent emotion-specific change in topography (but see Skrandies, 1998; Skrandies and Chui, 2003; Ortigue, 2004). A distinct, emotion-associated topography might point to the existence of a homogeneous emotion lexicon localizable in distinct neuronal populations of the brain as has been suggested for other semantic categories (Martin et al., 1996). Rather, emotional content seems to amplify cortical word processing, much in the same way as it enhances picture (Jungho¨fer et al., 2001) or face processing (Schupp et al., 2004). However, functional neuroimaging techniques with better spatial resolution of especially deep cortical and subcortical structures (see Cato and Crosson, this volume) and the more consistent use of dense array EEG studies (Ortigue et al., 2004) may provide additional information.
Late components (after 300 ms) Healthy volunteers In relation to traditional stages of visual word processing, effects occurring later than 300 ms after word onset are less puzzling than the previously discussed early ones. Enhanced late positivities in response to emotional word content have most often, but not invariably, been found. In several of the already discussed studies reporting early ERP modulations as a function of emotional content, later enhanced positivities associated with the emotional content of the word stimuli are also apparent. For instance, in the data shown by
Chapman et al. (1978, 1979, 1980; see above) a positivity occurring around 300 ms is discernible and appears to be primarily related to the potency dimension extracted from their data. Using materials from the Begleiter et al. (1969, 1979) and Chapman et al. (1978, 1979) studies, Vanderploeg et al. (1987) assessed ERP responses to visually presented emotional (20 pleasant, 20 unpleasant) and 20 neutral words and face drawings (two per emotion category), which were evaluated during viewing. The EEG was recorded from six electrodes referenced to linked ears in 10 male subjects. During viewing, the visual stimuli were presented for either 80 ms (words) or 100 ms (faces). In the conditioning phase, the face drawings were shown for 1500 ms. For both faces and words clearly discernible emotion-category dependent differences in ERP tracings appear from around 300 ms after stimulus onset as parietal positivities. A small but significant effect of emotional connotation of words but not faces on the spatial distribution of the ERP was also evident in the P2 window. Interestingly, although sizeable in appearance, the P3 effect of emotional connotation did not reach statistical significance for words. For a later positivity (positive slow wave/late positive complex) a similar result was obtained; although visible in the presented grand-averages, the difference in parietal positivity between emotional, pleasant and unpleasant, and neutral words does not reach significance in an analysis of the corresponding PCA factors while it does for faces. The authors, in line with Lifshitz’ (1966) early finding, suggest that words may be less powerful (or more heterogeneously evaluated) emotional stimuli than pictures (Vanderploeg et al., 1987). Thus, ERPs from10 subjects may not yield enough statistical power to assess connotation-dependent ERP differences, particularly in studies using comparatively sparse electrode arrays. Also, the perceptual variance between 20 words of a category may be higher than among two faces. Differential effects may, therefore, also result from the greater consistency or higher frequency of occurrence of the faces. Indeed, subsequent studies have found robust effects of emotional connotation on later ERP components. For instance, Naumann et al. (1992) investigated late positive potentials to adjectives
162
varying in emotional content. Their key idea was that using ERPs it should be possible to dissociate emotional and cognitive processing and that, following LeDoux (1989), cognitive and emotional processing systems should be functionally and neuronally separable as reflected in distinct ERP scalp topographies. In an initial experiment, 30 prerated pleasant, unpleasant, and neutral adjectives were presented to 14 subjects who had to either evaluate the words as pleasant, unpleasant, or neutral (affective task) or determine whether a word’s length was longer, shorter, or equaled six letters (structural task). The EEG was recorded from three midline electrodes (Fz, Cz, and Pz), which were referenced to the left mastoid. ERPs were assessed between 300 and 700 ms after word presentation for the P3 component and between 700 and 1200 ms for the later positive slow wave. For both components and all word categories, ERP amplitudes were more positive going for the affective than for the structural task, particularly at electrodes Fz and Cz. Moreover, P3 amplitudes were also generally more positive in response to emotional than neutral adjectives. The spatial distribution of the subsequent slow wave component varied with emotional category, displaying larger amplitudes at Pz than at Cz and Fz for pleasant and unpleasant words but having equal amplitudes at all three electrodes for the neutral words. This pattern was taken as evidence for the hypothesized separateness of affective and cognitive processing systems. Naumann et al. (1992) replicated this result in a second experiment having calculated an ideal sample size of 106 subjects, minimizing the likelihood of false-negative results. In the replication study a between groups design was used, assigning 53 subjects each to either the structural or the affective task. Again, for both components and all word types, more positive frontal ERPs were obtained for the affective than for the structural task. The P3 component was larger for both pleasant and unpleasant than for neutral adjectives. Furthermore, positivities in response to emotional adjectives were particularly pronounced at Cz and Pz, and this gradient was most evident for the pleasant words. For the slow wave, the scalp distribution likewise exhibited a parietal maximum and this
was more pronounced for the emotional than for the neutral adjectives. Thus, overall, an emphasis on emotional processing (affective task) caused an anterior shift of the scalp distribution. Furthermore, regardless of task, emotional stimuli led to more pronounced parietal peaks than neutral ones. Again, the authors interpreted their results as evidence for a functional and structural distinctiveness of affective and cognitive functions in the human brain as suggested by LeDoux (1989). However, a third demonstration of dissociable affective and cognitive processes in visual word processing failed. Naumann et al. (1997) examined a sample of 54 students in three different tasks, namely letter search (structural task), concrete–abstract decision (semantic task), and an unpleasant–neutral decision (affective task) on a set of nouns varying in emotional content. Fifty-six nouns were used that could be divided into subsets of seven words unambiguously belonging to one of eight possible combinations of these attributes. ERPs were now recorded from nine scalp locations (Fz, Cz, Pz, and adjacent left and right parallels). There was indeed a considerably larger P3 for unpleasant compared to neutral words, albeit the effect was not general but restricted to the affective task. This casts doubt on the assumption that cognitive and emotional word processing operate along completely separable routes and raises the question to what extent larger late positive potentials to emotional word stimuli occur outside the attentional focus. Given the early effects (o300 ms) reported above, it would have been interesting to analyze the data with a focus on early and possibly automatic impacts of emotion on word processing. Naumann et al. (1997) raise a number of conceivable reasons for the reduction of the effect, favoring a familiarity-based explanation. In the experiments that had yielded ‘uninstructed’ and topographically distinct late responses, subjects had been familiar with the stimuli beforehand. Thus, the affective differences between the stimuli may have already attracted the participants’ attention. Moreover, the new design reduced the probability of occurrence for an emotional word, possibly making this dimension less salient; although the converse hypothesis, based on an
163
oddball effect, would be equally plausible. Moreover, in their initial studies, Naumann et al. (1992) had used adjectives that may produce somewhat different effects, given that ERP differences between word classes have been reported (e.g. Federmeier et al., 2000; Kellenbach et al., 2002). Fischler and Bradley (this volume) report on a series of well-controlled studies where effects of word emotionality on late positivities are consistently found for both pleasantly and unpleasantly arousing words when the task requires semantic processing of the presented words but not otherwise. Some studies also show larger late positive potential effects restricted to pleasant words (Schapkin et al, 2000; Herbert et al., 2006), which were missing in Naumann’s 1997 study. For instance Schapkin et al. (2000, see above) report larger P3 and late positive slow wave responses to pleasant compared to both neutral and unpleasant words during evaluative decision. As mentioned above, Herbert et al. (2006) report a study where early ERP responses (P2, P3a) reflected the arousal dimension of the words, differentiating both pleasant and unpleasant from neutral words. The later LPC, however, differentiated pleasant from unpleasant stimuli and was larger for pleasant words (see Fig. 5). Bernat et al. (2001), on the other hand, report enhanced responses to unpleasant as compared to pleasant words across the entire analysis window, until 1000 ms after word onset encompassing P3 and LPC. Schapkin et al. (2000) additionally assessed late negativities that were labeled N3 (around 550 ms) and N4 (around 750 ms), finding no effect of emotional content. Data from our own studies do show a small effect of emotional content on N4 amplitudes with larger N4 to neutral than to emotional words possibly reflecting a contextual expectancy for an emotional content caused by unequal stimulus probabilities. In both studies (Schapkin et al, 2000; Kissler et al., submitted), two-thirds stimuli had emotional content (pleasant or unpleasant), only one-third was neutral. Late components — clinical studies Weinstein (1995) is one of the few reports of a modulation of integration of emotionally charged
words following a sentence context. Students with high-trait anxiety levels had a reduced N400 (or, in Weinstein’s terminology enhanced P400) to words following a threatening sentence context, indicating facilitated integration of information within threatening contexts. An alternative interpretation might suggest enhanced sustained attention to threatening information in highly anxious subjects, if the potential described were taken to resemble a P3/LPC component, which is not entirely clear on the basis of the presented data. Personality-dependent changes in late cortical responses to emotional words have been subsequently replicated: Kiehl et al. (1999) tried to extend Williamson et al.’s (1991, see above) results of deficient early (P2) and late (LPC) ERP responses to emotionally charged words in psychopathic subjects. They assessed similarities and differences in the processing of abstract–concrete vs. pleasant–unpleasant words in psychopaths and comparison subjects. To address the processing of emotional words a pleasant–unpleasant decision task was used; although the initial study had not revealed any valence differences. The altered task was apparently motivated by clinical observations suggesting that psychopaths have difficulty in understanding abstract information and in distinguishing pleasant from unpleasant valence. Stimuli were controlled for word length and frequency, syllable number, and concreteness. Word presentation was extended to 300 ms and words were presented only once, centrally and in a horizontal format. EEG was recorded from nine scalp positions, again with a linked mastoids reference. Analyses now focused on a 300–400 ms post-stimulus window and a LPC window (400–800 ms). Behaviorally, in both groups responses to pleasant words were faster and more accurate than those to unpleasant ones. Cortically, an N350 component differentiated between pleasant and unpleasant words but not between psychopaths and nonpsychopaths, being across groups larger for the pleasant words. Moreover, the valence differentiation was more pronounced over the left hemisphere. In the later time window (400–800 ms), unpleasant words elicited more positive going brain waves than pleasant ones. This left-hemispheric dominant differentiation was absent in psychopaths. In
164
effect, ERPs to unpleasant words were more positive than ERPs to pleasant words across both time windows, and the differentiation was reduced in psychopaths. It is unclear how the ERP patterns relate to the behavioral data (both groups were faster and more accurate for pleasant). But more positive-going late potentials for unpleasant stimuli in a binary pleasant–unpleasant decision are in line with data from Bernat et al. (2001). During a lexical decision task, Williamson et al. (1991) reported a larger LPC to emotional than to neutral words in nonpsychopathic subjects and to a lesser degree in psychopaths, but no differentiation between the pleasant and unpleasant words. Schapkin et al. (2000) and Herbert et al. (2006), by contrast, report larger late positivities for pleasant in comparison to both neutral and unpleasant words. Note, that neither the Kiehl et al. (1999) nor the Bernat et al. (2001) studies report data on neutral stimuli.
Comparing late emotional and late semantic word processing A considerable number of studies have found amplifying effects of emotional word content on electrophysiological cortical activity later than 300 ms after word onset. In contrast to the very early effects, they occur in a time range where modulation of cortical responses by word meaning is not unusual in itself. By 300–400 ms after stimulus onset ERP tracings reflect conscious processing stages (Halgren et al., 1994a, b) and clearly vary with semantic expectancy (Kutas and Hillyard, 1980, 1984), task relevance (Sutton et al., 1967), or depth of mental engagement (Dien et al., 2004). Thus, it is not surprising that ERPs in this time range can reflect processing differences between words of different emotional content. From a semantics perspective, N400 might represent an appropriate ‘classical’ ERP component to assess for emotion effects. Indeed, some studies have found modulations of N400- or N400-like ERP responses to words of emotional categories (Williamson et al., 1991; Weinstein, 1995; Kiehl et al., 1999). However, in line with ERP studies of affective processing of faces (Schupp et al., 2004) and
pictures (Keil et al., 2002), most researchers focused on an analysis of late positivities. The comparative paucity of reports on N400 modulation by emotional word content may partly reflect a bias on the part of the investigators and appear surprising in view of the fact that N400 is often regarded as ‘the electrophysiological indicator’ of semantic processes in the brain. On the other hand, it is becoming increasingly clear that the N400 response does not index lexical access or semantic processing per se but reflects semantic integration within a larger context, created by either expectations on sentence content or other contextual constraints within experiments (Kutas and Federmeier, 2000). Thus, it is reasonable to assume that single-word studies will only result in N400 modulations if strong expectations on emotional word content are established. Priming studies or experiments establishing an emotional expectation within a sentence context may provide a better testing ground for the issue of N400 modulations by emotional word content. Indeed, Weinstein (1995) followed this rationale establishing emotional expectations on a sentence level. Recent work from our laboratory also shows N400 modulation by emotional content in a lexical decision task where an emotional expectation (pleasant or unpleasant) was established by a preceding emotional picture. Of note, the pictures were of similar emotional connotation as the subsequent adjectives but the words were not descriptive of the picture content (Kissler and Ko¨ssler, in preparation). A transient mood induction may have mediated the effect; recently, effects of subjects’ emotional states on semantic processing have been reported (Federmeier et al., 2001). When subjects were in a mildly positive mood, their semantic processing was facilitated as reflected by a smaller N400 potential to more distant members of given categories than when in a neutral mood. Thus, a number of studies suggest that both a word’s emotional content and a subject’s emotional state may affect the N400 ERP response (but see Fischler and Bradley, this volume). Still, so far the most consistently reported later effects of emotional word categories on the ERP are seen in broadly distributed late positivities with a parietal maximum (see also Fischler and Bradley,
165
this volume). Such late positivities have generally not been associated with specific aspects of semantic processing but rather with task demands such as attentional capture, evaluation, or memory encoding. In neurolinguistics, late positivities have repeatedly been suggested to index syntactic reanalysis following morphosyntactic violations (Osterhout et al., 1994; Friederici et al., 1996; Hagoort and Brown, 2000). Yet, some studies also report modulations of late positivities by semantic attributes of language. For instance, in antonym processing differential P3 and LPC responses were found, depending on whether a word contained a given attribute or lacked it (Molfese, 1985). Contextual semantic constraints and stimulus abstractness have also been reported to affect late positivites (Holcomb et al., 1999). Both contextually expected and unexpected sentence-final words were associated with larger positivities than contextually unconstrained words, the effect being even more pronounced when the abstract words were contextually unexpected. Mu¨nte et al. (1998) also find late positive responses to language semantics, thereby challenging the account of specific morphosyntactic late positive shifts and corroborating the view that late positivities reflect mental engagement and effortful processing across a wide range of higher cognitive functions (Dien et al., 2004). Late positivities are likely to share a proportion of neural generators and differ on others, reflecting the extent to which the tasks that elicit them share or draw on different neural systems. Thus, topographically distinct late positive shifts may relate to different aspects of cognitive and emotional functioning, as suggested for instance by Naumann et al. (1992). However, in order to unambiguously elucidate topographic changes that reflect shifts in neural generator structure, simultaneous recordings from dozens of electrodes and advanced data analysis techniques are necessary (see also Jungho¨fer and colleagues, this volume). From the extant studies on the emotional modulation of late components in word processing, it is hard to gauge the extent to which emotion induces genuine topographic changes indicative of the recruitment of additional distinct cortical structures or purely amplifies the activity of a unitary processing system.
In emotion research, larger late positivities have consistently been shown during free viewing of emotional vs. neutral pictures (Keil et al., 2002) and faces (Schupp et al., 2004). If primary tasks distract participants from the emotional content of the visual stimuli, late positivities to emotional stimuli are often diminished reflecting competition for attentional resources. The degree to which and circumstances under which emotion and attention compete for resources, have additive effects, or operate in parallel is a matter of ongoing debate (see Schupp, et al., this volume, for a discussion). For visually presented word stimuli the picture is similar; when the primary task requires an evaluative emotional decision (pleasant–unpleasant–neutral, emotional–neutral) emotional words, like pictures or faces, are consistently associated with larger late positivities than neutral ones. When the primary task requires structural stimulus processing the evidence is mixed, with some studies still finding larger positivities in response to emotional stimuli (Naumann et al., 1992) while others do not (Naumann, 1997). During free viewing, a recent study (Kissler et al., submitted manuscript) finds a larger LPC to emotional words, suggesting that when subjects are free to allocate their processing resources as they wish, they process emotional words more deeply than nonemotional ones. Our results also indicate that late responses, around 500 ms, may be more affected by explicit attentional tasks than the simultaneously observed early effects around 250 ms. During lexical decision (Wiliamson et al., 1991) and naming tasks (Knost et al., 1997; Pauli et al., 2005), emotional words have also been found to be associated with larger late positivities (but see Fischler and Bradley, this volume). Thus, when the task allows for or even requires semantic processing, emotional words are processed more deeply than neutral ones. When the task requires structural processing this processing advantage is considerably diminished (Naumann et al., 1997). Clearly, the extent to which larger late positivities to emotionally relevant words are driven by arousal, valence, or additional subject-, task-, or situationspecific factors is not quite settled. The matter is complicated by the fact that studies differed on the instruments used to assess emotional word content
166
and the extent to which the pleasant and unpleasant dimension were differentiated or collapsed into one ‘emotional’ category. A fair number of studies employed the empirically well-founded semantic differential technique to assess emotional content, or the two-dimensional arousal valence space, yet others do not even report the criteria by which the emotional content of the material has been determined. Although multidimensional models of affect are empirically well founded and the use of numerical rating scales allows for the rapid assessment of large numbers of stimuli on many dimensions, an inherent problem with Likert-type scaling remains. Such scaling techniques assume that subjects will meaningfully assign numbers to psychological stimuli, such that the quantitative relationships between the numbers will correctly reflect the psychologically perceived relationships among the stimuli, including conservation of distance or conservation of ratio, yielding interval or even ratio scales. But these assumptions do not always hold, such that it is unclear whether the psychological distance between stimuli rated 2 and 4 on a given scale is really the same as between stimuli rated 6 and 8 (Luce and Suppes, 1965; Kissler and Ba¨uml, 2000; Wickelmaier and Schmid, 2004). Moreover, the relationship between behavioral ratings and physiological impact of emotional stimuli is likely to be nonlinear. Nevertheless, the bulk of the data corroborates the view that during earlier stages of processing, emotion acts as a nonvalence-specific, arousaldriven alerting system (see above). During later stages of processing (4300 ms), the patterns found are more varied and may reflect flexible adaptations to contextual factors. In support, Herbert et al. (2006) recently found arousal-driven amplification of cortical responses to both pleasant and unpleasant words within the first 300 ms and a divergent pattern that favors the processing of pleasant material in a later time window (see Fig. 5). Keil (this volume) discusses a number of task factors that contribute to processing advantages for pleasant or unpleasant material in turn. For language material with emotional content, a general pleasant–unpleasant asymmetry in emotional processing may be important: At low levels of arousal, a ‘positivity offset’ is often found in
that the approach system responds more strongly to relatively little input. The withdrawal system in response to unpleasant input, in turn, is activated comparatively more at high levels of arousal, this latter process being termed ‘negativity bias’ (Caccioppo, 2000; Ito and Caccioppo, 2000). Visually presented words are likely to constitute lessarousing stimuli than complex colored pictures, i.e., the word ‘cruel’ will be less arousing than a photograph of a corresponding scene, even if both stimuli receive comparable ratings. Therefore, in the absence of strong unpleasant personal associations for a given word, which may well be present in various clinical populations (see above), a ‘positivity offset’ for written verbal material might be expected. Corresponding data are reported, for instance, by Schapkin et al. (2000) or Herbert et al. (2006). Like for the early effects, the question arises how late effects of emotional word content on ERPs come about; subcortical activity has again been implicated. Nacchache et al. (2005) have recently for the first time recorded directly from the amygdala field potentials in response to emotional words. Three epilepsy patients with depth electrodes implanted for presurgical evaluation performed an evaluative decision task (threatening–non-threatening) on a series of threat or nonthreat words presented subliminally or supraliminally. In all three patients, larger amygdala potentials to threat than to nonthreat words could be identified around 800 ms after word presentation in the subliminal and around 500–600 ms in the supraliminal condition. The study is pivotal in that it both directly measures amygdala activity during emotional word processing and provides clues as to the timing of this activity. As detailed before, subcortical, primarily amygdala activity may be a source of cortical amplifying mechanisms in response to emotional stimuli visible in ERPs. Amygdala activity measured by depth electrodes around 600 ms after stimulus onset may provide a basis for LPC amplifications evident in the surface ERP. However, the timing of the responses poses new puzzles. If amygdala activity in emotional word processing onsets around 600 ms, how are early effects of emotional word content generated (see discussion above)? Amplified cortical ERP responses reflect the activation of larger
167
patches of cortex, indicating spread of activation in a more densely packed neural network. These result from life-long associative learning mechanisms. The effects of emotional learning can be seen in amplified cortical ERP tracings whenever the corresponding semantic network is accessed. Subcortical mechanisms might be active primarily in the acquisition of emotional semantics, reflecting the role of the amygdala in emotional learning even of abstract representations (Phelps et al., 2001). Their impact may be attenuated once a representation has been acquired. Clearly, elucidating the mechanisms by which amplified responses to emotional words are generated is a vital issue for future research. In sum, a considerable number of studies show enhanced late positive responses when people process emotionally laden words, pleasant and unpleasant alike. The responses are not as large as for pictorial or face stimuli, but they have been reliably demonstrated across numerous studies. Major challenges for future research remain in determining the relative role of arousal and valence and their interactions with task demands. Finally, the question to what extent and at which points in time, processing of emotional words recruits specific cortical and subcortical neural circuitries merits further scientific attention.
Processing emotional words — electrophysiological conjectures The above review demonstrates that emotional word content can amplify word processing at all stages from access to word meaning (around 200 ms), to contextual integration (around 400 ms), evaluation, and memory encoding (around 600 ms). Occasionally, emotionality-dependent enhancements have been reported even before 200 ms. In neurolinguistics, the timing of lexical access is heatedly debated. The reports about different points in time where some aspects of the lexical information on a word are accessed vary between 100 and 600 ms. Importantly, the interpretation of the N400 has shifted from an index of semantic access to a signature of the interaction between single word semantics and context. Accordingly, a
growing body of evidence demonstrates that some aspects of word meaning must be active before the N400 is elicited. We propose that there is no fixed point in time where all semantic information is equally available. Instead, subrepresentations can become activated in a dynamic, possibly cascaded manner. Variations in timing of semantic signatures could be interpreted in the light of an internal structure of the entry of a mental lexicon. Different aspects of semantics could be flexibly prioritized, depending on context, task, and motivation of the subject. If the tenet of a monolithic lexical entry is given up, the internal structure of a word’s lexical representation can be assessed by investigating the timing by which each subrepresentation is available (Assadollahi and Rockstroh, 2005) or the contextual constraints that lead to the activation of a particular subrepresentation at a given point in time (Sereno et al., 2003). Emotional semantics may be special; their connection to biologically significant system states and behavioral output tendencies may ensure most rapid activation of the underlying neural network representations. A number of studies endorse a simultaneous impact of emotional word content on both cortical and peripheral responses (Flor et al., 1997; Knost et al., 1997; Herbert et al., 2006), corroborating the view that the neural networks representing emotional concepts dynamically link semantic and response information (Lang et al., 1993, see Fig. 3). Many studies also show surprisingly early cortical responses to emotional word content. Thus, the subrepresentation of a word’s emotional content may sometimes, though clearly not always, be activated before other lexical information is available, for example, whether a word denotes an animate or an inanimate entity. Depending on personality, context, and task, emotional semantic networks may become activated rapidly or gradually, operate over several hundred milliseconds or their activation may decay quickly when not currently relevant. The specification of experimental factors contributing to the generation of arousal-, valence-, or even potency-specific effects or to the temporal gradient of the processing enhancement caused by emotional connotation is lacking detail. As a
168
working hypothesis, emotional content amplifies word processing at the earliest stage at which the visual word form is assembled, i.e., no sooner than about 100 ms after word presentation. Learning, top-down and priming processes may dynamically modify the time point at which the activation of the representation takes place (see above). Arousal is likely to govern this amplification process within the first 300 ms, reflecting a general orienting function of emotion. Under specific experimental manipulations, such as very brief stimulus duration, or in populations with pronounced negative processing biases, earliest advantages for unpleasant material may be obtained, possibly reflecting modality-independent operations of a rudimentary rapid threat detection system (O¨hman and Mineka, 2001). In general, the neural representations of abstract linguistic features are less likely to be activated by such a superfast detection system. Later than 300 ms after stimulus onset the behavioral relevance of the stimulus in a given
situation is likely to determine its further processing, with tasks that require processing of semantic content supporting sustained enhancement of emotional stimuli and contextual effects determining possible processing advantages for either valence. Under circumstances where structural outweighs semantic processing or where the stimuli occur very briefly and are not followed by confirmatory input, only transient enhancement of early brain responses and less, if any, amplification in later time windows will result. Subcortical structures, most prominently the amygdala, have been implied at all stages of this emotional content-driven amplification process, but the dynamics of the interplay between subcortical and cortical structures in processing emotional words await specification. Combining electrophysiological, functional magnetic resonance, and lesion approaches within a network theory of emotion and language will be useful to clarify these issues.
Appendix Table A1. A summary of experimental design, recording parameters and results of the reviewed studies on effects of emotional content on ERP measures of word processing in healthy volunteers. Study
Subjects N, sex
Task(s)
Stimuli
Presentation
Recording and analysis
Word class
Stimulus duration
Number of electrodes
Emotional content
Position of presentation
Reference
Control of additional stimulus parameters
Stimulus repetitions
Epoch duration (poststimulus)
ERP effects
Statistical analysis
Lifshitz (1966)
1, male
Viewing, memorizing
Word class not indicated
1000–2000 ms
4
Emotional content determined by author: 40 ‘dirty words’ 40 neutral words
Centrally
Left-hemispheric leads, reference unclear
Number of repetitions not indicated
18, all males
Viewing, word naming
500 ms Visual inspection
Controlled for: word length Begleiter and Platz (1969)
Visual comparison dirty words versus neutral words: no marked differences visible.
Nouns
10 ms
1 (O2)
Amplitude: ‘taboo’4neutral & flash
Emotional content determined by authors: 2 ‘taboo’ words (shit, fuck) 2 ‘neutral’ words (tile, page) blank flash
Centrally
Linked ears reference
20 repetitions
1024 ms
Naming4viewing (100–200 ms and 200–300 ms)
ANOVA t-test
Latency: ‘taboo’oneutral & flash (300–400 ms)
Controlled for: word length
Kostandov and Arzumanov (1977)
37, sex not indicated (‘all in state of jealousy’) Experiment 1 supraliminal 23. Experiment 2 subliminal, 14.
Covert identification and counting of stimulus repetitions
‘Conflict’ and ‘neutral’ words
Experiment 1: 200 ms 50 repetitions?
No further details on experimental stimuli given.
Experiment 2: 15 ms 50 repetitions?
Additional parameter control unclear
2 (Cz, O1) Left mastoid reference 1000 ms t-tests
Experiment 1: Supraliminal: N2 (220 ms) Emotional content on peak amplitude & latency n.s. P3a (270–320 ms) O1: Peak amplitude: emotional4neutral Peak latency: emotionaloneutral Cz: n.s. Experiment 2: Subliminal: N2 (220 ms) Peak amplitude O1 and Cz: emotional 4 neutral
169
P3a (300 ms) O1 and Cz: Peak amplitude: emotional 4 neutral Peak latency, n.s.
170
Table A1. (Continued ) Study Begleiter et al. (1979)
Subjects N, sex 10, all males
Task(s)
Stimuli
Presentation
Recording and analysis
ERP effects
Letter identification
Word class not indicated
20 ms
6 (F3, F4, C3, C4, P3, P4)
N1–P2 peak-to-peak amplitude (140–200 ms)
Affective rating
Prerating on semantic differential: 62 pleasant, 62 unpleasant, 62 neutral
Centrally
Only P3 and P4 analyzed, linked ears reference
Affective rating condition Electrodes P3 and P4: pleasant 4 neutral unpleasant 4 neutral pleasant 4 unpleasant
Single presentation
450 ms
Additional parameter control unclear
ANOVA t-test
Effects: left 4 right Left electrode (P3) Affective rating 4 letter identification No emotion effect in letter identification
Chapman et al. (1978)
10, 4 males
Word naming
Word class not indicated
17 ms
1 (CPz)
Pre-rating on semantic differentiala: 20 E+, 20 E–, 20 P+, 20 P–, 20 A+, 20 A–
Centrally
Linked ears reference
15 repetitions
510 ms PCA of ERP, ANOVA on PCA components discriminant analysis
Luminance, distribution of letters
Chapman (1979)
10
Verbal affective rating
Word class not indicated
17 ms
1 (CPz)
Pre-rating on semantic differentiala: 20 E+, 20 E–, 20 P+, 20 P–, 20 A+, 20 A–
Centrally
Linked ears reference
15 repetitions
510 ms PCA of ERP, ANOVA on PCA components discriminant analysis
Additional parameter control unclear
Chapman et al. (1980)
10, 4 males
Word naming
Word class not indicated
17 ms
1 (CPz)
Verbal affective rating
Pre-rating on semantic differential: 20 E+, 20 E–, 20 P+, 20 P–, 20 A+, 20 A–
Centrally
Linked ears reference
15 repetitions
510 ms PCA of ERP, ANOVA on PCA components discriminant analysis
Additional parameter control unclear
Vanderploeg et al. (1987)
10, all males
Affective rating
Word class not indicated
80 ms
6 (Fz, Pz, F7, F8, T5, T6)
Words from Begleiter et al. (1969, 1979) and Chapman et al., (1977, 1978)
Centrally
Linked ears reference
Repeated until 32 artifactfree averages per category obtained
800 ms
Additional parameter control unclear
PCA, ANOVA on PCA components
PCA/stepwise discriminant analysis differentiate among all 6 stimulus types within the first 500 ms Visual inspection of provided figures: N1, P2, and P3a differentiate the+and – poles of the E, P, A dimensions
PCA/stepwise discriminant analysis differentiate among all 6 stimulus types within the first 500 ms Differentiation between semantic scales (E, P, A) within 190 ms Differentiation among polarities of word content types (E+/, P +/, A+/) within 330 ms
PCA/stepwise discriminant analysis differentiate among all 6 stimulus types within the first 500 ms Visual inspection: N1, P2, P3a differentiate the+and – poles of the E, P, A dimensions
n.s. trend: amplitude: emotional (pleasant & unplesant) 4 neutral On P3a (230–420 ms) and LPC (500–624) ms
Naumann et al. (1992)
Pilot study: 17, 8 males
Word length decision
Adjectives
125 ms
Replication study: 106, 50 males
Affective rating
90 words affective norms (Hager et al., 1985)
Centrally Single presentation
30 pleasant, 30 unpleasant, 30 neutral
3 (Fz, Cz, Pz) Left mastoid reference 1500 ms post word ANOVAs on P3 and positive slow wave amplitude and peak latency
Controlled for: length, concreteness
Pilot study: P3 amplitude (300–700 ms) Affective rating4word length decision. Emotional content (pleasant & unpleasant) 4 neutral Positive slow wave amplitude (700–1200): Affective rating4word length decision Electrode emotional content Larger positivities at central and parietal electrodes for pleasant and unpleasant Replication study: P3 amplitude Emotional content Emotional (pleasant & unpleasant)4neutral Electrode task: Fz: affective rating4word length decision Electrode emotional content Larger effect for pleasant, particularly at Fz and Cz P3 peak latency: Neutraloemotional (pleasant & unpleasant) Positive slow wave amplitude Electrode emotional content No parietal maximum for neutral
Naumann et al. (1997)
54 (26 males, between design, 18 per condition)
Letter detection
Nouns
200 ms
Concreteness decision
Published affective norms (Schwibbe et al., 1981)
Centrally
Valence decision
28 unpleasant, 28 neutral Controlled for: length, concrete-ness
Single presentation
9 (F4, C4, P4, Fz, Cz, Pz, F3, C3, P3) -Linked ears reference
Emotional content task Unpleasant4neutral Effect from valence decision task only
1400 ms post word
Valence decision task P3 amplitude: Unpleasant4neutral
ANOVA and Tukey HSD test on P3 and positive slow wave amplitudes and peak latencies
P3 peak latency Emotional content task Letter detection & concreteness decision: unpleasantoneutral Valence decision: neutralounpleasant Positive slow wave amplitude No effects
171
172
Table A1. (Continued ) Study Skrandies (1998)
Subjects N, sex 22, 8 males
Task(s) Visualize and memorize words
Stimuli
Presentation
Recording and analysis
Nouns
1000 ms
30 (10–20 system)
Pre-rating on semantic differentiala (different population): 10 E+, 10 E–, 10 P+, 10 P–, 10 A+, 10 A–
Centrally
Average reference 1000 ms
RSVPb
ANOVAs on centroids of scalp distributions and peak latencies Duncan post hoc
40 repetitions
Controlled for: Word length, word frequency
ERP effects Complex sequence of effects, discriminating among E, P, A dimensions and their polarity: P1 (80–130 ms): Peak latency A4E and P Right negative centroid shift for E- words Left positive centroid shift for E- words N1 (130–195 ms): Increased peak latency for E and A words Reduced global field power (GFP) for A words 195–265 ms: Peak latency PoE and A Anterior centroid shift for E-, P-, Acompared to E+, P+, A+ 565–635 ms: Peak latency: E+, P+, A+4E-, P-, A635–765 ms: Centroids: P+ and P- shifted anteriorally and posteriorally 860–975 ms: Peak latency: E+ and P+4E- and P-A+oA-
Skrandies and Chiu (2003)
23, 10 males
Visualize and memorize words
Nouns
1000 ms
32 (10–20 system)
Pre-rating on semantic differentiala (different population): 10 E+, 10 E–, 10 P+, 10 P–, 10 A+, 10 A–
Centrally
Average reference
RSVPb
1000 ms post word
24(?) repetitions
ANOVAs on peak latency centroid position Global field power (GFP)
Controlled for word length, word frequency
Complex sequence of effects, discriminating between E, P, A dimensions and their polarity: P1 (80–130 ms): Latency E+, P+, A+4E-, P-, AN1 (130–195 ms): GFP P-and A-4P+and A+ Posterior-anterior location of centroids: A+anterior P+anterior E+and E-anterior P-and ALeft-right location of centroids: Right-shift of A centroid No effects after 300 ms
Schapkin et al. (2000)
15, 7 males
Emotional vs. neutral classification
Nouns
150 ms
Pre-rating on valence (different population)
Left or right visual field
6 pleasant, 6 unpleasant, 6 neutral
8 repetitions (4 per visual field)
17, 8 males
Focus on fixation point
Adjectives Pre-rating on semantic differential in same population. 10 pleasant, 10 unpleasant, 12 neutral
P2 amplitude (200–300 ms) pleasant4unpleasant
Linked ears reference
P3 amplitude (300–450 ms) pleasant4unpleasant
800 ms post word ANOVAs on base-to-peak measures of N1, P2, P3, SPW post hoc t-tests
Controlled for: syllables, word length, word frequency, concreteness, imagery, initial letter
Bernat et al. (2001)
14 (only C3, C4, P3, P4, O1, O2 analyzed)
Subliminal (1 ms) Supraliminal (40 ms) Centrally 12 repetitions (6 subliminal, 6 supraliminal)
‘word length, word frequency, luminance
6 (F3, F4, P3, P4, CPz, Oz) Linked ears reference 1000 ms post word ANOVAs on P1, N1, P2, P3, LPC mean amplitudes pleasant vs. unpleasant (analysis of neutral words is not presented)
Slow positive wave (1000–1800): Frontal pleasant4unpleasant Parietal: pleasant4neutral
Subliminal: Amplitudes: Emotion: P2 (100–210 ms), P3 (200–500 ms), LPC (500–900 ms): unpleasant4pleasant Emotion hemisphere: P1 (40–120 ms), N1 (80–170 ms), P3 (200–500 ms): Left: unpleasant4pleasant Right: unpleasant4pleasant Supraliminal: Amplitudes: Emotion: P3 (200–500 ms) and LPC (500–900 ms): unpleasant4pleasant Emotion hemisphere, P1 (20–80 ms), N1 (52–150 ms), Left: unpleasant4pleasant Right: unpleasant4pleasant P3 (200–500 ms) Emotional Content hemisphere: unpleasant: left4right
Ortigue et al. (2004)
13, all males
Divided field lexical decision (in which visual field did a word appear?)
Nouns Prerating for emotionality on seven-point scale (emotional–nonemotional) 16 words, (8 neutral 8 emotional) 96 pseudowords Controlled for: word length, word frequency
13 ms Left and right visual field simultaneously 30 repetitions
123 (extended 10–20 system)
Amplitude: 100–140 ms: emotional4neutral
Average reference
Distinct right occipital spatial map for emotional words presented in the right visual field
250 ms post word ANOVAs on spatial configuration maps LAURA source estimation
173
174
Table A1. (Continued ) Study Herbert et al. (2006)
Subjects N, sex 26, 16 males
Task(s) Covert evaluation
Stimuli
Presentation
Recording and analysis
ERP effects
Adjectives
4000 ms
64 (extended 10–20 system)
Amplitude:
Pre-rated on valence and arousal by different population
Centrally
Average reference
Single presentation
1000 ms post word
P2 (180–250 ms) and P3a (250–400 ms): emotional (pleasant & unpleasant)4neutral
ANOVAs on channel groups
60 unpleasant, 60 pleasant, 60 neutral
LPC (600–750 ms): pleasant4neutral
Controlled for: word length, word frequency
Kissler et al. (submitted)
16, 8 males
Silent reading
Nouns
333 ms and 1000 ms
64 (extended 10–20 system)
Pre-rating of valence and arousal by different population
RSVPb
Average reference
10 (5 per presentation speed)
333 ms/1000 ms post word
60 unpleasant, 60 pleasant, 60 neutral
ANOVA on groups of electrodes
Amplitude: 200–300 ms occipito-temporal electrodes Emotional (pleasant ¼ unpleasant)4neutral Left: emotional4neutral Right: emotional4neutral
Word length, word frequency, concreteness
a
Dimensions of the semantic differential: E, evaluation; P, potency; A, activity. +/- indicate the positive and negative poles of these dimensions. RSVP, rapid serial visual presentation of stimuli in a continuous consecutive stream without interstimulus interval.
b
Table A2. A summary of experimental design, recording parameters and results of the reviewed studies on effects of emotional content on ERP measures of word processing in clinical populations. Study
Subjects N, sex
Task(s)
Stimuli
Presentation
clinical status
Word class Emotional content Control of additional stimulus parameters
Williamson et al. (1991)
16 — 8 psychopaths, 8 non-psychopaths, all males
Lexical decision
Word class not indicated Affective norms (Toglia and Battig, 1978) 13 pleasant, 13 unpleasant, 13 neutral, 39 pseudowords Word length, frequency, number of syllables, concreteness
Stimulus duration
Recording and analysis
ERP effects
Electrodes and reference analysis window (poststimulus) Statistics
ERP effects
Position of presentation Stimulus repetitions
176 ms +/- 3 parafoveally, vertical format 6 repetitions
5 (Fz, Cz, Pz, PT3, PT4) Linked ears reference 2000 ms ANOVA t-test
P240: 225–300 ms Amplitude: Nonpsychopaths: emotional (pleasant & unpleasant)4neutral. Psychopaths: emotional (pleasant & unpleasant) ¼ neutral LPC: 650–800 ms Amplitude: Group emotional Content electrode site Midline: Non-psychopaths: emotional (pleasant & unpleasant)4neutral Psychopaths: emotional (pleasant & unpleasant)4neutral
175
176
Table A2. (Continued ) Study
Weinstein (1995)
Subjects N, sex
20 students 10 highly anxious, 5 males 10 low anxious 4 males
Task(s)
Decision on contextual fit
Stimuli
20 threatening, 20 pleasant sentences (5–7 words) ‘Probe words’:20 threat, 20 neutral 20 pleasant (nouns & adjectives) 2 pairings: fittingnot fitting. Threat words from MacLeod (1985). Emotional content of sentences and words prerated in different population. Control for linguistic parameters not indicated
Presentation
Recording and analysis
1100 ms per word
3 (Fz, Cz, Pz)
Centrally
Linked ears reference
2 repetitions
650 ms ANOVA t-test
ERP effects
N100: 90–120 ms: Peak amplitude: High anxiety group threat priming4low anxiety group threat priming. (Fz, Cz) N400/P400: 400–500 ms Peak amplitude: Threat condition: High anxiety group more positive than low anxiety group (Cz, Pz).Peak latency: High anxiety groupolow anxiety group. (Plots of means appear to diverge from the reported statistics)
Kiehl et al. (1999), Task 3
29 prisoners, all males: 8 psychopaths, 9 nonpsychopaths, 12 mixed
Pleasant/unpleasant decision
Word class not indicated Valence norms (Toglia and Battig, 1978)
300 ms Centrally Single presentation
5 (Fz, Cz, Pz, PT3, PT4)Linked ears reference 1200 ms ANOVA t-test on reaction times and N350, LPC amplitudes
60 pleasant, 60 unpleasant Controlled for: word length, frequency, number of syllabels, imagery, concreteness
N350: 300–400 ms Amplitude: All groups: Emotional content: pleasant4unpleasant Emotional content hemisphere: Left: pleasant4unpleasant Right: pleasant4unpleasant LPC: 400–800 ms Amplitude Emotional content: Non-psychopaths: Unpleasant4pleasant Psychopaths: Unpleasant ¼ pleasant Emotional content hemisphere: Left: unpleasant4pleasant Right: unpleasant4pleasant Reaction times: Pleasantounpleasant
Knost et al. (1997)
38:19 prechronicpain patients, 11 males 19 controls, 11 males
Detection and naming
Adjectives Pre-rating on familiarity, body and pain relatedness 40 pain-related, 40 body-related, 40 neutral Controlled for: word frequency, word length
Individual perceptual threshold
11 (Fz, F3, F4, Cz, C3, C4, Pz, P3, P4, T3, T4)
Centrally
Linked ears reference
Single presentation
800 ms ANOVA t-test
N100 (80–180 ms) at electrode F3 Prechronic pain patients: Amplitude: pain words4body related & neutral LPC2: 600–800 ms Amplitude both groups: pain/bodyrelated4neutral
177
178
Table A2. (Continued ) Study
Flor et al. (1997)
Subjects N, sex
24:12 chronic pain patients, 5 males 12 controls, 5 males
Task(s)
Stimuli
Detection and naming
Adjectives Pre-rating on familiarity, body and pain relatedness 40 pain-related, 40 body-related, 40 neutral Controlled for: word frequency, word length
Presentation
Recording and analysis
Perceptual threshold
11 (Fz, F3, F4, Cz, C3, C4, Pz, P3, P4, T3, T4)
N100 (80–140 ms) at electrodes F3, Fz, C3, P3, Pz
Linked ears reference
Chronic pain patients only: pain words4body related & neutral
Centrally Single presentation
800 ms ANOVA t-test
ERP effects
Both groups: N100 (80–140 ms) Pain words: left4right Neutral: right4left N200 (140–200 ms) at all electrodes: Chronic pain patients:pain words4body related & neutral P200 (180–280 ms): Chronic pain patients: Larger responses to pain words over right than left LPC1 (400–600 ms): n.s. All: pain words ¼ body related ¼ neutral LPC2 (600–800 ms) n.s. All: pain words ¼ body related ¼ neutral.
Pauli et al. (2005)
50:25 panic patients, 9 males, 25 controls, 9 males
Detection and naming
Adjectives and verbs Pre-rating for descriptiveness for panicdisorder symptoms by 2 psychiatrists and 2 clinical psychologists 48 panic related, 48 neutral Controlled for: word frequency, length, syllables, distribution of word classes
Perceptual threshold and 1000 ms, 2 presentations (one per duration)
21 (only 9 analyzed: Fz, Cz, Pz, F3, F4, C3, C4, P3, P4) Linked ears pnce 1000 ms ANOVA t-test
Threshold presentation: Panic patients: N200/P200 (100–200 ms) Frontal effect panic words4neutral P3a: (200–400 ms): panic words4neutral LPC1: 400–600 ms Amplitude both groups: panic4neutral 1000 ms presentation: LPC1: 400–600 ms Amplitude both groups: panic4neutral LPC2: 600–800 Amplitude both groups: panic4neutral Emotional content hemisphere: Left: panic4neutral Right: panic4neutral
179
180
Acknowledgments This work was supported by a grant from the Heidelberg Academy of Sciences (Mind and Brain Program). We thank Anne Hauswald for help in preparation of this manuscript and Christiane Beck, Susanne Ko¨ssler, Bistra Ivanona and Irene Winkler for assistance in the experimental work described.
References Amaral, D.G., Behniea, H. and Kelly, J.L. (2003) Topographic organization of projections from the amygdala to the visual cortex in the macaque monkey. Neuroscience, 118: 1099–1120. Anderson, A.K. and Phelps, E.A. (2001) Lesions of the human amygdala impair enhanced perception of emotionally salient events. Nature, 411: 305–309. Assadollahi, R. and Pulvermu¨ller, F. (2001) Neuromagnetic evidence for early access to cognitive representations. Neuroreport, 12: 207–213. Assadollahi, R. and Pulvermuller, F. (2003) Early influences of word length and frequency: a group study using MEG. Neuroreport, 14: 1183–1187. Assadollahi, R. and Rockstroh, B. (2005) Neuromagnetic brain responses to words from semantic sub- and supercategories. BMC Neurosci., 6: 57. Barsalou, L.W. (1999) Perceptual symbol systems. Behav. Brain Sci., 22: 577–609 discussion 610–660. Begleiter, H. and Platz, A. (1969) Cortical evoked potentials to semantic stimuli. Psychophysiology, 6: 91–100. Begleiter, H., Projesz, B. and Garozzo, R. (1979) Visual evoked potentials and affective ratings of semantic stimuli. In: Begleiter, H. (Ed.), Evoked Brain Potentials and Behavior. Plenum Press, New York, pp. 127–143. Bentin, S., Mouchetant-Rostaing, Y., Giard, M.H., Echallier, J.F. and Pernier, J. (1999) ERP manifestations of processing printed words at different psycholinguistic levels: time course and scalp distribution. J. Cogn. Neurosci., 11: 235–260. Bernat, E., Bunce, S. and Shevrin, H. (2001) Event-related brain potentials differentiate positive and negative mood adjectives during both supraliminal and subliminal visual processing. Int. J. Psychophysiol., 42: 11–34. Bower, G. (1981) Mood and memory. Am. Psychol., 36: 129–148. Bradley, M.M. and Lang, P.J. (1994) Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. J. Behav. Ther. Exp. Psychiatry, 25(1): 49–59. Bradley, M., and Lang, P.J., 1998. Affective norms for English words (ANEW): Instruction manual and affective ratings. Technical report A-8, The Center for Research in Psychophysiology, University of Florida.
Bradley, M.M. and Lang, P.J. (2000) Affective reactions to acoustic stimuli. Psychophysiology, 37: 204–215. Brem, S., Lang-Dullenkopf, A., Maurer, U., Halder, P., Bucher, K. and Brandeis, D. (2005) Neurophysiological signs of rapidly emerging visual expertise for symbol strings. Neuroreport, 16: 45–48. Caccioppo, J.T. (2000) Asymmetries in affect laden information processing. In: Banaji, R. and Prentice, D.A. (Eds.), Perspectivism in Social Psychology: The Yin and Yang of Scientific Progress. American Psychological Association Press, Washington, DC, pp. 85–95. Chao, L.L., Haxby, J.V. and Martin, A. (1999) Attribute-based neural substrates in temporal cortex for perceiving and knowing about objects. Nat. Neurosci., 2: 913–919. Chapman, R.M. (1979) Connotative meaning and averaged evoked potentials. In: Begleiter, H. (Ed.), Evoked Brain Potentials and Behavior. Plenum Press, New York, pp. 171–197. Chapman, R.M., McCrary, J.W., Chapman, J.A. and Bragdon, H.R. (1978) Brain responses related to semantic meaning. Brain Lang., 5: 195–205. Chapman, R.M., McCrary, J.W., Chapman, J.A. and Martin, J.K. (1980) Behavioral and neural analyses of connotative meaning: word classes and rating scales. Brain Lang., 11: 319–339. Cohen, L., Dehaene, S., Naccache, L., Lehericy, S., DehaeneLambertz, G., Henaff, M.A. and Michel, F. (2000) The visual word form area: spatial and temporal characterization of an initial stage of reading in normal subjects and posterior splitbrain patients. Brain, 123(Pt 2): 291–307. Davis, M. (1992) The role of the amygdala in fear and anxiety. Annu. Rev. Neurosci., 15: 353–375. Dehaene, S. (1995) Electrophysiological evidence for categoryspecific word processing in the normal human brain. Neuroreport, 6: 2153–2157. Dehaene, S., Cohen, L., Sigman, M. and Vinckier, F. (2005) The neural code for written words: a proposal. Trends Cogn. Sci., 9: 335–341. Dehaene, S., Le Clec, H.G., Poline, J.B., Le Bihan, D. and Cohen, L. (2002) The visual word form area: a prelexical representation of visual words in the fusiform gyrus. Neuroreport, 13: 321–325. Dien, J., Spencer, K.M. and Donchin, E. (2004) Parsing the late positive complex: mental chronometry and the ERP components that inhabit the neighborhood of the P300. Psychophysiology, 41: 665–678. Dijksterhuis, A. and Aarts, H. (2003) On wildebeests and humans: the preferential detection of negative stimuli. Psychol. Sci., 14: 14–18. Everatt, J., McCorquidale, B., Smith, J., Culverwell, F., Wilks, A., Evans, D., Kay, M. and Baker, D. (1999) Association between reading ability and visual processes. In: Everatt, J. (Ed.), Reading and Dyslexia. Routledge, London, pp. 1–39. Federmeier, K.D., Kirson, D.A., Moreno, E.M. and Kutas, M. (2001) Effects of transient, mild mood states on semantic memory organization and use: an event-related potential investigation in humans. Neurosci. Lett., 305: 149–152.
181 Federmeier, K.D., Segal, J.B., Lombrozo, T. and Kutas, M. (2000) Brain responses to nouns, verbs and class-ambiguous words in context. Brain, 123(Pt 12): 2552–2566. Flor, H., Knost, B. and Birbaumer, N. (1997) Processing of pain- and body-related verbal material in chronic pain patients: central and peripheral correlates. Pain, 73: 413–421. Friederici, A.D., Hahne, A. and Mecklinger, A. (1996) Temporal structure of syntactic parsing: early and late event-related brain potential effects. J. Exp. Psychol. Learn. Mem. Cogn., 22: 1219–1248. Garavan, H., Pendergrass, J.C., Ross, T.J., Stein, E.A. and Risinger, R.C. (2001) Amygdala response to both positively and negatively valenced stimuli. Neuroreport, 12: 2779–2783. Greenwald, A.G., Draine, S.C. and Abrams, R.L. (1996) Three cognitive markers of unconscious semantic activation. Science, 273: 1699–1702. Hagoort, P. and Brown, C.M. (2000) ERP effects of listening to speech compared to reading: the P600/SPS to syntactic violations in spoken sentences and rapid serial visual presentation. Neuropsychologia, 38: 1531–1549. Halgren, E., Baudena, P., Heit, G., Clarke, J.M., Marinkovic, K., Chauvel, P. and Clarke, M. (1994a) Spatio-temporal stages in face and word processing. 2. Depth-recorded potentials in the human frontal and Rolandic cortices. J. Physiol. (Paris), 88: 51–80. Halgren, E., Baudena, P., Heit, G., Clarke, J.M., Marinkovic, K. and Clarke, M. (1994b) Spatio-temporal stages in face and word processing. I. Depth-recorded potentials in the human occipital, temporal and parietal lobes [corrected]. J. Physiol. (Paris), 88: 1–50. Hamann, S. and Mao, H. (2002) Positive and negative emotional verbal stimuli elicit activity in the left amygdala. Neuroreport, 13: 15–19. Hauk, O., Johnsrude, I. and Pulvermuller, F. (2004) Somatotopic representation of action words in human motor and premotor cortex. Neuron, 41: 301–307. Haxby, J.V., Horwitz, B., Ungerleider, L.G., Maisog, J.M., Pietrini, P. and Grady, C.L. (1994) The functional organization of human extrastriate cortex: a PET-rCBF study of selective attention to faces and locations. J. Neurosci., 14: 6336–6353. Herbert, C., Kissler, J., Jungho¨fer, M., Peyk, P. and Rockstroh, B. (2006) Processing emotional adjectives: evidence from startle EMG and ERPs. Psychophysiology, 43(2): 197–206. Hinojosa, J.A., Martin-Loeches, M., Munoz, F., Casado, P. and Pozo, M.A. (2004) Electrophysiological evidence of automatic early semantic processing. Brain Lang., 88: 39–46. Hinojosa, J.A., Martin-Loeches, M. and Rubia, F.J. (2001) Event-related potentials and semantics: an overview and an integrative proposal. Brain Lang., 78: 128–139. Holcomb, P.J., Kounios, J., Anderson, J.E. and West, W.C. (1999) Dual-coding, context-availability, and concreteness effects in sentence comprehension: an electrophysiological investigation. J. Exp. Psychol. Learn. Mem. Cogn., 25: 721–742.
Isenberg, N., Silbersweig, D., Engelien, A., Emmerich, S., Malavade, K., Beattie, B., Leon, A.C. and Stern, E. (1999) Linguistic threat activates the human amygdala. Proc. Natl. Acad. Sci. USA, 96: 10456–10459. Ito, T.A. and Caccioppo, J.T. (2000) Electrophysiological evidence of implicit and explicit categorization processes. J. Exp. Soc. Psychol., 36: 660–676. Jungho¨fer, M., Bradley, M.M., Elbert, T.R. and Lang, P.J. (2001) Fleeting images: a new look at early emotion discrimination. Psychophysiology, 38: 175–178. Keil, A., Bradley, M.M., Hauk, O., Rockstroh, B., Elbert, T. and Lang, P.J. (2002) Large-scale neural correlates of affective picture processing. Psychophysiology, 39: 641–649. Kellenbach, M.L., Wijers, A.A., Hovius, M., Mulder, J. and Mulder, G. (2002) Neural differentiation of lexico-syntactic categories or semantic features? Event-related potential evidence for both. J. Cogn. Neurosci., 14: 561–577. Kiefer, M. and Spitzer, M. (2000) Time course of conscious and unconscious semantic brain activation. Neuroreport, 11: 2401–2407. Kiehl, K.A., Hare, R.D., McDonald, J.J. and Brink, J. (1999) Semantic and affective processing in psychopaths: an eventrelated potential (ERP) study. Psychophysiology, 36: 765–774. King, J.W. and Kutas, M. (1998) Neural plasticity in the dynamics of human visual word recognition. Neurosci. Lett., 244: 61–64. Kissler, J. and Ba¨uml, K.H. (2000) Effects of the beholder’s age on the perception of facial attractiveness. Acta Psychol. (Amst.), 104: 145–166. Kissler, J., Herbert, C., Peyk, P. and Jungho¨fer, M. (submitted manuscript) Sex, crime and videotape—enhanced early cortical responses to rapidly presented emotional words. Kissler, J., and Ko¨ssler, S., (in preparation) Pleasant pictures facilitate lexical decision. Knost, B., Flor, H., Braun, C. and Birbaumer, N. (1997) Cerebral processing of words and the development of chronic pain. Psychophysiology, 34: 474–481. Kostandov, E. and Arzumanov, Y. (1977) Averaged cortical evoked potentials to recognized and non-recognized verbal stimuli. Acta Neurobiol. Exp. (Wars), 37: 311–324. Kutas, M. and Federmeier, K.D. (2000) Electrophysiology reveals semantic memory use in language comprehension. Trends Cogn. Sci., 4: 463–470. Kutas, M. and Hillyard, S.A. (1980) Reading senseless sentences: brain potentials reflect semantic incongruity. Science, 207: 203–205. Kutas, M. and Hillyard, S.A. (1984) Brain potentials during reading reflect word expectancy and semantic association. Nature, 307: 161–163. Lang, P.J. (1979) Presidential address, 1978A bio-informational theory of emotional imagery. Psychophysiology, 16: 495–512. Lang, P.J. (1994) The motivational organization of emotion: Affect-reflex connections. In: Van Goozen, S.H.M., Van de Poll, N.E. and Sergeant, J.E. (Eds.), Emotions: Essays on
182 Emotion Theory. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 61–93. Lang, P.J., Greenwald, M.K., Bradley, M.M. and Hamm, A.O. (1993) Looking at pictures: affective, facial, visceral, and behavioral reactions. Psychophysiology, 30: 261–273. LeDoux, J.E. (1989) Cognitive-emotional interactions in the brain. Cogn. Emotion, 3: 267–289. LeDoux, J.E. (1995) Emotion: clues from the brain. Annu. Rev. Psychol., 46: 209–235. Liddell, B.J., Brown, K.J., Kemp, A.H., Barton, M.J., Das, P., Peduto, A., Gordon, E. and Williams, L.M. (2005) A direct brainstem-amygdala-cortical ‘alarm’ system for subliminal signals of fear. Neuroimage, 24: 235–243. Liddell, B.J., Williams, L.M., Rathjen, J., Shevrin, H. and Gordon, E. (2004) A temporal dissociation of subliminal versus supraliminal fear perception: an event-related potential study. J. Cogn. Neurosci., 16: 479–486. Lifshitz, K. (1966) The averaged evoked cortical response to complex visual stimuli. Psychophysiology, 3: 55–68. Luce, R.D. and Suppes, P. (1965) Preference, utility and subjective probability. In: Luce, R.D., Bush, R.R. and Galanter, E.H. (Eds.) Handbook of Mathematical Psychology, Vol. 3. New York, Wiley, pp. 249–410. Martin, A., Wiggs, C.L., Ungerleider, L.G. and Haxby, J.V. (1996) Neural correlates of category-specific knowledge. Nature, 379: 649–652. Martin-Loeches, M., Hinojosa, J.A., Gomez-Jarabo, G. and Rubia, F.J. (2001) An early electrophysiological sign of semantic processing in basal extrastriate areas. Psychophysiology, 38: 114–124. Molfese, D. (1985) Electrophysiological correlates of semantic features. J. Psycholinguistic Res., 14: 289–299. Morris, J.S., O¨hman, A. and Dolan, R.J. (1999) A subcortical pathway to the right amygdala mediating ‘unseen’ fear. Proc. Natl. Acad. Sci. USA,, 96: 1680–1685. Mu¨nte, T.F., Heinze, H.J., Matzke, M., Wieringa, B.M. and Johannes, S. (1998) Brain potentials and syntactic violations revisited: no evidence for specificity of the syntactic positive shift. Neuropsychologia, 36: 217–226. Naccache, L., Gaillard, R., Adam, C., Hasboun, D., Clemenceau, S., Baulac, M., Dehaene, S. and Cohen, L. (2005) A direct intracranial record of emotions evoked by subliminal words. Proc. Natl. Acad. Sci. USA, 102: 7713–7717. O¨hman, A. and Mineka, S. (2001) Fears, phobias, and preparedness: toward an evolved module of fear and fear learning. Psychol. Rev., 108: 483–522. Ortigue, S., Michel, C.M., Murray, M.M., Mohr, C., Carbonnel, S. and Landis, T. (2004) Electrical neuroimaging reveals early generator modulation to emotional words. Neuroimage, 21: 1242–1251. Osgood, C.E., Miron, M.S. and May, W.H. (1975) Cross-Cultural Universals of Affective Meaning. University of Illinois Press, Urbana, Chicago, London. Osgood, C.E., Suci, G.J. and Tannenbaum, P.H. (1957) The measurement of meaning. University of Illinois Press, Urbana, Chicago, and London.
Osterhout, L., Holcomb, P.J. and Swinney, D.A. (1994) Brain potentials elicited by garden-path sentences: evidence of the application of verb information during parsing. J. Exp. Psychol. Learn. Mem. Cogn., 20: 786–803. Pauli, P., Amrhein, C., Muhlberger, A., Dengler, W. and Wiedemann, G. (2005) Electrocortical evidence for an early abnormal processing of panic-related words in panic disorder patients. Int. J. Psychophysiol., 57: 33–41. Perfetti, C.A. (1998) Comprehending written language: a blueprint of the reader. In: Brown, C.M. and Hagoort, P. (Eds.), The Neurocognition of Language. University Press, Oxford. Perfetti, C.A. and Sandak, R. (2000) Reading optimally builds on spoken language: implications for deaf readers. J. Deaf. Stud. Deaf. Educ., 5: 32–50. Phelps, E.A., O’Connor, K.J., Gatenby, J.C., Gore, J.C., Grillon, C. and Davis, M. (2001) Activation of the left amygdala to a cognitive representation of fear. Nat. Neurosci., 4: 437–441. Posner, M.I., Abdullaev, Y.G., McCandliss, B.D. and Sereno, S.C. (1999) Neuroanatomy, circuitry and plasticity of word reading. Neuroreport, 10: R12–R23. Pulvermu¨ller, F. (1999) Words in the brain’s language. Behav. Brain Sci., 22: 253–279 discussion 280–336. Pulvermu¨ller, F., Assadollahi, R. and Elbert, T. (2001a) Neuromagnetic evidence for early semantic access in word recognition. Eur. J. Neurosci., 13: 201–205. Pulvermu¨ller, F., Harle, M. and Hummel, F. (2000) Neurophysiological distinction of verb categories. Neuroreport, 11: 2789–2793. Pulvermu¨ller, F., Ha¨rle, M. and Hummel, F. (2001b) Walking or talking? Behavioral and neurophysiological correlates of action verb processing. Brain Lang., 78: 143–168. Rudell, A.P. (1992) Rapid stream stimulation and the recognition potential. Electroencephalogr. Clin. Neurophysiol., 83: 77–82. Russel, J. (1980) A circumplex model of affects. J. Pers. Soc. Psychol., 39: 1161–1178. Schapkin, S.A., Gusev, A.N. and Kuhl, J. (2000) Categorization of unilaterally presented emotional words: an ERP analysis. Acta. Neurobiol. Exp. (Wars), 60: 17–28. Schendan, H.E., Ganis, G. and Kutas, M. (1998) Neurophysiological evidence for visual perceptual categorization of words and faces within 150 ms. Psychophysiology, 35: 240–251. Schupp, H.T., O¨hman, A., Junghofer, M., Weike, A.I., Stockburger, J. and Hamm, A.O. (2004) The facilitated processing of threatening faces: an ERP analysis. Emotion, 4: 189–200. Sereno, S.C., Brewer, C.C. and O’Donnell, P.J. (2003) Context effects in word recognition: evidence for early interactive processing. Psychol. Sci., 14: 328–333. Sereno, S.C., Rayner, K. and Posner, M.I. (1998) Establishing a time-line of word recognition: evidence from eye movements and event-related potentials. Neuroreport, 9: 2195–2200. Silvert, L., Delplanque, S., Bouwalerh, H., Verpoort, C. and Sequeira, H. (2004) Autonomic responding to aversive words
183 without conscious valence discrimination. Int. J. Psychophysiol., 53: 135–145. Skrandies, W. (1998) Evoked potential correlates of semantic meaning — a brain mapping study. Brain Res. Cogn. Brain Res., 6: 173–183. Skrandies, W. and Chiu, M.J. (2003) Dimensions of affective semantic meaning — behavioral and evoked potential correlates in Chinese subjects. Neurosci. Lett., 341: 45–48. Sutton, S., Tueting, P., Zubin, J. and John, E.R. (1967) Information delivery and the sensory evoked potential. Science, 155: 1436–1439. Tarkiainen, A., Helenius, P., Hansen, P.C., Cornelissen, P.L. and Salmelin, R. (1999) Dynamics of letter string perception in the human occipitotemporal cortex. Brain, 122(Pt 11): 2119–2132.
Vanderploeg, R.D., Brown, W.S. and Marsh, J.T. (1987) Judgments of emotion in words and faces: ERP correlates. Int. J. Psychophysiol., 5: 193–205. Warrington, E.K. and Shallice, T. (1979) Semantic access dyslexia. Brain, 102: 43–63. Warrington, E.K. and Shallice, T. (1980) Word-form dyslexia. Brain, 103: 99–112. Weinstein, A. (1995) Visual ERPs evidence for enhanced processing of threatening information in anxious university students. Biol. Psychiatry, 37: 847–858. Wickelmaier, F. and Schmid, C. (2004) A Matlab function to estimate choice model parameters from paired-comparison data. Behav. Res. Methods Instrum. Comput., 36: 29–40. Williamson, S., Harpur, T.J. and Hare, R.D. (1991) Abnormal processing of affective words by psychopaths. Psychophysiology, 28: 260–273.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 9
Event-related potential studies of language and emotion: words, phrases, and task effects Ira Fischler and Margaret Bradley Psychology Department, PO Box 112250, University of Florida, Gainesville, FL 32611, USA
Abstract: This chapter reviews research that focuses on the effects of emotionality of single words, and of simple phrases, on event-related brain potentials when these are presented visually in various tasks. In these studies, presentation of emotionally evocative language material has consistently elicited a late (c. 300–600 ms post-onset) positive-going, largely frontal–central shift in the event-related potentials (ERPs), relative to neutral materials. Overall, affectively pleasant and unpleasant words or phrases are quite similar in their neuroelectric profiles and rarely differ substantively. This emotionality effect is enhanced in both amplitude and latency when emotional content is task relevant, but is also reliably observed when the task involves other semantically engaging tasks. On the other hand, it can be attenuated or eliminated when the task does not involve semantic evaluation (e.g., lexical decisions to words or orthographic judgments to the spelling patterns) or when comprehension of phrases requires integration of the connotative meaning of several words (e.g., compare dead puppy and dead tyrant). Taken together, these studies suggest that the emotionality of written language has a rapid and robust impact on ERPs, which can be modulated by specific task demands as well as the linguistic context in which the affective stimulus occurs. Keywords: language; emotion; event-related potentials; motivation; phrase dominance and control, with verbal threats and promises providing a virtual forum for the basic appetitive and defensive systems that motivate all behavior (e.g., Lang et al., 1997), and that the exquisite informational capacity of language developed from a regulatory, motivational base. The primacy of language as a skill unique to humans, and its ubiquity across societies and individuals, suggests that in order to understand emotional behavior and cognition the story told by language will be an important one. In the studies reported here, effects of emotionality of single word and word phrases are explored as they affect brain processes as reflected in eventrelated potentials (ERPs). Individual words are an appropriate starting point, as words and other morphemes (e.g., the -ed in walked) are the building blocks of meaning in language. But meaning
There are good reasons for those who wish to understand human emotion to be interested in emotion and language. Our waking lives are filled with talk, to others and to ourselves — it has been estimated that the ‘‘stream of conscious experience’’ is largely a stream of self-talk (e.g., Klinger and Cox, 1987). We can be deeply moved, frightened, or aroused by what we hear or read. We actively seek out these kinds of materials, in the news articles, stories, and novels we choose to read. We might argue, in fact, that a primary evolutionary purpose of language is motivational — to influence the thoughts, feelings, and therefore the actions of others in our group. Indeed, some have argued that language developed in part as a tool of social Corresponding author. E-mail: ifisch@ufl.edu DOI: 10.1016/S0079-6123(06)56009-1
185
186
emerges from the sequence of words and phrases, and often is more than the ‘‘sum of its lexical parts’’ — dead is bad, and tyrant is bad, for example, but a dead tyrant is not so bad. In this paper, we report data from a number of different studies that varied the emotionality of individual words, and of simple, two-word phrases.
The motivational organization of emotion When studying emotion in an experimental context, one immediate decision concerns selecting the affective parameters on which stimuli will vary, regardless of whether one uses words, pictures, sounds, movies, or other stimulus materials to induce affect in the laboratory. In the work reviewed here, we define emotion in terms of differences in rated pleasure and arousal of words and simple phrases, relying on a number of theories that propose emotion is organized by dimensions of pleasure and arousal (e.g., Osgood et al., 1957; Lang et al., 1997). Supporting these views, empirical factor analyses of evaluative language consistently have found that the primary dimension underlying people’s judgments of stimuli (that vary from single words to abstract paintings) is hedonic valence, which accounts for the most variance in judgments of meaning (e.g., Osgood et al., 1957; see also Mehrabian and Russell, 1974; Russell, 1980). Hedonic valence is described by scales that, in aggregate, define a continuous dimension that ranges from unpleasant (unhappy, annoyed, despairing, etc.) to pleasant (happy, pleased, hopeful, etc.). Arousal or intensity is the second major dimension that emerges in these factor analyses, and scales defining this dimension extend from an unaroused state (calm, relaxed, sleepy, etc.) to a state of high energy or arousal (excited, stimulated, wideawake, etc.). The relative importance of hedonic valence and arousal has been widely debated. Some have argued that arousal is the primary dimension of emotion. Thus, Lindsley (1951) proposed an influential ‘‘activation theory of emotion’’ (see also earlier formulations by Duffy, 1941 and Arnold, 1945), which was based, in part, on human electroencephalographic (EEG) studies that
showed an association between alpha waves (10–13 Hz) and emotional calm (with even slower waves in sleep) and a progressive increase in frequency with increasing intensity of emotional arousal (from annoyance and anxiety to rage and panic). Other data from research with animal subjects showed that, on the one hand, the brain stem reticular formation (Moruzzi and Magoun, 1949) modulated both EEG frequency and visceral arousal, and on the other, that midbrain lesions blocked emotion, prompting lethargic, apathetic behavior. The importance of emotional arousal in evaluative judgments and affective reactions could be due, in part, to a biphasic organization of emotion arising from its evolutionary and functional heritage. For instance, Lang et al. (1997) proposed that emotion stems from activation of one of two fundamental motive systems in the brain — appetitive and defensive — that have evolved to support behaviors that sustain the life of the individual and the species, accounting for the primacy of the valence dimension in affective expression. These two motivational systems are associated with widespread cortical, autonomic, and behavioral activity that can vary in the intensity of activation. Differences in motivational intensity map onto the arousal parameter in emotion’s dimensional organization. Importantly, arousal is not viewed as a separable system that is independently modulated, but rather as representing the activation (metabolic and neural) initiated by centrally organized appetitive or defensive systems that have evolved to protect and maintain life. In studying how language cues activate the appetitive and defensive motivational systems underlying emotion, it is important to categorize and control verbal stimuli as they vary in hedonic valence and arousal. In a series of individual rating studies, we have collected affective reports of pleasure and arousal for over 1000 English words from many participants. These materials are distributed to researchers in a collection called the Affective Norms for English Words (ANEW; Bradley and Lang, 1998). In each rating study, participants are asked to judge the pleasantness and arousal of each word using the
187
Self-Assessment Manikin (SAM) affective rating system devised by Lang et al. (1980). SAM is a graphic (nonverbal) figure that depicts each evaluative dimension using a nine-point scale, and ranges from a smiling, happy figure to a frowning, unhappy figure when representing the hedonic valence dimension (see Fig. 1, ordinate); for arousal, SAM ranges from an excited, wide-eyed figure to a relaxed, sleepy figure (see Fig. 1, abscissa). The SAM measures of pleasure and arousal correlate well with ratings on these evaluative dimensions obtained using a much longer, verbal semantic differential scale (Bradley and Lang, 1994).
In Fig. 1, each word is plotted in a two-dimensional Cartesian space defined by its mean pleasure and arousal rating. There are several characteristic features of the resulting affective space. First, these materials elicit ratings that completely span the hedonic valence dimension ranging from extremely pleasant to extremely unpleasant. Similarly, a wide range of arousal levels are elicited by these materials. Secondly, it is clear that pleasant words range continuously along the arousal dimension: the upper half of emotional space has exemplars at many positions along this dimension. Words labeling unpleasant objects and events, however, tend
Pleasure
Affective Norms for English Words (ANEW)
Men Women
Arousal Fig. 1. Distribution of words in the two-dimensional affective space defined by each word’s mean pleasure and arousal rating (ANEW; Bradley and Lang, 1998), using the Self-Assessment Manikin (SAM; Lang et al., 80).
188
to cluster in the high arousal quadrant of emotional space, suggesting that events and objects that threaten life are rarely perceived as calm or unarousing. The distribution of words in affective space is remarkably similar to that of pictures, and of sounds (Bradley and Lang, 2000), suggesting that this distribution represents a relatively fundamental organization of the affective world.
The potency of single words While a single word may not seem like a very potent emotional stimulus, we know that even brief presentations of individual emotionally evocative words can bias attention. The so-called emotional Stroop effect, in which emotionally evocative words, especially threatening words presented to highly anxious, ‘‘hyperattentive’’ individuals, produces greater interference with naming of the color of the word relative to neutral words, suggests an early, possibly preattentive emotional response to these words (see, e.g., Dalgleish, 1995; Williams et al., 1996). Similarly, emotional words presented during a rapid series of briefly presented words (the ‘‘RSVP’’ procedure) can ‘‘capture’’ attention, improving detection of emotional words and targets, and impairing detection of other target words that follow them, the ‘‘emotional blink’’ effect (see, e.g., Keil and Ihssen, 2004; Anderson, 2005; Kaschub and Fischler, 2006). Chapman and colleagues (e.g., Chapman et al., 1980) explored the pattern of ERPs evoked by single emotional words presented in isolation. Their resulting principle component analyses revealed consistent ERP factors associated with each of Osgood’s three dimensions of ‘‘connotative meaning’’ — evaluation, activity, and potency — corresponding roughly to the dimensions of hedonic valence, arousal, and dominance widely used to standardize emotional materials (Bradley and Lang, 1994). A similar set of components was found when the words were being rated on dimensions of the semantic differential, as well as when the words were simply read out loud, suggesting that the emotional response to these words was to some degree independent of the particular way the participants needed to process them.
In Chapman et al. (1980), both emotionally pleasant and unpleasant words elicited an increased positivity in cortical ERPs, compared to emotionally neutral words, as found in a number of studies measuring ERPs when processing words in isolation (e.g., Chapman et al., 1978; Begleiter et al., 1979; Williamson et al., 1991; Skrandies, 1998; Bernat et al., 2001). The emotionality effect in the ERPs to words broadly resembles that obtained when participants look at emotionally evocative pictures (e.g., Cuthbert et al., 2000). In both cases, ERPs to emotionally arousing stimuli diverge from those for neutral stimuli as early as 300 ms after stimulus onset, have a wide topographic distribution, and show little lateral asymmetry (though see Cacioppo et al., 1996). The emotionality effect may disappear within a few hundred milliseconds, and is rarely seen at latencies above 800 ms after word onset.
Early vs. late components to emotional words In view of the effects of word emotionality on tasks like the RSVP detection task noted above, one might expect that ERPs to emotional words might be distinguishable from neutral words quite early in processing, and that these early components might be less task dependent than later differences, which in turn would be more responsive to specific task demands. Bernat et al. (2001) reported an early (P1-N1) divergence of ERPs to mood-descriptor adjectives (e.g., lonely, happy), with unpleasant adjectives more positive than pleasant. The early difference was confined to the left hemisphere sites, and was observed both with supraliminal (40 ms) and subliminal (1 ms, unmasked) presentation of the words. On the other hand, each of the 32 descriptors was shown multiple times during the study, and participants had also generated emotionality ratings for those terms on a number of occasions before the physiological phase of the study. In addition, neutral mood terms were not included, and are important in order to evaluate whether the effects were driven by emotional arousal, or by hedonic valence. More recently, Herbert et al. (in press) presented adjectives that were pleasant, unpleasant, and neutral
189
and asked participants to make a covert judgment regarding their emotional response. The earliest significant differences were for ERPs to emotional words (pleasant and unpleasant) becoming more positive than neutral words in the P2 component (180–250 ms post-onset), which continued into P3 (250–400 ms). The difference was predominantly in the left hemisphere, and largest at the central and parietal sites. Pleasantness vs. emotionality Subjectively, the hedonic valence of a stimulus or event appears to be more salient than differences in emotional arousal, especially for printed words. Indeed, as noted above, ratings of hedonic valence for words range much more widely than ratings of arousal (Bradley and Lang, 1998). Although differences in ERPs between pleasant and unpleasant materials have sometimes been reported, there is no consistent pattern that distinguishes between them. More importantly, when neutral words are included in the stimulus set, the ERP pattern typically indicates greater positivity over central–parietal sensors for both pleasant and unpleasant words, relative to neutral words (e.g., Williamson et al., 1991), with little difference between the two classes of emotional words. When differences have been found between pleasant and unpleasant words, it is more commonly the case that unpleasant words elicit a larger and/or earlier divergence from neutral stimuli, than do the pleasant words (e.g., Ito et al., 1998). Taken together, these data suggest that it may be the intensity of motivational activation — whether initiated by activity in appetitive or defensive neural systems — that is critical in modulating the ERP during language processing. Effects of encoding task on ERPs to single emotional words One factor that could clearly impact the effects of emotionality on the ERP across studies is that of the encoding task. Most typically, participants are overtly or covertly evaluating the emotionality or pleasantness of the presented words. In some
cases, there has been no primary task other than to read the words silently, a condition which, given the nature of the materials, may default to a consideration of emotionality. In a series of separate experiments, we contrasted the nature of the encoding task when processing the same set of words. In some of these studies, one or the other dimension of word emotionality was relevant. In others, the emotional attributes of the words were irrelevant to the decision. Normative ratings of emotionality were carefully controlled, and we examined the consequence of task manipulations on the latency and morphology of emotionality effects in the ERPs, which were recorded at five scalp locations (see below). In these studies, a set of 150 words was drawn from the ANEW (Bradley and Lang, 1998). The stimuli included five subsets of words from different regions of affective space: pleasant, high-arousal words (e.g., passion), unpleasant, high-arousal words (e.g., vomit), pleasant, low-arousal words (e.g., bunny), unpleasant, lowarousal words (e.g., ugly) and neutral, low-arousal words (e.g., rattle). For clarity, this chapter focuses on data that compare ERPs for the most arousing emotional stimuli (i.e., high-arousal pleasant and unpleasant words) as they differ from neutral, low-arousal words. In each stimulus set, both nouns and adjectives were included, but there were no words directly describing emotional states of either type (e.g., happy or happiness). Word length, frequency, and imagery ratings were also balanced across the sets. In each of five studies, each of the 150 critical words was presented once, in random order, during the course of the experiment. Each trial began with a fixation box signaling the start of the trial. Around 1/2 s later, one of the words was presented, centered on the point of fixation. Word duration was 175 ms, but the fixation box remained visible until 2 s after word onset. EEG was sampled at a 125 Hz rate, with a bandpass of 0.1–40 Hz during the period from 100 ms prior to word onset to 1000 ms after word onset. EEG was then rereferenced offline to the average of the mastoids, and trials with blinks and other artifacts were rejected prior to averaging. Participants with greater than 15% rejection rate were discarded,
190
and sufficient sessions were conducted to obtain data for 30 participants in each experiment.
Decisions about pleasantness We began the series by asking a group of participants to judge word pleasantness at encoding. In a pilot study, this task was presented as a two-choice bipolar decision (pleasant, unpleasant) that was to be made as soon as the word was shown. Perhaps not surprisingly, response time for neutral words was significantly slower than for affective words, as the neutral items were neither pleasant nor unpleasant, making the decision difficult. Thus, in the actual Pleasantness Decision study, we allowed three choices at encoding — pleasant, unpleasant or neutral — and delayed the overt response until the end of each trial. The ERPs for the pleasant, neutral, and unpleasant words from the Pleasantness Decision study are shown in Fig. 2. As illustrated in this figure, ERPs to the pleasant and unpleasant words diverge from that for neutral words around 450 ms after onset, taking the form of an enhanced late positive potential (LPP) peaking around 500 ms. Shortly thereafter, ERPs for pleasant and
unpleasant words become negative through the slow wave (SW) region more rapidly than do the ERPs for neutral stimuli. This overall pattern can be seen at each of the sensors. The contrast between emotional words (pleasant and unpleasant), on the one hand, and neutral words, on the other, was significant both in the LPP and the later SW region, and the size of the difference did not vary across sites. Importantly, there was no difference in ERPs between pleasant and unpleasant words at any time point or sensor site, despite the task relevance of this affective dimension. The lack of ERP differences before nearly 1/2 s after word onset suggests that the earlier onset of emotionality effects in prior studies of single words may be due either to repetition of the target words, or (as we believe) to the particular task given the participants. The lack of any differences in the ERPs for the pleasant and unpleasant words, as salient and as task relevant as the valence distinction was in this study, is striking, and is consistent with the primarily arousal-driven nature of the response to these stimuli as seen in the ERPs. The later SW negativity seen for the emotional words compared to the neutral words may be nothing more than an enhanced contingent negative variation (CNV), reflecting the fact that the decision
Fig. 2. Event-related brain potentials at five scalp locations following presentation of affectively unpleasant, neutral, and pleasant words. In this experiment, participants decided whether each word was pleasant, neutral, or unpleasant.
191
and response selection for the pleasant and unpleasant words continues to be easier than for the neutral words.
Decisions about emotionality In the second experiment, participants were required to classify the words as emotional (whether pleasant or unpleasant) or as unemotional, neutral words. Note that if this decision were made by first detecting whether the word was pleasant or unpleasant, and inferring that it should be therefore classified as emotional, the task becomes logically identical to the Pleasantness Decision, with an added step that might slow the overall decision, but should have no impact on the ERPs. Interestingly, however, the ERPs look quite different from those of the first experiment. As illustrated in Fig. 3, the ERPs for emotional words now diverge earlier and are visible as early as 300 ms post-onset, are significant in the N400 region and continuing into the LPP. Also in contrast to the results of the data from the Pleasantness Decision study, there is now no sustained difference between the emotional and the neutral words after about 600 ms.
This shift in the ERP emotionality effect indicates that the intensity of motivation activation — whether appetitively engaging or defensively activating — is the aspect of emotionality that is first apparent in the ERP. Both appetitively and defensively engaging stimuli often signal the need for immediate attention (and often action) and the ERP appears to reflect the behavioral and attentional adjustments to stimuli with motivational relevance. That is, the ERP appears to reflect cortical processes of heightened attention and preparation for action that is common for all motivationally salient stimuli, rather than reflecting hedonic distinctions. Note, however, that even with this slight shift in the ERP, there are still no apparent differences in the early components that are commonly associated with the processes that either precede or direct conscious attention and awareness.
Silent reading If, as we speculated earlier, simply reading a series of words that vary in affect might result in an emotionality effect in the ERPs, and that emotional arousal, rather than the pleasantness, of
Fig. 3. Event-related brain potentials at five scalp locations following presentation of affectively unpleasant, neutral, and pleasant words. In this experiment, participants decided if each word was emotionally arousing (whether pleasant or unpleasant), or unemotional.
192
words is the dimension that dominates the ERPs, we might expect that silent reading of these words with no decision at all might closely resemble the data from the Emotionality Decision study. In the third experiment, then, the same set of words was presented to participants who were now asked to read the words and ‘‘think about their meaning.’’ No overt decision or response was required, however. Results of this study are illustrated in Fig. 4. In general, the magnitude of the ERP emotionality effect is somewhat diminished, compared to the previous experiments, but the pattern is clearly more similar to that found for the emotionality decision (Fig. 3) than for the pleasantness decision (Fig. 2). The contrast between the emotional words and neutral words was significant in the LPP region (450–650 ms post-onset), and as for the data from the Emotionality Decision study, there is no difference among the word classes in the SW region of the ERPs. The somewhat smaller effects found in this experiment compared to the previous one are likely due to the lack of task demands that would engage participants in actually reading the words for meaning. In contrast to emotional pictures, for example, it is not surprising that single words may
have less ability to automatically engage comprehension of the meaning of the words. Nonetheless, given that participants could choose to be rather passive in their response to the words, the continued effect of emotionality in the LPP region is an impressive demonstration that even for these wholly symbolic stimuli, the derivation of meaning is, to some degree, hard to avoid, as these single words continued to elicit a measurable emotional response.
Semantic categorization In each of the preceding experiments, word emotionality was either an explicit or implicit focus of the task. In each case, a robust, arousal-driven LPP effect was seen beginning as early as 300 ms after word onset, and showing little difference between pleasant and unpleasant words. We wondered whether these effects would continue to be seen if the decision involved an attribute of the words that was orthogonal to their emotionality. In the fourth study in this series, we included a set of 60 target words, 30 from each of two taxonomic categories — tools and articles of clothing.
Fig. 4. Event-related brain potentials at five scalp locations following presentation of affectively unpleasant, neutral, and pleasant words. In this experiment, participants read each word silently and made no decision or overt response.
193
Participants were asked to press a response button whenever they saw a word from one of these two target categories, and make no response to the remaining (critical) words. The semantic decision is a relatively deep one, requiring that the meaning of each word be derived and a specific semantic attribute evaluated. The use of two categories was intended to discourage a strategy wherein participants could maintain a list of common category exemplars in working memory, and match these to presented words. Results from this task are shown in Fig. 5. Despite the irrelevance of emotion to the task, there were again differences due to word emotionality in the ERPs in the LPP region that is maximal at the Cz location, somewhat larger on the left posterior (P3) site, and absent on the right anterior (F4) site. In contrast to when emotion was task relevant, however, there is no sign of any divergence until well into the rising portion of the LPP, and the effect was not significant until 500 ms after word presentation. Moreover, the difference comes and goes within a span of 200 ms. Again, there is no sign of any emotionality differences in the SW region. Also in contrast to the previous studies, the ERP is only significantly different from neutral words
for the unpleasant, but not the pleasant, stimuli. This is quite interesting, particularly in view of the equivalence of the ERPs for pleasant and unpleasant words in the other experiments. It suggests that the effect of hedonic valence in the present task is not merely due to differences in emotional arousal — otherwise, it should have been present or enhanced when arousal was task relevant — but is in fact due to the greater ability of the unpleasant words to attract attention under conditions when emotionality is irrelevant to the task.
Lexical decision In the semantic decision task, a rather deep semantic analysis of the words was needed, in order to categorize them. The continued presence of an emotionality effect in the ERPs for the unpleasant words led us to wonder if a more superficial task that still required lexical access would further diminish or eliminate the emotionality effect. On the one hand, deciding whether or not a string of letters is a word does not require accessing the meaning of the word. On the other hand, a long tradition of research on semantic priming effects in
Fig. 5. Event-related brain potentials at five scalp locations following presentation of affectively unpleasant, neutral, and pleasant words. In this experiment, participants pressed a key if a word was a member of one of two semantic categories (tools or articles of clothing, represented by a set of filler words), and made no response to the critical words, none of which was from those categories.
194
the lexical decision task (e.g., Meyer and Schvaneveldt, 1971; Fischler, 1977; Holcomb, 1988) suggests that the core meaning of a word may be automatically activated during this task. A final experiment in the series was therefore conducted in which participants were shown the same set of 150 critical words, along with 60 nonwords that had been created by changing the target words from the two taxonomic categories (tools and clothing) by a single letter. Participants were to press a response button whenever a nonword was shown, and to make no overt response to any of the words. The ERPs obtained during this lexical decision task are presented in Fig. 6. The robust emotionality effect for the unpleasant words that was obtained in the Semantic Decision study has been largely eliminated, even over the central sites. Although there is a slight trend toward an emotionality effect at the vertex, the ERP difference among the classes of words did not reach significance at any site or in any temporal window. These results suggest that when words are read but there is no need to consider their meaning, and emotionality is irrelevant to the decision, affective attributes of words have little effect on the reader. Overall, our studies of ERPs to single words show that under a variety of tasks, emotionally
evocative words elicit a fairly rapid brain response as seen in the ERPs. For the most part, pleasant and unpleasant words affect the ERPs similarly, and there is little evidence for differences in hedonic valence of word stimuli across these studies. The one exception, in the semantic decision task, suggests that unpleasant or threatening words are more likely to elicit a response under conditions when participants are not oriented toward dimensions of emotionality. Clearly, too, there are some limits to the automaticity of even this emotional attention capture by single words, as seen by the lack of emotionality differences in the ERP when people need only to make lexical decisions. The absence of ERP differences as a function of word emotionality earlier than about 300 ms in any of these experiments, even when the encoding task directly focuses on the emotional meaning of words, is striking, and contrasts with at least some demonstrations of such effects with single words discussed earlier (e.g., Bernat et al., 2001). Given that estimates of the time needed for ‘‘lexical access’’ — the activation of the representation of a word as a linguistic object in memory — is of the order of 200 ms, and that presumably the associations that link that word to emotional experiences or ideas must also take time to be activated, it
Fig. 6. Event-related brain potentials at five scalp locations following presentation of affectively unpleasant, neutral, and pleasant words. In this experiment, participants pressed a key if the item was a nonword, and made no response to the words.
195
perhaps would be more surprising than not to find emotionality effects in cortical ERPs for words prior to 200 ms.
When words combine: comprehension of simple emotion phrases Words are the fundamental semantic units of language. But the combination of words into phrases and sentences provides language with its expressive power, as these combinations often convey meanings that could not be predicted from the meaning of the individual words (e.g., Murphy, 1990). The noun phrase, in a sense, is thus the fundamental unit of discourse and communication through language. The second series of experiments to be reviewed here concerns how the affective response to emotionally potent simple two-word phrases compares to that obtained when words within a phrase are treated as isolated units. Given the stable pattern of ERPs to emotional words in isolation, what might we expect when words are combined into meaningful phrases? It may be that the effects of emotionality on the ERP to words within sentences are no more than the sum of the ERP to each word presented in isolation. If this is the case, we should see an emotionality effect to the first word of the pair with the same time course as that for emotional words in isolation. Hence, a pair such as starving murderer should show an increased positivity beginning around 250–300 ms to starving than to a neutral word such as resting. Note that, as a consequence, the ERP to murderer may also differ in these cases because of carry-over effects from the first to the second word. An alternative hypothesis, more consistent with the view that words in linguistically coherent sequences may be processed differently from those in isolation (see Bellezza, 1984), is that the emotional response of the first word may be deferred until it is known what it is modifying. Compare, for example, a dead tyrant, or a dead puppy. A dead tyrant may well be a good thing; a dead puppy is usually not. On the other hand, a dead letter is fairly neutral. According to this view, effects of emotionality may not develop until the second
word is processed, and may reflect the affective meaning of the word pair rather than that of the second word alone. Hence, pairs such as starving murderer and resting murderer should show no emotionality effect at all for starving and resting, but a larger emotionality effect, as measured on the second word, for the pair starving murderer than for the pair resting murderer. The second question regarding word emotionality addressed in this research concerns the relationship between the hedonic valence of the first and second words. Superimposed on any arousal effects of the modifier and/or noun, the affective congruence of the words of a pair may affect ERPs. Pairs with mismatched affect between modifier and noun (e.g., laughing murderer) may elicit a distinctive ERP pattern compared to pairs that match in hedonic valence (e.g., dead murderer), especially when the task involves comprehension of the pair as a coherent phrase. Note that this effect could be independent of any arousal-based effects of the first and second word separately, since emotional arousal is the same for the matched and mismatched pairs. On the one hand, there is some evidence for emotional congruence effects in word processing tasks, including the so-called affective priming effect, in which decisions about emotional target words are speeded by prior presentation of affectively congruent prime words (e.g., death– aggressive) and slowed by incongruent prime words (e.g., death– happy) (e.g., Fazio et al., 1986; Klauer, 1998; Wentura, 2000). This priming effect appears to require a rather short (c. 200–300 ms) interval between prime and target onset (stimulus onset asynchrony, or SOA; see Bargh et al., 1992). On the other hand, as we reviewed above, there is little evidence that affective valence modulates the ERPs to individual emotional words upon which a valence-driven congruence effect might be based (although see Ito et al., 1998). Moreover, in the present studies, a relatively long SOA between first and second words of a pair (750 ms) was used, as we wanted to allow enough time for the first word to elicit the emotionality response, and to detect this emotionality component if it is present. We also wanted to maximize the potential impact of the emotionality of the first word on the second
196
by presenting the second word at a point when the effects of arousal, based on our studies with word in isolation, might be at a peak (500–700 ms postonset). For these reasons, we anticipated that the effects of matching or mismatching the hedonic valence of the modifier and noun of our pairs might have little impact on the ERPs to the second word in these experiments. In each of the experiments presented here, participants were shown a series of word pairs that formed modifier–noun phrases. In the first experiment, the task was to judge if the pair formed a ‘‘linguistically coherent phrase.’’ In the second study, in contrast, the same pairs were presented, but the task was to judge if either word came from a target semantic category. Our hypothesis was that the phrase task would produce diminished emotionality effects to the first word, and enhanced effects to the second, compared to the word task.
Comprehension of noun phrases Nine sets of 20-word pairs each were created, with emotionality of the first and of the second words (pleasant, neutral, and unpleasant) controlled and varied factorially. Three sets of twenty nouns each were selected from the ANEW (Bradley and Lang, 1998) to be used as second words in these noun phrase pairs. The sets varied in rated pleasantness, with both pleasant (e.g., lover) and unpleasant (e.g., suicide) words rated similarly higher in arousal than were neutral words (e.g., banner, see Table 1). The sets were comparable in length, Table 1. Mean ratings of valence (pleasantness) and arousal (from the Bradley & Lang, 1998, norms) for the words used in the Word Pair experiments (see text)
Pleasant 7.9 Neutral 4.8 Unpleasant 2.1
First Words
Second Words
Valence Arousal
Valence Arousal
6.0 4.4 5.5
8.2 4.9 1.8
6.3 4.7 6.3
Note: A 1–9 scale is used for the SAM ratings, with 9 the most pleasant/arousing.
frequency, and rated imagery value. Three additional sets of twenty words were also selected from the ANEW, again comparable in length, frequency, and imagery, to be used as first words in the pairs (see Table 1). Each of these words was paired with a pleasant, neutral, and unpleasant second word such that each combination made a relatively coherent (though sometimes semantically odd) adjective–noun phrase (e.g., terrible suicide, slow suicide, happy suicide). In addition, a set of 20 word pairs was constructed such that the first and second words did not form a ‘‘coherent’’ phrase, for example, foot sentence. Participants were told that the study concerned brain activity associated with understanding simple phrases and making decisions regarding the coherence of simple phrases. No mention was made of the emotionality of the words. Examples of linguistically ‘‘coherent’’ and ‘‘incoherent’’ word pairs were given, and it was explained that an overt response should be made only to pairs judged as incoherent, which would occur fairly infrequently. On each trial, a fixation mark was followed by the first word, centered at the point of fixation. The word remained visible for 250 ms. The second word was presented in the same location 750 ms after first word onset, and also remained visible for 250 ms. Grand averaged ERPs were obtained across 20 participants for each of the nine experimental word pair conditions. The ERPs to the first words as a function of emotional class (and averaged across second word class), are shown for five cortical sites in Fig. 7. Most notably, as predicted, there was no difference between the pleasant, neutral, and unpleasant words at any point prior to onset of the second word. The ERPs to the second words of the phrase, averaged across first word emotionality, are shown in Fig. 8. As illustrated in the figure, emotional second words — both pleasant and unpleasant — elicit a more positive-going ERP than do neutral words, diverging around 350 ms post-onset at centroparietal sites and continuing through the end of the recording epoch. The ERP emotionality effect for the word completing the phrase appears broadly similar to that observed when single words are presented and participants are judging their emotionality (see Fig. 4). Effects of emotionality
197
Fig. 7. Event-related brain potentials at five scalp locations following presentation of an affectively unpleasant, neutral, and pleasant first word of a two-word phrase. In this experiment, participants pressed a key when they judged the complete pair to be ‘‘linguistically incoherent’’ (represented by a set of filler pairs; see text).
Fig. 8. Event-related brain potentials at five scalp locations following presentation of an affectively unpleasant, neutral, and pleasant second word of a two-word phrase. In this experiment, participants pressed a key when they judged the complete pair to be ‘‘linguistically incoherent’’ (represented by a set of filler pairs; see text).
were significant in a 350–450 ms time window, in the LPP time window (450–600 ms), and in the SW window (600–750 ms). In each case, the ERPs to the unpleasant and pleasant nouns were significantly more positive than they were to the neutral
words, although the comparison between pleasant and neutral nouns was only marginal (po0.06) in the earliest time window. The difference between pleasant and unpleasant words never approached significance. Moreover, the effects were present at
198
each recording site, with no significant topographic differences in the ANOVAs on the scaled amplitudes. An effect of emotional congruence on the ERPs in these analyses would have appeared as a significant interaction of first and second word valence in the ERPs to the second word, with matched-valence ERPs (pleasant/pleasant and unpleasant/unpleasant) diverging from mismatched ERPs (pleasant/unpleasant and unpleasant/pleasant). This interaction never reached significance, for any site or interval. Also, there were no main effects of first word valence on the ERPs to the second word. Taken together, the pattern of ERPs to the emotional word pairs in this experiment was straightforward. First, emotionality of the first (modifier) words had no impact on the ERPs to these words, despite the use of words whose emotionality ratings were at least as strong, in hedonic valence and emotional arousal, as in earlier studies of words in isolation, and despite the use of a task requiring semantic analysis of both words, and an interval between first and second words in which emotionality effects had been clearly seen in prior work. Second, there were clear effects of emotionality of second words — the modified nouns — on ERPs. Third, as with earlier studies, these ERP effects were driven by the emotional intensity (arousal) of the words rather than by differences in hedonic valence; in no case did the pleasant and unpleasant second words systematically differ from each other in the ERP.
Comprehension of single word within noun phrases The lack of emotionality effects for the first words in this experiment is presumably a consequence of the coherence encoding task, which required comprehension of the word pair as a phrase. The affective meaning of the first word thus must await the completion of the entire phrase. To test this hypothesis, the same word pairs were presented with the same timing, using an encoding task that directed attention to the meaning of each individual word of the pair, rather than the phrase as a unit. In the last experiment to be reviewed, this was done by
presenting occasional target words from one of two prespecified semantic categories, and requesting participants to monitor each word pair for the occurrence of one of these target words. There is substantial evidence that, for sentences at least, comprehension of meaning at the phrase level may not be automatic. For example, Fischler et al. (1985; see also Fischler and Raney, 1991) found that the truth value of sentences such as A robin is a bird/A robin has a bird modulated the size of the N400 priming effect when the task was sentence verification, but not when the task was to judge the semantic relatedness of the first and second nouns (see also Brega and Healy, 1999). Our hypothesis, then, was that the focus on individual words rather than phrases in the second experiment in this series would elicit emotionality effects in the ERPs to the first words. In contrast, if the adjectives used were simply not potent enough to elicit an emotional response, then changing the task should have little effect on the ERP results, with first words again showing no emotionality effect. Regarding the pattern of congruence effects, there were several possibilities. Klauer and Stern (1992) have suggested that affective congruence effects depend on the ease with which the prime and target words can be implicitly thought of as a pair. This would imply that congruence effects should have been observed in the phrase task, however. If we are successful in eliciting an emotionality response to the first word by requiring a decision to it, this might set the stage for such match/mismatch effects between the first and second words. Nonetheless, the SOA remains substantially longer than those typically found to produce affective priming effects (see Bargh et al., 1992; Wentura, 2000), and thus congruence effects were not strongly expected in this experiment. The design and materials of this experiment were identical to that of the phrase coherence task, with the exception that the twenty incoherent pairs were modified so that one of the two words came from one of two semantic categories: tools and articles of clothing. Participants were told to read each word of each pair, and press a response button when a word from one of the two target categories was shown. Several examples of category instances were
199
Fig. 9. Event-related brain potentials at five scalp locations following presentation of an affectively unpleasant, neutral, and pleasant first word of a two-word phrase. In this experiment, participants pressed a key when they judged either of the two words of the pair to be a member of one of two semantic categories (tools or articles of clothing, represented by a set of filler pairs; see text).
given. As before, the SOA was 750 ms, and each participant saw all 200 pairs, in random order. The ERPs to the first words, as a function of hedonic valence (and averaged across second word valence), are shown for the five cortical sites in Fig. 9. The major morphological features of the ERPs at the frontal and central sites, as well as the more complex waveform parietally, correspond closely to those observed in the noun phrase task. The most notable overall difference from the pair coherence experiment is that now there is a substantial emotionality effect in the ERPs. Emotional words (either pleasant or unpleasant) again prompt more positive potentials, compared to neutral words, which is apparent around 400 ms and continues into the LPP region at centrofrontal sites. The effects, which were significant in the 350–450 ms interval and were marginal just before and after this time window, were entirely due to a difference between emotional and neutral conditions, with ERPs to pleasant and unpleasant words essentially identical to each other and both different from that elicited for neutral words. The ERPs to the second words as a function of hedonic valence (and averaged across first word
valence) are shown for each recording site in Fig. 10. The only time window resulting in a significant effect of word emotionality is 450–600 ms after onset of the second word. As before, contrasts showed that the difference is due to greater positivity for ERPs to pleasant and unpleasant words, compared to neutral words, with no difference in the ERPs between the emotional words. The effect did not differ across electrode site. There was little evidence of an interaction between first and second word valence at any site except for a slight trend in the 600–750 ms time window at P3. The pattern of amplitudes across conditions at this interval was complex, however, and did not easily lend itself to an affective priming interpretation. In sum, the data clearly show that when the words of a pair that potentially form a coherent phrase must be comprehended as individual words, the emotionality of the first as well as the second word results in an enhanced positive-going ERP. This contrasts with the ERPs to the identical words when the two items are treated as a noun phrase. On the other hand, there was no evidence that emotional congruence of the items affected ERPs to the second word in the pairs.
200
Fig. 10. Event-related brain potentials at five scalp locations following presentation of an affectively unpleasant, neutral, and pleasant second word of a two-word phrase. In this experiment, participants pressed a key when they judged either of the two words of the pair to be a member of one of two semantic categories (tools or articles of clothing, represented by a set of filler pairs; see text).
Emotionality effects on the ERPs to the first and second words of a word pair, then, was critically affected by the task. When the word pairs were considered as phrases, effects of emotionality of the first word (i.e., the modifier) were wholly absent, but when the pairs were treated as two separate words, a significant emotionality effect was observed in the ERP elicited by the first word. This demonstrates that the absence of these effects in the noun phrase task is not due to the particular words used, or to the word class of adjectives vs. nouns. Rather, we conclude that when word sequences can form phrases, and are treated as such, the emotional response to the modifying adjectives is deferred until, so to speak, one knows what is being modified: is it a tyrant, a puppy or a letter that’s dead? That is, the hedonic valence is only apparent at the level of the word phrase, rather than at the level of the individual words. The shift from phrase- to word-level tasks had the opposite effect on the emotional response to the second words, reducing the ERP differences in size, as well as in temporal and topographic extent. This reduction is consistent with the view that for phrases, the emotionality effect in the ERPs is wholly driven by the emotionality of the phrase,
rather than determined by a sequential cascade of emotionality effects to each word in turn. The emotionality effect for second words in the noun-phrase task was greatest, in fact, not to the most emotionally arousing individual words, but to the most emotionally arousing phrases, as shown by a post hoc analysis of emotionality ratings of the phrases. The contrasting effects of emotionality in the two experiments suggest limits to the automaticity of comprehension of emotional aspects of word meaning, in at least two ways. First, consistent with the findings of our studies of single words, even when the meaning of words is being processed, as in both experiments here, the response to their emotionality depends on the current task, with some tasks failing to elicit any emotionality effects in the ERPs. Second, consistent with the findings of Fischler et al. (1985) and others, the meaning of phrases and clauses requires attention at that level of analysis. When words in a sequence are treated as isolated lexical events, on the other hand, even if they have the potential to be integrated into phrases, the impact of phrase-level meaning is attenuated. In contrast to the effects of word emotionality and task on the ERPs, there was little effect of the
201
congruence or incongruence of emotional valence on the ERPs in either experiment. Since we did observe what might be considered a congruence effect in the behavioral data — the coherence ratings were higher, and the probability of ‘‘incoherent’’ responses was lower for emotionally congruent than incongruent pairs — it may be that the ERP measure is simply insensitive to the hedonic valence dimension of emotionality of words. However, this seems unlikely given the several positive findings of congruence effects in very different paradigms (e.g., Cacioppo et al., 1993; Chung et al., 1996). Alternatively, such effects with the present materials and tasks may be observable only when the SOA between first and second words is much shorter, as is the case for the affective priming studies (see above) — under 300 ms. To summarize the main results, the pattern of emotionality effects for ERPs suggests that when words are combined into phrases, and treated as phrases, the emotional response to these pairs is driven by the emergent meaning of the pair as a linguistic unit, rather than a sequential unfolding of emotionality effects to individual words that cascades into an overall emotionality effect. This is seen in the absence of emotionality effects to the first words and the enhanced emotionality effects to the second word (compared to our prior work, at least). In this view, the absence of systematic congruence effects between first and second words also makes sense: if there is no emotionality response to the first word as such, then there is nothing to be congruent or incongruent with the second word. What matters is the meaning of the phrase.
Comprehending words, phrases, and sentences Across these studies, there is consistency in how emotionality affects the brain’s response to words. ERP differences generally took the form of a broad increase in the magnitude of a late positive potential for pleasant and unpleasant words, relative to neutral words. This emotionality effect occurs around a half second after word onset, although it can emerge as early as 300 ms post-onset, and can endure for several hundred milliseconds or longer. The effect is maximal at central and
parietal sites, with only a slight trend toward a leftmaximal lateral asymmetry. With few exceptions, there is little difference between pleasant and unpleasant words in these ERPs. This pattern of ERP modulation is strikingly similar to that found when people view emotional pictures: in both cases, a late positive potential, emerging around 300–400 ms after stimulus onset and maximal over central–parietal sensors is apparent, which is heightened for emotionally engaging, compared to neutral, stimuli (e.g., Cuthbert et al., 2000). For both pictures and words, ERPs primarily reflect the intensity or arousal parameter of emotion, rather than differences in hedonic valence. One interpretation is that the late positive component of the ERP reflects heightened attention allocation to motivationally salient stimuli, which is implemented in cortical structures and initiated by activity in subcortical structures differentially engaged by appetitive and defensive stimuli. Because the ERP only weakly reflects differential subcortical activity, however, it instead reflects the heightened attention and engagement that is found when emotional (whether pleasant or unpleasant) cues are encoded. Collectively, the studies reviewed here also show how the specific encoding task, and also whether the words are presented in isolation or embedded in other words as part of phrases impact the presence, timing and extent of ERP emotionality effects. It is a reminder that, as claimed by Gestalt psychologists and ‘‘interactionist’’ psycholinguists, a sentence is more than the sum of its lexical parts. To understand how emotion is comprehended through language, it is necessary to consider both the local and global aspects of language processing, as well as the larger context in which we hear or read of emotional events.
Acknowledgments The research described here was supported by NIMH grants to each author as investigators in the NIMH Center for the Study of Emotion and Attention (P50-MH52384) at the University of Florida. The first series of experiments, on effects of single emotional words, was done in
202
collaboration with Romko Sikkema, Vincent van Veen, and Michelle Simmons. The second series of experiments, on effects of emotional noun phrases, was done in collaboration with Michael McKay, David Goldman, and Mireille Besson. Portions of the first noun phrase experiment formed the basis of David Goldman’s senior thesis at the University of Florida. We acknowledge the assistance of numerous undergraduate students in data collection and analysis, notably Michael Mattingly, Moshe Feldman, Kim Anderson, Carla Hill, Candice Mills, and William Lowenthal. References Anderson, A.K. (2005) Affective influences on the attentional dynamics supporting awareness. J. Exp. Psychol. Gen., 134: 258–281. Arnold, M.B. (1945) Physiological differentiation of emotional states. Psychol. Rev., 52: 35–48. Bargh, J.A., Chaiken, S., Govender, R. and Pratto, F. (1992) The generality of the automatic attitude activation effect. J. Pers. Soc. Psychol., 62: 893–912. Begleiter, H., Porjesz, B. and Garosso, R. (1979) Visual evoked potentials and affective ratings of semantic stimuli. In: Begleiter, H. (Ed.), Evoked Brain Potentials and Behavior. Plenum Press, New York. Bellezza, F.S. (1984) Reliability of retrieval from semantic memory: noun meanings. Bull. Psychon. Soc., 22: 377–380. Bernat, E., Bunce, S. and Shevrin, H. (2001) Event-related brain potentials differentiate positive and negative mood adjectives during both supraliminal and subliminal visual processing. Int. J. Psychophysiol., 42: 11–34. Bradley, M.M. and Lang, P.J. (1994) Measuring emotion: the Self-Assessment Manikin and the semantic differential. J. Behav. Ther. Exp. Psychiatry, 25: 49–59. Bradley, M.M. and Lang, P.J., (1998). Affective norms for English words (ANEW): instruction manual and affective ratings. Technical Report A-8, The Center for Research in Psychophysiology, University of Florida. Bradley, M.M. and Lang, P.J. (2000) Measuring emotion: behavior, feeling and physiology. In: Lane, R. and Nadel, L. (Eds.), Cognitive Neuroscience of Emotion. Oxford University Press, New York. Brega, A.G. and Healy, A.F. (1999) Sentence interference in the Stroop task. Mem. Cogn., 27: 768–778. Cacioppo, J.T., Crites, S.L., Berntson, G.G. and Coles, M.G. (1993) If attitudes affect how stimuli are processed, should they not affect the event-related brain potential? Psychol. Sci., 4: 108–112. Cacioppo, J.T., Crites Jr., S.L. and Gardner, W.L. (1996) Attitudes to the right: evaluative processing is associated with lateralized late positive event-related brain potentials. Pers. Soc. Psychol. Bull., 22: 1205–1219.
Chapman, R.M., McCrary, J.W., Chapman, J.A. and Bragdon, H.R. (1978) Brain responses related to semantic meaning. Brain Lang, 5: 195–205. Chapman, R.M., McCrary, J.W., Chapman, J.A. and Martin, J.K. (1980) Behavioral and neural analyses of connotative meaning: word classes and rating scales. Brain Lang, 11: 319–339. Chung, G., Tucker, D.M., West, P., Potts, G.F., Liotti, M. and Luu, P. (1996) Emotional expectancy: brain electrical activity associated with an emotional bias in interpreting life events. Psychophysiology, 33: 218–233. Cuthbert, B.N., Schupp, H.T., Bradley, M.M., Birbaumer, N. and Lang, P.J. (2000) Brain potentials in affective picture processing: covariation with autonomic arousal and affective report. Biol. Psychol., 52: 95–111. Dalgleish, T. (1995) Performance on the emotional Stroop task in groups of anxious, expert, and control subjects: a comparison of computer and card presentation formats. Cogn. Emotion, 9: 341–362. Duffy, E. (1941) An explanation of ‘‘emotional’’ phenomena without the use of the concept ‘‘emotion’’. J. Gen. Psychol., 25: 283–293. Fazio, R.H., Sanbonmatsu, D.M., Powell, M.C. and Kardes, F.R. (1986) On the automatic activation of attitudes. J. Pers. Soc. Psychol., 50: 229–238. Fischler, I. (1977) Associative facilitation without expectancy in a lexical decision task. J. Exp. Psychol. Hum. Percept. Perform., 3: 18–26. Fischler, I., Boaz, T., Childers, D. and Perry, N.W.J. (1985) Lexical and propositional components of priming during sentence verification [Abstract]. Psychophysiology, 22: 576. Fischler, I. and Raney, G.E. (1991) Language by eye: behavioral and psychophysiological approaches to reading. In: Jennings, J.R. and Coles, M.G.H. (Eds.), Handbook of Cognitive Psychophysiology: Central and Autonomic Nervous System Approaches. Wiley, Oxford, England, pp. 511–574. Herbert, C., Kissler, J., Junghoefer, M., Peyk, P. and Rockstroh, B. (2006) Processing of emotional adjectives — evidence from startle EMG and ERPs. Psychophysiology, 43: 197–206. Holcomb, P.J. (1988) Automatic and attentional processing: an event-related brain potential analysis of semantic priming. Brain Lang, 35: 66–85. Ito, T.A., Larsen, J.T., Smith, N.K. and Cacioppo, J.T. (1998) Negative information weighs more heavily on the brain: the negativity bias in evaluative categorizations. J. Pers. Soc. Psychol., 75: 887–900. Kaschub, C. and Fischler, I., 2006. The emotional blink to words without perceptual or semantic cuing. Manuscript submitted for publication. Keil, A. and Ihssen, N. (2004) Identification facilitation for emotionally arousing verbs during the attentional blink. Emotion, 4: 23–35. Klauer, K.C. (1998) Affective priming. In: Stroebe, W. and Hewstone, M. (Eds.) European Review of Social Psychology, Vol. 8. Wiley, New York, pp. 67–103.
203 Klauer, K.C. and Stern, E. (1992) How attitudes guide memory-based judgments: a two-process model. J. Exp. Soc. Psychol., 28: 186–206. Klinger, E. and Cox, W.M. (1987) Dimensions of thought flow in everyday life. Imagin. Cogn. Pers., 7: 105–128. Lang, P.J., Bradley, M.M. and Cuthbert, M.M. (1997) Motivated attention: affect, activation and action. In: Lang, P.J., Simons, R.F. and Balaban, M.T. (Eds.), Attention and Orienting: Sensory and Motivational Processes. Lawrence Erlbaum Associates, Hillsdale, NJ. Lang, P.J., Kozak, M.J., Miller, G.A., Levin, D.N. and McLean Jr., A. (1980) Emotional imagery: conceptual structure and pattern of somato-visceral response. Psychophysiology, 17: 179–192. Lindsley, D.B. (1951) Emotion. In: Stevens, S.S. (Ed.), Handbook of Experimental Psychology. Wiley, Oxford, pp. 473–516. Mehrabian, A. and Russell, J.A. (1974) A verbal measure of information rate for studies in environmental psychology. Environ. Behav., 6: 233–252. Meyer, D.E. and Schvaneveldt, R.W. (1971) Facilitation in recognizing pairs of words: evidence of a dependence between retrieval operations. J. Exp. Psychol., 90: 227–234.
Moruzzi, G. and Magoun, H.W. (1949) Brain stem reticular formation and activation of the EEG. Electroenceph. Clin. Neurophysiol., 1: 455–473. Murphy, G.L. (1990) Noun phrase interpretation and conceptual combination. J. Mem. Lang, 29: 259–288. Osgood, C.E., Suci, G.J. and Tannenbaum, P.H. (1957) The Measurement of Meaning. University of Illinois Press, Oxford. Russell, J.A. (1980) A circumplex model of affect. J. Pers. Soc. Psychol., 39: 1161–1178. Skrandies, W. (1998) Evoked potential correlates of semantic meaning — a brain mapping study. Cogn. Brain Res., 6: 173–183. Wentura, D. (2000) Dissociative affective and associative priming effects in the lexical decision task: yes versus no responses to word targets reveal evaluative judgment tendencies. J. Exp. Psychol. Learn. Mem. Cogn., 26: 456–469. Williams, J.M.G., Mathews, A. and MacLeod, C. (1996) The emotional Stroop task and psychopathology. Psychol. Bull., 120: 3–24. Williamson, S., Harpur, T.J. and Hare, R.D. (1991) Abnormal processing of affective words by psychopaths. Psychophysiology, 28: 60–273.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 10
Emotional connotation of words: role of emotion in distributed semantic systems M. Allison Cato Jackson1, and Bruce Crosson2,3 1 Nemours Children’s Clinic, Neurology Division, 807 Children’s Way, Jacksonville, FL 32207, USA Brain Rehabilitation Research Center, Malcom Randall VA Medical Center, 1601 SW Archer RD, Gainesville, FL 32608-1197, USA 3 Department of Clinical & Health Psychology, University of Florida, Gainesville, FL, USA
2
Abstract: One current doctrine regarding lexical–semantic functions asserts separate input and output lexicons with access to a central semantic core. In other words, processes related to word form have separate representations for input (comprehension) vs. output (expression), while processes related to meaning are not split along the input–output dimension. Recent evidence from our laboratory suggests that semantic processes related to emotional connotation may be an exception to this rule. The ability to distinguish among different emotional connotations may be linked distinctly both to attention systems that select specific sensory input for further processing and to intention systems that select specific actions for output. In particular, the neuroanatomic substrates for emotional connotation on the input side of the equation appear to differ from the substrates on the output side of the equation. Implications for semantic processing of emotional connotation and its relationship to attention and motivation systems are discussed. Keywords: language; emotion; functional magnetic resonance imaging; semantic processing; neuroimaging
three studies inform theories of semantic and emotion processing. Before we begin, however, it is necessary to define EC and how processing EC differs from the actual experience of emotion. EC refers to knowledge about the emotional property of an object, action, or event. It is not a direct reference to an emotion such as happiness, sadness, anger, or fear. Rather, it is implied knowledge that an object, action, or event is likely to evoke a specific emotion. For example, our knowledge of different animals indicates that bears are dangerous animals and, therefore, are to be feared. Or, our knowledge about birthday parties indicates that these are (usually) happy events. While such knowledge (i.e., EC) undoubtedly is derived, at least in part, from experience with birthday parties, bears, and other objects, actions, or events that evoke emotion, a distinction can be
In this paper, we discuss the semantics of the emotional connotation (EC) of words and how the EC of words (positive or negative) is processed in the brain relative to words with neutral EC. We will review findings from three functional magnetic resonance imaging (fMRI) studies that support unique brain activity associated with processing and producing words with EC. This series of studies informs theories of semantic organization in the brain as well as theories of emotional processing. We will review relevant theoretical background in semantics and emotional processing, prior to the presentation of findings from the three neuroimaging studies. In the ‘Discussion’ section, we will discuss how these
Corresponding author. Tel.: +1-904-390-3665; Fax: +1-904-390-3470; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56010-8
205
206
drawn between EC and emotional experience. While it is unclear whether emotions must be invoked to some degree to understand EC, it is safe to say that it is not necessary to experience an emotion to understand the emotional significance of an object, action, or event. In other words, one does not have to be subjected to the intense fear that one might experience when standing face-to-face with a bear to understand that a bear should evoke this emotion. Indeed, the fact that EC can be understood and evaluated in the absence of intense emotion can be a useful tool for learning about the emotional significance of potentially dangerous situations so that one can take steps to avoid these and the strong emotions they would likely provoke.
Relevant theories of semantic processing Theories of lexical–semantic processing attempt to explain the relationship between stored knowledge and the means by which we gain, access, and express knowledge. One current doctrine regarding lexical–semantic functions (e.g., Ellis and Young, 1988) asserts the existence of separate input and output lexicons, each of which has access to a central semantic core that processes the meaning of objects, actions, and words (see Fig. 1). An implication of the model is that semantic knowledge would be gained primarily through heard or written words or through seen objects. On the input side, information is filtered through the appropriate lexicon (phonological, orthographic, or structural) before meaning is processed at the level of semantics. On the output side, a concept from semantic processing can be expressed verbally through output lexicons (most commonly, phonologic, or orthographic). According to this model, separate representations (i.e., lexicons) exist in the brain for each input and output modality. Thus, it is possible to have an impairment in one input or output lexicon but leave other lexicons intact. Further, semantic knowledge about objects, actions, and words can be retained in the face of impairment in any or all of the lexicons. This model also suggests that access to the meaning of an object becomes independent of the means by which we learn or access that information. That is, each input or output modality has equal and full
heard word
seen object
printed word
phonologic input lexicon
structural description system
orthographic input lexicon
semantics
phonologic output lexicon
orthographic output lexicon
spoken word
written word
Fig. 1. Model of lexical–semantic processing. Adapted from Ellis and Young (1988).
access to stored knowledge. Although many case studies in the literature support aspects of this model, in particular, the occurrence of isolated impairment or preservation of one of the lexicons, the model has significant drawbacks. First, it can be of limited utility in understanding broader and more common impairments of language. Second, evidence suggests that the semantic system is neither modular nor unitary. Third, the model fails to take into account attentional systems that select one source of information among multiple sources for further processing or intentional systems that select one action for execution among multiple possible actions. For a long time, lesion studies were the primary means of building and examining models such as that of Ellis and Young (1988). However, over the past decade and a half, functional neuroimaging techniques have provided in vivo methods for measuring brain activity during cognitive challenge. For example, the most commonly used fMRI technique takes advantage of a natural contrast agent, deoxyhemoglobin, which occurs in red blood cells and is paramagnetic. A dynamic change in the proportion of this natural contrast agent to oxygenated hemoglobin creates the blood oxygen level dependent (BOLD) signal. The BOLD signal typically
207
increases as blood flow increases. Thus, when blood flow increases to perfuse brain regions that subserve a given activity (e.g., primary motor cortex during finger tapping), the BOLD signal increases as well, revealing active brain regions in fMRI studies. Over the past 25 years, lesion and subsequently neuroimaging evidence have contributed to our knowledge about the architecture of the semantic system. Semantic attributes (e.g., visual form or function), semantic category (e.g., living vs. nonliving), and modality of processing (e.g., visual or verbal) have been shown to be important components of the semantic system. Warrington and Shallice (1984) had linked semantic attributes to semantic categories, explaining category-specific deficits by the fact that some objects are more closely associated with specific semantic attributes than other categories. For example, function is more important for distinguishing between nonliving items, such as tools (i.e., implements), and visual form is more important for distinguishing between living items such as animals. Thus, when visual processing is impaired, semantic processing of animals should be more affected than processing of tools. Warrington and McCarthy (1987) later added an action-oriented domain to explain differences in processing small objects that could be manipulated as opposed to large objects that could not be manipulated. While there is much validity to these observations, they suggest an interdependence among semantic attribute, semantic category, and even to some degree modality. Yet, one particularly seminal lesion study by Hart and Gordon (1992) demonstrated that modality of processing, semantic attributes, and semantic category could be fractionated from one another, suggesting that the neural substrates for these dimensions might differ in some important ways. In functional neuroimaging studies, disparate patterns of brain activity have been demonstrated for processing information about animals vs. processing information about tools (e.g., Damasio et al., 1996; Martin et al., 1996). One such focus has been in the ventral visual stream, where different regions appear to be involved in processing living vs. nonliving things (e.g., Chao et al., 1999; Ishai et al., 1999). Further, processing movement or action-related words tends to activate motor or premotor cortices
during semantic processing, and the cortex activated may even vary depending upon the body part used for the action in question (Pulvermuller, 2005). However, these neuroimaging studies have not dissociated attribute, category, or modality. The problem is that the categories of items studied have been closely associated with particular attributes and/or modalities, making it difficult to separate and explore the different components of the semantic system. For example, form (i.e., shape) is usually processed in the visual modality, and when visual cortex is activated for objects relying upon shape for identification (e.g., animals), it is difficult to determine if this cortex is activated because of the specific feature being processed (shape) or because of the modality in which the feature is typically represented. It should also be noted that few studies have explored the impact of engaging attentional vs. intentional systems in semantic processing. We became interested in EC as a semantic attribute because it is relatively independent of modality of processing and category of the object being processed. For example, one can experience disgust when presented with numerous living and nonliving objects. Further, one can experience disgust by seeing, hearing, touching, smelling, tasting, or even talking about the disgusting object, action, or event. When an object possesses an EC (whether pleasing, disgusting, frightening), it is a defining characteristic of that object, i.e., one of the object’s semantic attributes. Although one does not have to experience a full-blown emotion to understand EC, it is useful to understand some aspects of emotional processing. Such an understanding will enable us to draw important distinctions between emotional experience and EC. Thus, before discussion of our observations regarding brain activity during processing words with positive or negative EC relative to words with neutral EC, a brief discussion of relevant emotion theory is offered.
Relevant theories of emotion processing Emotion processing theories attempt to predict the nature and location of processing emotionally salient information in the brain. While it is well established that limbic areas, including the orbitofrontal
208
cortex, the amygdala, and the hypothalamus are involved in the experience of emotions such as fear, rage, and joy, it is not as clear what brain regions mediate processing of EC. With respect to viewing emotionally salient pictures, evidence has been found to support two competing hypotheses. The right hemisphere hypothesis states that emotionally salient information is biased for processing in the right hemisphere, relative to emotionally neutral material. On the other hand, the bivalent hypothesis suggests that information that is negatively valenced is biased for processing in the right hemisphere, while the left hemisphere mediates processing of positive EC. Evidence for these hypotheses come from lesion studies (e.g., Blonder et al., 1991; Peper and Irle, 1997) and EEG and neuroimaging studies (e.g., Davidson, 1995; Lane et al., 1997a, b; Canli et al., 1998; Lang et al., 1998) in healthy adults. In most cases, pictures with emotionally salient vs. neutral material have been used to examine brain processing of EC. However, processing words with EC is less likely to evoke emotional reactions in individuals, relative to viewing pictures. Thus, it is likely that processing words with EC will lead to different patterns of activity than viewing pictures of actual items with EC. Fewer studies have examined brain correlates of processing words with EC. Maddock and Buonocore (1997) used fMRI to examine brain activation during processing of threat-related vs. neutral words. They found left retrosplenial activity that was unique to processing threat-related relative to neutral words. They did not find evidence to support either the right hemisphere or bivalent hypothesis. In a follow-up study, Maddock et al. (2003) found rostral frontal and posterior cingulate activity when evaluating either pleasant or unpleasant compared to neutral words; however, these authors could not distinguish between the roles of these two regions on the basis of their data. Similarly, Beauregard et al. (1997) used PET to investigate brain activity during viewing of words with EC. Findings of unique brain activation associated with processing EC included unique activity in left prefrontal and left orbital frontal cortex. Thus, previous neuroimaging studies of EC of words suggested that left frontal and posterior cingulate regions mediate verbal processing of EC. These previous studies
did not address whether the brain regions that process words with EC differ when the task at hand weighs more heavily on intentional or attentional demands. Intention refers to the selection of one among multiple potential actions for execution and initiation of that action (Fuster, 2003; Heilman et al., 2003; Crosson et al., in press). The relevance of intention to emotion is clear. Emotions have been defined as action dispositions (Lang, 1995). Because emotions act as motivators for action, they will be a factor in selecting one course of action vs. others. Specifically, objects, actions, and events with negative ECs generally will be avoided, and objects, actions, and events with positive ECs normally will be sought out. Attention is the selection of one source of information among multiple sources of information for further processing and initiation of that processing. Emotionally salient stimuli will be attended to because of their implications for guiding behavior. We will develop these themes in greater detail as we discuss the implications of our findings.
Review of three studies from our laboratory By conceptualizing EC as a semantic attribute and with support of previous neuroimaging studies of processing words with EC, we predicted that processing words with EC would lead to a pattern of brain activity in the left hemisphere different from that of processing words with no strong EC. In particular, processing the EC of words should engage areas of the brain with access to emotions and/or emotional knowledge. In addition, we suspected that the brain regions associated with processing words with EC would be the same whether the task was more heavily weighted toward processing incoming information (i.e., engaged attentional systems) vs. more heavily weighted toward producing a response (i.e., engaged intentional systems). In other words, we predicted that the brain regions processing EC would be invariant across task demands that emphasize input vs. output. To examine the role of EC in lexical–semantic processing, our group developed three paradigms. Two of these tasks involved generation of multiple
209
words from given semantic categories and a third semantic monitoring task involved making semantic judgments about given words. All tasks engaged the phonological input lexicon to some degree (spoken words were the input stimulus in all cases). However, the tasks differed by the relative emphasis on attentional vs. intentional task demands. The two word generation tasks relied more on the intentional system — the system that mediates response selection and production. The semantic monitoring task placed heavier demands on the ability to attend to spoken words and make semantic decisions. This task involved some response selection as well, but the primary processing demands were on the input side. The results of these three studies are previously published and described in great detail elsewhere (Crosson et al., 1999, 2002; Cato et al., 2004). In this paper, we will evaluate the results of the three fMRI tasks to determine what conclusions we can draw about the role of EC in lexical–semantic processing. In particular, we will examine the results to address whether (1) unique brain activity was consistently associated with processing EC and whether (2) the pattern of activity differed across tasks that were more heavily weighted towards attentional vs. intentional demands. As noted above, EC is intimately associated with intention, providing the motive to select which action is performed in many circumstances. In other words, emotions provide an important substrate for action. Likewise, EC is intimately associated with attention, providing the motive to attend to some stimuli and not others. For these reasons, the anatomical units involved in intention and attention will be briefly addressed. Following the suggestion of the Russian neuroanatomist Betz from the late 19th century, Fuster (2003) and Luria (1973) have both described the cerebral cortex as being divided into an anterior realm involved primarily in action and a posterior realm involved primarily in sensory processing. Further, Heilman et al. (2003) and Fuster (2003) have divided attention into two realms. As noted above, intention selects one among several competing actions for execution and regulates activity in the anterior realm of action. Heilman et al. (2003) have suggested that anterior cingulate cortex is involved in translating emotion/
motivation into action. Attention properly selects one among several competing sources of sensory information for further processing and regulates activity in the posterior realm of sensory processing. Heilman et al. (2003) have suggested that posterior cingulate cortex (including retrosplenial cortex) is involved in selecting the stimuli that are motivationally/emotionally significant. This analysis suggests an alternative to our original hypothesis of no difference in processing EC based on input or output. If processing EC is intimately associated with attention and intention, the degree to which a task emphasizes sensory processing vs. action may determine what mechanisms are engaged to perform the task. Across the three studies that follow, we present findings that suggest (contrary to our original predictions) that the retrosplenial region, in more posterior cortex, may have a more sensory, or attentional, role in semantic processing in contrast with the frontal region. The findings also suggest that the rostral frontal region may have more of a role in the intentional aspects of word production.
Question 1: Does processing emotional connotation of words lead to unique patterns of brain activity relative to processing words with neutral emotional connotation? In the first and the second studies, only the left hemisphere was scanned due to limitations in the number of slices we could acquire. In the first study (Crosson et al., 1999), 17 right-handed healthy adults alternated between generating words and repeating words with neutral EC in each of two functional runs (Fig. 2). Each functional run consisted of 6.4 cycles, and word generation and word repetition half cycles lasted 18.4 s each. During each half cycle participants heard a cue, either ‘generate’ or ‘repeat’ followed by either a category, or a list of emotionally neutral words to repeat, respectively. When just provided a category, participants silently generated as many words as possible until they were told to stop generating words 18.4 s after presentation of the category. One run consisted of neutral categories and the other run consisted of categories with positive and negative ECs. Rate of generation
210 18.4 s
18.4 s
18.4 s
gen
rep
gen
6.4 cycles
Fig. 2. In each of two functional runs during the first experiment, participants alternated between word generation and word repetition. Each functional run consisted of 6.4 cycles, and word generation and word repetition half cycles lasted 18.4 –s each. During the word generation task, when provided a category, participants silently generated as many words as possible until told to stop generating. One run used emotionally neutral categories and the other run used categories with positive and negative emotional connotations. (From Crosson et al., 1999.)
of words to emotional and neutral categories were closely matched by selecting categories with ECs and neutral categories that yielded similar response rates during a pilot study. As measured by oral generation after the scanning session, the rate of generation for categories with ECs and emotionally neutral categories were closely matched to the rate of word repetition, with no significant differences in rate (see Crosson et al., 1999 for details). In data analyses, no distinction was made between positive vs. negative EC. When generation of words for categories with EC was compared to generation of words for categories with neutral EC, an area in the left rostral frontal lobe was active (see Fig. 3). In the second study (Crosson et al., 2002), we compared semantic monitoring of words with EC to semantic monitoring of words with neutral EC (Fig. 4). In each of three functional runs, 16 healthy, right-handed adults alternated between monitoring words and monitoring tones. Each functional run consisted of eight cycles, and toneand word-monitoring half cycles lasted 28 s each. During each 28 s half cycle of word monitoring, participants heard nine words. During one run, participants listened to words with positive and negative ECs. For this run, they pressed a button each time the word was both nonliving and had a negative connotation (e.g., vomit). The other two runs consisted of emotionally neutral words. In one neutral run, the participants monitored implements for those that required only one hand to use and are used primarily outdoors (e.g., baseball). The other neutral run consisted of monitoring animals having
more than two legs and primarily land-dwelling (e.g., grasshopper). The baseline task consisted of tone monitoring. During this task, participants pressed a button each time two or more highpitched tones were played in a tone sequence. Target density matched across tone- and word-monitoring tasks. The task demands were identical among the three word-monitoring tasks other than the rated emotional valence and arousal of the words. We found that monitoring words with EC for the two characteristics led to unique brain regions relative to animal or implement monitoring. When monitoring for words with EC was directly compared to monitoring implements or animals, a left rostral frontal region of brain activity was uniquely activated by semantic processing of EC (see Fig. 5). This region encompassed the same left rostral frontal area as in the first study, in which subjects generated words with EC. In the third study (Cato et al., 2004), we repeated the word generation paradigm but included whole brain coverage and enough trials to compare valence (positive or negative) of the categories. In each of four functional runs (Fig. 6), 26 healthy, right-handed adults alternated between word generation and word repetition. At the beginning of each trial within a run, participants heard a cue, either ‘generate’ or ‘repeat’ followed by either a category, or a list of emotionally neutral words to repeat, respectively. When provided with a category, participants silently generated as many words as possible in 16.5 s. Length of word repetition trials varied between 9.9 and 23.1 s (Fig. 6). Positive, negative, and neutral word generation were interspersed in each run. Rate of generation of words to both emotional categories and to neutral categories were closely matched by selecting categories with ECs and neutral categories that yielded similar response rates during a pilot study. As measured by oral generation after the scanning session, the rate of generation for categories with ECs and emotionally neutral categories showed no significant differences (see Cato et al., 2004 for details). Rated arousal was matched between categories with positive or negative EC based on the Affective Norms for English Words (ANEW; Bradley et al., 1988). Subjects’ ratings of categories demonstrated significant differences between positive, negative,
211
Frontal Pole 2.5 Emogen
2
Neutgen 1.5 1 0.5 0 1
2
3
4
5
6
7
8
9
10
Fig. 3. Activity near the left frontal pole was associated with generating words with emotional connotations vs. generating emotionally neutral words (Crosson et al., 1999).
28 s
28 s
28 s
word tone word
8 cycles
Fig. 4. In each of three functional runs during the second experiment, participants alternated between monitoring words and monitoring tones. Each functional run consisted of eight cycles, and tone- and word-monitoring half cycles lasted 28 s each. During each 28 s half cycle of word monitoring, participants heard nine words. During one run, participants made a semantic judgment about words with positive and negative emotional connotations. The other two runs consisted of emotionally neutral words. In one, the participants made semantic judgments about implements, and in the other, they made semantic judgments about animals. The comparison task consisted of monitoring tone sequences for two or more highpitched tones. Target density matched across tone- and wordmonitoring tasks. (From Crosson et al., 1999.)
and neutral categories, and arousal ratings were significantly greater for the emotional than the neutral categories (see Cato et al., 2004 for details). For a third time, robust rostral frontal activity was found, this time bilaterally, but in the left more than the right hemisphere (Fig. 7). This activity was associated with positive and negative, but not neutral EC. No significant differences in activity were found according to valence of the categories with EC.
In summary, across the three studies, the left rostral frontal region was associated with semantic processing of EC. In all three studies, we made careful choice of baseline tasks in order to isolate the semantic system. For the semantic monitoring task, to eliminate the brain activity associated with auditory processing of sounds, the tone-monitoring baseline task was used. As both the semantic monitoring and the tone-monitoring tasks required a button press, brain activity associated with executing the response was also eliminated. In the case of the two word generation tasks, use of a neutral EC word repetition baseline task subtracted elements of the phonological input as well as elements of making a (silent) verbal response. Thus, in answer to the first question, processing of EC does appear to require unique brain activity in the rostral frontal lobe. We concluded that this rostral frontal area mediates semantic processing of EC.
Question 2: Does the nature of the task influence the pattern of brain activity associated with processing emotional connotation? A cursory look at these three studies suggests that the nature and location of processing EC does not change when task demands are varied. In fact, a comparison between studies 1 and 2 reveals that the
212
Fig. 5. Sagittal and coronal images reveal activity in the left rostral frontal pole that is unique to monitoring words with emotional connotation relative to monitoring tool names or animal names (Crosson et al., 2002).
16.5”
9.9”
16.5”
16.5”
16.5”
23.1”
L2 “Weapons”
gen
“Vacations”
rep-4
gen
“Buildings”
rep-7
gen
A58
rep-10
Emotional versus Neutral Words 9.5 cycles
L3 Fig. 6. In each of four functional runs during the third study, participants alternated between word generation and word repetition trials. At the beginning of each trial within a run, participants heard a cue, either ‘generate’ or ‘repeat’ followed by either a category, or a list of emotionally neutral words to repeat, respectively. When provided with a category, participants silently generated as many words as possible in 16.5 s. Length of word repetition trials varied between 9.9 and 23.1 s. Positive, negative, and neutral word generation trials were interspersed in each run. Rate of generation of words to emotional and neutral categories were closely matched. Rated arousal was matched between categories with positive or negative emotional connotation. (From Cato et al., 2004.)
A60
Positive versus Neutral Words
L4
A60
Negative versus Neutral Words
same frontal region is active for semantic processing of EC during an attentional task and during an intentional task. However, in the third study (Cato et al., 2004), involving a larger sample size and a larger number of blocks, we found another major cluster of functional activity associated with semantic processing in the left retrosplenial region. This region has been implicated frequently in the literature as a candidate region for semantic processing of EC (Maddock and Buonocore, 1997; for review, Maddock, 1999). Surprisingly, time courses
Fig. 7. In the third study comparing word generation to positive, negative, vs. neutral emotional connotations, the left frontal pole was significantly active during processing of positive and negative, but not neutral emotional connotations.
significantly differed between the rostral frontal and retrosplenial areas (see Fig. 7). The rostral frontal region remained active throughout the word generation blocks and activity in this region declined to baseline levels only after the end of the trial. The retrosplenial area was active upon hearing the
213
X = -2 1.4 1.2 1 0.8 0.6 0.4
Signal
1.4
0.2 Signal
red = p < 0.005, yellow = p < 0.001
0
1.2
-0.2
1
-0.4
0.8
-0.6
0.6
-0.8
0.4
-1
0.2
-1.2
1
2
3
4
5
6
7
8
9
10 11
Image Number
0 -0.2
1
2
3
4
5
6
7
8
9
10 11 Positive Categories Negative Categories Neutral Categories
-0.4 -0.6 -0.8 -1 -1.2 Image Number
Fig. 8. Time course analysis revealed that rostral frontal cortex and retrosplenial cortex may play different roles in processing emotional, but not neutral, emotional connotations (Cato et al., 2004).
category cue at the beginning of word generation trials, but the activity in this area returned to baseline during the responses made by the participant. The fact that both areas demonstrated activity during generation of words with EC but not during generation of emotionally neutral words suggested that both areas are involved in semantic processing of EC. However, the differences in time courses indicated that the rostral frontal region was more involved in semantic processing of EC during word production, while the retrosplenial region was more involved when attention was directed towards incoming (in this case, heard) categories with EC (Fig. 8). The implications for intention and attention in processing EC will be discussed below.
Discussion Returning to the Ellis and Young (1988) model described at the beginning of the paper, it appears that new evidence from brain imaging studies informs a number of the assumptions of this modular model. First, the semantic system does not appear to be unitary in the sense that it consists of a single module. Brain activity differs during semantic processing as a function of the modality, category, and attributes of the information being processed. In particular, processing of semantic attributes is widely distributed with the topography dependent upon the nature of the particular attribute. For example, our studies indicate that
214
emotional attributes are processed in rostral frontal and retrosplenial cortex. The findings of Hauk et al. (2004) suggest not only that human action attributes of words are processed in motor and premotor cortex, but also that the part of the body involved in the action determines the dorsal and ventral location of the activity in keeping with the motor homunculus. The findings of Chao et al. (1999) indicate that processing of visual attributes is associated with activity in the ventral visual stream, even when the input is words. This conclusion also is consistent with the lesion study of Hart and Gordon (1992). Second, each input and output sensory modality does not always have equal access to all parts of the semantic system. Rather, in some cases, the nature of the interaction of meaning with attention (sensory) and intention (response) systems may determine how some semantic information is processed. From our recent studies it specifically appears that processing EC requires unique activity in both attention systems that select specific sensory input for further processing and intention systems that select specific actions for output. In particular, the neuroanatomic substrates for EC on the input side of the equation appear to differ from the substrates on the output side of the equation. This conclusion suggests that the semantic system, at least in the case of EC, is not neatly separated from input and output systems, as Ellis and Young’s model suggests. Rather, attentional and intentional systems play a role in semantic processing of EC. It is not clear if there are other semantic attributes that share this characteristic with EC. In addition, our findings support a left hemisphere dominance for processing words with EC. This suggests that processing words with EC is different than processing pictures with EC where support for the right hemisphere and bivalent hypotheses have been found (e.g., Canli et al., 1998). EC is a fundamental attribute associated with a vast array of semantic information. It is linked to survival. Without knowledge of emotional salience of incoming information we are stripped of a major aid to safety, sustenance, and procreation. In the evolution of the species, it has been adaptive to respond quickly and to remain alert in threatening circumstances, to seek means of sustenance, and to
engage in procreation. As humans have developed language, it has become a tool to assist in these activities. Thus, the attentional and intentional systems may be relatively more invoked during processing of words with EC, leading to multiple clusters of brain activity in posterior (sensory) and anterior (action) regions. With respect to the topography for processing EC, activity in the retrosplenial region is consistent with the model invoking posterior cingulate and retrosplenial cortex in the interface between motivation/emotion and attention (Heilman et al., 2003). However, the frontal regions involved in generating words with ECs was anterior to the region that might be predicted from this model. Nonetheless, Tranel (2002) suggested that this region of the frontal lobes is involved using emotions to facilitate decision making, in other words, in selecting which actions to perform (i.e., intention). We suggest that because this area is involved in evaluating emotions during intentional processes, it well suited to processing EC of words. This synergy between its intentional role and its role in processing EC is similar to the synergy in the role of the motor and premotor cortex in executing actions and in processing action attributes of words (Hauk et al., 2004). It should be noted that the topography for processing EC was similar to the topography of Maddock et al.’s (2003) recent study; only the rostral frontal activity in our studies was somewhat superior to that of Maddock et al. Further, this latter study was not able to distinguish between the role of the rostral frontal and retrosplenial cortex on the basis of their data, whereas the time courses in our most recent study (Cato et al., 2004) suggested an explanation. Also, not all studies of processing the EC of words have produced this topography. For example, a recent study by Kensinger and Corkin (2004) attempted to dissociate valence and arousal by comparing encoding of unpleasant–low arousal words, unpleasant high-arousal words, and neutral low-arousal words. Findings for neither valence nor arousal mapped well onto our most recent findings (Cato et al., 2004), most likely because of significant differences in task. Finally, it is unclear whether other types of information also require processing in both sensory and action planning areas during semantic processing, or
215
whether this feature is unique to EC. In addition, the series of studies reported here all used the same input modality and did not examine the effects of using visual, rather than auditory input. This preliminary evidence does suggest that more distributed models of lexical–semantic processing are needed. Such models should preserve the anatomical precision that is afforded by fMRI investigations (maintaining some aspect of modularity), but also should address the interface between attentional, semantic, and intentional systems to reflect the distributed nature of information flow during semantic processing. It appears that semantic information may be used in different ways by attentional and intentional systems to assure quick and appropriate response to incoming, salient information. More work is needed on modeling information flow and how the human brain filters information on line for appropriate response selection and action.
Acknowledgments This research was supported in part by grant no. DC 03455 from the National Institute on Deafness and Other Communication Disorders (PI: Crosson), a grant from the McKnight Institute at the University of Florida (PI: Crosson), and Research Career Scientist Award no. B3470S to Dr. Crosson from the Department of Veterans Affairs Rehabilitation Research and Development Service. Dr. Cato wishes to thank Nemours Children’s Clinic for its support during the preparation of this manuscript.
References Beauregard, M., Chertkow, H., Bub, D., Murtha, S., Dixon, S. and Evans, A. (1997) The neural substrate for concrete, abstract, and emotional word lexica: a positron emission tomography study. J. Cogn. Neurosci., 9: 441–461. Blonder, L.X., Bowers, D. and Heilman, K.M. (1991) The role of the right hemisphere in emotional communications. Brain, 114: 1115–1127. Bradley, M.M., Cuthbert, B.N. and Lang, P.J. (1988) Affective Norms for English Words (ANEW): Technical Manual and Affective Ratings. University of Florida, Gainesville, FL. Canli, T., Desmond, J.E., Zhao, Z., Glvoer, G. and Gabrieli, J.D.E. (1998) Hemispheric asymmetry for emotional stimuli detected with fMRI. Neuroreport, 9: 3323–3339.
Cato, M.A., Crosson, B., Gokcay, D., Soltysik, D., Wierenga, C., Gopinath, K., Himes, N., Belanger, H., Baver, R.M., Fischler, I.S., Gonzalez-Rothi, L. and Briggs, R.W. (2004) Processing words with emotional connotation: an FMRI study of time course and laterality in rostral frontal and retrosplenial cortices. J. Cogn. Neurosci., 16: 167–177. Chao, L.L., Haxby, J.V. and Martin, A. (1999) Attribute-based neural substrates in temporal cortex for perceiving and knowing about objects. Nat. Neurosci., 2: 913–919. Crosson, B., Benjamin, M. and Levy, I.F. (in press) Role of the basal ganglia in language and semantics: supporting cast. In: J. Hart, Jr. and M. Kraut (eds.), Neural Bases of Semantic Memory. Cambridge University Press, Cambridge, UK. Crosson, B., Cato, M.A., Sadek, J.R., Gokcay, D., Bauer, R.M., Fischler, I.S., et al. (2002) Semantic monitoring of words with emotional connotation during fMRI: contribution of anterior left frontal cortex. J. Int. Neuropsychol. Soc., 8: 607–622. Crosson, B., Radonovich, K., Sadek, J.R., Gokcay, D., Bauer, R.M., Fischler, I.S., et al. (1999) Left-hemisphere processing of emotional connotation during word generation. Neuroreport, 10: 2449–2455. Damasio, H., Grabowski, T.J., Tranel, D., Hichwa, R.D. and Damasio, A.R. (1996) A neural basis for lexical retrieval. Nature, 380: 499–505. Davidson, R.J. (1995) Cerebral asymmetry, emotion, and affective style. In: Davidson, R.J. and Hugdahl, K. (Eds.), Brain Asymmetry. MIT Press, Cambridge, MA. Ellis, A.W. and Young, A.W. (1988) Human Cognitive Neuropsychology. Erlbaum, London. Fuster, J.M. (2003) Cortex and Mind: Unifying Cognition. Oxford University Press, New York. Hart, J. and Gordon, B. (1992) Neural subsystems for object knowledge. Nature, 359: 60–64. Hauk, O., Johnsrude, I. and Pulvermuller, F. (2004) Somatotopic representation of action words in human motor and premotor cortex. Neuron, 41: 301–307. Heilman, K.M., Watson, R.T. and Valenstein, E. (2003) Neglect and related disorders. In: Heilman, K.M. and Valenstein, E. (Eds.), Clinical Neuropsychology (4th ed.). Oxford University Press, New York, pp. 296–346. Ishai, A., Ungerleider, L.G., Martin, A., Schouten, J.L. and Haxby, J.V. (1999) Distributed representations of objects in the human ventral visual pathway. Proc. Natl. Acad. Sci. USA, 96: 9379–9384. Kensinger, E.A. and Corkin, S. (2004) Two routes to emotional memory: distinct neural processes for valence and arousal. Proc. Natl. Acad. Sci. USA, 101: 3310–3315. Lane, R.D., Reiman, E.M., Ahern, G.L., Schwartz, G.E. and Davidson, R.J. (1997a) Neuroanatomical correlates of happiness, sadness, and disgust. Am. J. Psychiatr., 154: 926–933. Lane, R.D., Reiman, E.M., Bradley, M.M., Lang, P.J., Ahern, G.L., Davidson, R.J. and Schwartz, G.E. (1997b) Neuroanatomical correlates of pleasant and unpleasant emotion. Neuropsychologia, 35: 1437–1444. Lang, P.J. (1995) The emotion probe: studies of motivation and attention. Am. Psychol., 50: 372–385.
216 Lang, P.J., Bradley, M.M., Fitzsimmons, J.R., Cuthbert, B.N., Scott, J.D., Moulder, B. and Nangia, V. (1998) Emotional arousal and activation of the visual cortex: an fMRI analysis. Psychophysiology, 35: 199–210. Luria, A.R. (1973) The Working Brain: An Introduction to Neuropsychology. Basic Books, New York (translated by Basil Haigh). Maddock, R.J. (1999) The retrosplenial cortex and emotion: new insights from functional neuroimaging of the human brain. Trends Neurosci., 22: 310–316. Maddock, R.J. and Buonocore, M.H. (1997) Activation of left posterior cingulate gyrus by the auditory presentation of threatrelated words: an fMRI study. Psychiatry Res., 75: 1–14. Maddock, R.J., Garret, A.S. and Buonocore, M.H. (2003) Posterior cingulate cortex activation by emotional words: fMRI evidence from a valence decision task. Hum. Brain Mapp., 18: 30–41.
Martin, A., Wiggs, C.L., Ungerleider, L.G. and Haxby, J.V. (1996) Neural correlates of category-specific knowledge. Nature, 379: 649–652. Peper, M. and Irle, E. (1997) Categorical and dimensional decoding of emotional intonations in patients with focal brain lesions. Brain Lang., 58: 233–264. Pulvermuller, F. (2005) Brain mechanisms linking language and action. Nat. Rev. Neurosci., 6: 576–582. Tranel, D. (2002) Emotion, decision making, and the ventromedial prefrontal cortex. In: Stuss, D.T. and Knight, R.T. (Eds.), Principles of Frontal Lobe Function. Oxford University Press, New York, pp. 338–353. Warrington, E.K. and McCarthy, R.A. (1987) Categories of knowledge: further fractionation and an attempted integration. Brain, 110: 1273–1296. Warrington, E.K. and Shallice, T. (1984) Category-specific semantic impairments. Brain, 107: 829–853.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 11
Macroscopic brain dynamics during verbal and pictorial processing of affective stimuli Andreas Keil Department of Psychology, University of Konstanz, PO Box D23, D-78457 Konstanz, Germany
Abstract: Emotions can be viewed as action dispositions, preparing an individual to act efficiently and successfully in situations of behavioral relevance. To initiate optimized behavior, it is essential to accurately process the perceptual elements indicative of emotional relevance. The present chapter discusses effects of affective content on neural and behavioral parameters of perception, across different information channels. Electrocortical data are presented from studies examining affective perception with pictures and words in different task contexts. As a main result, these data suggest that sensory facilitation has an important role in affective processing. Affective pictures appear to facilitate perception as a function of emotional arousal at multiple levels of visual analysis. If the discrimination between affectively arousing vs. nonarousing content relies on fine-grained differences, amplification of the cortical representation may occur as early as 60–90 ms after stimulus onset. Affectively arousing information as conveyed via visual verbal channels was not subject to such very early enhancement. However, electrocortical indices of lexical access and/or activation of semantic networks showed that affectively arousing content may enhance the formation of semantic representations during word encoding. It can be concluded that affective arousal is associated with activation of widespread networks, which act to optimize sensory processing. On the basis of prioritized sensory analysis for affectively relevant stimuli, subsequent steps such as working memory, motor preparation, and action may be adjusted to meet the adaptive requirements of the situation perceived. Keywords: emotion; attention; oscillatory brain activity; rapid categorization stimulus modality (e.g., visual, auditory), information channel (e.g., verbal, pictorial), and experimental task context (e.g., passive viewing, concurrent task processing). With the body of evidence continuously increasing, these variables have proven crucial for predicting the effects that emotional content exerts on the behavioral and neural levels. In particular, discrepancies between verbal and pictorial information channels in the visual modality have attracted attention, as results appear mixed and available theoretical accounts fail to account for this lack of consistency. The question arises, what is different when emotional responses are evoked by means of verbal vs. pictorial content?
Introduction Studies capitalizing on the neural correlates of affective stimulus perception have been motivated by a multitude of research questions and traditions of science. To date, this has led to a broad range of experimental approaches used in affective perception research. Some of the most obvious choices researchers must take in designing experiments on affective stimulus processing are with respect to the nature of the stimulus material. Indeed, empirical findings point to a prominent impact of Corresponding author. Tel.: +49-7531-882696; Fax: +497531-882891; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56011-X
217
218
In the current chapter, this question is discussed in a framework of dynamic perception and action regulation in presence of behaviorally relevant information, which can be conveyed via pictorial and verbal objects. Emotions are viewed as implemented in neuronal networks encompassing aspects of stimulus representations, memories, and actions, among others. Behavioral and electrophysiological evidence is presented, supporting the view that macroscopic (neural mass) oscillations are crucial for signal transmission and plastic changes in these networks. This tenet is illustrated with experimental data from studies of emotional perception and emotion–attention interactions. Potential distinctions of verbal and pictorial channels are viewed in the light of (1) differences in the spatiotemporal activation patterns in verbal vs. picture processing and (2) differences in terms of emotional intensity. The chapter concludes with remarks on potential elements for a model of plastic perception-action regulation as mediated by dynamic cortical networks.
Affective words and pictures: behavioral data Experimental tasks involving the presence of affective stimuli have a long tradition in experimental psychology. A significant part of this literature tested predictions derived from motivational theories on human behavior, examining speed, and accuracy as a function of stimulus intensity and hedonic valence (see e.g., Kamin, 1955). In particular, trait characteristics of different populations (e.g., patient groups with anxiety disorders) were at the core of this early literature on reaction time and accuracy in response to emotional stimuli (Castaneda, 1956). Despite this long tradition, data describing the effects of affectively arousing information on behavioral performance are not conclusive. There have been contradictory results, depending on the type of behavioral task as well as on the type of stimuli used (see Kitayama, 1990). Empirical findings are at variance with each other, suggesting either facilitation (Williamson et al., 1991) or inhibition (Watts et al., 1986) of behavioral responses in the presence of aversive or appetitive stimuli. Often, differences were
observed as a function of hedonic valence and response mode (e.g., Neumann and Strack, 2000). Most interestingly, many effects depend upon interindividual differences and are reliably seen in anxious participants, but not necessarily in healthy, low-anxious controls (Bradley et al., 1995). In addition, the channels of information used in this literature (words, faces, scenes) differ in physiological and self-rated affective intensity (Sloan et al., 2002), which contributes to the inconsistency found in the literature. Surprisingly high variability of findings in motor responses to affective stimuli has been observed even in tasks having low complexity. For instance, simple response tasks as well as choice responses to affective stimuli have failed to yield converging evidence (Megibow and De Jamaer, 1979; Wedding and Stalans, 1985). With respect to choice reactions, it has recently been argued that only pleasant target content leads to facilitated response time and accuracy, whereas aversive content is associated with delayed motor responding (Leppa¨nen and Hietanen, 2003, 2004). In addition, pleasure facilitation of choice reactions has been observed with affective pictures depicting more complex scenes (Lehr et al., 1966). This is in contrast to visual search tasks in which aversive stimuli are typically detected faster than pleasant or neutral targets (Hansen and Hansen, 1988; O¨hman et al., 2001a, b). The most widely used choice reaction paradigm involving words is the lexical decision task, which requires subjects to make a choice response, indicating if a stimulus visible on the screen is or is not part of a language’s lexicon, i.e., whether it is a word or not (Calvo and Castillo, 2005). Using this design, most researchers have observed advantages in speed and accuracy for affectively arousing words, compared to lowarousing control words (e.g., Altarriba and Bauer, 2004). Thus, facilitation has been associated with facilitation for pleasant and unpleasant, compared to neutral targets (Williamson et al., 1991). As a possible explanation for this divergent pattern of findings, one may put forward the differences in physiological arousal related to viewing words, faces, and scenes (cf., Keil et al., 2005a). Pictures are capable of inducing a high amount of physiological arousal in the viewer
219
(Lang, 1994) and thus can be expected to differentially affect behavioral performance when hedonic valence (i.e., pleasant vs. unpleasant content) is manipulated. Compared to affective pictures, word and face stimuli are related to a lesser degree in arousal, both physiologically and in terms of subjective ratings (Lang et al., 1998). Besides differences in arousal, the higher frequencies of pleasant faces as well as the role of context generated by the stimuli have been put forward as possible explanations for this divergent pattern of findings (Leppa¨nen and Hietanen, 2004). Several studies observing differences between pleasant and unpleasant words (reviewed in Kissler et al, this volume) make the situation more complicated. Results from priming paradigms suggest, however, that congruent affective content facilitates subsequent choice reactions, even when prime and target stimuli are faces vs. words and vice versa (Carroll and Young, 2005). Thus, while differences in the stimulus channel may lead to different experimental outcomes, the activation of semantic information related to a specific emotional disposition may be compatible across information channels. Summarizing these results, it turns out that in choice reaction tasks with affectively arousing pictures, there is a tendency to find facilitation for pleasant, compared to neutral and unpleasant, content; whereas lexical decision is facilitated as a function of emotional arousal associated with target words. The situation changes considerably when more complex tasks are considered. In particular, tasks have been widely used that require a response to a stimulus, which is competing for resources with a concurrently visible affective stimulus. As a main finding, interference effects of to-be-ignored affective pictures were repeatedly reported (Hartikainen et al., 2000). Hence, behavioral performance in a concurrent task declines, as the amount of affective arousal present, e.g., in the background of the task array, is increased (Schimmack and Derryberry, 2005). Combining interference of picture affective content with a spatial attention task, Keil et al. (2005a) found response time interference effects of unpleasant relative to neutral picture content present in a background stream. This difference was specific to
the left hemifield and was accompanied by righthemispheric electrocortical facilitation. The emotional Stroop is one of the most widely used paradigms to show such interference of affective content with a foreground (e.g., color naming) task (Mogg et al., 1989). Generally, these studies have produced evidence for strong interference as a function of affective content, with responses to a task-relevant dimension being delayed by taskirrelevant affective content (Watts et al., 1986). Interestingly, these effects have also been observed with nonwords that gained affective significance by means of classical conditioning (Richards and Blanchette, 2004). This latter finding might be taken as evidence for a more consistent pattern of interference tasks, showing similar effects across stimulus kinds. Indeed, Stroop designs using picture representations of objects or faces with a strong affective connotation have converged with visual word Stroop (Anes and Kruer, 2004), suggesting that affective interference on concurrent task behavior can be more easily predicted. The behavioral literature as briefly reviewed above suggests that both affective faces and pictures are reliably capable of creating interference across paradigms. Affective words might be capable to automatically withdraw resources from a concurrent task specifically in paradigms, in which attention is readily available to process verbal information. In contrast, interference effects of emotional words may be diminished in tasks that require processing a high number of stimuli in a short amount of time (Harris and Pashler, 2004; Harris et al., 2004), with affective stimuli being outside the spotlight of attention (Calvo and Castillo, 2005). Thus, activation of concepts by lexico-semantic routes may differ from activation on the basis of object recognition. The ‘‘attentional blink’’ (AB) paradigm (Raymond et al., 1992) represents a potential avenue to examine these aspects. The term AB refers to an impairment in reporting the second target (T2) of two targets presented in rapid succession, if the temporal distance between T1 and T2 is in a range between 200 and 500 ms. Thus, if words are shown in a rapid series (i.e., 6–10 per second), the second of two target words identified by a certain feature (e.g., color) will be reported less likely when
220
appearing in the AB time window. Using visual word stimuli, it has been shown that affectively arousing T2 s are less affected by this effect, being associated with more accurate report than affectively neutral T2 s (Anderson and Phelps, 2001; Keil and Ihssen, 2004; Anderson, 2005). Interestingly, enhancing the affective arousal of T1 stimuli leads to greater interference with subsequent target processing as indicated in poor T2 identification accuracy (Ihssen and Keil, submitted). Most recently, interference of arousing first targets on T2 detection has been shown with affective picture T1 s (Most et al., 2005), which is consistent with results from other interference tasks reported above. The AB paradigm offers the manipulation of time intervals and is in accordance with the needs of recording electrocortical oscillations. This paradigm is, therefore, considered in more detail later in this chapter, when discussing neurobehavioral correlates of affective word and picture processing. Taken together, these findings raise two questions regarding the effects of affective content on perception and performance: First, what is the role of affective arousal as communicated via different information channels (scenes, faces, words) for perceptual and behavioral facilitation vs. interference? Second, what is the differential role of affective arousal when affective stimuli serve as targets vs. distractors. The remaining paragraphs of the present chapter address these questions, based on a multivariate database that includes behavioral, peripheral physiological, and brain measures.
Neurobehavioral correlates of affective picture processing The availability of standardized sets of pictures showing affective faces (Lundqvist et al., 1998) or scenes (Lang et al., 1999) has stimulated a plethora of studies addressing the multifaceted aspects of emotional processing. The International Affective Picture System (IAPS, Lang et al., 1999), for instance, provides normative ratings of emotional valence, arousal, and dominance, for more than 1000 colored pictures, and is referred to extensively within this volume. In the present section, a
survey of the neurobehavioral correlates of affective picture viewing is given, focusing on scene perception under varying task instructions. Particular attention is paid to oscillatory measures of large-scale brain processes.
Neural facilitation and amplification by affective pictures: the role of emotional arousal As indicated earlier in the chapter, one way to conceptualize emotions is to view them as action dispositions, facilitating evolutionary adaptive behaviors (Lang, 1979; Frijda, 1988). On the theoretical level, this concept has been described in terms of affective networks, linking different aspects of affective perception and behavior (e.g., Lang, 1979). Importantly, network models allow for different routes capable of initiating an affective episode, being based, for instance, on imagery, visual objects, or physiological processes (see Fig. 1). As a basis of adaptive behavior, the perception of emotionally arousing stimuli has been associated with acting efficiently and successfully in situations indexed as motivationally significant by properties of the visual scene. The prioritized selection of features representing affective arousal may occur in an automatic manner and has been referred to as ‘‘motivated attention’’ (Lang et al., 1997b). Consistent with the network perspective taken here, this behavioral advantage may arise as a consequence of optimized processing throughout the parts of the emotional network, beginning with prioritized perception of affectively relevant sensory features. Given the massively parallel and bidirectional nature of the human visual system, the sensory part of the network may in turn be subject to top-down regulation exerted by higher parts of the emotion network, coding working memory and action representations (Keil, 2004). As a consequence, the sensory representation of an emotional stimulus together with its neurophysiological underpinnings may represent a useful target for research into the spatiotemporal dynamics of affective processing. To study these phenomena, it is necessary to monitor brain dynamics at a high temporal rate. Electrocortical measures provide one avenue to study fast changes in the neuronal
221
Fig. 1. Schematic representation of an emotion network, encompassing different aspects of the affective response. Activation of the entire network can be accomplished via different routes such as the stimulus representation, the imagination of emotional episodes in memory, and physiological processes. These elements are organized in a parallel fashion and may undergo changes over time, adapting the emotional response to environmental or internal demands. Such changes may be conceptualized as changes of weights in the functional architecture of the network.
mass activity of cortical cells and are reviewed in the subsequent paragraphs.
Evidence from electrocortical measures: time-domain analyses As a robust empirical finding, passive viewing of affectively arousing, compared to nonarousing pictures, has been associated with amplification (enhancement) of electrocortical parameters (Schupp et al., 2000). Recently, neuroimaging and electrocortical data have converged to show that the extended visual cortex is a key structure demonstrating facilitation effects as a function of affective arousal (Keil et al., 2002; Bradley et al., 2003). Studies investigating modulations of the event-related potential (ERP) derived from stimuluslocked time-domain averaging of electroencephalogram (EEG) epochs recorded during picture viewing have repeatedly observed such enhancement related to visual cortex activity (Cacioppo and Gardner, 1999; Keil et al., 2002). ERP differences were particularly robust in late time segments (i.e., on ERP components such as the P3 or the slow wave), following picture onset by approximately 300 ms (Mini et al., 1996). This modulation has been shown to closely correlate with other physiological measures of affective processing such
as skin conductance or heart rate changes as well as a strong relationship with self-report questionnaire data (Cuthbert et al., 2000). Interestingly, this pattern of results has been stable across visual stimuli depicting affective scenes (Palomba et al., 1997), faces (Schupp et al., 2004), conditioned stimuli (Pizzagalli et al., 2003), and specific content such as dermatological disease (Kayser et al., 1997). Late ERP differences have been associated with activity in the extended visual cortex (Keil et al., 2002) and in consequence have been theoretically related to the concept of motivated attention, in which motivationally relevant stimuli automatically arouse attentional resources being directed to the relevant locations and features present in the field of view (Lang et al., 1997b). Given this view, it is interesting to consider electrophysiological research on instructed as opposed to ‘‘natural’’ attention to task-relevant stimulus properties. In this field, authors have demonstrated early in addition to late modulations of ERP data (Hillyard and Anllo-Vento, 1998), suggesting an early locus of several types of selective attention. For instance, attending to a particular location in space may amplify the electrocortical response as early as at the stage of the P1 component of the ERP, typically starting around 80 ms after onset of a stimulus (Martinez
222
et al., 1999). Instructing a subject to attend to a particular set of features, such as color or shape, is often associated with relative enhancement of the so-called N1 ERP component at a latency of 140–200 ms (Luck and Hillyard, 1995). In line with these findings from selective attention research, affectively arousing stimuli appear to attract attentional resources at early stages of processing, leading to relative amplification, which occurs earlier than the late potential reported above (Jungho¨fer et al., 2001; Schupp et al., 2003b). This facilitation may be manifest at the level of the P1/N1 component of the ERP, also modulating oscillatory responses in visual cortex around 80–100 ms post-stimulus (Keil, 2004). For instance, using a hemifield design with colored affective pictures, Keil et al. (2001b) reported enhancement of the N1 amplitude (160 ms poststimulus) for arousing, compared to neutral stimuli. Other authors have demonstrated similar effects of emotional content on very early ERP components (Pizzagalli et al., 2003; Smith et al., 2003), raising the question as to the mechanisms mediating such fast differential responding. In analogy to findings in the field of selective attention (Hillyard and Anllo-Vento, 1998), a ‘sensory gain’ mechanism has been hypothesized to amplify sensory processing according to the importance of the stimulus for the organism. For the case of spatial selective attention, these functions are mediated by the joint action of a widespread network of cortical and deep structures, including areas in occipital, parietal and frontal cortex as well as the striatum and superior colliculi (Mesulam, 1998). Clearly, ERP work reported so far is consistent with the involvement of such a widespread cortical network in early discrimination of features related to affective content. Confirming evidence comes from analyses of the topographical distribution in dense array ERP data, pointing to generators in higher order visual, as well as right-parietal cortices (Jungho¨fer et al., 2001; Keil et al., 2002). These effects have often been interpreted as manifestations of reentrant tuning of visual cortex, which may be mediated by deep cortical structures such as the amygdala, or by temporal, parietal, and prefrontal cortical structures, among others (Keil, 2004; Sabatinelli et al., 2005). Neuroimaging
data as reviewed in the present volume (see e.g., the chapter by Sabatinelli et al., this volume) have been supportive of this notion and have provided more precise information on the anatomical structures, which may mediate such reentrant modulation of visual cortex. The topic of early attention to affectively arousing stimuli raises questions related to the involvement of primary visual (i.e., striate) cortex. For instance, Pourtois et al. (2004) employed a visual hemifield paradigm with covert orienting to emotional faces. For fearful compared to happy faces, these authors reported enhancement of the first detectable visual ERP deflection at 90 ms after stimulus onset, likely originating in the striate cortex. They concluded that the emotional relevance of the stimuli might be associated with increased activation in the primary visual cortex, possibly due to an interaction with subcortical structures. Following on these findings, Stolarova et al. (2006) presented grating patterns to the hemifields, paired with unpleasant or neutral pictures in a delayed classical conditioning paradigm. This procedure allowed to control for the physical stimulus features of conditioned stimuli (i.e., grating patterns) and to explore the development of perceptual facilitation across blocks of trials, as a function of learning. As a main finding, this study resulted in greater amplitude of the visual ERP for threatrelated stimuli vs. neutral stimuli occurring as early as 65–90 ms following stimulus onset (i.e., the C1 component of the ERP; see Martinez et al., 1999). Interestingly, this difference appeared to increase with continuing acquisition of affective meaning, being more pronounced in a second conditioning block, compared to the first block of trials. This suggests a role of short-term cortical plasticity for early affective perception, tuning sensory cortex to amplify specific visual features that are related to motivationally relevant information in a specific context.
Evidence from electrocortical measures: frequencydomain analyses While time-domain averaging as typically employed in ERP research provides important insights into the timing of neural mass action on a
223
millisecond scale, there is also complementary information in electrocortical recordings that can be examined using frequency-domain approaches. These methods take advantage of the fact that EEG readings are oscillatory in nature, containing a wealth of characteristic rhythmic processes in any time window considered. Frequency-domain analysis techniques allow assessing two measures of oscillations at a given frequency: (i) the spectral power or amplitude, which is a measure of the magnitude of the oscillatory response and (ii) the phase, which can be considered reflecting the latency of an oscillation with respect to a reference function. Using amplitude and phase measures, one can obtain estimates of the amount of energy present at a known frequency during a given recording epoch. Moreover, it is possible to examine temporal dynamics across sensors or model dipoles, helping to elucidate the organization of functional networks that connect different areas of the cerebral cortex (Gruber et al., 2002). In particular, measures reflecting stability of phase differences between recording sites or brain areas across time may indicate epochs of large-scale synchronization occurring during specific cognitive or behavioral processes (Gevins et al., 1999). In the field of affective perception, several experimental strategies have been used to examine electrocortical oscillations in the frequency domain. Many studies have compared the properties of the frequency spectrum of EEG data during viewing of stimuli different in affective content (for a review, see Keil et al., 2001a). Early work has shown evidence of a general reduction of the alpha and enhancement of the beta frequency range as a function of emotional arousal (Tucker, 1984; Ray and Cole, 1985; Collet and Duclaux, 1987). With the advent of dense-array electrode arrays and novel time-frequency approaches to data reduction, analyses of the EEG power spectrum could be refined to yield better resolution both in the spatial and the time domains (Keil et al., 2001a). For instance, wavelet analyses of EEG epochs provide an estimate of the evolutionary spectrum, which reflects changes in different frequency bands across time (Bertrand et al., 1994). Thus, epochs of rapid variability in the oscillatory
state of the nervous system can be identified and characterized. Moreover, instances of synchronization across recording sites or model dipoles may be evaluated using wavelet algorithms (Lachaux et al., 1999; Gruber et al., 2002). In particular, the gamma band range of the temporal spectrum has attracted attention (DePascalis et al., 1987). The terms ‘gamma-band oscillations’ or ‘gamma-band activity’ (GBA) refer to those oscillations in electrophysiological recordings that lie in the higher frequency range of the temporal spectrum, typically above 20 Hz.Theoretically, these oscillation have been associated with activation of an object representation (Tallon-Baudry and Bertrand, 1999), possibly integrating different features of an object into one percept (Keil et al., 1999). Elevated GBA has been reported during viewing affective stimuli (Aftanas et al., 2004). Specifically, gamma responses were markedly enhanced in the right hemisphere as a function of affective arousal (Mu¨ller et al., 1999). Furthermore, viewing affective pictures in the visual hemifields indicated that right-hemispheric cortical areas are more sensitive to affectively arousing content than left-hemispheric sites in late time ranges between 400 and 600 ms after stimulus onset (Keil et al., 2001b). Early oscillatory responses pointed to involvement of basic visual analysis in affective modulation, showing greater amplitudes for aversive pictures. This difference was increased in the second block of trials, compared to the first (Keil et al., 2001a,b), thus paralleling the time course of early ERP modulation reported earlier (Stolarova et al., 2006). In addition to time-frequency analyses of the intrinsic oscillatory activity considered so far, macroscopic electrocortical oscillations could also be driven by extrinsic stimulation. Work employing steady-state visual evoked potentials (ssVEPs) capitalizes on the fact that the human brain responds to rapid visual stimulation at a fixed rate (i.e., flickering of a stimulus at five cycles per second or greater) with a near-sinusoidal response at the same frequency. This brain response can be extracted at the known frequency and is referred to as the ssVEP. Investigations using source analysis techniques to enhance spatial specificity of scalp signals have suggested that the ssVEP is generated in parieto-occipital cortices. Source configurations
224
might depend on the stimulation frequency, however (Mu¨ller et al., 1997). Importantly, the ssVEP reflects multiple excitations of the visual system with the same stimulus in a short amount of time. As a consequence, it is capable of reflecting changes in neural mass activities, which are related to both initial sensory processing and reentrant, top-down modulation on the basis of stimulus features and higher order processing (Silberstein et al., 1995). It has been demonstrated that the amplitude of ssVEPs is sensitive to several types of visual selective attention, showing an enhancement with increased attentional load (Morgan et al., 1996; Mu¨ller et al., 1998, 2003). In addition, amplitude and phase alterations of ssVEPs were associated with affective stimulus characteristics. It has repeatedly been observed that the ssVEP amplitude was enhanced when viewing flickering picture stimuli rated as being emotionally arousing. Presenting pleasant, neutral, and unpleasant pictures from the IAPS at a rate of 10 Hz, Keil et al. (2003) found higher ssVEP amplitude and accelerated phase for arousing, compared to calm pictures (see Fig. 2). These differences were most pronounced at central posterior as well as right parieto-temporal recording sites. In another study, the steady-state visual evoked magnetic field (ssVEF), which is the magnetocortical counterpart of the ssVEP, also varied as a function of emotional arousal (Moratti et al., 2004). Source estimation procedures pointed to involvement of parieto-frontal attention networks in arousing and directing attentional resources toward the relevant stimuli. In line with this interpretation, Kemp and collaborators found amplitude
reduction for ssVEPs elicited by flickering full-field stimulation, when concurrently presented picture stimuli were emotionally engaging as opposed to having neutral content (Kemp et al., 2002, 2004). This opposite pattern is in line with the results reported above, as the full-field flicker represents a concurrent stimulus that competes for resources with the affective pictures. Thus, engaging stimuli can be expected to draw attentional resources not available for processing of the flicker stimulus, which decreases the flicker-evoked ssVEP. Given their excellent signal-to-noise ratio, ssVEPs are a useful tool to examine changes of neural responding during acquisition of affective significance across small numbers of trials. This property is relevant for studies of classical fear conditioning in which initially neutral cues acquire signal function for unpleasant contingencies. In a series of studies, Moratti and colleagues have shown enhancement of ssVEF amplitude to stimuli predicting aversive events after a small number of learning trials. Relating these findings to concurrently measured heart rate changes, they suggested that enhancement of the visual response depends on the development of a fear response to the aversive conditioned stimulus (Moratti and Keil, 2005). Hence, with increasing heart rate acceleration in response to the aversive stimulus, the cortical response amplitude was also enhanced. This pattern of results was replicated in a second study in which all subjects were fully aware of the contingencies, reducing the variability due to attention to the fear signal (Moratti et al., 2006). In line with observations made in affective picture viewing,
Fig. 2. Steady-state VEP amplitude differences for affective pictures differing in content. Amplitude was enhanced for both pleasant and unpleasant picture content, compared with neutral content. Topographies represent a grand mean across 19 participants.
225
processing of conditioned fear stimuli activated frontal, parietal, and occipital cortical areas. Given the oscillatory nature of ssVEPs, this suggests coactivation in a common network, with structures interacting in a massively parallel manner. Ongoing studies have analyzed ssVEPs on the level of single trials, estimating the relevant parameters for epochs reflecting 6 s of single-picture viewing. Results showed surprisingly high reliabilities for ssVEP amplitude and stable topographical distributions of phase values. Interestingly, phase synchrony between brain areas, as estimated by means of a distributed source model, showed alignment of phase values between fronto-temporal and occipito-parietal regions (Keil et al., 2005b). This synchronization of phase values on the ssVEP frequency was enhanced as a function of affective arousal. Furthermore, we found a linear relationship between picture arousal as measured by skin conductance and ssVEP amplitude, being most pronounced in frontal cortical regions (Smith et al., 2005). To summarize, there is now a large body of evidence in psychophysiology, strongly suggesting that emotional picture content guides attentional resources in a rapid, and possibly automatic, fashion. Several authors have addressed this topic by manipulating affective and attentional load directed to experimental stimuli (e.g., Schupp et al., 2003a). The chapter by Harald Schupp (this volume) provides a comprehensive review of this work. Here, the specific effects of attention–emotion interactions on the ssVEP are briefly considered, as these approaches may help to elucidate the role of bottom-up and top-down processes, which act to organize the integrated functioning of affective networks. In a study that crosses the factors of selective spatial attention and affective content, Keil et al. (2005a) investigated the ssVEP amplitude and phase as a function of attention–emotion interactions. Participants silently counted randomdot targets embedded in a 10-Hz flicker of colored pictures presented to both hemifields. An increase of ssVEP amplitude was observed as an additive function of spatial attention and emotional content. Mapping the statistical parameters of this difference on the surface of the scalp, the authors found occipito-temporal and parietal activation
contralateral to the attended visual hemifield. Differences were most pronounced during selection of the left visual hemifield, at right temporal electrodes. Consistent with this finding, phase information revealed accelerated processing of aversive compared to affectively neutral pictures. Results suggest that affective stimulus properties modulate the spatio-temporal process of information propagation along the ventral stream, being associated with amplitude amplification and timing changes in posterior and fronto-temporal cortex. Importantly, even nonattended affective pictures showed electrocortical facilitation throughout these sites. In line with network perspectives of emotion, this pattern of results supports the notion that emotional content affects both initial sensory processing and stages of more complex processing, depending on visual objects’ affective and motivational significance (Lang, 1979; Bower et al., 1994). Analyses of intersite phase synchrony, which are under way, may shed further light on the spatiotemporal relationships between sensory and higher order processing (Keil et al., 2005b).
Neurobehavioral correlates of emotional language processing As outlined in the introduction of the chapter, behavioral studies have pointed to a facilitatory influence of affective content in word identification task such as the lexical decision paradigm (Williamson et al., 1991). Thus, paralleling research with affective pictures, an important question in the literature, relates to the temporal dynamics of potential facilitation for affectively arousing verbal information. As opposed to picture work, semantic information conveyed by means of verbal channels requires differential processing steps such as sublexical encoding of the language stimulus and subsequent lexical access (Ramus, 2001). Hence, there is no singleton set of features that can be associated with a specific affective content, but affective content may be resulting from — and potentially co-occurring with and determining — lexical processing (Kitayama, 1990). In the present framework, this means that activating an emotion network via lexical information is
226
different from affective object recognition regarding several aspects. (i) Depending on the familiarity of the verbal material, the symbolic operations resulting in semantic access are expected to take longer than sensory encoding of known objects in the visual scene. For instance, words having high vs. low frequency in a given language are distinguished as early as 120–160 ms post-stimulus (Assadollahi and Pulvermu¨ller, 2001). This suggests that the representation corresponding to verbal material is accessed relatively early, but is accessed later than semantics related to visual objects. (ii) The locus of facilitation by affective picture content should be early in the cascade of visual processes and might extend as early as initial visual analysis. This should affect later stages as well. Alternatively, one might predict that coding lexical information should be facilitated later, i.e., on the level of lexical analysis, which again may act back to visual analysis, but can be expected to have different spatio-temporal dynamics. The following paragraphs examine these issues in the context of the AB task with affective words, which is suitable to examine facilitation and interference at different intensities. On the electrophysiological level, the AB is compatible with recording of ssVEPs, which is another advantage for analyses focusing on frequency domain analyses, including synchrony between different cortical regions.
Affective facilitation during the AB As introduced above, spread of activation and consolidation of a stimulus representation in working memory can be investigated using the so-called AB design, which requires the selection of two targets in a rapid stream of stimuli. Usually, a first target (T1) is followed by a second target (T2), with processing of T1 being critical for occurrence of the AB (Chun and Potter, 1995; Potter et al., 1998). It has been repeatedly shown that at high rates of visual presentation (e.g., at 6 Hz and higher), T2 s being presented in an interval between 180 and 500 ms after a given T1 are reported less accurately. However, there is evidence that T2 items are perceived, and semantic information may be processed to a certain degree
(Luck et al., 1996). Specifically, detected T2 s are associated with normal electrophysiological markers of conscious identification (Kranczioch et al., 2003). Thus, early ERP components in response to missed T2 stimuli were not different from T2 s outside the blink period. In contrast, the P3 component was reduced for missed T2 s (Vogel et al., 1998). This pattern of results may be taken as evidence for a postperceptual, workingmemory-related interference process, which impairs performance for T2 s within the AB period. Interestingly, T2 stimuli carrying ‘salient’ information, such as a participant’s own name, appear to be reported more accurately, compared to lesssalient stimuli (Shapiro et al., 1997). In a series of studies with affective visual verbs, Ihssen and Keil (2004) set out to investigate postperceptual facilitation for arousing compared to neutral visual verbs during the AB. Verbal stimuli were selected on the basis of subjective ratings of their affective valence and arousal. These ratings for German verbs showed a typical distribution in a two-dimensional affective space spanned by the dimensions of valence and arousal, as has been previously reported for pictorial (Lang et al., 1997a) and auditory stimuli (Bradley and Lang, 2000). As expected, Ihssen and Keil found strong reduction of report accuracy with short T1–T2 intervals (232 ms), throughout experiments. While intermediate intervals (i.e., 464 ms) showed less impairment of verb identification than the 232 ms SOA condition, long T1–T2 distance of 696 ms was associated with almost unimpaired accuracy of report. In terms of affective modulation, results suggested identification as a function of affective arousal especially in the short-interval condition, in which performance was most impaired. Thus, motivationally or affectively relevant material was selected preferentially from a temporal stream of verbal information. This difference disappeared when affective valence was manipulated while arousal was held constant at a low level (Keil and Ihssen, 2004). Systematically manipulating T1 and T2 affective content in AB tasks with and without explicit target definition, differential effects of T1 and T2 emotional intensity emerged (Ihssen and Keil, submitted). First, emotional
227
Fig. 3. Response accuracy for the identification of second Targets (T2) in an attentional blink task with affective words. Participants (N ¼ 13) were asked to identify two green target words in a rapid stream of words shown at a rate of 8.7 words per second. With a T1–T2 stimulus onset asynchrony of 232 ms, performance was substantially impaired, compared to longer T1–T2 intervals. This impairment was less pronounced for affectively arousing T2 words.
arousal of an explicit target (both T1 and T2) defined by a relevant dimension (i.e., color) reliably facilitated conscious identification of that item (see Fig. 3). Second, arousing content of T1 words interfered with identification of the subsequent target when the rate of T2 identification was well below 100%, i.e., when the T2 task was difficult enough. Affective interference effects did not depend on an explicit target definition. For instance, interference effects were established for emotional nontarget words preceding targets in a semantic categorization task (Ihssen and Keil, submitted). Steady-state visual evoked potentials during the AB Given their sensitivity to rapidly changing oscillations directly related to experimental manipulations, steady-state potentials have been used to address different aspects of language processing. Lexical access is assumed to be effected by the integrated dynamics of various areas in the brain that may be examined by means of time-frequency
and synchrony measures (Pulvermu¨ller, 1996). In a study using steady-state auditory fields generated by an amplitude-modulated word stream at 40 Hz, Ha¨rle et al. (2004) showed that during word comprehension, there is a specific increase in largescale synchrony that encompasses frontal and parietal, in addition to perisylvian cortices. The EEG can be recorded during AB tasks, and data can be evaluated using time-frequency domain techniques. In the case of manipulating affective content of T2, enhancement of the ssVEP amplitude was observed for affectively arousing T2 verbs, beginning at around 150 ms following T2 onset (Ihssen et al., 2004). This difference was most pronounced at parietal sites, where the ssVEP amplitude was most pronounced. In addition, it was restricted to the early lag condition, paralleling behavioral facilitation effects, which were also specific to short T1–T2 intervals (Keil and Ihssen, 2004). As displayed in Fig. 4, the 8.7-Hz amplitude of the ssVEP was enhanced in a time range between 140 and 200 ms after T2 onset, whereas earlier differences were not detected. Thus, modulation of the ssVEP amplitude during the AB suggests enhanced processing of emotionally arousing T2 verbs at latencies that are consistent with lexical access (Assadollahi and Pulvermu¨ller, 2001). In line with that notion, the topography of the ssVEP showed a left occipitotemporal preponderance, compared to the right hemisphere, in which amplitudes were generally lower (see Fig. 4). Manipulating the content of T1 instead of T2, behavioral effects reported above were replicated (Ihssen et al., 2005). Thus, interference of T1 on T2 performance increased as a function of affective arousal. In contrast to T2 facilitation, T1 interference was present across T1–T2 intervals. Preliminary electrophysiological results in nine participants indicated that behavioral interference was paralleled by amplitude reduction in the time range, in which facilitation for affectively arousing T2 s was observed. Again, topographical distributions suggest that this effect is related to differential lexical access for affectively arousing, compared to calm words. This interpretation is corroborated by analyses of phase synchrony, which allow evaluating epochs of synchronization
228
Fig. 4. Steady-state VEP amplitude differences for second targets in an attentional blink task, differing in content. The splineinterpolated topographies shown here cover a time range between 140 and 200 ms following onset of the T2 stimulus. Amplitude is enhanced for both pleasant and unpleasant word content, compared with neutral content. Topographies represent a grand mean across nine participants.
between occipito-parietal and fronto-temporal regions. These ongoing analyses have revealed enhanced coupling across the left hemisphere at the steady-state frequency, which might reflect facilitated integration within the semantic network representing the affective content.
Learning to associate language stimuli with affective content One conclusion from the data presented so far is that spatial and temporal dynamics of affective stimulus processing depend on the channel employed. Affective picture perception involves early sensory cortical processes as well as higher order stages. Affective information as conveyed via lexical channels in contrast seems to affect the earliest steps in lexical access, which may be facilitated as a function of emotional arousal. The question arises, whether learned significance as associated with simple language stimuli may lead to early sensory amplification as well. In fact, supportive evidence has recently been suggested by a classical conditioning experiment involving synthesized stop consonant–vowel syllables (Heim and Keil, in press). Here, aversive white noise (unconditioned stimulus, US) was paired with two /ba/ syllables (conditioned stimulus, CS+). Two /da/ syllables served as the CS- and indicated absence of the noxious US. The EEG was recorded while healthy volunteers passively listened to the
stimuli. ERP data revealed amplitude modulations of a late component, the N2, as a function of stimulus properties (i.e., CS+ vs. CS). Over righthemispheric sensors, negativity pertinent to the latency range of about 250 and 310 ms was specifically enhanced for the CS+ during intermittent aversive conditioning. During extinction trials, this conditioning effect was paralleled by gamma-band (18–40 Hz) oscillations occurring as early as 80–120 ms after stimulus onset. Thus, already early stages of syllable processing may be affected by the motivational significance of a given language stimulus, if this stimulus gains affective relevance by contingent pairing with external events. This would imply a process different from facilitation related to the semantic content or meaning of a language stimulus.
Conclusions Taken together, the selective review presented above suggests that both picture and word stimuli are processed in a facilitated manner, if their content is emotionally arousing. There are important differences in the spatial and temporal dynamics, however, as indicated by their large-scale oscillatory correlates. In the laboratory, very early sensory modulation appears for stimuli with salient features that have been experienced multiple times. The more immediate relationship of pictures to natural objects allows for a fast activation of
229
object representations on the basis of characteristic features, even with short presentation times. Affective picture viewing typically shows more similarities with the real-world situation and is related to higher physiological arousal, compared to linguistic stimuli. Affective arousing content of words seems to facilitate lexical access and subsequent processing such as consolidation in working memory. When paired with distinct external affective consequences, features of words, however, may acquire more direct access to affective networks, acting more rapidly to enable fast visual analysis. Both pictures and words thus underlie dynamic changes in affective processing, showing dependencies of the context, internal states of the individual, and previous experience. Dynamic network perspectives on affective processing are capable of accounting for these findings. They allow generating hypotheses for future work investigating the dynamic and plastic properties of affective processing. References Aftanas, L.I., Reva, N.V., Varlamov, A.A., Pavlov, S.V. and Makhnev, V.P. (2004) Analysis of evoked EEG synchronization and desynchronization in conditions of emotional activation in humans: temporal and topographic characteristics. Neurosci. Behav. Physiol., 34: 859–867. Altarriba, J. and Bauer, L.M. (2004) The distinctiveness of emotion concepts: a comparison between emotion, abstract, and concrete words. Am. J. Psychol., 117: 389–410. Anderson, A.K. (2005) Affective influences on the attentional dynamics supporting awareness. J. Exp. Psychol. Gen., 134: 258–281. Anderson, A.K. and Phelps, E.A. (2001) Lesions of the human amygdala impair enhanced perception of emotionally salient events. Nature, 411: 305–309. Anes, M.D. and Kruer, J.L. (2004) Investigating hemispheric specialization in a novel face-word Stroop task. Brain Lang., 89: 136–141. Assadollahi, R. and Pulvermu¨ller, F. (2001) Neuromagnetic evidence for early access to cognitive representations. Neuroreport, 12: 207–213. Bertrand, O., Bohorquez, J. and Pernier, J. (1994) Time– frequency digital filtering based on an invertible wavelet transform: an application to evoked potentials. IEEE Trans. Biomed. Eng., 41: 77–88. Bower, G.H., Lazarus, R., LeDoux, J.E., Panksepp, J., Davidson, R.J. and Ekman, P. (1994) What is the relation between emotion and memory? In: Ekman, R.J.D.P. (Ed.), The Nature of Emotion: Fundamental Questions. Series in
Affective Science. Oxford University Press, New York, NY, USA, pp. 301–318. Bradley, B.P., Mogg, K., Millar, N. and White, J. (1995) Selective processing of negative information: effects of clinical anxiety, concurrent depression, and awareness. J. Abnorm. Psychol., 104: 532–536. Bradley, M.M. and Lang, P.J. (2000) Affective reactions to acoustic stimuli. Psychophysiology, 37: 204–215. Bradley, M.M., Sabatinelli, D., Lang, P.J., Fitzsimmons, J.R., King, W. and Desai, P. (2003) Activation of the visual cortex in motivated attention. Behav. Neurosci., 117: 369–380. Cacioppo, J.T. and Gardner, W.L. (1999) Emotion. Annu. Rev. Psychol., 50: 191–214. Calvo, M.G. and Castillo, M.D. (2005) Foveal vs. parafoveal attention-grabbing power of threat-related information. Exp. Psychol., 52: 150–162. Carroll, N.C. and Young, A.W. (2005) Priming of emotion recognition. Q. J. Exp. Psychol. A, 58: 1173–1197. Castaneda, A. (1956) Reaction time and response amplitude as a function of anxiety and stimulus intensity. J. Abnorm. Psychol., 53: 225–228. Chun, M.M. and Potter, M.C. (1995) A two-stage model for multiple target detection in rapid serial visual presentation. J. Exp. Psychol. Hum. Percept. Perform., 21: 109–127. Collet, L. and Duclaux, R. (1987) Hemispheric lateralization of emotions: absence of electrophysiological arguments. Physiol. Behav., 40: 215–220. Cuthbert, B.N., Schupp, H.T., Bradley, M.M., Birbaumer, N. and Lang, P.J. (2000) Brain potentials in affective picture processing: covariation with autonomic arousal and affective report. Biol. Psychol., 52: 95–111. DePascalis, V., Marucci, F.S., Penna, P.M. and Pessa, E. (1987) Hemispheric activity of 40 Hz EEG during recall of emotional events: differences between low and high hypnotizables. Int. J. Psychophysiol., 5: 167–180. Frijda, N.H. (1988) The laws of emotion. Am. Psychol., 43: 349–358. Gevins, A., Smith, M.E., McEvoy, L.K., Leong, H. and Le, J. (1999) Electroencephalographic imaging of higher brain function. Philos. Trans. R. Soc. Lond. B. Biol. Sci., 354: 1125–1133. Gruber, T., Muller, M.M. and Keil, A. (2002) Modulation of induced gamma band responses in a perceptual learning task in the human EEG. J. Cogn. Neurosci., 14: 732–744. Hansen, C.H. and Hansen, R.D. (1988) Finding the face in the crowd: an anger superiority effect. J. Pers. Soc. Psychol., 54: 917–924. Harris, C.R. and Pashler, H. (2004) Attention and the processing of emotional words and names: not so special after all. Psychol. Sci., 15: 171–178. Harris, C.R., Pashler, H.E. and Coburn, N. (2004) Moray revisited: high–priority affective stimuli and visual search. Q. J. Exp. Psychol. A, 57: 1–31. Hartikainen, K.M., Ogawa, K.H. and Knight, R.T. (2000) Transient interference of right hemispheric function due to automatic emotional processing. Neuropsychologia, 38: 1576–1580.
230 Heim, S. and Keil, A. (in press) Effects of classical conditioning on identification and cortical processing of speech syllables. Exp. Brain Res., DOI: 10.1007/s00221-006-0560-1. Hillyard, S.A. and Anllo-Vento, L. (1998) Event-related brain potentials in the study of visual selective attention. Proc. Natl. Acad. Sci. USA, 95: 781–787. Ha¨rle, M., Rockstroh, B.S., Keil, A., Wienbruch, C. and Elbert, T.R. (2004) Mapping the brain’s orchestration during speech comprehension: task-specific facilitation of regional synchrony in neural networks. BMC Neurosci., 5: 40. Ihssen, N., Heim, S. and Keil, A. (2005) Electrocortical correlates of resource allocation during the attentional blink. Psychophysiology, 42: S67. Ihssen, N. and Keil, A. (submitted) Affective facilitation and inhibition of conscious identification: an investigation with the attentional blink. Ihssen, N., Heim, S. and Keil, A. (2004) Emotional arousal of verbal stimuli modulates the attentional blink. Psychophysiology, 41: S36. Jungho¨fer, M., Bradley, M.M., Elbert, T.R. and Lang, P.J. (2001) Fleeting images: a new look at early emotion discrimination. Psychophysiology, 38: 175–178. Kamin, L.J. (1955) Relations between discrimination, apparatus stress, and the Taylor scale. J. Abnorm. Psychol., 51: 595–599. Kayser, J., Tenke, C., Nordby, H., Hammerborg, D., Hugdahl, K. and Erdmann, G. (1997) Event-related potential (ERP) asymmetries to emotional stimuli in a visual half-field paradigm. Psychophysiology, 34: 414–426. Keil, A. (2004) The role of human prefrontal cortex in motivated perception and behavior: a macroscopic perspective. In: Otani, S. (Ed.), Prefrontal Cortex: From Synaptic Plasticity to Cognition. Kluwer, New York. Keil, A., Bradley, M.M., Hauk, O., Rockstroh, B., Elbert, T. and Lang, P.J. (2002) Large-scale neural correlates of affective picture processing. Psychophysiology, 39: 641–649. Keil, A., Gruber, T. and Muller, M.M. (2001a) Functional correlates of macroscopic high-frequency brain activity in the human visual system. Neurosci. Biobehav. Rev., 25: 527–534. Keil, A., Gruber, T., Muller, M.M., Moratti, S., Stolarova, M., Bradley, M.M. and Lang, P.J. (2003) Early modulation of visual perception by emotional arousal: evidence from steady-state visual evoked brain potentials. Cogn. Affect. Behav. Neurosci., 3: 195–206. Keil, A. and Ihssen, N. (2004) Identification facilitation for emotionally arousing verbs during the attentional blink. Emotion, 4: 23–35. Keil, A., Moratti, S., Sabatinelli, D., Bradley, M.M. and Lang, P.J. (2005a) Additive effects of emotional content and spatial selective attention on electrocortical facilitation. Cereb. Cortex, 15: 1187–1197. Keil, A., Mu¨ller, M.M., Gruber, T., Wienbruch, C., Stolarova, M. and Elbert, T. (2001b) Effects of emotional arousal in the cerebral hemispheres: a study of oscillatory brain activity and event-related potentials. Clin. Neurophysiol., 112: 2057–2068.
Keil, A., Mu¨ller, M.M., Ray, W.J., Gruber, T. and Elbert, T. (1999) Human gamma band activity and perception of a gestalt. J. Neurosci., 19: 7152–7161. Keil, A., Smith, J.C., Wangelin, B.C., Sabatinelli, D., Bradley, M.M. and Lang, P.J. (2005b) Source space analysis of single EEG epochs in the frequency domain: an application for steady-state potentials. Psychophysiology, 42: S73. Kemp, A.H., Gray, M.A., Eide, P., Silberstein, R.B. and Nathan, P.J. (2002) Steady-state visually evoked potential topography during processing of emotional valence in healthy subjects. Neuroimage, 17: 1684–1692. Kemp, A.H., Gray, M.A., Silberstein, R.B., Armstrong, S.M. and Nathan, P.J. (2004) Augmentation of serotonin enhances pleasant and suppresses unpleasant cortical electrophysiological responses to visual emotional stimuli in humans. Neuroimage, 22: 1084–1096. Kitayama, S. (1990) Interaction between affect and cognition in word perception. J. Pers. Soc. Psychol., 58: 209–217. Kranczioch, C., Debener, S. and Engel, A.K. (2003) Eventrelated potential correlates of the attentional blink phenomenon. Brain Res. Cogn. Brain Res., 17: 177–187. Lachaux, J.P., Rodriguez, E., Martinerie, J. and Varela, F.J. (1999) Measuring phase synchrony in brain signals. Hum. Brain Mapp., 8: 194–208. Lang, P.J. (1979) A bioinformational theory of emotional imagery. Psychophysiology, 16: 495–512. Lang, P.J. (1994) The motivational organization of emotion: affect-reflex connections. In: Van Goozen, S.H.M., Van de Poll, N.E. and Sergeant, J.E. (Eds.), Emotions: Essays on Emotion Theory. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 61–93. Lang, P.J., Bradley, M.M. and Cuthbert, B.N. (1997a) International Affective Picture System (IAPS): Technical Manual and Affective Ratings. The Center for Research in Psychophysiology, University of Florida, Gainesville. Lang, P.J., Bradley, M.M. and Cuthbert, B.N. (1997b) Motivated attention: affect, activation, and action. In: Lang, P.J., Simons, R.F. and Balaban, M.T. (Eds.), Attention and Orienting: Sensory and Motivational Processes. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 97–135. Lang, P.J., Bradley, M.M. and Cuthbert, B.N. (1998) Emotion, motivation, and anxiety: brain mechanisms and psychophysiology. Biol. Psychiatry, 44: 1248–1263. Lang, P.J., Bradley, M.M. and Cuthbert, B.N. (1999) International Affective Picture System (IAPS): Instruction Manual and Affective Ratings, Technical Report. The Center for Research in Psychophysiology, University of Florida, Gainesville, FL. Lehr, D.J., Bergum, B.O. and Standing, T.E. (1966) Response latency as a function of stimulus affect and presentation order. Percept. Mot. Skills, 23: 1111–1116. Leppa¨nen, J.M. and Hietanen, J.K. (2003) Affect and face perception: odors modulate the recognition advantage of happy faces. Emotion, 3: 315–326. Leppa¨nen, J.M. and Hietanen, J.K. (2004) Positive facial expressions are recognized faster than negative facial expressions, but why? Psychol. Res., 69: 22–29.
231 Luck, S.J. and Hillyard, S.A. (1995) The role of attention in feature detection and conjunction discrimination: an electrophysiological analysis. Int. J. Neurosci., 80: 281–297. Luck, S.J., Vogel, E.K. and Shapiro, K.L. (1996) Word meanings can be accessed but not reported during the attentional blink. Nature, 383: 616–618. Lundqvist, D., Flykt, A. and O¨hman, A. (1998) Karolinska Directed Emotional Faces. Dept. of Neurosciences, Karolinska Hospital, Stockholm. Martinez, A., Anllo-Vento, L., Sereno, M.I., Frank, L.R., Buxton, R.B., Dubowitz, D.J., Wong, E.C., Hinrichs, H., Heinze, H.J. and Hillyard, S.A. (1999) Involvement of striate and extrastriate visual cortical areas in spatial attention. Nat. Neurosci., 2: 364–369. Megibow, M.M. and De Jamaer, K.A. (1979) Hemispheric equality in reaction times to emotional and non-emotional nouns. Percept. Mot. Skills, 49: 643–647. Mesulam, M.M. (1998) From sensation to cognition. Brain, 121: 1013–1052. Mini, A., Palomba, D., Angrilli, A. and Bravi, S. (1996) Emotional information processing and visual evoked brain potentials. Percept. Mot. Skills, 83: 143–152. Mogg, K., Mathews, A. and Weinman, J. (1989) Selective processing of threat cues in anxiety states: a replication. Behav. Res. Ther., 27: 317–323. Moratti, S. and Keil, A. (2005) Cortical activation during Pavlovian fear conditioning depends on heart rate response patterns: an MEG study. Brain Res. Cogn. Brain Res., 25: 459–471. Moratti, S., Keil, A. and Miller, G.A. (2006) Fear but not awareness predicts enhanced sensory processing in fear conditioning. Psychophysiology, 43: 216–226. Moratti, S., Keil, A. and Stolarova, M. (2004) Motivated attention in emotional picture processing is reflected by activity modulation in cortical attention networks. Neuroimage, 21: 954–964. Morgan, S.T., Hansen, J.C. and Hillyard, S.A. (1996) Selective attention to stimulus location modulates the steady-state visual evoked potential. Proc. Natl. Acad. Sci. USA, 93: 4770–4774. Most, S.B., Chun, M.M., Widders, D.M. and Zald, D.H. (2005) Attentional rubbernecking: Cognitive control and personality in emotion-induced blindness. Psychon. Bull. Rev., 12: 654–661. Mu¨ller, M.M., Keil, A., Gruber, T. and Elbert, T. (1999) Processing of affective pictures modulates right-hemispheric gamma band EEG activity. Clin. Neurophysiol., 110: 1913–1920. Mu¨ller, M.M., Malinowski, P., Gruber, T. and Hillyard, S.A. (2003) Sustained division of the attentional spotlight. Nature, 424: 309–312. Mu¨ller, M.M., Picton, T.W., Valdes-Sosa, P., Riera, J., TederSalejarvi, W.A. and Hillyard, S.A. (1998) Effects of spatial selective attention on the steady-state visual evoked potential in the 20–28 Hz range. Brain Res. Cogn. Brain Res., 6: 249–261. Mu¨ller, M.M., Teder, W. and Hillyard, S.A. (1997) Magnetoencephalographic recording of steady-state visual evoked cortical activity. Brain Topogr., 9: 163–168.
Neumann, R. and Strack, F. (2000) Approach and avoidance: the influence of proprioceptive and exteroceptive cues on encoding of affective information. J. Pers. Soc. Psychol., 79: 39–48. O¨hman, A., Flykt, A. and Esteves, F. (2001a) Emotion drives attention: detecting the snake in the grass. J. Exp. Psychol. Gen., 130: 466–478. O¨hman, A., Lundqvist, D. and Esteves, F. (2001b) The face in the crowd revisited: a threat advantage with schematic stimuli. J. Pers. Soc. Psychol., 80: 381–396. Palomba, D., Angrilli, A. and Mini, A. (1997) Visual evoked potentials, heart rate responses and memory to emotional pictorial stimuli. Int. J. Psychophysiol., 27: 55–67. Pizzagalli, D.A., Greischar, L.L. and Davidson, R.J. (2003) Spatio-temporal dynamics of brain mechanisms in aversive classical conditioning: high-density event-related potential and brain electrical tomography analyses. Neuropsychologia, 41: 184–194. Potter, M.C., Chun, M.M., Banks, B.S. and Muckenhoupt, M. (1998) Two attentional deficits in serial target search: the visual attentional blink and an amodal task-switch deficit. J. Exp. Psychol. Learn. Mem. Cogn., 24: 979–992. Pourtois, G., Grandjean, D., Sander, D. and Vuilleumier, P. (2004) Electrophysiological correlates of rapid spatial orienting towards fearful faces. Cereb. Cortex, 14: 619–633. Pulvermu¨ller, F. (1996) Hebb’s concept of cell assemblies and the psychophysiology of word processing. Psychophysiology, 33: 317–333. Ramus, F. (2001) Outstanding questions about phonological processing in dyslexia. Dyslexia, 7: 197–216. Ray, W.J. and Cole, H.W. (1985) EEG alpha activity reflects attentional demands, and beta activity reflects emotional and cognitive processes. Science, 228: 750–752. Raymond, J.E., Shapiro, K.L. and Arnell, K.M. (1992) Temporary suppression of visual processing in an RSVP task: an attentional blink? J. Exp. Psychol. Hum. Percept. Perform., 18: 849–860. Richards, A. and Blanchette, I. (2004) Independent manipulation of emotion in an emotional stroop task using classical conditioning. Emotion, 4: 275–281. Sabatinelli, D., Bradley, M.M., Fitzsimmons, J.R. and Lang, P.J. (2005) Parallel amygdala and inferotemporal activation reflect emotional intensity and fear relevance. Neuroimage, 24: 1265–1270. Schimmack, U. and Derryberry, D. (2005) Attentional interference effects of emotional pictures: threat, negativity, or arousal? Emotion, 5: 55–66. Schupp, H.T., Cuthbert, B.N., Bradley, M.M., Cacioppo, J.T., Ito, T. and Lang, P.J. (2000) Affective picture processing: the late positive potential is modulated by motivational relevance. Psychophysiology, 37: 257–261. Schupp, H.T., Junghofer, M., Weike, A.I. and Hamm, A.O. (2003a) Attention and emotion: an ERP analysis of facilitated emotional stimulus processing. Neuroreport, 14: 1107–1110. Schupp, H.T., Junghofer, M., Weike, A.I. and Hamm, A.O. (2003b) Emotional facilitation of sensory processing in the visual cortex. Psychol. Sci., 14: 7–13.
232 Schupp, H.T., Ohman, A., Junghofer, M., Weike, A.I., Stockburger, J. and Hamm, A.O. (2004) The facilitated processing of threatening faces: an ERP analysis. Emotion, 4: 189–200. Shapiro, K.L., Caldwell, J. and Sorensen, R.E. (1997) Personal names and the attentional blink: a visual ‘‘cocktail party’’ effect. J. Exp. Psychol. Hum. Percept. Perform., 23: 504–514. Silberstein, R.B., Ciorciari, J. and Pipingas, A. (1995) Steady-state visually evoked potential topography during the Wisconsin card sorting test. Electroencephalogr. Clin. Neurophysiol., 96: 24–35. Sloan, D.M., Bradley, M.M., Dimoulas, E. and Lang, P.J. (2002) Looking at facial expressions: dysphoria and facial EMG. Biol. Psychol., 60: 79–90. Smith, J.C., Keil, A., Wangelin, B.C., Sabatinelli, D. and Lang, P.J. (2005) Single-trial analyses of steady-state visual potentials: effects of emotional arousal. Psychophysiology, 42: S116. Smith, N.K., Cacioppo, J.T., Larsen, J.T. and Chartrand, T.L. (2003) May I have your attention, please: electrocortical responses to positive and negative stimuli. Neuropsychologia, 41: 171–183.
Stolarova, M., Keil, A. and Moratti, S. (2006) Modulation of the C1 visual event-related component by conditioned stimuli: evidence for sensory plasticity in early affective perception. Cereb. Cortex, 16: 876–887. Tallon-Baudry, C. and Bertrand, O. (1999) Oscillatory gamma activity in humans and its role in object representation. Trends Cogn. Sci., 3: 151–162. Tucker, D.M. (1984) Lateral brain function in normal and disordered emotion: interpreting electroencephalographic evidence. Biol. Psychol., 19: 219–235. Vogel, E.K., Luck, S.J. and Shapiro, K.L. (1998) Electrophysiological evidence for a postperceptual locus of suppression during the attentional blink. J. Exp. Psychol. Hum. Percept. Perform., 24: 1656–1674. Watts, F.N., McKenna, F.P., Sharrock, R. and Trezise, L. (1986) Colour naming of phobia-related words. Br. J. Psychol., 77(Pt 1): 97–108. Wedding, D. and Stalans, L. (1985) Hemispheric differences in the perception of positive and negative faces. Int. J. Neurosci., 27: 277–281. Williamson, S., Harpur, T.J. and Hare, R.D. (1991) Abnormal processing of affective words by psychopaths. Psychophysiology, 28: 260–273.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 12
Intonation as an interface between language and affect Didier Grandjean, Tanja Ba¨nziger and Klaus R. Scherer Swiss Center for Affective Sciences, University of Geneva, 7 rue des Battoirs, 1205 Geneva, Switzerland
Abstract: The vocal expression of human emotions is embedded within language and the study of intonation has to take into account two interacting levels of information — emotional and semantic meaning. In addition to the discussion of this dual coding system, an extension of Brunswik’s lens model is proposed. This model includes the influences of conventions, norms, and display rules (pull effects) and psychobiological mechanisms (push effects) on emotional vocalizations produced by the speaker (encoding) and the reciprocal influences of these two aspects on attributions made by the listener (decoding), allowing the dissociation and systematic study of the production and perception of intonation. Three empirical studies are described as examples of possibilities of dissociating these different phenomena at the behavioral and neurological levels in the study of intonation. Keywords: prosody; intonation; emotion; attention; linguistic; affect; brain imagery Emotions are defined as episodes of massive, synchronous recruitment of mental and somatic resources to adapt to or cope with stimulus events that are subjectively appraised as being highly pertinent for an individual and involve strong mobilization of the autonomous and somatic nervous systems (Scherer, 2001; Sander et al., 2005). The patterns of activations created in those systems will be reflected, generally, in expressive behavior and, specifically, will have a powerful impact on the production of vocal expressions. Darwin (1998/1872) observed that in many species, voice is moreover exploited as an iconic affective signalling device. Vocalizations, as well as other emotional expressions, are functionally used by conspecifics and sometimes even members of other species to make inferences about the emotional/ motivational state of the sender.
The coding of information in prosody One of the most interesting issues in vocal affect signalling is the evolutionary continuity between animal vocalizations of motivational/emotional states and human prosody. Morton (1982) proposed universal ‘‘motivational-structural rules’’ in an attempt to understand the relationship between fear and aggression in animal vocalizations. Morton tried to systematize the role of fundamental frequency, energy, and quality (texture) of vocalization for the signalling of aggressive anger and fear. The major differences between these characteristics of anger and fear are the trajectory of the contours of the respective sounds and the roughness or the ‘‘thickness’’ of the sounds. For instance, the ‘‘fear endpoint’’ (no aggression) is characterized by continuous high pitch compared with low pitch and roughness at the ‘‘aggressive endpoint’’ (see Fig. 1). Between these two extremes, Morton described variations corresponding to mixed emotions characterized by different combinations of the basic features.
Corresponding author. Tel.: +41223799213; Fax: +41223799844; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56012-1
235
236
Fig. 1. Motivational-structural rules (Morton, 1982). Modelling the modifications of F0 and energy of vocalizations signalling aggression (horizontal axis) or fear/appeasement (vertical axis).
This kind of iconic affective signalling device, common to most mammals, has changed in the course of human evolution in the sense that it became the carrier signal for language. This communication system is discretely and arbitrarily coded and uses the human voice and the acoustic variability it affords to build the units of the language code. This code has been superimposed on the affect signalling system, which continues to be used in human communication, for example, in affect bursts or interjections (Scherer, 1985). Many aspects of the primitive affect signalling system have been integrated into the language code via prosody. It is not only that the two codes basically coexist, but also that speech prosody has integrated some of the same sound features as described in Morton’s examples. These vocal productions are not pure, but they are shaped by both physiological reactions and cultural conventions. Wundt (1900)
has suggested the notion of ‘‘domestication of affect sounds’’ constituted by this mixture of natural and cultural aspects. It is particularly important to distinguish ‘‘push effects’’ and ‘‘pull effects’’ (Scherer, 1986). The push effects represent the effect of underlying psychobiological mechanisms, for instance, the increase of arousal that increases muscle tension and thereby raises fundamental frequency (F0). The equivalence of ‘‘push’’ on the attribution side is some kind of schematic recognition, which is probably also largely innate. These psychobiological effects are complemented by ‘‘pull effects,’’ that is, conventions, norms, and display rules that pull the voice in certain directions. The result is a ‘‘dual code,’’ which means that, at the same time, several dimensions are combined to produce speech. This complex patterning makes it difficult to understand which features are specific to particular emotions in the acoustic signal.
237
Scherer et al. (1984) have proposed distinguishing coding via covariation (continuous affect signalling), determined mostly by biopsychological factors and configuration (discrete message types), shaped by linguistic and sociocultural factors. This distinction is very important but unfortunately often neglected in the literature. Given the complex interaction of different determinants in producing intonation patterns, it is of particular importance to clearly distinguish between production or encoding and perception or decoding of intonational messages. Brunswik’s ‘‘lens model’’ allows one to combine the study of encoding and decoding of affect signals (Brunswik, 1956). This model (see Fig. 2) presumes that emotion is expressed by distal indicator cues (i.e., acoustic parameters), which can be extracted from the acoustic waveforms. These objective distal cues are perceived by human listeners who, on the basis of their proximal percept, make a subjective attribution of what underlying emotion is expressed by the speaker, often influenced by the context. This model allows one to clearly distinguish between the expression (or encoding) of emotion on the sender side, the transmission of the sound, and the impression (or decoding) on the receiver side,
resulting in emotion inference. The model encourages voice researchers to measure the complete communicative process, including (a) the emotional state expressed, (b) the acoustically measured voice cues, (c) the perceptual judgments of voice cues, and (d) the process that integrates all cues into a judgment of the encoded emotion. Figure 2 further illustrates that both psychobiological mechanisms (push effects) and social norms (pull effects) will influence the expression (encoding) of vocal expressions and, conversely, will be transposed into rules used for impression forming (decoding). The patterns of intonation produced during a social interaction are strongly influenced by conventions shaped by different social and cultural rules (Ishii et al., 2003). Moreover, for tone languages, different tones may convey specific semantic meaning (Salzmann, 1993). In these cases, tones with semantic functions interact in a complex manner with emotional influences. It is very important to keep apart these different aspects because they are governed by very different principles (Scherer et al., 2003). Given that the underlying biological processes are likely to be dependent on both the idiosyncratic
Fig. 2. Adaptation of Brunswik’s lens model, including the influences of conventions, norms, and display rules (pull effects) and psychobiological mechanisms (push effects) on emotional vocalizations produced by the speaker (encoding) and the reciprocal influences of these two aspects on attributions made by the listener (decoding).
238
nature of the individual and the specific nature of the situation, relatively strong interindividual differences in the expressive patterns will result from push effects. Conversely, for pull effects, a very high degree of symbolization and conventionalization, and thus comparatively few and small individual differences, are expected. With respect to cross-cultural comparison, one would expect the opposite: Very few differences between cultures for push effects and large differences for pull effects.
What is the role of intonation in vocal affect communication? First, we propose defining different terms related to intonation to clarify the concepts. We suggest that the term ‘‘prosody’’ refers to all suprasegmental changes in the course of a spoken utterance — intonation, amplitude, envelope, tempo, rhythm, and voice quality. We talk about intonation as the contour of F0 in the utterance. The amplitude envelope is determined by the contour of acoustic energy variations over the utterance and will most of the time be correlated with F0. The tempo consists of the number of phonemic segments per time unit. The rhythm corresponds to the structure of F0 accents, amplitude peaks, and pausing distribution in the utterance. Finally, voice quality is defined by the distribution of energy in the spectrum as produced by different phonation modes. The modifications of the vocal tract characteristics, for example, the overall
muscle tension, influence not only the source (namely, the vocal cords), but also the resonances produced by the modifications of the vocal tract shapes. Spectral energy distribution is influenced by the vocal tract of the speaker and contributes to the possibilities of identifying an individual by the sound of the voice. Recently, a study conducted by Ghazanfar (2005) has shown that monkeys are able to attribute the size of the body from the vocal production of a conspecific. Spectral energy distribution is also modified by emotional processes, with the relative amount of energy in highand low-frequency bands changing for different emotions. For example, the voice quality can be modified in terms of roughness or sharpness during an emotional process, information that can be used by the listener to infer the emotional state of the interlocutor. It is very important to take these aspects into account when we want to understand the organism’s ability to attribute emotional states to conspecifics and even to other species. Different features of prosody might be coded differentially in terms of the distinction mentioned above between continuous or discrete types of coding. Thus, Scherer et al. (1984) have shown that F0 is coded continuously, whereas many intonation contour shapes are coded configurationally. On the left panel of Fig. 3, the pitch range of an utterance is continuously varied using copy synthesis (following the covariation principle); with wide pitch range, there is a continuous increase in the amount of emotional content perceived in the utterance. On the right panel of
Fig. 3. Effects of F0 range manipulation (covariation principle, left panel) and different contour shapes (configuration principle) for ‘‘Wh’’ and ‘‘Y/N’’ questions (right panel) on the perception of emotion.
239
Fig. 3, configuration effects are demonstrated; whether the utterance is perceived as challenging or not depends on whether final fall or final rise is used. This depends also on the context: It makes a major difference if a ‘‘Wh’’ question is pronounced (‘‘Where are you going’’) with a final fall that is not really challenging versus a final rise. The effect depends on configuration features rather than continuous covariation. Several distinctions are crucial and should be addressed or at least taken into account in future research related to vocal emotional communication. The distinction between coding based on an ancient affect signalling system and coding based on language should be systematically taken into account, particularly with respect to the distinction between covariation and configuration principles (see also Scherer, 2003). Because of the possibility that different coding principles underlie different features of prosody, we also need to distinguish very carefully between intonation as a phenomenon as carried by F0, tempo, amplitude, and voice quality; all of these aspects interact but it is possible to pull them apart. We also need to distinguish between push and pull effects; in other words, the question as to what extent there are differences in physiological push, producing certain vocal effects, versus the effects produced essentially by the speaker, modifying the command structure for the vocalizations on the basis of templates given by linguistic or cultural schema (pull). Finally, we need to distinguish much more carefully between encoding and decoding mechanisms, how the sender encodes both push and pull effects for a certain affect, and what processes are used to decode the production influenced by emotional processes. The ideal way to understand these two different processes is to study them jointly as suggested by the Brunswikian lens model (Brunswik, 1956). In the following sections, we present three different approaches to study and understand the encoding and decoding processes at different levels, using the conceptual distinctions explained in the above paragraphs. The first approach is focused on the question of the specificity of contour intonation for different emotions. The second exemplifies a study addressing the decoding
processes at the central nervous system (CNS) by electroencephalography (EEG). Finally, the relationship between emotion and attention in prosodic decoding is addressed in a study using functional magnetic resonance imagery (fMRI).
Are the emotions specific intonation contours? The question of how specifically intonation codes emotion has been addressed in a study investigating the contribution of intonation to the vocal communication of emotions (Ba¨nziger and Scherer, 2005). Intonation is defined in this context as pitch (or F0) fluctuations over time. Other prosodic aspects such as rhythm, tempo, or loudness fluctuations were not included in this study. Along the lines of the distinction outlined by Scherer et al. (1984), authors from different research backgrounds independently postulated that (a) specific configurations of pitch patterns (pitch contours) reflect and communicate specific emotional states (e.g., Fonagy and Magdics, 1963) and (b) continuous variation of pitch features (such as pitch level or pitch range) reflect and communicate features of emotional reactions, such as emotional arousal (e.g., Pakosz, 1983). Evidence supporting the first claim (existence of emotion-specific pitch contours) consists mostly of selected examples rather than of empirical examination of emotional speech recordings. On the other hand, efforts to describe/analyze the intonation of actual emotional expressions have been limited by the use of simplified descriptors, such as measures of overall pitch level, pitch range, or overall rise/fall of pitch contours. This line of research established that a number of acoustic features — such as F0 mean or range, intensity mean or range, and speech rate — vary continuously with emotional arousal (a review of this work is described in Scherer, 2003). It is far less clear as to what extent specific F0 contours can be associated with different emotions, especially independently of linguistic content. To examine this issue, quantifiable and comparable descriptions of F0 contours are needed. The study we describe used a simple procedure to stylize F0 contours for emotional expressions.
240
Fig. 4. Stylization example for an instance of (low aroused) ‘‘happiness’’ with sequence 1 (‘‘ha¨t san dig prong nju ven tsi’’).
The stylization (see Fig. 4) was applied to 144 emotional expressions (sampled from a larger set of emotional expressions described in detail by Banse and Scherer, 1996). Expressions produced by nine actors who pronounced two sequences of seven syllables (1. ‘‘ha¨t san dig prong nju ven tsi’’; 2. ‘‘fi go¨tt laich jean kill gos terr’’) and expressed eight emotions were used in this study. Two instances of ‘‘fear,’’ ‘‘happiness,’’ ‘‘anger,’’ and ‘‘sadness,’’ with ‘‘low arousal’’ (labeled: ‘‘anxiety,’’ ‘‘happiness,’’ ‘‘cold anger,’’ and ‘‘sadness’’) and ‘‘high arousal’’ (labeled: ‘‘panic fear,’’ ‘‘elation,’’ ‘‘hot anger,’’ ‘‘despair’’) were included in this study. Ten key points were identified for each F0 contour. The first point (‘‘start’’) corresponds to the first F0 point detected for the first voiced section in each expression. This point is measured on the syllable ‘‘ha¨t’’ in sequence 1 and on the syllable ‘‘fi’’ in sequence 2. The second (‘‘1 min1’’), third (‘‘1max’’), and fourth points (‘‘1 min2’’)
correspond respectively to the minimum, maximum, and minimum of the F0 excursion for the first operationally defined ‘‘accent’’ of each sequence. Those local minima and maxima are measured for the syllables ‘‘san dig’’ in sequence 1 and for the syllables ‘‘go¨tt laich’’ in sequence 2. Points five (‘‘2 min1’’), six (‘‘2max’’), and seven (‘‘2 min2’’) correspond respectively to the minimum, maximum, and minimum of the F0 excursion for the second operationally defined ‘‘accent’’ of each sequence. They are measured for the syllables ‘‘prong nju ven’’ and ‘‘jean kill gos.’’ Points eight (‘‘3 min’’), nine (‘‘3max’’), and ten (‘‘final’’) correspond to the final ‘‘accent’’ of each sequence: The local minimum, maximum, and minimum for the syllables ‘‘tsi’’ and ‘‘ter.’’ Fig. 4 shows an illustration of this stylization for a happy expression (first utterance). The pattern represented in Figure 4 — two ‘‘accents’’ (sequences of local F0 min1–max–min2) followed by a final fall — was the most frequent
241
pattern for the 144 expressions submitted to this analysis. The count of F0 ‘‘rises’’ (local ‘‘min1’’ followed by ‘‘max’’), ‘‘falls’’ (local ‘‘max’’ followed by ‘‘min2’’), and ‘‘accents’’ (‘‘min1’’ followed by ‘‘max’’ followed by ‘‘min2’’) for the first accented part, the second accented part, and the final syllable was not affected by the expressed emotions but varied for different speakers and for the two sequences of syllables that they pronounced. In order to control for differences in F0 level between speakers, a ‘‘baseline’’ value had to be defined for each speaker. An average F0 value was computed from 112 emotional expressions (including the 16 expressions used in this study) produced by each speaker. Fig. 5 shows the differences in hertz (averaged across speakers and sequences of syllables) between the observed F0 points in each expression and the speaker baseline value for each expressed emotion. Figure 5 shows that F0 level is affected by emotional arousal. The F0 points for emotions with low arousal (such as sadness, happiness, and anxiety) are generally lower than the F0 points for emotions with high arousal (despair, elation, panic fear, and hot anger). The description of the different points in the contour does not appear to add much information to an overall measure of F0,
such as F0 mean. Looking at the residual variance after regressing F0 mean (computed for each expression) on the points represented in Fig. 5, there remains only a slight effect of expressed emotion on points ‘‘2max’’ and ‘‘final.’’ The second maximum tends to be higher for recordings expressing elation, hot anger, and cold anger than for recordings expressing other emotions. The final F0 value tends to be relatively lower for hot anger and cold anger than for other emotions. Figure 5 further shows that the range of F0 fluctuations is affected by emotional arousal and also that F0 range (expressed in a linear scale) is on average larger for portrayals with high arousal (and high F0 level) than for portrayals with low arousal (and low F0 level). It is likely that both the level and range of F0 are enhanced in portrayals with high arousal as a consequence of increased vocal effort in those portrayals. The results reported above show that in this study only the overall level of the F0 contours was affected by expressed emotions and determined the emotion inferences of the judges in a powerful and statistically significant fashion. As could be expected from frequently replicated results in the literature, the height of F0 is likely to be reliably interpreted as indicative of differential activation
Fig. 5. Average F0 values by portrayed emotion. Note: The number of observations varies from 18 (for ‘‘start’’ with hot anger, cold anger, and elation; for ‘‘1max’’ with cold anger and panic fear) to 7 (for ‘‘final’’ with sadness). It should be noted also that there is a sizable amount of variance around the average values shown for all measurement points.
242
or arousal. These results do not encourage the notion that there are emotion-specific intonation contours. However, some of the detailed results suggest that aspects of contour shape (such as height of selected accents and final F0 movement) may well differentially affect emotion inferences. However, it seems unlikely that such features will have a discrete, iconic meaning with respect to emotional content. It seems reasonable to assume that, although the communicative value of F0 level may follow a covariation model, the interpretation of various features of F0 contour shape seems to be best described by a configuration model. Concretely, contour shape, or certain central features thereof, may acquire emotional meaning only in specific linguistic and pragmalinguistic contexts (including phonetic, syntactic, and semantic features, as well as normative expectations). Furthermore, the role of F0 contour may vary depending on the complexity of the respective emotion and its dependence on a sociocultural context. Thus, one would expect covariation effects for simple, universally shared emotions that are closely tied to biological needs and configuration effects for complex emotions and affective attitudes that are determined by socioculturally variable values and symbolic meaning. To summarize, the results indicate that there are no specific contours or different shapes related to the emotions studied (Ba¨nziger and Scherer, 2005). There are strong differences in the F0 level for the different kinds of emotions related to the underlying arousal intensity, which is well known from past research (see Fig. 5). However, in this study, for the first time, we have evidence that the accent structure, even with meaningless speech, with an important corpus, and spoken by nine different actors, is very similar for different emotions. The only single effect is related to elation, with a disproportional rise on the second accent (see Fig. 5). These results indicate that the accent structure alone does not show much specificity: There may be slight effects for the height of the secondary accent. There could be emotion-specific effects in contour aspects that are time critical: Lengthening–shortening in particular segments, upswing–downswing acceleration. To study these aspects systematically, we need further research using the synthetic variation of the
relevant parameters. Preliminary work using this technique has shown the importance of voice quality in decoding intonation (Ba¨nziger et al., 2004).
The neural dynamics of intonation perception The second study, which highlights the question related to the time course of decoding emotional prosody, was conducted by Grandjean et al. (in preparation). The main goal of this experiment was to address the timing related to the perception of emotional prosody, linguistic pragmatic accents, and phonemic identification. Using EEG and spatio-temporal analyses of the electrical brain signals, Grandjean et al. (2002) showed that different patterns of activation are related to specific decoding processes in emotional prosody identification compared with pragmatic and phonemic identifications. Three simple French words were used (‘‘ballon,’’ ‘‘talon,’’ and ‘‘vallon’’), with the F0 contour being systematically manipulated using Mbrola synthesis (Dutoit et al., 1996) to produce happiness, sadness, and neutral emotion expressions, as well as affirmative and interrogative utterance types (see Fig. 6). During EEG recording, the participants had to identify emotional prosody, linguistic prosody, and phonemic differences within three different counterbalanced blocks. Time slots for occurrences of specific topographical brain maps obtained by cluster analyses (Lehmann, 1987; see also Michel et al., 2004) on grand average of event-related potentials (ERPs) are different for the three recognition tasks. The results highlight specific processes related to emotional and semantic prosody identification compared with phonemic identification (see Fig. 7). Specifically, the first three ERP electrical brain maps (C1, C2, and C3 maps on the Fig. 7) are common to the different experimental conditions. Between 250 and 300 ms to 400 ms, specific processes occurred for emotional prosodic identification and semantic linguistic identification, demonstrating the involvement of different underlying neural networks subserving these different mental processes. In fact, the statistical analyses show specificity of the maps for both the emotional prosody and the linguistic pragmatic conditions,
243
Fig. 6. Examples of pitch analyses and sonograms for the French utterance ‘‘ballon’’ for the different experimental conditions used in the EEG study.
Fig. 7. Occurrence of the different topographical brain maps over time (1000 ms after the onset of the stimulus) for the three experimental conditions (semantic and emotional prosody, and phonemic identifications). The different colors correspond to the different brain electrical maps obtained from the grand average ERPs. The maps are represented on the global field power (GFP). Note the specific map related to emotional identification characterized by a right anterior positivity (E map) and the specific map for the semantic prosody condition with a large central negativity (S map).
244
when compared with the two other conditions, respectively (Grandjean et al., 2002). These results indicate that specific neural circuits are involved in the recognition of emotional prosody compared with linguistic and phonemic identification tasks. A right anterior positivity was measured on the scalp; this result is compatible with a previous fMRI study demonstrating anterior activations in right dorsolateral and orbitofrontal regions during emotional identification compared with phonetic identifications of the same stimuli (Wildgruber et al., 2005). The involvement of the left part of the frontal region was highlighted in another fMRI study when the participants had to identify linguistic information compared with emotional prosodic information (Wildgruber et al., 2004). However, the two temporal parts of the hemispheres are differentially involved in different subprocesses that contribute to the recognition of the emotional content of a word or a sentence (see Schirmer and Kotz, 2006). For instance, different brain networks process temporal information compared with spectral information, respectively, in the left temporal versus the right temporal parts of the brain (Zatorre and Belin, 2001). The specific electrical map related to the recognition of emotional prosody in this EEG experiment cannot be explained solely by the fact that the intonation contour was modified, because we also used different F0 contours for the linguistic pragmatic condition (interrogation and affirmative contours). Moreover, the same stimuli were used in the phonemic identification condition, demonstrating that this specific emotional prosody map is not related to the differences of basic acoustical features but rather related to the type of the participant’s recognition task. This study underlines the possibility of using speech synthesis to modify systematically acoustic features of emotional prosody, inducing different types of categorization processes related to the participant’s tasks. In the future, this type of paradigm could allow researchers interested in the understanding of perception of emotional prosody to study the integration of different subprocesses contributing to the subjective perception of intonation in emotional processes. Further studies are needed to manipulate systematically the different acoustical dimensions
involved in different functions at the prosodic level with vocal synthesis. In contrast to fMRI techniques, EEG methods allow the study of not only the interactions of different brain areas in prosodic perception, but also the timing of these processes to identify the brain structures involved in prosody perception.
Impact of attention on decoding of emotional prosody Another fundamental issue concerns the ability of humans to rely on emotion to adapt or cope with particularly relevant events or stimuli (Sander et al., 2003, 2005). Emotional prosody often serves the function of social communication and indeed has an impact on behavioral level related to individual differences; that is, attention toward angry voices increases neuronal activity in orbitofrontal regions related to interindividual sensitivity of punishment (Sander et al., 2005). To detect emotional signals in the environment, which is potentially relevant for survival, the CNS of organisms seems to be able to reorient attention via reflexive mechanisms, even if the voluntary attention processes are occupied with another specific task (Vuilleumier, 2005). This reflexive process has been extensively studied in the visual domain (Vuilleumier et al., 2001; Pourtois et al., 2004) but only rarely in the auditory domain (Mitchell et al., 2003; Wambacq et al., 2004). The last research example in this chapter addresses this question through a brain imaging study of emotionally angry prosody compared with neutral prosody in a dichotic listening study (see Fig. 8). In two experiments using fMRI techniques, Grandjean et al. (2005) demonstrated an increase of neuronal activities in the bilateral superior temporal sulcus (STS), known to be sensitive for human voices (Belin et al., 2000), when exposed to angry prosody compared with neutral prosody (controlling for signal amplitude level and envelope, as well as F0 level; see Fig. 8). This increase of STS activity occurred even when the angry emotional prosody was not the focus of attention in a dichotic listening paradigm, indicating possible reflexive mechanisms related to
245
Fig. 8. Attention and emotional prosody were manipulated in two fMRI experiments. (a) Dichotic listening paradigm allowing the presentation of different vocalizations at the left and right ears and the manipulation of spatial attention toward the right or the left side. (b) Cerebral activations of the right hemisphere. An increase of neuronal activity for angry relative to neutral speech prosody was found in the right STS (red, Po0.001). An anterior region of right STS was modulated by spatial attention directed to the left relative to the right ear (green, Po0.005). These modulations of activations by emotion and attention occurred within voice-selective areas (blue line). (c) Right STS activation in Experiment 1. Blood Oxygen Level-Dependent (BOLD) responses were increased for angry compared with neutral speech. (d) The same cluster in the right STS as in Experiment 2. Activation occurred only in response to vocal stimuli and not synthetic sounds.
anger prosody. Thus, emotional prosody seems to be able to induce an increase of neuronal activities even when the respective emotional-relevant event is not the focus of voluntary attention, as previously demonstrated in the visual domain (for a review, see Vuilleumier, 2005). Thus, the human auditory system interacting with other brain areas such as the amygdala would be able to allocate attentional resources to compute the relevant information, modifying attention allocation (Sander et al., 2005). Conclusion The research examples described above highlight the importance of investigating emotional prosody
by taking into account the specificity of acoustic signals not only in perception, but also in production. Moreover, the different subprocesses involved in these two mechanisms, such as temporal unfolding of decoding or the differential effects of push and pull effects, during encoding processes should be manipulated systematically in future research to allow better differentiation of the determinants of these phenomena. Future research in this field should systematically manipulate different acoustical parameters (i.e., the acoustic features underlying prosody properly speaking and vocal quality in a more general sense). In addition, the current research shows the utility of combining behavioral and neuroscience
246
research in trying to disentangle the complex system of intonational structure in a dually coded system, subject to psychobiological push and sociocultural pull. Abbreviations CNS EEG ERPs F0 fMRI max min ms STS
central nervous system electroencephalography event-related potentials fundamental frequency functional magnetic resonance imagery maximum minimum milliseconds superior temporal sulcus
References Banse, R. and Scherer, KR. (1996) Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol., 70: 614–636. Ba¨nziger, T. and Scherer, K.R. (2005) The role of intonation in emotional expressions. Speech Commun., 46: 252–267. Belin, P., Zatorre, R.J., Lafaille, P., Ahad, P. and Pike, B. (2000) Voice-selective areas in human auditory cortex. Nature, 403: 309–312. Brunswik, E. (1956) Perception and The Representative Design of Psychological Experiments (2nd ed). University of California Press, Berkeley, CA. Darwin, C. (1998) The Expression of the Emotions in Man and Animals. John Murray, London. (Reprinted with introduction, afterword, and commentary by P. Ekman, Ed.). Oxford University Press, New York. (Original work published 1872) Dutoit, T., Pagel, V., Pierret, N., Bataille, F. and Van Der Vrecken, O. (1996) The MBROLA project: towards a set of high-quality speech synthesizers free of use for non-commercial purposes. Proc. ICSLP’96, 3: 1393–1396. Fonagy, I. and Magdics, K. (1963) Emotional patterns in intonation and music. Z Phonet., 16: 293–326. Ghazanfar, A.A. (2005) The evolution of speech reading. International Conference on Cognitive Neuroscience 9. Cuba. Grandjean, D., Ducommun, C. and Scherer, K.R. (in preparation) Neural signatures of processing emotional and linguistic-pragmatic prosody compared to phonetic-semantic word identification: an ERP study. Grandjean, D., Ducommun, C., Bernard, P.-J. and Scherer, K.R. (2002) Comparison of cerebral activation patterns in identifying affective prosody, semantic prosody, and phoneme differences. Poster presented at the International Or-
ganization of Psychophysiology (IOP), August 2002, Montreal, Canada. Grandjean, D., Sander, D., Pourtois, G., Schwartz, S., Seghier, M., Scherer, K.R. and Vuilleumier, P. (2005) The voices of wrath: brain responses to angry prosody in meaningless speech. Nat. Neurosci., 8: 145–146. Ishii, K., Reyes, J.A. and Kitayama, S. (2003) Spontaneous attention to word content versus emotional tone: differences among three cultures. Psychol. Sci., 14: 39–46. Lehmann, D. (1987) Principles of spatial analyses. In: Gevins, A.S. and Re´mond, A. (Eds.) Handbook of Electroencephalography and Clinical Neurophysiology, Vol. 1. Methods of Analyses of Brain Electrical and Magnetic Signals, Elsevier, Amsterdam, pp. 309–354. Michel, C.M., Murray, M.M., Lantz, G., Gonzalez, S., Spinelli, L. and Grave de Peralta, R. (2004) EEG source imaging. Clin. Neurophysiol., 115: 2195–2222. Mitchell, R.L., Elliott, R., Barry, M., Cruttenden, A. and Woodruff, P.W. (2003) The neural response to emotional prosody, as revealed by functional magnetic resonance imaging. Neuropsychologia, 41: 1410–1421. Morton, E.S. (1982) Grading, discreteness, redundancy, and motivation-structural rules. In: Kroodsma, D.E., Miller, E.H. and Ouellet, H. (Eds.), Acoustic Communication in Birds. Academic Press, New York, pp. 182–212. Pakosz, M. (1983) Attitudinal judgments in intonation: some evidence for a theory. J. Psycholinguist. Res., 12: 311–326. Pourtois, G., Grandjean, D., Sander, D. and Vuilleumier, P. (2004) Electrophysiological correlates of rapid spatial orienting towards fearful faces. Cereb. Cortex, 14: 619–633. Salzmann, Z. (1993) Language, Culture and Society: An Introduction to Linguistic Anthropology (3rd ed). Westview Press, Boulder, CO. Sander, D., Grafman, J. and Zalla, T. (2003) The human amygdala: an evolved system for relevance detection. Rev. Neurosci., 14: 303–316. Sander, D., Grandjean, D., Pourtois, G., Schwartz, S., Seghier, M., Scherer, K.R. and Vuilleumier, P. (2005) Emotion and attention interactions in social cognition: brain regions involved in processing anger prosody. Neuroimage, 28: 848–858. Sander, D., Grandjean, D. and Scherer, K.R. (2005) A systems approach to appraisal mechanisms in emotion. Neural Networks, 18: 317–352. Scherer, K.R. (1985) Vocal affect signalling: a comparative approach. In: Rosenblatt, J., Beer, C., Busnel, M.-C. and Slater, P.J.B. (Eds.) Advances in the Study of Behavior, Vol. 15. Academic Press, New York, pp. 189–244. Scherer, K.R. (1986) Vocal affect expression: a review and a model for future research. Psychol. Bull., 99: 143–165. Scherer, K.R. (2001) Appraisal considered as a process of multi-level sequential checking. In: Scherer, K.R., Schorr, A. and Johnstone, T. (Eds.), Appraisal Processes in Emotion: Theory, Methods, Research. Oxford University Press, New York and Oxford, pp. 92–120. Scherer, K.R. (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun., 40: 227–256.
247 Scherer, K.R., Johnstone, T. and Klasmeyer, G. (2003) Vocal expression of emotion. In: Davidson, R.J., Scherer, K.R. and Goldsmith, H. (Eds.), Handbook of the Affective Sciences. Oxford University Press, New York and Oxford, pp. 433–456. Scherer, K.R., Ladd, D.R. and Silverman, K.E.A. (1984) Vocal cues to speaker affect: testing two models. J. Acoust. Soc. Am., 76: 1346–1356. Schirmer, A. and Kotz, S.A. (2006) Beyond the right hemisphere: brain mechanisms mediating vocal emotional processing. Trends in Cognitive Sciences, 10: 24–30. Vuilleumier, P. (2005) How brains beware: neural mechanisms of emotional attention. Trends Cogn. Sci., 9: 585–594. Vuilleumier, P., Armony, J.L., Driver, J. and Dolan, R.J. (2001) Effects of attention and emotion on face processing in the human brain: an event-related fMRI study. Neuron, 30: 829–841. Wambacq, I.J., Shea-Miller, K.J. and Abubakr, A. (2004) Nonvoluntary and voluntary processing of emotional prosody:
an event-related potentials study. Neuroreport, 15: 555–559. Wildgruber, D., Hertrich, I., Riecker, A., Erb, M., Anders, S., Grodd, W. and Ackermann, H. (2004) Distinct frontal regions subserve evaluation of linguistic and emotio nal aspects of speech intonation. Cereb. Cortex, 14: 1384–1389. Wildgruber, D., Riecker, A., Hertrich, I., Erb, M., Grodd, W., Ethofer, T. and Ackermann, H. (2005) Identification of emotional intonation evaluated by fMRI. Neuroimage, 15: 1233–1241. Wundt, W. (1900) Vo¨lkerpsychologie. Eine Untersuchung der Entwicklungsgesetze von Sprache, Mythos und Sitte. Band I. Die Sprache. [Cultural Psychology: A Study of the Developmental Laws of Language, Myth, and Customs. Vol. 1. Language]. Kro¨ner, Leipzig. Zatorre, R.J. and Belin, P. (2001) Spectral and temporal processing in human auditory cortex. Cereb. Cortex, 11: 946–953.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 13
Cerebral processing of linguistic and emotional prosody: fMRI studies D. Wildgruber1,2,, H. Ackermann3, B. Kreifelts1 and T. Ethofer1,2 1 Department of Psychiatry, University of Tu¨bingen, Osianderstr. 24, 72076 Tu¨bingen, Germany Section MR of CNS, Department of Neuroradiology, University of Tu¨bingen, 72076 Tu¨bingen, Germany 3 Department of General Neurology, Hertie Institute for Clinical Brain Research, University of Tu¨bingen, Hoppe-Seyler-Str. 3, 72076 Tu¨bingen, Germany 2
Abstract: During acoustic communication in humans, information about a speaker’s emotional state is predominantly conveyed by modulation of the tone of voice (emotional or affective prosody). Based on lesion data, a right hemisphere superiority for cerebral processing of emotional prosody has been assumed. However, the available clinical studies do not yet provide a coherent picture with respect to interhemispheric lateralization effects of prosody recognition and intrahemispheric localization of the respective brain regions. To further delineate the cerebral network engaged in the perception of emotional tone, a series of experiments was carried out based upon functional magnetic resonance imaging (fMRI). The findings obtained from these investigations allow for the separation of three successive processing stages during recognition of emotional prosody: (1) extraction of suprasegmental acoustic information predominantly subserved by right-sided primary and higher order acoustic regions; (2) representation of meaningful suprasegmental acoustic sequences within posterior aspects of the right superior temporal sulcus; (3) explicit evaluation of emotional prosody at the level of the bilateral inferior frontal cortex. Moreover, implicit processing of affective intonation seems to be bound to subcortical regions mediating automatic induction of specific emotional reactions such as activation of the amygdala in response to fearful stimuli. As concerns lower level processing of the underlying suprasegmental acoustic cues, linguistic and emotional prosody seem to share the same right hemisphere neural resources. Explicit judgment of linguistic aspects of speech prosody, however, appears to be linked to left-sided language areas whereas bilateral orbitofrontal cortex has been found involved in explicit evaluation of emotional prosody. These differences in hemispheric lateralization effects might explain that specific impairments in nonverbal emotional communication subsequent to focal brain lesions are relatively rare clinical observations as compared to the more frequent aphasic disorders. Keywords: affect; communication; emotion; fMRI; intonation; language; lateralization; prosody words we use. Rather, in numerous situations it seems to be much more important how we utter them (Mehrabian, 1972). Emotional states, attitudes (e.g., sympathy, dominance, politeness), and intentions often are predominantly expressed by the modulation of the tone of voice (emotional or affective prosody). For example, if your head of
Introduction During social interactions among humans, transfer of information does not depend only upon the Corresponding author. Tel. +49-7071-298-6543; Fax: +497071-29-4141; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56013-3
249
250
department comes around and says with an angry intonation ‘‘I have just been reading your report. We have to talk about it right now,’’ you will certainly get a fairly different impression of his intentions as if he would produce the same sentences in a friendly and happy manner. As concerns the cerebral correlates of prosody processing, observations in patients suffering from focal brain lesions indicate that the well-established left hemisphere dominance for language comprehension does not extend to the perception of emotional tone (Hughling-Jackson, 1879; Pell and Baum, 1997a,b; Schmitt, Hartje, & Williams, 1997; Baum and Pell, 1999; Borod et al., 2001, 2002; Adolphs, 2002; Charbonneau, Scherzer, Aspirot, & Cohen, 2003; Wildgruber and Ackermann, 2003; Ackermann, Hertrich, Grodd, & Wildgruber, 2004). According to an early neuroanatomical model proposed by Ross (1981), prosodic information is encoded within distinct right-sided perisylvian regions that are organized in complete analogy to the left-sided language areas. Expression of emotional prosody, thus, is believed to depend upon the Broca’s homologue within the right inferior frontal cortex, whereas comprehension of intonational information is presumed to be bound to the right superior temporal region (Wernicke’s homologue). However, the empirical evidence for this model was based on a few case reports only, and more systematic investigations yielded rather discrepant results. The majority of lesion studies seem to be compatible with the assumption that the right hemisphere posterior perisylvian cortex is highly important for the comprehension of speech melody (Heilman et al., 1975, 1984; Darby, 1993; Starkstein, Federoff, Price, Leiguarda, & Robinson, 1994; Adolphs, Tranel, & Damasio, 2001; Borod et al., 2002). However, various clinical examinations indicate a widespread network of — partially bilateral — cerebral regions including the frontal cortex (Hornack et al., 1996, 2003; Breitenstein et al., 1998; Rolls, 1999; Adolphs, Damasio, & Tranel, 2002) and the basal ganglia (Cancellier and Kertesz, 1990; Weddel, 1994; Peper and Irle, 1997; Breitenstein et al., 1998; Breitenstein, Van Lancker, Daum, & Waters, 2001; Pell and Leonard, 2003) to contribute to the processing of emotional intonation. In line with these findings,
several neuroimaging studies reported rightward lateralization of hemodynamic activation within temporal regions (Buchanan et al., 2000; Wildgruber et al., 2002, 2005; Kotz et al., 2003; Mitchell, Elliot, Barry, Cruttenden, & Woodruff, 2003; Grandjean et al., 2005) and revealed additional — partially bilateral — responses within the frontal cortex (George et al., 1996; Imaizumi et al., 1997; Buchanan et al., 2000; Wildgruber et al., 2002, 2004, 2005; Kotz et al., 2003), the anterior insula (Imaizumi et al., 1997; Wildgruber et al., 2002, 2004), and the basal ganglia (Kotz et al., 2003) during recognition of emotional intonation. The considerable differences in lateralization and localization of the relevant lesion sites as well as hemodynamic activation spots, however, do not yet allow for an indisputable determination of the neural substrates of prosody processing. Presumably, the discrepancies of the available data are due to differences in the methods used such as stimulus selection, task and control conditions. In order to further clarify to what extent specific neural structures subserve different facets of the comprehension of emotional prosody, our research group conducted a variety of experiments based on functional magnetic resonance imaging (fMRI), a technique that can be used for the noninvasive evaluation of task-related hemodynamic cerebral responses at a high spatial (ca. 0.5 mm; Menon and Goodyear, 1999) and moderate temporal (o1 s; Wildgruber, Erb, Klose, & Grodd, 1997) resolution. Specifically, these studies were designed to delineate the neural substrates underlying distinct facets of prosody processing: (a) extraction of suprasegmental acoustic information, (b) representation of meaningful prosodic sequences, (c) explicit judgment of emotional as compared to linguistic information, (d) connectivity between the neural structures involved, and (e) implicit processing of emotional prosody.
Extraction of suprasegmental acoustic information At the perceptual level, emotional tone is characterized by the modulation of loudness (acoustic correlate: sound intensity), pitch (fundamental frequency variation), speech rhythm (duration of
251
syllables and pauses), and voice quality or timbre (distribution of spectral energy) across utterances (Lehiste, 1970; Ackermann et al., 1993; Murray and Arnott, 1993; Banse and Scherer, 1996; Cutler, Dahan, & Donselaar, 1997; Bachorowski and Owren, 2003; Scherer, Johnstone, & Klasmeyer, 2003; Sidtis and Van-Lancker-Sidtis, 2003). These suprasegmental features are imposed upon the sequence of speech sounds (segmental structure) of verbal utterances. According to the acoustic lateralization hypothesis (Fig. 1a), the encoding of suprasegmental parameters of the speech signal (rather slow shifts 4100 ms) is predominantly bound to right hemisphere structures whereas rapid transitions (o50 ms), contributing to the differentiation of the various speech sounds at the segmental level (i.e., phonemes, syllables), are mainly processed within contralateral areas (Van Lancker and Sidtis, 1992; Belin et al., 1998; Ivry and Robertson, 1998; Zatorre and Belin, 2001; Zatorre, 2001; Zatorre et al., 2002; Meyer, Alter, Friederici, Lohmann, & von Cramon, 2002; Poeppel et al., 2004). These acoustic laterality effects have been supposed to explain the differential hemispheric dominance patterns of language (left hemisphere) and music processing (right hemisphere) (Wildgruber, et al., 1996, 1998, 2001, 2003; Belin et al., 1998; Ivry and Robertson, 1998; Zatorre et al., 2002; Hugdahl and Davidson, 2003; Poeppel, 2004; Ackermann et al., 2006). In order to further separate the neural structures subserving the extraction of basic acoustic properties of speech prosody from those which respond to the conveyed emotional ‘‘meaning’’, a series of fMRI experiments was conducted. More specifically, the following hypotheses were explored: (a) Lateralization of hemodynamic responses during passive listening to trains of noise bursts depends upon stimulus frequency. (b) Extraction of specific acoustic parameters (signal duration, fundamental frequency) is associated with different activation patterns at the level of primary and higher order acoustic regions. (c) Expressiveness of emotional prosody enhances the hemodynamic responses of voicesensitive areas within the right as compared to
corresponding regions within the left hemisphere. The first experiment encompassed a simple passive listening condition. Trains of noise bursts (clicks) were presented at different rates (2.0, 2.5, 3.0, 4.0, 5.0, 6.0 Hz) to eight healthy right-handed subjects (four males and four females, aged 19–32 years) during fMRI measurements. The clicks had been produced originally by striking a pen against a table. Each acoustic sequence of a given click rate had a duration of 6 s. Altogether, 90 trains (6 rates 15 repetitions) were presented in pseudorandomized order. During passive listening to these stimuli, significant hemodynamic responses across all different presentation rates emerged within the superior temporal gyrus of both sides, right hemisphere putamen, and the tectum. Moreover, parametric analysis revealed lateralized ratedependent responses within the anterior insular cortex. During presentation of the click trains at slow rates, the right anterior insula showed the highest activation levels. Furthermore, the hemodynamic responses of this region displayed a decline of amplitude in parallel with an increase of stimulation frequency. By contrast, an opposite relationship emerged within the left anterior insular cortex (Ackermann et al., 2001). This double dissociation of rate-response functions between the two hemispheres is in a very good accordance with the acoustic lateralization hypothesis (Fig. 1b). Seventeen healthy volunteers (8 males, 9 females, aged 18–31 years) participated in a second experiment that investigated discrimination of duration and pitch values at different levels of difficulty. Complex sounds characterized by four formant frequencies (500, 1500, 2500, 3500 Hz), manipulated either in duration (100–400 ms) or in fundamental frequency (100–200 Hz, realized by rhythmic intensity fluctuations throughout the signal), served as stimuli. Sequences of two signals were presented to both ears each, and subjects either had to detect the longer duration (duration task) or the higher pitch (pitch task), respectively. The behavioral data showed comparable hit scores (mean values about 75%) with increasing accuracy rates in correlation to rising physical difference between the two acoustic signals for both, pitch
252
and duration discrimination (Fig. 1c). As compared to baseline at rest, both tasks yielded bilateral activation of frontal, temporal and parietal regions including primary and secondary acoustic
cortices as well as the working memory network. A lateralization analysis, i.e., comparison of each hemisphere with the contralateral side on a voxelby-voxel basis, revealed, however, lateralization
253
effects toward the left side within insular and temporal cortex during both tasks. Even more noteworthy, a parametric analysis of hemodynamic responses showed an increase of activation within the right temporal cortex in parallel with the differences in sound properties of the stimulus pairs (Fig. 1c). This positive linear relationship emerged both during the duration and the pitch task. Moreover, a comparison with the contralateral hemisphere revealed significant lateralization effects of the parametric responses toward the right superior temporal sulcus during discrimination of stimulus duration (Reiterer et al., 2005). Slowly changing and highly different acoustic stimuli, thus, seem to be predominantly processed within the right hemisphere whereas detection of rapid changes or rather slight signal differences might be linked to the left hemisphere. The findings of these first two experiments indicate that differences in basic acoustic properties have a strong impact on brain activation patterns. In a third study, 12 healthy right-handed subjects (7 males, 5 females, aged 19–29 years) were asked to judge in two separate sessions the emotional valence of either word content or prosody of altogether 162 German adjectives spoken in a happy, angry, or neutral tone. Intonations of these different emotional categories differ in various acoustic properties (Banse and Scherer, 1996). To disambiguate more specific effects of emotional expressiveness from extraction of low-level acoustic parameters, mean and variation of sound intensity and fundamental frequency were included in the statistical
models as nuisance variables. During both tasks, a linear correlation between hemodynamic responses and prosodic emotional expressiveness emerged within the middle part of bilateral superior temporal sulcus (mid-STS). Responses of right hemisphere mid-STS showed higher amplitudes, larger extension, and a stronger dependency on emotional intensity than those of the contralateral side (Fig. 1d). Similar response patterns were found both for explicit and implicit processing of emotional prosody (Ethofer et al., 2006c). These observations support the assumption that the mid-STS region contributes to the encoding of emotionally salient acoustic stimuli independent from task-related attentional modulation (Grandjean et al., 2005). In summary, these findings, related to the acoustic level of prosody processing, indicate extraction of suprasegmental acoustic information to be predominantly subserved within right-sided primary and higher order acoustic brain regions including mid-STS and anterior insula.
Representation of meaningful prosodic sequences According to the neuroanatomical model proposed by Elliot Ross, the Wernicke’s homologue region bound to the posterior aspects of right hemisphere superior temporal gyrus represents the key area for the comprehension of prosodic sequences (Ross, 1981). An important role of the right posterior perisylvian cortex for comprehension of speech melody has been confirmed in
Fig. 1. (a) According to the acoustic lateralization hypothesis, rapid changes of acoustic parameters (o50 ms) are predominantly processed within the left whereas slow variations (4100 ms) are mainly encoded within the right hemisphere. (b) Parametric responses during passive listening to trains of noise bursts: hemodynamic responses characterized by positive linear (red), negative linear (green), or nonlinear (blue) rate-response functions. Activation clusters are displayed on transverse sections of the averaged anatomical reference images (R ¼ right, L ¼ left). The relationship between signal intensity (in arbitrary units) and rate of acoustic stimulation was determined within the right (green) and left (blue) insular cortex (see Ackerman et al., 2001). (c) Discrimination of sound duration: pairs of complex acoustic signals that varied in duration (100–400 ms) were presented to healthy subjects. Accuracy rates demonstrate increasing deviance in time to be correlated with higher performance scores. Parametric effects: significantly activated areas as a function of linear increase with task performance emerged within the right MTG/STG during duration discrimination. Laterality analysis: voxelwise comparison of the hemispheres revealed a significantly activated cluster within the left STG for the parametric effect of duration discrimination (see Reiterer et al., 2005). (d) Parametric effects of prosodic emotional intensity. Conjunction of regions showing a linear relationship between hemodynamic responses and prosodic emotional intensity during both implicit and explicit processing of emotional prosody. Beta estimates (mean7standard error) corresponding to distinct intensity steps of emotional intonations have been plotted for the most significant voxel of the cluster in the right and left STS during implicit (red) and explicit (green) processing of emotional prosody (see Ethofer et al., 2006c).
254
various clinical examinations (Heilman, Scholes, & Watson, 1975, 1984; Darby, 1993; Starkstein et al., 1994; Borod et al., 2002). In some studies on the comprehension of emotional information, however, the valence of emotional expression has been reported to influence lateralization of cerebral responses (Canli et al., 1998; Davidson, Abercrombie, Nitschke, & Putnam, 1999; Murphy, Nimmo-Smith, & Lawrence, 2003). According to the valence hypothesis, rightward lateralization of prosody processing only holds true for negative emotions, whereas comprehension of happy stimuli is ascribed to the left hemisphere (Fig. 2). As concerns speech intonation, several clinical
examinations failed to show any interactions between hemispheric lateralization and emotional valence (Pell, 1998; Baum and Pell, 1999; Borod et al., 2002; Kucharska-Pietura et al., 2003). Considering functional imaging data, however, distinct cerebral activation patterns bound to specific emotional categories such as disgust, anger, fear, or sadness have been observed during perception of facial emotional expressions (Sprengelmeyer, Rausch, Eysel, & Przuntek, 1998; Kesler-West et al., 2001; Phan, Wager, Tayler, & Liberzon, 2002; Murphy et al., 2003). Several studies have corroborated the notion that responses of the amygdalae are specifically related to facial expressions of fear
Fig. 2. (a) According to the valence hypothesis, positive emotional information (i.e., happy expressions) is processed within the left negative emotional information (expressions of fear, anger, disgust or sadness) within the right hemisphere. (b) Significant hemodynamic responses during identification of emotional intonation as compared to vowel identification are superimposed upon the cortical surface of a template brain and upon an axial slice at the level of the highest activated voxels within the activation clusters. The emotional task yielded specific activation within the right STS (BA 22/42) and the right inferior frontal cortex (BA 45/47). Analysis of valence effects, however, revealed no differences of cerebral responses depending upon valence or specific emotional categories (see Wildgruber et al., 2005).
255
(Morris et al., 1996, 1998; Adolphs, 2002; Phan et al., 2002) whereas facial expressions of disgust seem to elicit activation of the anterior insula (Phillips et al., 1998; Sprengelmeyer et al., 1998; Calder et al., 2000; Phan et al. 2002; Wicker et al., 2003). Fear-specific responses of the amygdalae have also been reported in association with vocal emotional expressions (Phillips et al., 1998; Morris, Scott, & Dolan, 1999) whereas the predicted disgust-related activation of the anterior insula has not been observed in a prior PET experiment (Phillips et al., 1998). It is unsettled, thus, to which extent lateralization and exact localization of cerebral activation during comprehension of emotional prosody is linked to specific emotional categories. Based on the aforementioned clinical and neuroimaging studies, presumably, there are cerebral regions, including the right posterior temporal cortex, that contribute to comprehension of emotional prosody independent of any specific emotional content. Other regions, including the amygdala and anterior insula, are selectively linked to comprehension of specific emotional categories. In order to separate these components, 100 short German declarative sentences with emotionally neutral content (such as ‘‘Der Gast hat sich fu¨r Donnerstag ein Zimmer reserviert’’ [The visitor reserved a room for Thursday], ‘‘Die Anrufe werden automatisch beantwortet’’ [Phonecalls are answered automatically]) were randomly ascribed to one of five different target emotions (happiness, anger, fear, sadness, or disgust). A professional actress and an actor produced these test materials expressing the respective emotion by modulation of affective intonation. Verbal utterances were presented to 10 healthy subjects (5 males, 5 females, age: 21–33 years) under two different task conditions during fMRI. As an identification task, subjects were asked to name the emotion expressed by the tone of voice whereas the control condition (phonetic task) required the detection of the vowel following the first /a/ in each sentence. Similarly to the emotion recognition task, vowel identification also included a forced choice selection from five alternatives, i.e., the vowels /a/, /e/, /i/, /o/, /u/. Under both conditions, participants were asked to give a verbal response as quickly as possible
and they were provided with a list of possible response alternatives prior to testing. Since both tasks require evaluation of completely identical acoustic stimuli and involve very similar response mechanisms, comparison of the respective hemodynamic activation patterns should allow for the separation of task-specific cerebral responses independently of stimulus characteristics and unspecific task components. In order to delineate cerebral structures contributing to the recognition of emotional prosody independent of specific emotional categories, responses during the identification of emotional prosody across all emotional categories were compared to the phonetic control condition. To disentangle patterns of cerebral activation related to comprehension of specific emotional categories, each emotional category was compared against the others. The main goal of the study, thus, was to evaluate the following two hypotheses: (a) A network of right-hemisphere areas including the posterior temporal cortex supports identification of affective intonation independent of specific emotional information conveyed. (b) Perception of different emotional categories is associated with specific brain regions, i.e., response localization varies with emotion type. Specifically, fear- specific responses are linked to the amygdalae and disgust-specific responses to the anterior insula. During the fMRI experiment, subjects correctly identified the emotional tone at a slightly lower rate (mean: 75.277.9%) as compared to the vowel detection task (mean: 83.477.0%, p o0.05). The accuracy scores for happy (90%), angry (82%), and sad (84%) expressions reached comparable levels whereas fearful (51%) and disgusted (57%) expressions were identified at significantly lower rates (po0.05). These differences in performance are in good accordance with prior observations and might be related to differences in recognizability of the acoustic cues of the various emotions (Banse and Scherer, 1996). Response times for the emotional task (mean: 4.370.9 s) showed no significant differences as compared to the phonetic task (mean: 4.171.0 s) indicating comparable
256
levels of task difficulty. Cerebral responses obtained during both tasks, as compared to the rest condition, yielded a bilateral network of hemodynamic activation at the level of cortical and subcortical regions including frontal, temporal and parietal cortex, thalamus, and cerebellum. To identify brain regions specifically contributing to the encoding of emotional intonation, the respective activation patterns were directly compared to the responses obtained during phonetic processing of the identical acoustic stimuli (Wildgruber et al., 2005). Using this approach, responses within two activation clusters, localized within the right posterior superior temporal sulcus (BA 22/42) and the right inferior frontal cortex (BA 45/47), could be assigned to recognition of emotional prosody (Fig. 2b). No significant impact of emotional valence or specific emotional categories on the distribution of brain activation could be observed. Therefore, the results of the current study do not support, in line with prior functional imaging (Buchanan et al., 2000; Wildgruber et al., 2002; Kotz et al., 2003; Mitchell et al., 2003) and recent lesion studies (Pell, 1998; Baum and Pell, 1999; Borod et al., 2002; Kucharska-Pietura et al., 2003), the hypothesis of valence-specific lateralization effects during processing of emotional intonation. The observed hemodynamic responses, however, indicate a task-dependent and stimulus-independent contribution of the right posterior STS (BA 22/ 42) and the right inferior frontal cortex (BA 45/47) to the processing of suprasegmental acoustic information irrespective of specific emotional categories. We assume, therefore, that the representation of meaningful suprasegmental acoustic sequences
within these areas should be considered a second step of prosody processing. A further experiment was designed in order to evaluate the contribution of posterior STS and inferior frontal cortex to the processing of emotional prosody as compared to evaluation of linguistic prosody.
Explicit judgment of emotional prosody As concerns its communicative functions, speech prosody serves a variety of different linguistic as well as emotional purposes (Ackermann et al., 1993, 2004; Baum and Pell, 1999). Among others, it is used to specify linguistic information at the word (content vs. content) and sentence level (question vs. statement intonation: ‘‘It is new?’’ vs. ‘‘It is new!’’; location of sentence focus: ‘‘he wrote this letter ‘‘vs. ‘‘he wrote this letter’’), and conveys information about a speaker’s personality, attitude (i.e., dominance, submissiveness, politeness, etc.), and emotional state (Fig. 3). Based on lesion studies, the functional lateralization hypothesis proposes linguistic prosody to be processed within the left hemisphere, whereas emotional tone is bound to contralateral cerebral structures (Van Lancker, 1980; Heilman et al., 1984; Behrens, 1985; Emmorey, 1987; Pell and Baum, 1997a; Borod et al., 1998, 2002; Geigenberger and Ziegler, 2001; Schirmer, Alter, Kotz, & Friederici, 2001; Charbonneau et al., 2003). In order to disentangle the functional and the acoustic level of prosody processing, sentences varying in linguistic accentuation (sentence focus) as well as emotional expressiveness were generated by
Fig. 3. (a) According to the functional lateralization hypothesis linguistic prosody is processed within the left emotional prosody is bound to the right hemisphere. (b) Variation of linguistic (left) and emotional intonation (right). The German sentence ‘‘Der Schal ist in der Truhe’’ (the scarf is in the chest) was digitally resynthesized with various pitch contours. Five different patterns of sentence focus were realized by a stepwise increase of the fundamental frequency on the final word (left). The stress accentuation ranged between an utterance clearly focused on the second word (solid line) and one that is focused on the final word (dotted line). For each of these synthetic sentences, five variations of emotional expressiveness were generated by manipulation of the pitch range across the whole utterance (right). Sentences with broader pitch ranges are perceived as being more excited. As shown for the middle contour (red), the realization of linguistic accents remains constant during manipulation of emotional expressiveness. The sentences of each stimulus pair differed in relative focus accentuation as well as in emotional intensity. (c) Significantly activated regions, identified by task comparisons, superimposed upon the cortical surface of a template brain and upon an axial slice at the level of the highest activated voxels within each activation cluster: The emotional task (upper row) yielded significant responses within the bilateral orbitobasal frontal cortex (BA 11/47), whereas activation of the left inferior frontal gyrus (BA 44/45) emerged during discrimination of linguistic prosody (lower row) (see Wildgruber et al., 2004).
257
systematic manipulations of the fundamental frequency contour of the simple declarative German sentence ‘‘Der Schal ist in der Truhe’’ (The scarf is in the chest). With its focus on the second word,
this utterance represents an answer to the question ‘‘What is in the chest?’’. Shifting the accent to the final word, the sentence provides information about where the scarf is. This prosodic distinction
258
is realized by distinct pitch patterns characterized by F0-peaks on the accented syllables (Cutler et al., 1997). As a first step, a series of five F0 contours was generated extending from a clear-cut focus on the second to an accent on the final word (Fig. 3b). On the basis of each of these five focus patterns, second, five additional variations were generated differing in pitch range across the whole sentence. These global variations are perceived as modulations of emotional expressiveness. Sentences with broader F0 range clearly sound more excited (Banse and Scherer, 1996; Pihan et al., 1997). Ten healthy right-handed participants (6 males, 4 females, age: 20–35 years) were asked to perform two different discrimination tasks during pairwise presentation of these acoustic stimuli. In two different sessions of the experiment they had to answer one of the following questions: (a) ‘‘Which of the two sentences is better suited as a response to the question: Where is the scarf?’’ (discrimination of linguistic prosody) and (b) ‘‘Which of the two sentences sounds more excited?’’ (discrimination of emotional expressiveness). Since both conditions require the evaluation of completely identical acoustic signals, the comparison of hemodynamic responses obtained during the two different runs allows for the separation of task-specific responses independent of stimulus characteristics. This experiment was primarily designed to explore the following two alternative hypotheses: (a) Lateralization effects during prosody processing are strongly bound to acoustic properties of the relevant speech signal: Since comprehension of linguistic as well as emotional prosody relies upon the extraction of suprasegmental features, a rightward lateralization must be expected during both conditions (acoustic lateralization hypothesis). (b) Linguistic prosody is processed within leftsided speech areas, whereas comprehension of emotional prosody must be expected to be bound to the right hemisphere (functional lateralization hypothesis). The obtained behavioral data clearly show that the participants were able to discriminate the
patterns of linguistic accentuation and emotional expressiveness at similar levels of accuracy (linguistic discrimination: 82%714%, emotional discrimination 78%711%). Therefore, a comparable level of difficulty for both tasks can be assumed. As compared to the baseline at rest, both conditions yielded bilateral hemodynamic responses within supplementary motor area, anterior cingulate gyrus, superior temporal gyrus, frontal operculum, anterior insula, thalamus, and cerebellum. Responses within the dorsolateral frontal cortex (BA 9/45/46) showed lateralization effects toward the right side during both tasks (Wildgruber et al., 2004). In order to identify brain regions specifically contributing to the processing of linguistic or emotional intonation, the respective activation patterns were directly compared with each other. During the linguistic task, significantly stronger activation was observed within the left inferior frontal gyrus (BA 44/45 ¼ Broca’s area). By contrast, the affective condition yielded significant bilateral hemodynamic responses within orbitofrontal cortex (BA 11/47) as compared to the linguistic task (Fig. 3c). Comprehension of linguistic prosody requires analysis of the lexical, semantic, and syntactic aspects of pitch modulation patterns. Activation of left inferior frontal cortex (Broca’s area) concomitant with the discrimination of linguistic accents indicates that at least some of these operations might be housed within the anterior perisylvian language areas. In line with this assumption, native speakers of Thai, a tone language, showed activation of the left inferior frontal region during discrimination of linguistically relevant pitch patterns in Thai words. This activity was absent in Englishspeaking subjects listening to identical stimuli (Gandour, Wong, & Hutchins, 1998). Moreover, clinical observations support the assumption of a specific contribution of the left hemisphere to the comprehension of linguistic aspects of intonation. For example, Heilman et al. (1984) found patients suffering from focal left-sided brain lesions to produce significantly more errors in a linguistic prosody identification task as compared to the recognition of affective intonation, whereas damage to the right hemisphere was associated with a similar profile of deficits in both tasks. Furthermore,
259
Emmorey (1987) observed impaired discrimination of stress contrasts between noun compounds and noun phrases after damage to the left hemisphere whereas patients with right-sided lesions performed as well as normal control subjects. Predominant disturbance of linguistic prosody comprehension concomitant with relatively preserved processing of emotional intonation in patients with damage to the left hemisphere has also been reported by Pell and Baum (1997a) as well as Geigenberger and Ziegler (2001). Discrimination of emotional expressiveness yielded a significant increase of hemodynamic responses within bilateral orbitofrontal cortex (BA 11/47) as compared to the linguistic task indicating, thus, a specific contribution of this region to the evaluation of emotional aspects of verbal utterances conveyed by the tone of speech. On the basis of neuroanatomical considerations, e.g., reciprocal fiber connections to sensory cortices and limbic regions, this region might serve as a substrate for the judgment of emotional stimuli independent of the stimulus modality (Price, 1999). Accordingly, activation of the orbitobasal frontal cortex has been observed in preceding functional imaging studies during perception of emotional intonation (George et al., 1996; Wildgruber et al., 2002), emotional facial expressions (Blair, Morris, Frith, Perret, & Dolan, 1999; Nakamura et al., 1999), and affective gustatory judgments ( Small et al., 2001). Moreover, patients suffering from unilateral focal damage to this area displayed impaired identification of emotional face and voice expressions whereas performance in nonemotional control tasks (i.e., discrimination of unfamiliar voices and recognition of environmental sounds) was found uncompromised (Hornak, Rolls, & Wade, 1996; Hornak et al., 2003; Rolls, 1999). These observations, in line with the results of the present study, support the assumption that orbitofrontal areas contribute to the explicit evaluation of emotional information conveyed by different communicational channels. Blair and Cipolatti supposed this region to be critically involved in building associations between the perceived emotional signals and an emotional episodic memory. In patients suffering from lesions of orbitofrontal cortex, pronounced abnormalities of social
behavior have been observed (Levin, Eisenberg, & Benton, 1991; Blair and Cipolatti, 2000; Wildgruber et al., 2000), resulting, conceivably, from compromised associations between actual environmental stimuli with emotional memory traces. In conclusion, hemispheric specialization for higher level processing of intonation contours has been found to depend, at least partially, upon the functional role of the respective acoustic signals within the communication process: Comprehension of linguistic aspects of speech melody relies predominantly upon left-sided perisylvian language areas, whereas the evaluation of emotional signals, independent of modality and emotion type, is bound to bilateral orbitofrontal regions. As a third step of prosody processing, thus, explicit evaluation of emotional prosody seems to be associated with bilateral inferior aspects of frontal cortex including the orbitobasal surface (BA 47/11).
Connectivity within the prosody network So far, three successive steps of prosody processing have been identified: (1) extraction of suprasegmental acoustic information, (2) representation of suprasegmental sequences, and (3) explicit judgment of emotional information. As concerns the respective neuroanatomical correlates, extraction of suprasegmental acoustic information seems to be predominantly bound to the right primary and secondary auditory regions. Presumably, the relevant acoustic information is transferred from these regions via direct fiber connections to an area within the posterior superior temporal sulcus (post-STS) subserving the representation of meaningful intonational sequences. In case of explicit judgment of emotional prosody, a further temporofrontal passage of information must be assumed accounting for the observed activation of bilateral inferior frontal cortex during this task. It should be emphasized, furthermore, that converging results from lesion studies (Hornak et al., 1996, 2003; Ross, Thompson, & Yenkosky, 1997) and functional imaging examinations (Imaizumi et al., 1997; Pihan, Altenmu¨ller, Hertrich, & Ackermann,
260
2000; Wildgruber et al., 2002, 2004) suggest a contribution of these areas to the processing of emotional prosody, and an intact transcallosal communication of information has been assumed to be a prerequisite for comprehension of emotional prosody (Ross et al., 1997). It is unclear, however, whether this cooperation of the two hemispheres is based on a sequence of processing steps or if both frontal lobes receive the respective information independently via parallel connections from the right posterior temporal cortex. In order to investigate the connectivity architecture of the cerebral network involved in the processing of emotional prosody, a further experiment was carried out. Twenty-four healthy right-handed subjects (11 males, 13 females, mean age 24.4 years) underwent event-related fMRI measurements while rating the emotional valence of either prosody or semantics of 162 binaurally presented emotional adjectives (54 neutral, 54 positive, 54 negative content) spoken in happy, neutral, or angry intonation by six professional actors (3 females/3 males). The adjectives were selected from a sample of 500 adjectives on the basis of ratings obtained from 45 healthy German native speakers (see Kissler et al., this volume) along the dimensions of valence and arousal on a nine-point self-assessment manikin scale (SAM, Bradley and Lang, 1994). The stimuli comprised 54 highly arousing positive (mean arousal rating 44, mean valence rating o4, e.g., ‘‘verfu¨hrerisch’’ ¼ alluring), 54 highly arousing negative (mean arousal 44, mean valence rating 46, e.g., ‘‘panisch’’ ¼ panic), and 54 low-arousing neutral (mean arousal rating o4, mean valence rating between 4 and 6, e.g., ‘‘breit’’ ¼ broad). During separate functional imaging sessions, subjects had been asked to judge either the valence of emotional word content or the valence of emotional prosody on the nine-point SAM scale. Both the order of within-session stimulus presentation and the sequence of sessions were pseudorandomized across subjects. To assess functional connectivity of activated regions, the novel technique of dynamic causal modeling (Friston et al., 2003) was applied to the data. This approach allows inferences on (1) the parameters representing influence of experimentally designed inputs, (2) the intrinsic coupling
of different brain regions, and (3) modulation of this coupling by experimental factors (for methodological details see Ethofer et al., 2006b). Using this technique, the following hypotheses were evaluated: (a) Within the network of regions characterized by task-dependent activation, the postSTS serves as input region (receiving input from primary and secondary acoustic regions). (b) The frontal lobes, consecutively, receive their input from the post-STS. Moreover, it was assessed whether both frontal lobes subserve two successive processing steps or receive their information independently from the right post-STS via parallel pathways. Conventional analysis of the fMRI data yielded, in very good accordance with prior investigations (Wildgruber et al., 2004, 2005), activation within the right posterior STS and bilateral inferior frontal cortices during evaluation of emotional prosody. Subsequent determination of functional connectivity revealed that the activation cluster within the right post-STS represents the most likely input region into this task-specific network. This finding is in agreement with the assumption that this region subserves representation of suprasegmental sequences and receives direct input from primary and secondary acoustic regions. To investigate the intrinsic connectivity pattern within the network, dynamic causal models assuming parallel, serial, or fully bidirectional connectivity patterns were compared. The model based upon parallel projections from the posterior STS to the frontal cortical regions turned out to be significantly superior to both serial models as well as the model with bilaterally connected brain regions (Fig. 4a). In a post hoc analysis, an attempt was made to optimize this parallel pathway model by adding either unidirectional or bidirectional connections between the two frontal regions or adding unilateral or bidirectional backward projections from the frontal areas to the right posterior STS. The original parallel pathway model again was found to be significantly superior to all
261
Fig. 4. (a) To evaluate the intrinsic connectivity of regions contributing to the processing of emotional prosody, four different models were compared. (Model 1) Parallel transmission from the right post-STS to both frontal regions. (Model 2) Successive conductance from post-STS to right IFG and further on to the left IFG. (Model 3) Serial conductance from post-STS to left IFG and right IFG. (Model A) Fully connected bidirectional flow of information. Based upon a prior analysis, in all these models external inputs were specified to enter the network via the right post-STS. Dynamic causal modeling revealed a statistical superiority of the parallel processing model (Model 1) as compared to all other models (Ethofer et al., 2006b). (b) Based on the these findings it is assumed that explicit judgment of emotional prosody is carried out in at least three successive steps: (1) extraction of suprasegmental information bound to predominantly right-sided primary and secondary acoustic regions, (2) representation of meaningful suprasegmental sequences within the right post-STS, and (3) explicit emotional judgment of acoustic information within the bilateral inferior frontal cortices.
262
alternative models. These results provide further empirical support for the hypothesis that processing of emotional prosody is carried out in three successive steps: (1) extraction of suprasegmental acoustic information bound to predominantly right-sided primary and higher order acoustic regions, (2) representation of meaningful suprasegmental sequences within the right post-STS, and (3) explicit emotional judgment of acoustic information within the bilateral inferior frontal cortices (Fig. 4b).
Implicit processing of emotional prosody During everyday interactions among humans, as a rule, the emotional connotations of communicative signals are not explicitly evaluated on a quantitative scale. Rather, highly automatized understanding of the emotional information conveyed by facial expressions, speech prosody, gestures, or the propositional content of verbal utterances seems to be much more important. A variety of empirical data indicate different cerebral pathways to be involved in explicit and implicit processing of emotional signals (LeDoux, 1996; Anderson and Phelps, 1998; Adolphs and Tranel, 1999; Critchley, 2000; Adolphs et al., 2002). As concerns the hemodynamic responses bound to specific emotional categories, a selective contribution of the amygdala to recognition of fearful voices has been assumed on the basis of lesion data (Scott, Young, Calder, & Hellawell, 1997) and prior PET studies (Phillips et al., 1998; Morris et al., 1999). Furthermore, a specific contribution of the anterior insula and the basal ganglia to the perception of vocal expressions of disgust has been predicted based on clinical findings (Pell and Leonhard, 2003) and functional imaging experiments during processing of facial expressions (Sprengelmeyer et al., 1998; Phan et al., 2002; Wicker et al., 2003). Responses of the amygdalae have been observed to depend on implicit processing of emotional signals, e.g., during passive listening tasks, whereas explicit judgments of emotional expressions were shown to result in deactivation of this region (Morris et al., 1999; Critchley et al., 2000; Adolphs 2002). As a
consequence, implicit transmission of emotional information by the induction of physiological emotional reactions, e.g., changes of heart rate and skin conductance, might be linked to emotion-specific subcortical regions, whereas the explicit evaluation of emotional signals based on the retrieval of information from emotional memory appears to be processed within bilateral inferior frontal areas, irrespective of emotion type and valence of the stimuli. In order to evaluate the neural basis of implicit processing of emotional prosody, a crossmodal interaction experiment was conducted (for methodological issues of cross-modal interaction experiments see Ethofer et al., this volume). This experiment was designed to test the following two predictions:c (a) Simultaneous presentation of emotional faces and emotional prosody induces distinct interaction effects: explicit judgment of facial expressions is influenced by implicit processing of unattended emotional prosody. (b) The impact of an unattended fearful tone of speech on explicit judgment of emotional faces is associated with activation of the amygdala. During this experiment, images of facial expressions taken from the Ekman and Friesen battery (Ekman and Friesen, 1976) were presented to 12 healthy right-handed subjects (7 males, 5 females, age: 19–29 years). Using digital morphing techniques, a series of visual stimuli was generated extending in facial expression from 100% fear to 100% happiness in incremental steps of 25% (Perret et al., 1994). In one run of the experiment, the facial expressions were shown in isolation, and in another trial they were combined with acoustic stimuli, i.e., short declarative sentences spoken in a fearful or happy tone by two professional actors (one male, one female). In both of these runs, participants were instructed to rate the emotional valence of the displayed facial expressions. A third run of the experiment required explicit judgment of emotional prosody. The behavioral results show that subjects rated fearful and neutral facial expressions as being more fearful when presented concomitant with a fearfully spoken sentence as
263
Fig. 5. (a) Implicit impact of fearful prosody on judgment of emotional faces: (left) valence rating of facial expressions (mean7standard error) presented without acoustic stimuli (white bars) and in combination with fearful prosody (gray): Evaluation of facial expressions in the presence of a fearful voice as compared to a happy intonation yielded significant activation in the right fusiform gyrus (upper right). Analysis of cross-modal impact of fearful voices revealed significant correlations between individual behavioral changes and hemodynamic responses in the left amygdala (see Ethofer et al., 2006a). (b) Cross-modal integration of emotional communicative signals: (1) Extraction of different communicative signals (prosody, facial expressions, word content) is subserved by the respective modality-specific primary cortices. (2) More complex features of these signals are processed within modality specific secondary regions. (3) As a third step, explicit emotional judgments based on evaluation of associations with episodic emotional memory seem to be linked to the bilateral inferior frontal cortex. This region is assumed to be involved in cross-modal integration during explicit evaluation. On the other hand, emotional signals can yield an automatic (implicit) induction of emotional physiological reaction (e.g., variation of heart rate and skin conductance) that is linked to specific subcortical regions. Presumably, both neural pathways are interconnected at various levels.
264
compared to the no-voice condition. By contrast, no significant shifts in interpretation occurred during presentation of happy expressions (Fig. 5a). Thus, this experimental paradigm might provide a means for quantitative measurements of the implicit impact of emotional prosody on the judgment of facial expressions (de Gelder and Vroomen, 2000). A comparison of happy and fearful intonations during explicit judgment of prosody (unimodal auditory session) did not reveal any significant differences of the hemodynamic cerebral responses. As concerns implicit processing of emotional prosody, however, the middle section of the right fusiform gyrus showed a significantly stronger activation when facial expressions were displayed in the presence of a fearful voice as compared to happy intonation. This region has been named the fusiform face area, because it has been found crucial for the processing of faces in clinical and experimental studies (Puce, Allison, Gore, & McCarthy, 1995; Kanwisher et al., 1997; Barton et al., 2002). Moreover, this region shows stronger activation to emotional as compared to neutral faces (Morris et al., 1998) and seems to respond particularly to stimuli signaling danger (Surguladze et al., 2003). The increased hemodynamic responses within the fusiform gyrus in presence of an auditory expression of threat might reflect enhanced alertness for detection of the respective visual cues, giving rise to shifts in the interpretation of facial expressions. Moreover, comparison of hemodynamic responses with the individual explicit ratings of emotional facial expressions in presence of unattended fearful prosody revealed a significant correlation within the basolateral part of the left amygdala extending into the periamygdaloid cortex. This finding indicates the impact of voice on the processing of faces to be mediated via these anterior temporal structures. In line with this assumption, the amygdala has been observed to modulate neuronal activity in brain regions subserving visual processing (Morris et al., 1998; Davis and Whalen, 2001; Vuilleumier, Richardson, Armony, Driver, & Dolan, 2004), and it has been suggested the left-sided nuclei integrate audiovisual fear-related emotional information into a common percept (Dolan, Morris, & De Gelder, 2001).
Cross-modal integration of emotional communicative signals Emotional information may be conveyed via different communicative channels, e.g., prosodic features of the acoustic speech signal, facial expressions, and propositional content of verbal utterances. Based on the findings presented here, several successive steps during cross-modal integration of emotional signals can be separated and assigned to distinct cerebral correlates: (1) extraction of communicative signals is subserved by the respective modality-specific primary cortices, (2) modalityspecific higher order regions process emotional information (e.g., prosody ¼ right STS, facial expressions ¼ fusiform face area, propositional meaning ¼ left posterior STG), (3) explicit emotional judgments, presumably involving evaluation of associations with episodic emotional memory, were found to be linked to bilateral orbitofrontal cortex. Implicit processing of emotional signals, however, seems to rely on alternative pathways including emotion-specific subcortical regions involved in automatic physiological reaction (e.g., variation of heart rate and skin conductance). It has been demonstrated, that both pathways of emotion processing influence the behavior of the organism and that unattended processing of emotional information may interact with attended evaluation of emotional communicational signals (Fig. 5b). Future research will be required, however, to further clarify the neuroanatomical basis of interaction effects between implicit and explicit stimulus processing and integration of emotional signals conveyed by various means of communication.
Abbreviations BA fMRI IFC IFG mid-STS
Brodmann area functional magnetic resonance imaging inferior frontal cortex inferior frontal gyrus middle part of the superior temporal sulcus
265
MTG post-STS STG STS
middle temporal gyrus posterior part of the superior temporal sulcus superior temporal gyrus superior temporal sulcus
Acknowledgments The reported studies were supported by the Junior Science Program of the Heidelberger Academy of Sciences and Humanities and the German Research Foundation (DFG WI 2101 and SFB 550 B10).
References Ackermann, H., Hertrich, I., Grodd, W. and Wildgruber, D. (2004) Das Ho¨ren von Gefu¨hlen: funktionell-neuroanatomische Grundlagen der Verarbeitung affektiver prosodie. Aktuelle Neurol., 31: 449–460. Ackermann, H., Hertrich, I. and Ziegler, W. (1993) Prosodische sto¨rungen bei neurologischen erkrankungen: eine literaturu¨bersicht. Fortschr. Neurol. Psychiatr., 61: 241–253. Ackermann, H., Riecker, A., Grodd, W. and Wildgruber, D. (2001) Rate-dependent activation of a prefrontal-insular-cerebellar network during passive listening to trains of click stimuli: an fMRI study. NeuroReport, 18: 4087–4092. Ackermann, H., Riecker, A. and Wildgruber, D. (2006) Cerebral correlates of singing capabilities in humans: clinical observations, experimental-behavioural studies, and functional imaging data. In: Altenmu¨ller, E., Kesselring, J. and Wiesendanger, M. (Eds.), Music, Motor Control, and the Brain. Oxford University Press, Oxford, pp. 205–221. Adolphs, R. (2002) Neural systems for recognizing emotion. Curr. Opin. Neurobiol., 12: 169–177. Adolphs, R., Damasio, H. and Tranel, D. (2002) Neural systems for recognition of emotional prosody: a 3-D lesion study. Emotion, 2: 23–51. Adolphs, R. and Tranel, D. (1999) Intact recognition of emotional prosody following amygdala damage. Neuropsychologia, 37: 1285–1292. Adolphs, R., Tranel, D. and Damasio, H. (2001) Emotion recognition from faces and prosody following temporal lobectomy. Neuropsychology, 15: 396–404. Anderson, A.K. and Phelps, E.A. (1998) Intact recognition of vocal expressions of fear following bilateral lesion of the human amygdala. Neuroreport, 9: 3607–3613. Bachorowski, J.O. and Owren, M.J. (2003) Sounds of emotion: production and perception of affect-related vocal acoustics. Ann. NY Acad. Sci., 1000: 244–265. Banse, R. and Scherer, K.R. (1996) Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol., 70: 614–636.
Barton, J.J.S., Press, D.Z., Keenan, J.P. and O’Connor, M. (2002) Lesions of the fusiform face area impair perception of facial configuration in prosopagnosia. Neurology, 58: 71–78. Baum, S.R. and Pell, M.D. (1999) The neural basis of prosody: insights from lesion studies and neuroimaging. Aphasiology, 13: 581–608. Behrens, S.J. (1985) The perception of stress and lateralization of prosody. Brain Lang., 26: 332–348. Belin, P., Zilbovicius, M., Crozier, S., Thivard, L., Fontaine, A., Masure, M.C. and Samson, Y. (1998) Lateralization of speech and auditory temporal processing. J. Cogn. Neurosci., 10: 536–540. Blair, R.J.R. and Cipolotti, L. (2000) Impaired social response reversal. Brain, 123: 1122–1141. Blair, R.J.R., Morris, J.S., Frith, C.D., Perret, D.I. and Dolan, R.J. (1999) Dissociable neural responses to facial expressions of sadness and anger. Brain, 122: 883–893. Borod, J.C., Bloom, R.L., Brickman, A.M., Nakhutina, L. and Curko, E.A. (2002) Emotional processing deficits in individuals with unilateral brain damage. Appl. Neuropsychol., 9: 23–36. Borod, J.C., Obler, L.K., Erhan, H.M., Grunwald, I.S., Cicero, B.A., Welkowitz, J., Santschi, C., Agosti, R.M. and Whalen, J.R. (1998) Right hemisphere emotional perception: evidence across multiple channels. Neuropsychology, 12: 446–458. Borod, J.C., Zgaljardic, D., Tabert, M.H. and Koff, E. (2001). Asymmetries of emotional perception and expression in normal adults. In: Gainotti G. (Ed.), Handbook of Neuropsychology, 2nd Edition, Vol. 5. Elsevier Science, Amsterdam, pp. 181–205. Bradley, M.M. and Lang, P.J. (1994) Measuring emotion: the self-assessment manikin and the semantic differential. J. Behav. Ther. Exp. Psychiatry., 25: 49–59. Breitenstein, C., Daum, I. and Ackermann, H. (1998) Emotional processing following cortical and subcortical brain damage: contribution of the fronto-striatal circuitry. Behav. Neurol., 11: 29–42. Breitenstein, C., Van Lancker, D., Daum, I. and Waters, C.H. (2001) Impaired perception of vocal emotions in parkinson’s disease: influence of speech time processing and executive functioning. Brain Cogn., 45: 277–314. Buchanan, T.W., Lutz, K., Mirzazade, S., Specht, K., Shah, N.J., Zilles, K. and Ja¨ncke, L. (2000) Recognition of emotional prosody and verbal components of spoken language: an fMRI study. Cogn. Brain Res., 9: 227–238. Calder, A.J., Keane, J., Manes, F., Antoun, N. and Young, A.W. (2000) Impaired recognition and experience of disgust following brain injury. Nat. Neurosci., 3: 1077–1078. Cancelliere, A.E. and Kertesz, A. (1990) Lesion localization in acquired deficits of emotional expression and comprehension. Brain Cogn., 13: 133–147. Canli, T., Desmond, J.E., Zhao, Z., Glover, G. and Gabrieli, J.D. (1998) Hemispheric asymmetry for the emotional stimuli detected with fMRI. Neuroreport, 9: 3233–3239. Charbonneau, S., Scherzer, B.P., Aspirot, D. and Cohen, H. (2003) Perception and production of facial and prosodic emotions by chronic CVA patients. Neuropsychologia, 41: 605–613.
266 Critchley, H., Daly, E., Phillips, M., Brammer, M., Bullmore, E., Williams, S., Van Amelsvoort, T., Robertson, D., David, A. and Murphy, D. (2000) Explicit and implicit neural mechanisms for processing of social information from facial expressions: a functional magnetic resonance imaging study. Hum. Brain Mapp., 9: 93–105. Cutler, A., Dahan, D. and Donselaar, W. (1997) Prosody in the comprehension of spoken language: a literature review. Lang. Speech, 40: 141–201. Darby, D.G. (1993) Sensory aprosodia: a clinical clue to lesions of the inferior division of the right middle cerebral artery? Neurology, 43: 567–572. Davidson, R.J., Abercrombie, H., Nitschke, J.B. and Putnam, K. (1999) Regional brain function, emotion and disorders of emotion. Curr. Opin. Neurobiol., 9: 228–234. Davis, M. and Whalen, P.J. (2001) The amygdala: vigilance and emotion. Mol. Psychiatry, 6: 13–34. de Gelder, B. and Vroomen, J. (2000) The perception of emotions by ear and eye. Cogn. Emotion, 14: 289–311. Dolan, R.J., Morris, J.S. and De Gelder, B. (2001) Crossmodal binding of fear in voice and face. Proc. Natl. Acad. Sci. USA, 98: 10006–10010. Ekman, P. and Friesen, W. (1976) Pictures of Facial Affect. Consulting Psychologists Press, Palo Alto. Emmorey, K.D. (1987) The neurological substrates for prosodic aspects of speech. Brain Lang., 30: 305–329. Ethofer, T., Anders, S., Erb, M., Droll, C., Royen, L., Saur, R., Reiterer, S., Grodd, W. and Wildgruber, D. (2006a). Impact of voice on emotional judgement of faces: an event-related fMRI study. Hum. Brain Mapp (in press). Ethofer, T., Anders, S., Erb, M., Herbert, C., Wiethoff, S., Kissler, J., Grodd, W. and Wildgruber, D. (2006b) Cerebral pathways in processing of emotional prosody: a dynamic causal modelling study. NeuroImage, 30: 580–587. Ethofer, T., Erb, M., Anders, S., Wiethoff, S., Herbert, C., Saur, R., Grodd, W. and Wildgruber, D. (2006c) Effects of prosodic emotional intensity on activation of associative auditory cortex. NeuroReport, 17: 249–253. Friston, K.J., Harrison, L. and Penny, W. (2003) Dynamic causal modeling. NeuroImage, 19: 1273–1302. Gandour, J., Wong, D. and Hutchins, G. (1998) Pitch processing in the human brain is influenced by language experience. NeuroReport, 9: 2115–2119. Geigenberger, A. and Ziegler, W. (2001) Receptive prosodic processing in aphasia. Aphasiology, 15: 1169–1188. George, M.S., Parekh, P.I., Rosinsky, N., Ketter, T.A., Kimbrell, T.A., Heilman, K.M., Herscovitch, P. and Post, R.M. (1996) Understanding emotional prosody activates right hemisphere regions. Arch. Neurol., 53: 665–670. Grandjean, D., Sander, D., Pourtois, G., Schwartz, S., Seghier, M.L., Scherer, K.R. and Vuilleumier, P. (2005) The voices of wrath: brain responses to angry prosody in meaningless speech. Nat. Neurosci., 8: 145–146. Heilman, K.M., Bowers, D., Speedie, L. and Coslett, H.B. (1984) Comprehension of affective and nonaffective prosody. Neurology, 34: 917–921.
Heilman, K.M., Scholes, R. and Watson, R.T. (1975) Auditory affective agnosia: disturbed comprehension of affective speech. J. Neurol. Neurosurg. Psychiatry, 38: 69–72. Hornak, J., Bramham, J., Rolls, E.T., Morris, R.G., O’Doherty, J., Bullock, P.R. and Polkey, C.E. (2003) Changes in emotion after circumscribed surgical lesions of the orbitofrontal and cingulate cortices. Brain, 126: 1691–1712. Hornak, J., Rolls, E.T. and Wade, D. (1996) Face and voice expression identification in patients with emotional and behavioral changes following ventral frontal lobe damage. Neuropsychologia, 34: 247–261. Hugdahl, K. and Davidson, R.J. (2003) The Asymmetrical Brain. MIT Press, Cambridge, London. Hughling-Jackson (1879). On affections of speech from disease of the brain (reprint from Brain 1879). Brain (1915), 38 107–129. Imaizumi, S., Mori, K., Kiritani, S., Kawashima, R., Sugiura, M., Fukuda, H., Itoh, K., Kato, T., Nakamura, A., Hatano, K., Kojima, S. and Nakamura, K. (1997) Vocal identification of speaker and emotion activates different brain regions. NeuroReport, 8: 2809–2812. Ivry, R.B. and Robertson, L.C. (1998) The Two Sides of Perception. MIT Press, Cambridge, MA. Kanwisher, N., McDermott, J. and Chun, M.M. (1997) The fusiform face area: a module in human extrastriate cortex spezialized for face perception. J. Neurosci., 17: 4302–4311. Kesler-West, M.L., Andersen, A.H., Smith, C.D., Avison, M.J., Davis, C.E., Kryscio, R.J. and Blonder, L.X. (2001) Neural substrates of facial emotion processing using fMRI. Cogn. Brain Res., 11: 213–226. Kotz, S.A., Meyer, M., Alter, K., Besson, M., von Cramon, D.Y. and Friederici, A.D. (2003) On the lateralization of emotional prosody: an event-related functional MR investigation. Brain Lang., 86: 366–376. Kucharska-Pietura, K., Phillips, M.L., Gernand, W. and David, A.S. (2003) Perception of emotions from faces and voices following unilateral brain damage. Neuropsychologia, 41: 1082–1090. LeDoux, J. (1996) The Emotional Brain. Simon & Schuster, New York. Lehiste, I. (1970) Suprasegmentals. MIT Press, Cambridge, MA. Levin, H.S., Eisenberg, H.M. and Benton, A.L. (1991) Frontal lobe function and dysfunction. Oxford University Press, New York, pp. 318–338. Mehrabian, A. (1972) Nonverbal Communication. Albine-Atherton, Chicago. Menon, R.S. and Goodyear, B.G. (1999) Submillimeter functional localization in human striate cortex using BOLD contrast at 4 Tesla: implications for the vascular point-spread function. Magn. Reson. Med., 41: 230–235. Meyer, M., Alter, K., Friederici, A.D., Lohmann, G. and von Cramon, D.Y. (2002) FMRI reveals brain regions mediating slow prosodic modulations in spoken sentences. Hum. Brain Mapp., 17: 73–88.
267 Mitchell, R.L.C., Elliot, R., Barry, M., Cruttenden, A. and Woodruff, P.W.R. (2003) The neural response to emotional prosody, as revealed by functional magnetic resonance imaging. Neuropsychologia, 41: 1410–1421. Morris, J.S., Friston, K.J., Bu¨chel, C., Frith, C.D., Young, A.W., Calder, A.J. and Dolan, R.J. (1998) A neuromodulary role for the human amygdala in processing emotional facial expressions. Brain, 121: 47–57. Morris, J.S., Frith, C.D., Perrett, D.I., Rowland, D., Young, A.W., Calder, A.J. and Dolan, R.J. (1996) A differential neural response in the human amygdala to fearful and happy facial expressions. Nature, 383: 812–815. Morris, J.S., Scott, S.K. and Dolan, R.J. (1999) Saying it with feelings: neural responses to emotional vocalizations. Neuropsychologia, 37: 1155–1163. Murphy, F.C., Nimmo-Smith, I. and Lawrence, A.D. (2003) Functional neuroanatomy of emotions: a meta-analysis. Cogn. Affect. Behav. Neurosci., 3: 207–233. Murray, I.R. and Arnott, J.L. (1993) Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am., 93: 1097–1108. Nakamura, K., Kawashima, R., Ito, K., Sugiura, M., Kato, T., Nakamura, A., Hatano, K., Nagumo, S., Kubota, K., Fukuda, H. and Kojima, S. (1999) Activation of the right inferior frontal cortex during assessment of facial emotion. J. Neurophysiol., 82: 1610–1614. Pell, M.D. (1998) Recognition of prosody following unilateral brain lesions: influence of functional and structural attributes of prosodic contours. Neuropsychologia, 36: 701–715. Pell, M.D. and Baum, S.R. (1997a) The ability to perceive and comprehend intonation in linguistic and affective contexts by brain-damaged adults. Brain Lang., 57: 80–99. Pell, M.D. and Baum, S.R. (1997b) Unilateral brain damage, prosodic comprehension deficits, and the acoustic cues to prosody. Brain Lang., 57: 195–214. Pell, M.D. and Leonard, C.L. (2003) Processing emotional tone from speech in Parkinson’s disease: a role for the basal ganglia. Cogn. Affect. Behav. Neurosci., 3: 275–288. Peper, M. and Irle, E. (1997) Categorical and dimensional decoding of emotional intonations in patients with focal brain lesions. Brain Lang., 58: 233–264. Perrett, I., May, K.A. and Yoshikawa, S. (1994) Facial shape and judgements of female attractiveness. Nature, 368: 239–242. Phan, K.L., Wager, T., Tayler, S.F. and Liberzon, I. (2002) Functional neuroanatomy of emotion: a meta-analysis of emotion activation studies in PET and fMRI. NeuroImage, 16: 331–348. Phillips, M.L., Young, A.W., Scott, S.K., Calder, A.J., Andrew, C., Giampietro, V., Williams, S.C.R., Bullmore, E.T., Brammer, M. and Gray, J.A. (1998) Neural responses to facial and vocal expressions of fear and disgust. Proc. R. Soc. Lond., 265: 1809–1817. Pihan, H., Altenmu¨ller, E. and Ackermann, H. (1997) The cortical processing of perceived emotion: a DC-potential study on affective speech prosody. NeuroReport, 8: 623–627.
Pihan, H., Altenmu¨ller, E., Hertrich, I. and Ackermann, H. (2000) Cortical activation patterns of affective speech processing depend on concurrent demands on the subvocal rehearsal system: a DC-potential study. Brain, 123: 2338–2349. Poeppel, D., Guillemin, A., Thompson, J., Fritz, J., Bavelier, D. and Braun, A. (2004) Auditory lexical decision, categorical perception, and FM direction discrimination differentially engage left and right auditory cortex. Neuropsychologia, 42: 183–200. Price, J.L. (1999) Prefrontal cortical network related to visceral function and mood. Ann. NY Acad. Sci., 877: 383–396. Puce, A., Allison, T., Gore, J. and McCarthy, G. (1995) Facesensitive regions in human extrastriate cortex studied by functional MRI. J. Neurophysiol., 74: 1192–1199. Reiterer, S.M., Erb, M., Droll, C.D., Anders, S., Ethofer, T., Grodd, W. and Wildgruber, D. (2005) Impact of task difficulty on lateralization of pitch and duration discrimination. NeuroReport, 16: 239–242. Rolls, E.T. (1999) The functions of the orbito-frontal cortex. Neurocase, 5: 301–312. Ross, E.D. (1981) The aprosodias: functional-anatomic organization of the affective components of language in the right hemisphere. Arch. Neurol., 38: 561–569. Ross, E.D., Thompson, R.D. and Yenkosky, J. (1997) Lateralization of affective prosody in the brain and the callosal integration of hemispheric language functions. Brain Lang., 56: 27–54. Scherer, K.R., Johnstone, T. and Klasmeyer, G. (2003) Vocal expression of emotion. In: R.J. Davidson K.R. Scherer H.H. Goldsmith (Eds.), Handbook of Affective Sciences. Oxford, New York, pp. 433–456. Schirmer, A., Alter, K., Kotz, S. and Friederici, A.D. (2001) Lateralization of prosody during language production: a lesion study. Brain Lang., 76: 1–17. Schmitt, J.J., Hartje, W. and Williams, K. (1997) Hemispheric asymmetry in the recognition of conditional attitude conveyed by facial expression, prosody and propositional speech. Cortex, 33: 65–81. Scott, S.K., Young, A.W., Calder, A.J. and Hellawell, D.J. (1997) Impaired auditory recognition of fear and anger following bilateral amygdala lesions. Nature, 385: 254–275. Sidtis, J.J. and Van-Lancker-Sidtis, D. (2003) A neurobehavioral approach to dysprosody. Semin. Speech Lang., 24: 93–105. Small, D.M., Zatorre, R.J., Dagher, A., Evans, A.C. and JonesGotman, M. (2001) Changes in brain activity related to eating chocolate: from pleasure to aversion. Brain, 124: 1720–1733. Sprengelmeyer, R., Rausch, M., Eysel, U.T. and Przuntek, H. (1998) Neural structures associated with recognition of facial expressions of basic emotions. Proc. R. Soc. Lond. B Biol. Sci., 265: 1927–1931. Starkstein, S.E., Federoff, J.P., Price, T.R., Leiguarda, R.C. and Robinson, R.G. (1994) Neuropsychological and neuroradiologic correlates of emotional prosody comprehension. Neurology, 44: 515–522.
268 Surguladze, S.A., Brammer, M.J., Young, A.W., Andrew, C., Travis, M.J., Williams, S.C.R. and Phillips, M.L. (2003) A preferential increase in the extrastriate response to signals of danger. NeuroImage, 19: 1317–1328. Van Lancker, D. (1980) Cerebral lateralization of pitch cues in the linguistic signal. Int. J. Hum. Commun., 13: 227–277. Van Lancker, D. and Sidtis, J.J. (1992) The identification of affective-prosodic stimuli by left- and right-hemisphere-damaged subjects: all errors are not created equal. J. Speech Hear. Res., 35: 963–970. Vuilleumier, P., Richardson, M.P., Armony, J.L., Driver, J. and Dolan, R.J. (2004) Distant influences of amygdala lesion on visual cortical activation during emotional face processing. Nat. Neurosci., 7: 1271–1278. Weddell, R. (1994) Effects of subcortical lesion site on human emotional behaviour. Brain Cogn., 25: 161–193. Wicker, B., Keysers, C., Plailly, J., Royet, J.P., Gallese, V. and Rizzolatti, G. (2003) Both of us disgusted in My insula: the common neural basis of seeing and feeling disgust. Neuron, 40: 655–664. Wildgruber, D. and Ackermann, H. (2003) Aphasie. In: Brandt, T., Dichgans, J. and Diener, H.C. (Eds.), Therapie und Verlauf neurologischer Erkrankungen. Kohlhammer, Stuttgart, pp. 267–277. Wildgruber, D., Ackermann, H. and Grodd, W. (2001) Differential contributions of motor cortex, basal ganglia and cerebellum to speech motor control: effects of syllable repetition rate evaluated by fMRI. NeuroImage, 13: 101–109. Wildgruber, D., Ackermann, H., Klose, U., Kardatzki, B. and Grodd, W. (1996) Functional lateralization of speech production at primary motor cortex: a fMRI study. NeuroReport, 7: 2791–2795.
Wildgruber, D., Ackermann, H., Klose, U., Kardatzki, B. and Grodd, W. (1998) Hemispheric lateralization of speech production and singing at the level of the motor cortex in fMRI. In: Ziegler, W. and Deger, K. (Eds.), Clinical Phonetics and Linguistics. Whurr, London, pp. 238–243. Wildgruber, D., Erb, M., Klose, U. and Grodd, W. (1997) Sequential activation of supplementary motor area and primary motor cortex during self-paced finger movement in human evaluated by functional MRI. Neurosci. Lett., 127: 161–164. Wildgruber, D., Hertrich, I., Riecker, A., Erb, M., Anders, S., Grodd, W. and Ackermann, H. (2004) Distinct frontal regions subserve evaluation of linguistic and affective aspects of intonation. Cereb. Cortex, 14: 1384–1389. Wildgruber, D., Kischka, U., FaXbender, K. and Ettlin, T. (2000) The Frontal Lobe Score: evaluation of its clinical validity. Clin. Rehabil., 14: 272–278. Wildgruber, D., Pihan, H., Ackermann, H., Erb, M. and Grodd, W. (2002) Dynamic brain activation during processing of emotional intonation: influence of acoustic parameters, emotional valence and sex. NeuroImage, 15: 856–869. Wildgruber, D., Riecker, A., Hertrich, I., Erb, M., Grodd, W., Ethofer, T. and Ackermann, H. (2005) Identification of emotional intonation evaluated by fMRI. NeuroImage, 24: 1233–1241. Zatorre, R.J. (2001) Neural specializations for tonal processing. Ann. NY Acad. Sci., 930: 193–210. Zatorre, R.J. and Belin, P. (2001) Spectral and temporal processing in human auditory cortex. Cereb. Cortex, 11: 946–953. Zatorre, R.J., Belin, P. and Penhune, V. (2002) Structure and function of auditory cortex: music and speech. Trends Cogn. Sci., 6: 37–46.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 14
Affective and linguistic processing of speech prosody: DC potential studies Hans Pihan1,2, 1 Department of Neurology, Schulthess Klinik, 8008 Zurich, Switzerland Department of Neurology, Inselspital, University of Bern, 3010 Bern, Switzerland
2
Abstract: Speech melody or prosody subserves linguistic, emotional, and pragmatic functions in speech communication. Prosodic perception is based on the decoding of acoustic cues with a predominant function of frequency-related information perceived as speaker’s pitch. Evaluation of prosodic meaning is a cognitive function implemented in cortical and subcortical networks that generate continuously updated affective or linguistic speaker impressions. Various brain-imaging methods allow delineation of neural structures involved in prosody processing. In contrast to functional magnetic resonance imaging techniques, DC (direct current, slow) components of the EEG directly measure cortical activation without temporal delay. Activation patterns obtained with this method are highly task specific and intraindividually reproducible. Studies presented here investigated the topography of prosodic stimulus processing in dependence on acoustic stimulus structure and linguistic or affective task demands, respectively. Data obtained from measuring DC potentials demonstrated that the right hemisphere has a predominant role in processing emotions from the tone of voice, irrespective of emotional valence. However, right hemisphere involvement is modulated by diverse speech and language-related conditions that are associated with a left hemisphere participation in prosody processing. The degree of left hemisphere involvement depends on several factors such as (i) articulatory demands on the perceiver of prosody (possibly, also the poser), (ii) a relative left hemisphere specialization in processing temporal cues mediating prosodic meaning, and (iii) the propensity of prosody to act on the segment level in order to modulate word or sentence meaning. The specific role of top-down effects in terms of either linguistically or affectively oriented attention on lateralization of stimulus processing is not clear and requires further investigations. Keywords: DC potentials; slow potentials; prosody; hemisphere specialization; emotion; acoustics; digital resynthesis; nonverbal communication frequency-related information such as pitch, (ii) temporal information such as speech rate or speech rhythm, and (iii) loudness. Misclassifications of affective prosodic impressions are based on certain similarities between emotions that result from similar quantitative expression of acoustic cues. For example, happiness and despair are more likely to be confused than happiness and sadness because both emotions are characterized by high values of pitch. These observations gave rise to
Biological theories of prosody processing Prosody or speech melody subserves linguistic functions in speech communication, such as question versus statement intonation, and mediates emotional speaker states. Prosodic perception is based on decoding of acoustic cues mediating (i) Corresponding author. Tel.: +41-44–385-7171; Fax: +41-44385-7538; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56014-5
269
270
dimensional models that conceptualize emotional perception as a result of multidimensional evaluation (Frijda, 1969). Typical dimensions used are activation (e.g., highly aroused vs. calm) and valence (e.g., pleasant vs. unpleasant). Acoustic correlates of these dimensions have been characterized (Frick, 1985; Ladd et al., 1985; Bergmann et al., 1988; Tischer, 1993; Banse and Scherer, 1996). Fundamental frequency (F0), for example, is the lowest frequency present in sound during phonation. It represents a physical correlate of perceived pitch and indicates speaker activation. Voice dynamics can be characterized by measurement of fundamental frequency variation (F0-variability, F0-range). Its expression in the extreme either reflects a melodic speech expressing, for example, happiness or a flat and poorly modulated tone of voice typically mediating the impression of sadness. Temporal parameters such as the variation of vowel length also indicate dynamic voice properties either typifying happiness (high variation) or amplifying the impression of sadness (low variation). Perception studies on affective prosody further demonstrated that judgments on dimensional categories are based on different time scales. For example, during perception of a sentence recognition of speaker activity was reliably completed after the first two words, whereas attribution of valence continued to develop throughout the whole utterance (Tischer, 1993). Thus, evaluation of prosodic meaning can be considered a temporally extended mental process, which integrates different dimensional aspects of intonation into a continuously updated impression. In this review three relevant theories on prosody processing are addressed which discuss different bottom-up versus top-down mechanisms of prosody processing. The functional lateralization hypothesis proposes left hemisphere (LH) specialization for linguistic intonation and right hemisphere (RH) mediation of melodic aspects of affective speech. This specialization is considered relative rather than absolute and is believed to depend on the potency of intonation to vary semantic understanding of speech content. Van Lancker suggested that the degree of LH involvement in prosodic processing depends on the extent to which prosodic cues interact with segmental
linguistic information (Van Lancker, 1980). For example, the difference of ‘ ‘hot dog’ versus ‘hot‘dog’ mediated by prosody would require LH mechanisms for perception whereas question and statement intonations (‘hot ‘dog?’ vs. ‘hot ‘dog!’) would predominantly involve the RH. Support for this hypothesis has been found in lesion studies and in recent imaging experiments (Pell and Baum, 1997b; Wildgruber et al., 2004; Wong et al., 2004). The differential specialization hypothesis suggests preferential RH processing of pitch patterns irrespective of linguistic or emotional function and LH specialization in high frequent temporal analysis relevant for decoding speech sounds. This cortical asymmetry is assumed to result from the need to optimize processing of the acoustic environment in both temporal and frequency domains (Zatorre et al., 2002). Accordingly, a right-lateralized activation was observed when pitch patterns were processed (Zatorre et al., 1992). Patients who underwent resection of parts of the primary and secondary acoustic cortex in the right temporal lobe were found to have elevated thresholds when judging the direction of pitch change (Johnsrude et al., 2000). In contrast, a predominant left hemisphere involvement was found when rapid frequency changes such as fast formant transitions, which promote consonant detection, are processed in the auditory cortex (Zatorre et al., 1992; Belin et al., 1998; see review in Tervaniemi and Hugdahl, 2003). A related version of the differential specialization hypothesis, the acoustic cue hypothesis, suggests a functional dichotomy between the right and left hemisphere related to pitch (RH) and temporal processing (LH) of complex auditory stimuli. Patients with RH lesions were found to make no use of F0-variability, but rather relied on duration cues to assess affective prosody (Van Lancker and Sidtis, 1992). Robin and co-worker (Robin et al., 1990) observed right temporoparietal lesions to disrupt the discrimination of tones, but not the perception of time patterns, while lesions in the homologous regions of the LH had opposite effects. In accordance, a study using positron emission tomography (PET) found complementary LH and RH auditory cortex activation in
271
response to temporal and spectral variation of two tones (Zatorre and Belin, 2001). These data support the idea of a pitch module located in the RH, representing a hard-wired right hemisphere specialization for nonverbal acoustic communication. As a consequence, the expected contribution of the RH to the processing of affective speech prosody might reflect the discrimination of pitch contours rather than the evaluation of emotional significance of verbal utterances.
stimulus (S1) that was followed by an imperative stimulus (S2) initiating a motor response. The negative DC potential occurring within S1–S2 intervals of up to 2 s was termed contingent negative variation (CNV). Later investigators using longer S1–S2 intervals and focusing on cognitive, motor preparatory (e.g., the ‘Bereitschaftspotential’), and motivational processes used the term slow cortical potential or DC potential. For a review of this method see Rockstroh et al. (1989) or Altenmu¨ller and Gerloff (1998).
DC potentials and prosodic evaluation In speech communication, mental activity is maintained over seconds when changes of speaker emotion or variations of sentence meaning must be inferred from speech melody. These cognitive processes strongly involve cortical networks. Sustained cortical activation can directly be assessed by measuring DC (direct current, slow) components of the EEG signal. They presumably are generated by sustained excitatory input to apical dendrites of cortical pyramidal cells. Cortical activation patterns obtained with this method are task specific and intraindividually reproducible (Altenmu¨ller, 1986; Lang et al., 1988; Altenmu¨ller et al., 1993; Pihan et al., 1997, 2000). Microelectrode recordings demonstrated a close relationship between local field potentials generated in particular from cells geometrically arranged in the cortex, such as pyramidal cells, and changes of the blood oxygen level dependent (BOLD) signal (for a review see Logothetis and Pfeuffer, 2004). Results from these combined imaging and intracortical recordings support the assumption that local intracortical processing is reflected by both an extracranial recordable slow potential and a correlating BOLD response. DC potential investigations in affective prosody processing presented here were designed in analogy to the ‘two stimulus paradigm’: Two stimuli (S1 and S2) are presented in succession with discriminable acoustic features, separated by a short time interval. Perception of S1 prepares subjects for perception of a deviant acoustic structure of S2. A verbal response is required after presentation of S2. The original experiments were performed by Walter (Walter et al., 1964) who presented a warning
Processing of affective prosody mediated by pitch and temporal cues Affective prosodic discrimination: predominant RH involvement There is clinical and experimental evidence that the RH has an advantage over the left in extracting pitch information from complex auditory stimuli. Patient with RH lesions were found to make no use of F0-variability, but rather relied on duration cues to assess affective prosody (Van Lancker and Sidtis, 1992). Furthermore, pitch analysis processors seem to be located within the right temporal lobe. Robin (Robin et al., 1990) observed that right temporoparietal lesions disrupted the discrimination of tones but not the perception of time patterns, while lesions in the homologous regions of the LH had opposite effects. Patients who underwent resection of parts of the primary and secondary acoustic cortex in the right temporal lobe were found to have elevated thresholds when judging the direction of pitch change (Johnsrude et al., 2000). An imaging study using PET demonstrates a right-lateralized activation when pitch patterns were processed (Zatorre et al., 1992). Conceivably, therefore, the expected contribution of the RH to the processing of affective speech prosody reflects the discrimination of pitch contours rather than the evaluation of the emotional meaning of an utterance. In order to investigate specific contributions of each hemisphere to prosody processing, control tasks and/or appropriate control stimuli are needed in order to evaluate bottom-up effects of acoustic cue processing and top-down effects of emotional
272
or linguistic engagement. All the more so as research in this field faces a corpus of experimental data which could not clearly support a differential LH and RH involvement in prosody processing (Schlanger et al., 1976; Pell and Baum, 1997a, b; Stiller et al., 1997; Imaizumi et al., 1998; Pell, 1998, 2006; Kotz et al., 2003; Doherty et al., 2004). The first experiment reported here was designed to test bottom-up effects of physical stimulus characteristics according to the acoustic cue hypothesis. Stimuli were presented as sequences of two successive utterances, each of them with identical wording. The paired stimuli represented variants of the same prosodic category (happy, sad, or neutral) differing perceptually in emotional intensity or arousal. Effects of varying intensity or arousal were created by digital manipulation of either pitch range or duration of stressed vowels. Pitch contours of the original utterances were extended or reduced relative to the sentence-final F0-level being kept constant. Fig. 1(a) explicates the procedure and presents the average of maximum F0-range and its standard deviation for all resynthesized stimuli. A copy of the original utterances was used in which emotional intensity was altered by varying vowel duration of stressed syllables. From each original sentence, variants with short, middle, or long duration of stressed vowels were obtained by either cutting out or doubling single pitch periods of the acoustic signal. Fig. 1(b) exemplifies the time manipulation and its effect on total stimulus duration. Stimulus pairs were presented in pseudorandomized order. As compared to the first sentence the second item of each pair varied either in pitch range or in duration of stressed vowels. Wording and emotional intonation were identical. At perceptual evaluation, the pitch- and time-manipulated variants of happy and sad utterances differed in the degree of perceived intensity of the respective emotion. In contrast, sentences with neutral intonation sounded more or less aroused. Healthy subjects specified the emotional category of each stimulus pair (happy, sad, or neutral) and indicated whether the first or the second stimulus was more intense (emotional intonations) or higher aroused (neutral items). An answer was considered ‘correct’ if the sentence with broader F0-range or shorter vowel duration was recognized as ‘more
intense’. In sadly intonated pairs, the utterance with longer syllable duration was expected to be labelled as ‘sadder’. Sixteen healthy, right-handed subjects participated. During stimulus presentation DC potentials were recorded from 26 nonpolarizable AgCl electrodes. The recordings were averaged and analysed relating the DC amplitudes during the presentation periods to a baseline taken from a 1.5 s prestimulus period. As shown in Fig. 2, activation of the second analysis interval was considerably higher when compared to the first one reflecting increasing cognitive demands on cognition and working memory during stimulus perception. Behavioural data demonstrated that vocally portrayed emotions were well recognized. Only few sentences with happy or sad tone were labelled ‘neutral’ (Fig. 3). Discrimination of intensity based on variations of pitch range or duration of stressed vowels yielded less consistent results. Few subjects rated expressiveness at random or contrary to the predictions made. We suggested that this behavioural variability was driven by the inherent ambiguity of terms like ‘happier’ or ‘sadder’ with respect to dimensional specifications: For example, an utterance can be perceived as sadder, if it expresses increasing passivity with low speaker activation (depression) or, in case one thinks of despair, if high speaker arousal and activity is signalled. Even happier might equally denote a more extrovert, highly excited state (e.g. enthusiasm) as well as a rather relaxed, self-content attitude such as placidity. In order to avoid the directing attention of the subjects towards a pure evaluation of the activity dimension and, possibly, towards a decision strategy based on physical stimulus characteristics, response terms did not further specify the underlying emotion. Processing of pitch- and time-manipulated sentence pairs with either happy or sad intonation yielded a similar pattern of DC potentials. In each condition, cortical activation was significantly lateralized towards the RH. Besides right-frontal areas, high activation was also observed over rightcentral electrode positions (Fig. 4). These areas were described to receive projections from the auditory cortex in the temporal plane (Keidel, 1971). Results of this experiment supported neuropsychological assumptions of an RH dominance in
273
Fig. 1. (a) Acoustic signal (upper panel) and three synthetic pitch contours (lower panel) of the test sentence ‘Sie gab nach, und verflogen war der ganze Unmut’, produced with neutral intonation. Voiced signal portions are indicated by bars below the acoustic signal, numbers indicate peak and endpoint frequencies in Hz. In the lower panel, F0-range of the medial pitch contour is slightly increased as compared to the original utterance. All original utterances were subjected to the same F0-range manipulation. Averages of F0-range (highest minus lowest value) over four test sentences of each emotional and synthetic category are listed beneath (standard deviation in parenthesis). (b) Durational variants of the vowel /o/ of the stressed syllable in ‘verflogen’. In the upper acoustic signal, vowel duration has been reduced to 143 ms by cutting out single pitch periods; the middle and lower signals were lengthened to 195 and 232 ms, respectively, by doubling single pitch periods. Accented vowels of each test sentence were shortened/lengthened accordingly constituting a corpus of 4 3 durational variants (short-, medial- or long-vowel duration) within each emotional category. Underneath, averages of total stimulus duration are listed (standard deviation in parenthesis). From Pihan et al. (2000).
274
Fig. 2. Time course of a single trial and averaged DC potential (electrode position FC4). Mean amplitudes within periods 1 and 2 provided the basis for data normalization and statistical evaluation. Evoked potentials within the first second of stimulus presentation might partly reflect unspecific activation and orientation responses, and, therefore, were not analysed. The variable end of each utterances lies within the light part of the grey bar. From Pihan et al. (2000).
Fig. 3. Percentage of correct answers across the subject group with respect to the identification of ‘emotional category’ (left panel) and the evaluation of ‘expressiveness’ (right panel). F0 varied stimulus conditions are indicated by the suffix ‘-P’, duration-manipulated conditions are marked ‘-T’. From Pihan et al. (1997).
emotion processing (Borod, 1992). As for prosody processing, acoustic stimulus processing in the temporal plane as well as cognitive-affective evaluation were lateralized to the RH. No support was found for the acoustic cue hypothesis. A differential effect of pitch- and temporal-based manipulation of
emotional intensity was only observed in neutral stimulus pairs. Since the subjects correctly identified those items as neutral utterances, the observed RH lateralization during processing of pitch-manipulated neutral sentences cannot be explained by an RH activation of stored emotional representations.
275
Fig. 4. (a) Grand average data are plotted into a black-and-white map with grey levels representing mean amplitude values of the second presentation period. Highest amplitude values were recorded at right-frontal and right-central electrode positions. (b) Plot of normalized amplitudes of all six stimulus conditions. Only neutrally intonated sentences differed significantly with respect to potentials’ amplitudes (po0.001) and distribution (po0.001) as revealed by contrast analysis. (c) Average data map from the processing of F0 and duration varied pairs of sentences (second presentation period). From Pihan et al. (1997).
Psychoacoustic studies demonstrated that mean F0 and F0-range modulate emotional intensity as continuous variables. High values correspond to high arousal, low values indicate low speaker activity (Ladd et al., 1985; Bergmann et al., 1988). In consideration of these observations our activation patterns suggest that evaluation of prosodic information as performed by the RH might extend beyond specific emotions such as joy and sadness to a broader range of a speaker’s inner states including the level of arousal. This interpretation supports earlier suggestions saying that components of emotional behaviour such as psychophysical activation and autonomic responses are mediated predominantly by RH mechanisms (for further information on this topic see Gainotti et al., 1993; Bradley and Lang, 2000). A corresponding right-lateralized activation pattern was observed for stimuli with pitch and temporal variation of intensity. However, these findings do not exclude a differential LH and RH involvement in pitch and temporal cue processing. Prosodic perception comprises complex acoustic analyses including temporal and
spectral components. An event-related design such as ours in which subjects are not aware of the relevant acoustic cue may provoke an activation of all mechanisms competent for prosodic evaluation. Thus, possible lateralization effects of temporal and spectral cue processing may have been masked. Similar results were found in a functional magnetic resonance imaging (fMRI) study that used the same experimental design and identical stimuli. As shown in Fig. 5, the temporal pattern of hemodynamic responses revealed increasing lateralization to the right hemisphere during processing of the second sentence. In addition, activation of mesial regions of both hemispheres, including anterior and posterior cingulate gyrus, as well as supplementary motor areas could be demonstrated. Thus, fMRI yielded the same pattern of right-lateralized activation with frontal preponderance that was found in the DC experiment. These results support the suggestion that blood oxygen level dependent (BOLD) signals from the cortex and DC potentials arise from similar electrochemical processes (see Logothetis and Pfeuffer, 2004).
276
Fig. 5. Temporal sequence of fMRI activation across all subjects and stimuli conditions. Presentation of the stimuli, first sentence indicated in yellow and second sentence in red, is shown at the left of the time axis. The assumed period of corresponding fMRI activation under consideration of the haemodynamic delay of approximately 3–6 s is shown at the right side of the time axis. Areas showing significant activation during the successive time intervals as compared to the baseline periods are projected on the lateral (left) and mesial (right) surface of the template brain. The pictogram at the very right shows the corresponding contrast function for each temporal interval. Successively analysed images are indicated with an arrow, baseline images as black bars and stimulation periods in upper row (SPM96, n ¼ 12, po0.05 corrected, Z43.72, k48) From Wildgruber et al. (2002).
Inner speech: differential RH and LH involvement in prosody processing Activation patterns described above demonstrating a predominant involvement of the RH were
stable in all subjects but one. This test person showed a bilateral activation with left-frontal preponderance. He indicated that he had repeated the stimuli in parallel with the ongoing presentation using inner speech. The test person assumed that
277
this manoeuvre had facilitated stimulus discrimination. Although it is known that lateralization effects can be masked if linguistic operations are concurrently performed with nonlinguistic tasks (Tompkins and Flowers, 1985), we were surprised that a comparatively easy and quickly automated task such as inner speech would abolish lateralization effects of affective prosody discrimination. Possibly, the strategy of this exceptional test person and the bilateral activation measured indicated a left hemisphere involvement in prosody processing. In order to evaluate the function of inner speech on affective prosody perception, a second group of eight healthy right-handed subjects performed inner speech in addition to the discrimination task. They were instructed to repeat the wording of the utterance they were attending to in parallel to the presentation. As only four semantically different sentences were used this task was immediately learned and completed without effort. Under additional inner speech demands a balanced RH/LH activation pattern arose with leftfrontal preponderance and reverse responses to
pitch- and time-related information. A generally higher cortical activation was elicited in response to time-varied as compared to pitch-manipulated sentences, whereas the reverse pattern was observed during discrimination of the same stimuli (Figs. 6(b) and (c) — lower panel). Behaviourally, there was no significant change in recognition accuracy as compared to discrimination without inner speech. We suggested that the pronounced left hemisphere activation was not attributable to a repeated low-level linguistic performance such as ‘inner speech’. The following line of argumentation was developed: The cognitive function of repeating words ‘in our head’ has been conceptualized in terms of a phonological loop (Baddeley, 1995). It involves two components, a memory store retaining phonological information and an articulatory control process. Paulesu and co-workers (Paulesu et al., 1993) localized the phonological store to the left supramarginal gyrus and the subvocal rehearsal mechanism to Broca’s area. In our study, brain potentials during perception of the second sentence were analysed. Besides inner speech there
Fig. 6. Grand average data of activation amplitudes; (a) spontaneous discrimination; (b) inner speech; (c) plot of normalized amplitudes averaged over all subjects including values of the second presentation/ analysis periods at frontal and temporoparietal electrode positions. Upper panel: Mean activation of LH and RH during spontaneous discrimination (white rhombi) and inner speech (black rhombi). Lower panel: Mean activation (average over LH and RH electrode values) during discrimination of F0 and duration varied stimuli during spontaneous discrimination (inner speech: no) and inner speech (inner speech: yes). From Pihan et al. (2000).
278
was no need for further linguistic stimulus analysis nor for a phonological store during perception of the second sentence. Consequently, differential brain responses towards frequency- and time-related information as shown in Fig. 6(c) were probably not associated with phonological processing. As compared to spontaneous discrimination, prosodic evaluation under additional inner speech demands was paralleled by a shift of cortical involvement from the right hemisphere to a bilateral processing with left-frontal preponderance. Stimulus pairs carrying temporal cues elicited higher activation when compared to pitch-varied sentences. As duration of stressed vowels was altered relative to their absolute length, the manipulations did not change sentence rhythm or stress pattern. Its only function was to signal differences in prosodic intensity. We suggested that inner speech was associated with a bilateral prosodic processing and a differential weighting of acoustic parameters used for evaluation of emotional content: A relative increase of LH activation mediated by temporal cues and a reduced RH processing of frequency-related information. Results from the inner speech experiment are in line with data from lesion studies which found that RH damaged patients preferentially relied on duration cues to make affective judgements whereas LH damaged subjects appeared to make use of F0information (Van Lancker and Sidtis, 1992). Left temporo-parietal lesions impaired the ability of gap detection and pattern perception when sequences of tones were presented while frequency perception was completely preserved. Lesions of homologous regions of the RH had opposite effects with normal processing of temporal information (Robin et al., 1990). The left-frontal activation during inner speech was driven by an inherent functional coupling of acoustic input and verbal output channels. This mechanism provides motor control information in early phase of speech analysis enabling subjects to shadow (i.e. to repeat as fast as possible) speech segments with latencies at the lower limit for execution of motor gestures (Porter and Lubker, 1980). Shadowing studies indicated parallel processes of auditory analysis and generation of an articulatory gesture. Perceived speech is rapidly
represented by neural networks which provide an interpretation of the acoustic structure in terms of a motor programme. Other input channels such as the visual do not seem to have the same coupling strength to verbal output. Shaffer (1975), for example, showed that auditory-vocal shadowing of continuous speech can be successfully combined with visually controlled copy-typing. In contrast, copy-typing of continuous speech was not compatible with a vocal reproduction of a written text, even for highly skilled audio-typists. The RH hypothesis of emotion processing would suggest an independent activation of corresponding left- and right-sided neural networks involved in inner speech (which was performed without effort) and prosodic affective evaluation (which was difficult as shown in Fig. 3). As a consequence, the left-frontal preponderance of brain activation observed in the inner speech experiment does not support the right hemisphere hypothesis. Data presented suggest a bilateral processing of prosody with LH involvement being dependent on articulatory demands. They also indicate a possible articulatory representation of suprasegmental speech characteristics in the left hemisphere. If this is corroborated in future experiments, some results of lesion studies may require reinterpretation. For example, LH damaged patients have been shown to improve in repetition and discrimination of affective prosody under reduced verbal-articulatory load (Ross et al., 1997). This was taken as support for right hemisphere dominance for prosodic processing. However, improvement may also have occurred as a result of decoupling of acoustic input and verbal output channel in the LH, facilitating stimulus processing in the nonaffected RH. Patients with LH or RH lesions might both be impaired in prosodic evaluation: The LH group by a disturbed use of temporal information and the RH group from an impaired processing of frequency patterns. However, as shown by Van Lancker (1992), preferential use of either parameter by LH or RH damaged patients was not sufficient for a good performance. This indicates that pitch range and duration parameters alone are not fully functional in mediating prosodic impressions or in determining differential RH and LH involvement.
279
Biological speech perception/production models that incorporate emotional communication are facing the same problem from different perspectives: The mechanisms of integration/decomposition of suprasegmental and segmental aspects of speech into/from articulatory-phonatory programmes remain unclear. These processes must be expected close to neural structures in the frontal (and maybe parietal) lobe which are functionally connected to articulophonatory programmes. On the level of speech perception, an acoustic-motor interface system in the left hemisphere was suggested by Hickok and Poeppel, (2000). However, besides phoneme processing, speech communication extends to suprasegmental aspects such as emotional intonation. The fact that we are able to speak or to sing a given text and manage to do this in a happy or sad tone of voice is presumably accomplished by a differential contribution of each hemisphere. As for speaking and singing, a recent fMRI study demonstrated differential lateralization effects during covert performance and outlined motor cortex and anterior insula as functionally involved areas (Riecker et al., 2000). Further experiments are needed in order to investigate the functional significance and the neural substrate of articulatory representation of suprasegmental aspects of perceived speech.
Processing of linguistic prosody mediated by pitch direction Speech communication presumably comprises both a phylogenetically young linguistic system for the exchange of prepositional information and an older system for regulation of social behaviour by means of emotional communication (Arndt and Janney, 1991). Affective and linguistic prosodic information is mediated by the tone of voice via different topological features of the acoustic signal: a global pattern of pitch movement related to emotive expressions and local characteristics corresponding to question versus statement intonation (in general at sentence-final location) or to contrastive stress patterns (for example, ‘hot ‘dog’ vs. ‘hotdog’). These topological features are assumed to determine lateralization of prosody
processing. As described in the first section of this chapter, the degree of LH involvement in prosodic processing was suggested to depend on the extent to which prosodic cues interact with segmental linguistic information (Van Lancker, 1980). In certain instances statements and questions are differentiated by pitch contours diverging over large sentence segments. A study investigating the interdependence of sentence focus location and speech intonation demonstrated that in utterances with an initial focus, a clear divergence of question and statement intonations occurred after the initial word (Eady and Cooper, 1986). In statements, the F0 value dropped to a low number for the remainder of the sentence, whereas in questions, F0 showed rather high and increasing values for subsequent words. These findings were replicated by Pell (2001) who demonstrated a stable pattern of fall/rise of F0 contour for statements and questions in utterance with an initial sentence focus and various affective intonations. These examples demonstrate that an intonational feature comprising a whole utterance (here F0 contour) may also appear in a linguistic context and in parallel to other general acoustic characteristics indicating speaker affect. Thus, the separation of intonational features into focal versus global related to linguistic versus affective perception appears inadequate as it does not consider overlapping linguistic and emotive functions of intonation in speech communication. Conceivably, emotional intonations influence the linguistic perception of pitch direction, for example, the discrimination of question versus statement intonation in the tone of voice. Investigation of this topic promises to yield further insight into the neural implementation of prosody processing. Spontaneous discrimination of affective prosody has been shown to provoke a strong rightlateralized activation. Acoustic cue hypotheses as well as differential specialization theory suggest a right hemisphere processing of pitch contour mediating linguistic information (statement or question intonation). Emotional effects of intonation contour may or may not augment right hemisphere involvement. A bilateral or left-lateralized activation pattern would indicate participation of the left hemisphere and would support the functionalist hypothesis. The following study was investigating the
280
cortical activation during linguistic perception of pitch direction of sentences that carried a neutral or emotional intonation (happy, fearful). A total of three semantically different utterances were used. They were recorded from two opera singers with intonational focus on the first word according to Eady and Cooper (1986) and Pell (2001). Stimulus pairs were assembled which differed exclusively with respect to pitch course direction. Acoustic differences were created by means of digital resynthesis of the original utterances. In order to create perceptual contrasts of statement versus question intonations, F0 values of successive voiced sentence segments were changed either by a stepwise increase or decrease of fundamental frequency values (Fig. 7). The relative rise or fall of pitch contour as compared to sentence initial floor values was not significantly different between fearful, neutral and happy items. Sixteen healthy subjects listened to pairs of sentences with happy, fearful or neutral speech melody. Stimulus pairs were presented in pseudorandomized order. As compared to the first sentence the second item of each pair varied in pitch direction constituting a contrast up versus down or up versus horizontal, respectively. Wording and emotional intonation were identical. At perceptual evaluation, the sentence with continuously rising contour was expected to represent an intonational question. Subjects were asked to identify this item. During stimulus presentation DC potentials were
recorded from 64 nonpolarizable AgCl electrodes. The recordings were averaged and analysed relating the DC amplitudes during the presentation periods to a baseline taken from a 1.2 s prestimulus period. Behavioural data showed that subjects were able to perceive intonational question and statement contrasts from pitch direction. The highest discrimination rates were observed for the neutrally intoned sentence pairs. Fearful utterances were less consistently recognized, and items with happy intonation had the lowest number of correct responses. Acoustic stimulus analysis suggested a perceptual interaction of pitch variability associated with the type of emotional intonation and pitch direction, indicating intonational question/ statement contrasts. Higher pitch variability was paralleled by lower linguistic recognition accuracy: Neutral sentences were easier to discriminate than fearful items and happy stimuli were the most difficult to discriminate (Fig. 8). Analysis of DC potentials during processing of the first utterance revealed a significant main effect for emotion (Fig. 9). Highest cortical activation was observed when stimuli with happy intonation were discriminated. Processing of neutral sentence pairs was associated with lowest brain activation. No significant difference was observed between the left and right hemisphere. A post-hoc analysis performed on frontal, central and temporoparietal regions yielded significant effects of emotion over
Fig. 7. Digital resynthesis of F0 direction. Acoustic signal (upper panel) and three synthetic pitch contours (lower panel) of the test sentence ‘She did not believe the entire story’ produced with neutral intonation (female voice). Voiced signal portions are indicated by bars under the corresponding acoustic signal. Numbers at the end of pitch contours indicate sentence final F0 differences (in Hz).
281
Fig. 8. Left and middle: Acoustic stimulus analyses. Bars indicate normalized mean values for pitch (mean F0) and pitch variability (mean F0 SD). Utterances with neutral intonation were taken as a reference in order to control for gender related speaker effects. Both acoustic parameters yielded significant effects of EMOTION in an ANOVA performed on ranked values (both, po0.01); right: Accuracy of linguistic discrimination. Mean value of % correct answers plus standard deviations.
central and temporoparietal regions as indicated in Fig. 9. Again, no significant difference in activation amplitude was observed when left and right hemisphere regions were compared. Previous research from our lab using DC potentials had corroborated RH processing of affective prosody when subjects evaluated valence and emotional intensity (Pihan et al., 1997). In the current study, stimulus processing lateralized to the right hemisphere was not observed. In line with other studies our data reflect bihemispheric involvement (Heilman et al., 1975, 1984; Bryan, 1989; Cancelliere and Kertesz, 1990; Pell and Baum, 1997b; Pihan et al., 1997, 2000; Wildgruber et al., 2002; Kotz et al., 2003; Pell, 2005). Although task demands required focusing on linguistic aspects of intonation, a significant activation main effect of emotional intonation was observed which corresponded to the behavioural results reported above. The data indicated a bihemispheric competence in pitch pattern processing and speak against a strong version of the acoustic cue or differential specialization hypothesis. Post-hoc analysis revealed that the main effect of emotion primarily arose from central and temporoparietal regions. What might be the function of a posterior neural network when concurrent linguistic and emotional pitch effects are processed? In the present study subjects were required to make
Fig. 9. Cortical activation plotted separately for happy, neutral and fearful intonations (normalized values) at frontal (left), central (middle) and temporoparietal (right) electrode positions as indicated in the maps. Post-hoc analysis of the emotion main effect investigated in the respective regions: (n.s.) ¼ nonsignificant; (*) ¼ significant.
282
judgements on linguistic functions of pitch direction. Presumably, interference occurred from emotional pitch effects. It can be assumed that posterior left and right hemisphere networks monitor task-relevant linguistic (or emotive) pitch effects. The main effect of emotion observed in this study may have resulted from differential degrees of task difficulty when performing the discrimination task. Posterior brain regions getting differentially involved in linguistic versus affective intonation processing were recently outlined in an fMRI study (Gandour et al., 2003). However, the specific contribution of left hemisphere regions to linguistic processing of pitch direction needs further clarification.
Conclusion Studies on perception of affective prosody using DC potentials corroborated that the RH has a predominant role in processing emotions from the tone of voice, irrespective of emotional valence. However, the right hemisphere holds a merely relative dominance, both for processing of F0 and for evaluation of emotional significance of sensory input. The role of the left hemisphere is at least complementary for the analysis of intonation. The degree of LH involvement seems to be determined by a number of speech- and language-related conditions such as: articulatory demands on the perceiver of prosody (possibly also the poser); the power of temporal cues mediating prosodic meaning; the propensity of prosody to act on the segment level in order to modulate word or sentence meaning. In addition, posterior neural networks in the left and right hemisphere may monitor taskdependent linguistic (or emotive) pitch effects. The specific contribution of the LH to linguistic processing of pitch direction is not clear and requires further investigations.
Abbreviations DC F0 LH RH
direct current fundamental frequency left hemisphere right hemisphere
References Altenmu¨ller, E. (1986) Hirnelektrische Korrelate der cerebralen Musikverarbeitung beim Menschen. Eur. Arch. Psychiat. Neurol. Sci., 235: 342–354. Altenmu¨ller, E. and Gerloff, Ch. (1998) Psychophysiology and the EEG. In: Niedermeyer, E. and Lopes da Silva, F. (Eds.), Electroencephalography. Williams and Wilkins, Baltimore, pp. 637–655. Altenmu¨ller, E., Kriechbaum, W., Helber, U., Moini, S., Dichgans, J. and Petersen, D. (1993) Cortical DC-potentials in identification of the language-dominant hemisphere: linguistic and clinical aspects. Acta Neurochir., 56: 20–33. Arndt, H. and Janney, R.W. (1991) Verbal, prosodic, and kinesic emotive contrasts in speech. J. Pragma., 15: 521–549. Baddeley, A. (1995) Working memory. In: Gazzaniga, M.S. (Ed.), The Cognitive Neuroscience. MIT, Baltimore, pp. 755–764. Banse, R. and Scherer, K.R. (1996) Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol., 70: 614–636. Belin, P., Zilbovicius, M., Crozier, S., Thivard, L., Fontaine, A., Masure, M.C. and Samson, Y. (1998) Lateralization of speech and auditory temporal processing. J. Cogn. Neurosci., 10: 536–540. Bergmann, G., Goldbeck, T. and Scherer, K.R. (1988) Emotionale eindruckswirkung von prosodischen sprechmerkmalen. Z. Exp. Angew. Psychol., 35: 167–200. Borod, J.C. (1992) Interhemispheric and intrahemispheric control of emotion: A focus on unilateral brain damage. J. Consult. Clin. Psychol., 60: 339–348. Bradley, M.M. and Lang, P.J. (2000) Measuring emotion: Behavior, feeling, and physiology. In: Lane, R.D. and Nadel, L. (Eds.), Cognitive Neuroscience of Emotion. Oxford University Press, New York, pp. 242–276. Bryan, K.L. (1989) Language prosody and the right hemisphere. Aphasiology, 3: 285–299. Cancelliere, A.E.B. and Kertesz, A. (1990) Lesion localization in aquired deficits of emotional expression and comprehension. Brain Cogn., 13: 133–147. Doherty, C.P., West, W.C., Dilley, L.C., Shattuck-Hufnagel, S. and Caplan, D. (2004) Question/statement judgments: an fMRI study of intonation processing. Hum. Brain Mapp., 23: 85–98. Eady, S.J. and Cooper, W.E. (1986) Speech intonation and focus location in matched statements and questions. J. Acoust. Soc. Am., 80: 402–415. Frick, R.W. (1985) Communicating emotion: The role of prosodic features. Psychol. Bull., 97: 412–429. Frijda, N.H. (1969) Recognition of emotion. In: Berkowitz, L. (Ed.), Advances in Experimental Social Psychology. Academic Press, New York, pp. 167–223. Gainotti, G., Caltagirone, C. and Zoccolotti, P. (1993) Left/ right and cortical/subcortical dichotomies in the neuropsychological study of human emotions. Cogn. Emotion, 7: 71–93. Gandour, J., Wong, D., Dzemidzic, M., Lowe, M., Tong, Y. and Li, X. (2003) A cross-linguistic fMRI study of perception
283 of intonation and emotion in Chinese. Hum. Brain Mapp., 18: 149–157. Heilman, K.M., Bowers, D., Speedie, L. and Coslett, H.B. (1984) Comprehension of affective and nonaffective prosody. Neurology, 34: 917–921. Heilman, K.M., Scholes, R. and Watson, R.D. (1975) Auditory affective agnosia. Disturbed comprehension of affective speech. J. Neurol. Neurosurg. Psychiatry, 38: 69–72. Hickok, G. and Poeppel, D. (2000) Towards a functional neuroanatomy of speech perception. Trends Cogn. Sci., 4: 131–138. Imaizumi, S., Mori, K., Kiritani, S., Hosoi, H. and Tonoike, M. (1998) Task-dependent laterality for cue decoding during spoken language processing. NeuroReport, 9: 899–903. Johnsrude, I.S., Penhune, V.B. and Zatorre, R.J. (2000) Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain, 123: 155–163. Keidel, W.D. (1971) DC-potentials in the auditory evoked response in man. Acta. Otolaryngol., 71: 242–248. Kotz, S.A., Meyer, M., Alter, K., Besson, M., von Cramon, D.Y. and Friederici, A.D. (2003) On the lateralization of emotional prosody: an event-related functional MR investigation. Brain Lang, 86: 366–376. Ladd, D.R., Silverman, K.E.A., Tolkmitt, F., Bergmann, G. and Scherer, K.R. (1985) Evidence for the independent function of intonation contour type, voice quality, and F0 range in signaling speaker affect. J. Acoust. Soc. Am., 78: 435–444. Lang, W., Lang, M., Podreka, I., Steiner, M., Uhl, F., Suess, E., Muller, C. and Deecke, L. (1988) DC-potential shifts and regional cerebral blood flow reveal frontal cortex involvement in human visuomotor learning. Exp. Brain Res., 71: 353–364. Logothetis, N.K. and Pfeuffer, J. (2004) On the nature of the BOLD fMRI contrast mechanism. Magn. Reson. Imag., 22: 1517–1531. Paulesu, E., Frith, C.D. and Frackowiak, R.S.J. (1993) The neural correlates of the verbal component of working memory. Nature, 362: 342–345. Pell, M.D. (1998) Recognition of prosody following unilateral brain lesion: influence of functional and structural attributes of prosodic contours. Neuropsychologia, 36: 701–715. Pell, M.D. (2001) Influence of emotion and focus location on prosody in matched statements and questions. J. Acoust. Soc. Am., 109: 1668–1680. Pell, M.D. (2006) Cerebral mechanisms for understanding emotional prosody in speech. Brain Lang, 96: 221–234. Pell, M.D. and Baum, S.R. (1997a) The ability to perceive and comprehend intonation in linguistic and affective contexts by brain-damaged adults. Brain Lang, 57: 80–99. Pell, M.D. and Baum, S.R. (1997b) Unilateral brain damage, prosodic comprehension deficits, and the acoustic cues to prosody. Brain Lang, 57: 195–214. Pihan, H., Ackermann, H. and Altenmu¨ller, E. (1997) The cortical processing of perceived emotion: A DC-potential study on affective speech prosody. NeuroReport, 8: 623–627. Pihan, H., Altenmu¨ller, E., Hertrich, I. and Ackermann, H. (2000) Cortical activation patterns of affective speech
processing depend on concurrent demands on the subvocal rehearsal system: a DC-potential study. Brain, 123: 2338–2349. Porter, R.J. and Lubker, J.F. (1980) Rapid reproduction of vowel-vowel sequences: evidence for a fast and direct acoustic-motoric linkage in speech. J. Speech Hear. Res., 23: 576–592. Riecker, A., Ackermann, H., Wildgruber, D., Dogil, G. and Grodd, W. (2000) Opposite hemispheric lateralization effects during speaking and singing at motor cortex, insula and cerebellum. NeuroReport, 11: 1997–2000. Robin, D.A., Tranel, D. and Damasio, H. (1990) Auditory perception of temporal and spectral events in patients with focal left and right cerebral lesions. Brain Lang, 39: 539–555. Rockstroh, B., Elbert, T., Canavan, A., Lutzenberger, W. and Birbaumer, N. (1989) Slow Cortical Potentials and Behaviour. Urban & Schwarzenberg, Munich. Ross, E.D., Thompson, R.D. and Yenkosky, J. (1997) Lateralization of affective prosody in brain and the callosal integration of hemisphere language functions. Brain Lang, 56: 27–54. Schlanger, B.B., Schlanger, P. and Gerstmann, L.J. (1976) The perception of emotionally toned sentences by right-hemisphere damaged and aphasic subjects. Brain Lang, 3: 396–403. Shaffer, L.H. (1975) Multiple attention in continuous verbal tasks. In: Rabbit, P.M.A. and Dornic, S. (Eds.), Attention and Performance V. Academic Press, New York. Stiller, D., Gaschler-Markefski, B., Baumgart, F., Schindler, F., Tempelmann, C., Heinze, H.J. and Scheich, H. (1997) Lateralized processing of speech prosodies in the temporal cortex: a 3-T functional magnetic resonance imaging study. MAGMA, 5: 275–284. Tervaniemi, M. and Hugdahl, K. (2003) Lateralization of auditory-cortex functions. Brain Res. Brain Res. Rev., 43: 231–246. Tischer, B. (1993) A¨usserungsinterne A¨nderungen des emotionalen eindrucks mu¨ndlicher sprache: dimensionen und akustische korrelate der eindruckswirkung. Z. Exp. Angew. Psychol., XL, 4: 644–675. Tompkins, C.A. and Flowers, C.R. (1985) Perception of emotional intonation by brain-damaged adults: the influence of task processing levels. J. Speech Hear. Res., 28: 527–538. Van Lancker, D. (1980) Cerebral lateralization of pitch cues in the linguistic signal: papers in linguistic. Int. J. Hum. Commun., 13: 200–277. Van Lancker, D. and Sidtis, J.J. (1992) The identification of affective-prosodic stimuli by left- and right-hemisphere-damaged subjects: all errors are not created equal. J. Speech Hear. Res., 35: 963–970. Walter, W.G., Cooper, R., Aldridge, V.J., McCallum, W.C. and Winter, A.L. (1964) The contingent negative variation. An electrical sign of significance of association in the human brain. Nature, 203: 380–384. Wildgruber, D., Hertrich, I., Riecker, A., Erb, M., Anders, S., Grodd, W. and Ackermann, H. (2004) Distinct frontal
284 regions subserve evaluation of linguistic and emotional aspects of speech intonation. Cereb. Cortex, 14: 1384–1389. Wildgruber, D., Pihan, H., Ackermann, H., Erb, M. and Grodd, W. (2002) Dynamic brain activation during processing of emotional intonation: influence of acoustic parameters, emotional valence, and sex. Neuroimage, 15: 856–869. Wong, P.C., Parsons, L.M., Martinez, M. and Diehl, R.L. (2004) The role of the insular cortex in pitch pattern perception: the effect of linguistic contexts. J. Neurosci., 24: 9153–9160.
Zatorre, R.J. and Belin, P. (2001) Spectral and temporal processing in human auditory cortex. Cereb. Cortex, 11: 946–953. Zatorre, R.J., Belin, P. and Penhune, V.B. (2002) Structure and function of auditory cortex: music and speech. Trends Cogn. Sci., 6: 37–46. Zatorre, R.J., Evans, A.C., Meyer, E. and Gjedde, A. (1992) Lateralization of phonetic and pitch discrimination in speech processing. Science, 256: 846–849.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 15
Lateralization of emotional prosody in the brain: an overview and synopsis on the impact of study design Sonja A. Kotz1,2,, Martin Meyer3 and Silke Paulmann1 1
Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, 04103 Leipzig, Germany 2 Day Care Clinic of Cognitive Neurology, University of Leipzig, Leipzig, Germany 3 Department of Neuropsychology, Institute for Psychology, University of Zu¨rich, Zu¨rich, Switzerland
Abstract: Recently, research on the lateralization of linguistic and nonlinguistic (emotional) prosody has experienced a revival. However, both neuroimaging and patient evidence do not draw a coherent picture substantiating right-hemispheric lateralization of prosody and emotional prosody in particular. The current overview summarizes positions and data on the lateralization of emotion and emotional prosodic processing in the brain and proposes that: (1) the realization of emotional prosodic processing in the brain is based on differentially lateralized subprocesses and (2) methodological factors can influence the lateralization of emotional prosody in neuroimaging investigations. Latter evidence reveals that emotional valence effects are strongly right lateralized in studies using compact blocked presentation of emotional stimuli. In contrast, data obtained from event-related studies are indicative of bilateral or left-accented lateralization of emotional prosodic valence. These findings suggest a strong interaction between language and emotional prosodic processing. Keywords: emotion; prosody; lateralization; fMRI; design; patients
the reinforcer is positive or negative. This classification model includes emotions associated with the omission or termination of reinforcers and explains an emotion along a valence (positive vs. negative) and an emotional intensity dimension (high vs. low) shaping the assumption that negative and positive emotions may be processed in partially different neural systems. Pharmacological investigations in the late sixties initially coined the valence hypothesis (Rosadini and Rossi, 1967). However, in particular, the studies by Davidson (1992) shaped a lateralization hypothesis of emotion processing on the basis of the valence dimension of emotion. The valence hypothesis conceptualizes that emotion processing is anchored in both hemispheres, but each hemisphere is specialized for one valence. Several authors proposed that
Emotion and its supporting brain network Previous research postulates that emotions are processed in a complex brain network including the orbitofrontal cortex, the amygdala, the anterior cingulate cortex, and the temporal and subcortical structures (Davidson, 2000). However, not every study of emotional processing reports activation of all these brain regions. Here, differentiating accounts on emotion processing in the brain may shed some light on such differences. Rolls (1999), in a dimensional account of emotion, classified emotions according to their different reinforcement contingencies and to whether Corresponding author. E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56015-7
285
286
the left hemisphere regulates positive emotions, while the right hemisphere drives negative emotions (Robinson and Starkstein, 1989; Davidson, 1992; Gur et al., 1994). It has also been noted in a range of patient investigations that next to cortical representation of emotion, additional brain structures such as fronto-stiatal areas are involved in emotion processing (Sackheim et al., 1982; Morris et al., 1996; Paradiso et al., 1999). However, lateralization of function within these areas has not yet been fully addressed. In contrast to the valence hypothesis, the right hemisphere hypothesis posits that the right hemisphere dominates the left hemisphere for all emotion processing. For example, early behavioral studies reported enhanced left side emotional expressiveness of the face (Sackheim et al., 1978). Strauss and Moscovitch (1981) reported superior identification of emotional facial expression when faces were presented in the left rather than the right visual hemifield. These results were substantiated by evidence of Bowers et al. (1985) reporting that the identification of faces was disrupted after right hemisphere rather than left hemisphere strokes. Lastly, data from split-brain patients showed a response to emotional stimuli but no capacity to verbally describe emotional stimuli (Gazzaniga, 1988). In terms of prosody, as another multipurpose means of expression, clinical studies also buttressed the notion that affective components of language are a dominant function of the right hemisphere. The first clinical study to formally address the neural organization of emotional speech tested the prediction that damage to the right hemisphere seriously disrupts the comprehension of affective aspects of speech (Heilman et al., 1975). They observed that right brain-damaged patients were severely impaired when asked to recognize emotions inserted into linguistically neutral statements. In a series of seminal publications, Ross provided observations supporting the view that the ‘‘functional-anatomic organization in the right hemisphere mirrors that of the propositional language in the left hemisphere.’’ (Ross, 1981, p. 561). Examinations of patients suffering from focal right hemisphere lesions indicated deficient receptive and expressive prosody as well as impaired comprehension of emotional gestures.
Predicated on his investigations, Ross coined the notion of ‘‘aprosodia’’ encompassing all sorts of disorders of affective language (equivalent to the term ‘‘aphasia’’ covering various disturbances of propositional language). To further substantiate this view, a subsequent publication by Ross et al. (1981) reported the case study of a patient suffering from right frontoparietal opercular damage who displayed impaired expression of all alterations in prosody, which signifies emotional speech. On the basis of these clinical studies, Ross concluded that the right hemisphere is dominant for organizing the affective-prosodic components of language and gestural behavior (Ross, 1985). However, all strong lateralization hypotheses have been challenged (Caltagirone et al., 1989; Kowner, 1995). In addition, several variants of the right hemisphere hypothesis have emerged. Some authors put forward that the right hemisphere engages in the perception and expression of emotion, but does not support the experience of an emotion (Adolphs et al., 1999). Murphy et al. (2003) noted a return in current emotion research to the concept of individual neural systems coding distinct dimensions of emotions rather than the concept of an integrated neural emotional system. Tuning into emotional tone — the lateralization of emotional prosody Motivated by the proposed lateralization hypotheses for general emotion processing, lateralization hypotheses for emotional prosodic processing were developed. Taking both, voice and acoustic properties, as well as valence into consideration, three main hypotheses are proposed. Before each of these hypotheses is discussed in turn, a brief discussion into acoustic voice properties will be presented. Voice and acoustic properties of emotion The German saying ‘‘Der Ton macht die Musik,’’ literally means the tone defines how one understands an expression. Thus, this saying often serves as an excellent introduction to prosody for a nonlinguist. Interestingly, the saying suits the purpose of emotional communication as well as it points to
287
the social relevance of appropriately using emotional prosodic cues in daily communication. For example, we use a variety of emotional tones that give the correct meaning to what we are saying. A simple example is the difference between the statement: ‘‘Tim is crazy’’ spoken in a positive or negative tone of voice. Dependent on the acoustic modulation of a voice articulating this sentence, the interpretation of the utterance changes in that it could be evaluated as spoken in happy or negative tone. Modulating the emotional expressions of speech is thus dependent on physical regulation of the voice. According to Scherer (1989), this entails three physiological processes, i.e., respiration that provides flow of air, phonation that transforms flow of air from lungs into sounds, and movement of articulators (jaw, tongue, and palate) that modulate speech sounds. While not investigating lateralization of vocalization per se, Scherer (1989; but see Belin et al., 2000 for a more recent opinion on voice processing in general) proposes that the processes regulating emotional vocalization are controlled by the limbic system — a circuitry composed of cortical and subcortical areas, which are supposed to support emotional processes. The effects of emotional arousal on speech production are primarily due to tonic activation of the autonomic and somatic nervous system. Predicated on the assumption that each emotion has its own acoustic profile, Banse and Scherer (1996) analyzed the acoustic parameters of vocal cues to define acoustic profiles of vocal emotional expression, as modulation of vocal cues is a good indicator of physiological arousal. For example, the vocalization of anger reveals higher fundamental frequency than the vocalization of sadness. Furthermore, intensity measurements reveal louder vocalizations for happy than for sad utterances. However, the issue of an acoustic description of emotion is not trivial. The particular quality of prosodic emotions cannot solely be defined acoustically. In addition, it has been demonstrated that decoding emotional information does not imperatively involve limbic regions. It rather appears that individuals are capable of properly evaluating the emotional color of a spoken utterance with only frontotemporal regions being recruited (Kotz et al., 2003a).
Lateralization hypotheses — clinical evidence Going beyond the acoustic characteristics of emotional vocalization and its potential brain basis in the limbic system and beyond, the last 30 years of emotional prosodic research reflect a quest in search of cortical representation of emotion vocalization. Comparable to the positions put forward in emotion research, three main hypotheses have been formulated. These hypotheses are the result of clinical research and recently neuroimaging research on emotional prosody and will be discussed in turn. The right hemisphere hypothesis is based on primarily receptive but also expressive, clinical studies. In accord with the proposal by Ross (1981), the right hemisphere exclusively operates prosodic cues. He stipulated that the right inferior frontal cortex serves expressive prosody, whereas the right posterior superior temporal cortex mediates receptive prosody. Even though Ross’ model was based on parsimonious empirical data even to date some authors claim that both, linguistic and nonlinguistic (emotional) prosody are processed in the right hemisphere (Bryan, 1989; Dykstra et al., 1995). However, other studies that have investigated both types of prosody only show a right hemisphere preference for emotional prosody (Blonder et al., 1991; Borod, 1993; Starkstein et al., 1994; but see Weintraub et al., 1981; Bradvik et al., 1991 for linguistic prosody). Early patient evidence from identification tasks (Heilman et al., 1975), discrimination tasks (Tucker et al., 1977), and recognition tasks (Bowers et al., 1987) indicates that right temporoparietal lesions result in emotional prosodic deficits. A recent case report using intrasurgical electrocortical stimulation describes a particular sensitivity of the right frontocentral operculum for prosody (Montavont et al., 2005). However, there is ample clinical evidence that challenges the right hemisphere hypothesis for emotional prosodic processing (Van Lancker and Sidtis, 1992; Darby, 1993). Accordingly, the functional hypothesis claims that, dependent on linguistic load, prosodic processing is lateralized either to the left or to the right hemisphere (Van Lancker, 1980). For example, data from sentence-level linguistic prosodic processing show
288
selective influence of the left hemisphere (Van Lancker, 1980; Emmorey, 1987). Thus, the relation of lateralization and prosodic processing forms a continuum. The more linguistic emphasis on the task, the more pronounced is the left hemisphere involvement. In turn, one could speculate that the smaller the linguistic load, the stronger the right hemisphere involvement. That is, if the task emphasizes emotion, such as in emotional categorization, right hemisphere dominance predominates (Bowers et al., 1987). As a consequence, lateralization of function could be influenced by attention to task over the course of an experiment. Van Lancker and Sidtis (1992) as well as Zatorre et al. (2002) put forward a more detailed functional lateralization hypothesis. The so-called parameter dependence hypothesis states that prosody perception is ruled by acoustic parameters such as pitch, duration, and intensity. It was found that pitch is preferably processed in the right hemisphere, whereas duration, rhythm, and intensity are primarily processed in the left hemisphere (see Van Lancker and Sidtis, 1992; Sidtis and Van Lancker-Sidtis, 2003). Such a hypothesis also calls into question whether lateralization due to acoustic properties is prosody specific (linguistic or nonlinguistic; see Zatorre, 1988; Ouellette and Baum, 1993) or linguistic at all. Studies using nonlinguistic tasks revealed a left hemisphere superiority for processing of time structure cues (Carmon and Nachshon, 1971; Robinson and Starkstein, 1990), and a right hemisphere preference for pitch processing (Robin et al., 1990; Sidtis and Feldmann, 1990; Zatorre et al., 1994). Lastly, the valence hypothesis of emotional prosodic processing was put forward stating that lateralization of emotional prosody can be influenced by specific emotional intonation. This hypothesis is based on the clinical evidence on depression. Davidson et al. (1999) reported that patients with a depressive mood process positive emotions in the left hemisphere, but negative emotions in the right hemisphere. However, Ross et al. (1981, 1997), as well as Pell and Baum (1997) reported data speaking against the valence-dependent lateralization of emotional prosody. According to these authors, there is no straightforward clinical evidence supporting the valence hypothesis.
This short survey of empirical clinical evidence and the developed lateralization hypotheses on emotional prosodic processing clearly show that clinical research of emotional and linguistic prosody provides very little convergent evidence that prosody is solely processed in the right hemisphere (also see Baum and Pell, 1999 for similar conclusions). Plausible factors driving such divergence in clinical research may result from (1) variable lesion locations and lesion size, (2) acute or chronic state of testing, (3) secondary deficits such as depression and neglect, (4) stimulus characteristics (Van Lancker, 1980), (5) ill-defined concepts, and (6) task complexity. For example, Tompkins and Flowers (1985) showed that task complexity in an emotional prosodic judgment correlated with lateralization. The more complex a task was, the more the left-hemispheric involvement during emotional prosodic processing. On the basis of these data, the following questions remain to be solved in neuroimaging investigations of emotional prosodic processing in healthy participants: (1) What is the functional specification of the right hemisphere, potentially relevant for prosodic processing in general (both, linguistic and nonlinguistic), and (2) What substantiates a bilateral prosodic network also including subcortical and basal temporal structures?
Lateralization hypotheses — neuroimaging evidence Next to the question of interhemispheric specialization of emotional prosodic processing, the intrahemispheric specialization is a critical factor in our understanding of the brain bases of emotional prosodic processing. As mentioned earlier, the perception of emotional prosody may rely on a distributed brain network. On the basis of patient data, right posterior temporal (parietal) brain regions still seem to be the prime candidate for emotional prosodic processing. However, neuroimaging studies, comparing neutral against emotional prosody, report bilateral activation of the posterior part of the superior temporal cortex (Phillips et al., 1998; Kotz et al., 2003a; Grandjean et al., 2005; but see Wildgruber et al., 2002, 2004, 2005; Mitchell et al., 2003). As discussed
289
elsewhere (Schirmer and Kotz, 2006), latter activation overlaps with areas reported for voice recognition (Belin et al., 2004) suggesting that listeners potentially engage attention toward emotional tone (as realized by voice) rather than neutral tone. Next to the involvement of posterior temporal areas, some patient evidence links right frontoparietal (Starkstein et al., 1994), right orbitofrontal (Hornak et al., 1996), and right frontal brain regions (Breitenstein et al., 1998) to the recognition of emotional prosody. Significant blood flow changes of the right dorsal and ventral prefrontal and orbitofrontal cortex have been reported during emotional prosodic judgment (George et al., 1996; Morris et al., 1999; Gandour et al., 2003; but see Buchanan et al., 2000; Kotz et al., 2003a; Mitchell et al., 2003; Wildgruber et al., 2004, 2005 for bilateral activation). Furthermore, subcortical structures, such as the basal ganglia, seem to be involved in processing emotional prosody (Breitenstein et al., 1998, 2001; Pell and Leonard, 2003; Sidtis and Van Lancker-Sidtis, 2003). To date, only three fMRI experiments report enhanced activation of the basal ganglia during the processing of vocal verbal (Kotz et al., 2005) and nonverbal emotions (Morris et al., 1999; Wildgruber et al., 2002; Kotz et al., 2003a).
Factors influencing the lateralization of emotional prosody Given the short survey on clinical and neuroimaging evidence on the lateralization of emotional prosodic processing, one of the currently unsolved questions in the literature is: What drives lateralization of emotional prosody? As argued by Davidson and Irwin (1999), divergent results could be the consequence of conceptual and methodological differences. Here, we focus on two possible factors, namely differentiation of emotional prosodic processing and methodology, and substantiate the methodological factor by presenting some of our recent data that contrast valence effects under two different fMRI designs — event-related vs. blocked presentation (Donaldson and Buckner, 2001).
One function or multiple processing steps? Recently, we argued that defining emotional prosody as a multistep process rather than a holistic concept would render lateralization of emotional prosody a less controversial issue (for more elaborate information please refer to Schirmer and Kotz, 2006). By dividing emotional prosodic processing into three subprocesses, that is, (1) acoustic analysis, (2) derivation of emotional meaning based on acoustic cues, and (3) evaluation processes two critical aspects influencing lateralization are put forward such as, (1) lateralization may vary as a function of the proposed subprocesses, and (2) lateralization can be substantiated by a brain network supporting these subprocesses beyond the right hemisphere. Taking both, neuroimaging evidence with its high spatial resolution and event-related brain potential (ERP) evidence with its high temporal resolution into consideration, we (Schirmer and Kotz, 2006) proposed the following: (1) acoustic analysis of temporal and frequency properties of a signal is supported by left and right primary and secondary auditory cortices, respectively (Zatorre and Belin, 2001; Liegeois-Chauvel et al., 2004; Zaehle et al., 2004; but see Boemio et al., 2005 for a more fine-grained analysis of the lateralization of temporal properties of the acoustic signal). As reported in two recent fMRI studies, lateralization of basic acoustic information such as temporal information may occur when integration of such information is in focus. Increase in left posterior auditory cortex was reported as a consequence of a perceptual switch from nonspeech to speech perception requiring integration of temporal information (Dehaene-Lambertz et al., 2005; Meyer et al., 2005). When frequency or intensity information is discriminated (i.e., contrasting two tones), a more extensive right-lateralized superior temporal network including the posterior superior temporal gyrus is recruited (Liebenthal et al., 2003). Latter evidence points to the fact that the activation may not result from complex acoustic cue analysis per se, but may occur as a function of discrimination or contrast realized in an experimental setup. The role of task demands (i.e., allocation of attention) in this context awaits further
290
neuroimaging investigations. However, some recent ERP evidence points to such a possibility (Rinne et al., 2005). (2) Attributing emotional significance to a stimulus when comparing emotional and neutral acoustic signals results in activations along the bilateral posterior superior temporal cortex overlapping with areas identified as voice-specific (Belin et al., 2004). However, the functional specification of anterior and middle portions of the superior temporal cortex related to emotional prosodic processing needs further investigations (but see Grandjean et al., 2005). (3) Dependent on the task utilized to test emotional prosodic processing, the right frontal cortex extending into the orbitofrontal cortex (Hornak et al., 2003; O¨ngu¨r et al., 2003) is recruited to explicitly evaluate the significance attributed to an emotional signal in the right posterior temporal cortex. Taken together, starting to specify lateralization as a function of level of processing allows to consolidate some of the controversial clinical and neuroimaging evidence reported above, even though further evidence is needed to substantiate such a working model of emotional prosodic processing.
Methodological factors — design and task We have recently argued that methodological factors may influence lateralization of emotional prosodic processing (Kotz et al., 2003a, b). While it is generally acknowledged that an event-related design is preferred over a blocked design, dependent on the experimental realization, design per se may critically affect lateralization of emotional prosodic processing. For example, it is well known that a blocked design provides enhanced statistical power (Friston et al., 1999), if subtle differences between conditions should be detected in the bold signal as (1) artifacts are more easily detected, and (2) randomization and spacing of critical trials is not necessary (Donaldson and Buckner, 2001). On the other hand, a blocked design suffers from stimulus order predictability and does not allow to sort trial responses for individual response patterns (Carter et al., 1998; Wagner et al., 1998) or as a function of stimulus characteristics (Pu et al., 2001). Scanning the neuroimaging literature, at
least two of seven reports (George et al., 1996; Buchanan et al., 2000) used a blocked design reporting right hemisphere lateralization effects for emotional prosodic processing. Other reports (Wildgruber et al., 2002, 2004, 2005) describe right hemisphere lateralization of frontal/orbitofrontal brain regions for higher order judgment of emotional prosody, but activation of temporal brain regions often is bilateral though with a right hemisphere dominance. In a recent study by Scha¨fer et al. (2005), the authors pose the question whether stimulus arrangement (blocked- and event-related designs) has a strong effect in the context of emotion processing on the basis of the significant difference in detection power of the two design types. Their results show common activation between the design types (bilateral amygdalae, left insula, and left orbitofrontal cortex), but distinct right orbitofrontal and insula activation in the event-related design. In addition, the extent of activation in all brain areas was larger in the event-related design. While the authors point out the advantages of an event-related design, it is also apparent that design type can drive lateralization effects. For this matter, we compared the results of a previous experiment that had utilized a fully eventrelated design (Kotz et al., 2003a) with results of a mixed design, that is, stimulus type (intelligible emotional/nonemotional sentences and unintelligible [filtered] emotional/nonemotional sentences) was presented in blocks, but within blocks, valence (emotional vs. nonemotional sentences) was presented and analyzed in an event-related manner (Kotz et al., 2003b). As the same stimulus material and task (emotional prosodic categorization) was used in the two experiments, the effects of blockedpresentation vs. fully event-related presentation designs can be compared. We previously argued (Kotz et al., 2003a) that the interleaving of stimulus type in an event-related design might influence bilateral and left-accented activation of an emotional prosodic brain network. Given the fact that the task, emotional prosodic categorization, engages the participants to verbally label prosodic contours, the swift change from intelligible to unintelligible (filtered) emotional sentences may have resulted in an effort to understand and categorize
291
unintelligible emotional sentences. In essence, as also argued by Vingerhoets et al. (2003), verbally labeling emotional categories may promote semantic processing and as a result enhance a left-hemisphere effort over a right-hemisphere analysis of emotional prosodic contours. Thus, the evaluation of unintelligible emotional sentences may have become more effortful as a function of task. For that matter, both design and task can critically affect lateralization of emotional prosodic processing. In a first step, we therefore manipulated the design in a second fMRI investigation. The assumption of the follow-up study was that varying the design (from a fully event-related to a blockedpresentation design), but keeping stimulus type, valence, and task consistent, would critically test the influence of design on the lateralization of function (for procedure and recording specifications please
refer to Kotz et al., 2003a). When visually comparing the valence effects (emotional [positive/negative] vs. neutral) of the respective blocked (Fig. 1) and fully event-related design (Fig. 2) design, presenting stimulus type in a blocked manner, while keeping presentation within block event-related, results in a rightward shift of activation for both stimulus types. Valence effects in the blocked-design revealed predominantly activation of the right anterior, middle, and posterior superior temporal region (covering both, the superior temporal gyrus [STG] and superior temporal sulcus [STS]), as well as the right middle frontal gyrus, right anterior insula, and right striatum in the intelligible sentence condition. Valence effects in the unintelligible sentence condition are reflected in bilateral middle STG and right posterior STS activation. In comparison to the results of a fully event-related design (Kotz et al., 2003a),
Fig. 1. Displayed are in an axial view (nonradiological convention, left ¼ left; right ¼ right); the activation patterns for intelligible emotional speech (left) and unintelligible emotional speech (right) for positive valence (top) and negative valence (bottom) from the blocked presentation design. Functional activation was thresholded at ZX3.09 (uncorrected).
Fig. 2. Displayed are in an axial view (nonradiological convention, left ¼ left; right ¼ right); the activation patterns for intelligible emotional speech (left) and unintelligible emotional speech (right) for positive valence (top) and negative valence (bottom) from the event-related design. Functional activation was thresholded at ZX3.09 (uncorrected).
292
which resulted in strongly left-lateralized (though bilateral) emotional valence effects for both stimulus types, the current results of the mixed presentation design clarify the powerful effect of design on the respective brain areas involved in emotional prosodic processing. Lastly, Scha¨fer et al. (2005) discussed the potential effects of task on the visual processing of disgust and fear. Referring to an investigation by Taylor et al. (2003), the authors speculate that an active rather than a passive task can modulate limbic activation with an active task reducing limbic activation. We have followed up this possibility and data from a nonverbal (unpublished data) and verbal investigation (Kotz et al., 2005) on emotion processing indicate that task affects both, activation of critical brain regions during the respective emotional processing and lateralization of these brain regions. Taken together, both factors evaluated here, namely differentiation of emotional prosody into subprocesses and the design type applied in fMRI investigations, have a major impact on our understanding of how emotional prosodic processing is functionally anchored in the brain. Suffice to say, lateralization of function, and emotional prosody in particular, is not a simple concept of the right hemisphere. Future clinical and neuroimaging investigations will considerably improve our knowledge on this function by keeping the critical factors discussed here in perspective.
References Adolphs, R., Russel, J. and Tranel, D. (1999) A role for the human amygdala in recognizing emotional arousal from unpleasant stimuli. Psychol. Sci., 10: 167–171. Banse, R. and Scherer, K. (1996) Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol., 3: 614–636. Baum, S. and Pell, M. (1999) The neural bases of prosody: insights from lesion studies and neuroimaging. Aphasiology, 13: 581–608. Belin, P., Zatorre, R.J., Lafaille, P., Ahad, P. and Pike, B. (2000) Voice-selective areas in human auditory cortex. Nature, 403: 309–312. Belin, P., Fecteau, S. and Bedard, C. (2004) Thinking the voice: neural correlates of voice perception. Trends Cogn. Neurosci., 8: 129–135.
Blonder, L., Bowers, D. and Heilman, K. (1991) The role of the right hemisphere in emotional communication. Brain, 114: 1115–1127. Boemio, A., Fromm, S., Braun, A. and Poeppel, D. (2005) Hierarchical and asymmetric temporal sensitivity in human auditory cortices. Nat. Neurosci., 8: 389–395. Borod, J.C. (1993) Cerebral mechanisms underlying facial, prosodic and lexical emotional expression: a review of neuropsychological studies and methodological issues. Neuropsychology, 7: 445–493. Bowers, D., Bauer, R.M., Coslett, H.M. and Heilman, K.M. (1985) Processing of face by patients with unilateral hemispheric lesions: dissociations between judgments of facial affect and facila identity. Brain Cogn., 4: 258–272. Bowers, D., Coslett, H., Bauer, R.M., Speedie, L. and Heilman, K. (1987) Comprehension of emotional prosody following unilateral hemispheric lesions: processing defect versus distraction defect. Neuropsychologia, 25: 317–328. Bradvik, B., Dravins, C., Holtas, S., Rosen, I., Ryding, E. and Ingvar, D. (1991) Disturbances of speech prosody following right hemisphere infarcts. Acta Neurol. Scand., 84: 114–126. Breitenstein, C., Daum, I. and Ackermann, H. (1998) Emotional processing following cortical and subcortical brain damage: contribution of the frontostriatal circuitry. Behav. Neurol., 11: 29–42. Breitenstein, C., Van Lancker, D., Daum, I. and Waters, C.H. (2001) Impaired perception of vocal emotions in Parkinson’s disease: influence of speech time processing and executive functioning. Brain Cogn., 45: 277–314. Bryan, K. (1989) Language prosody and the right hemisphere. Aphasiology, 3: 285–299. Buchanan, T., Lutz, K., Mirzazade, S., Specht, K., Shah, N., Zilles, K. and Ja¨ncke, L. (2000) Recognition of emotional prosody and verbal components of spoken language: an fMRI study. Cogn. Brain Res., 9: 227–238. Caltagirone, C., Ekman, P., Friesen, W., Gainotti, G., Mammucari, A., Pizzamiglio, L. and Zoccolotti, P. (1989) Posed emotional expression in unilateral brain damaged patients. Cortex, 25: 653–663. Carmon, A. and Nachshon, I. (1971) Effect of unilateral brain damage on perception of temporal order. Cortex, 7: 410–418. Carter, C.S., Braver, T.S., Barch, D.M., Botvinick, M.M., Noll, D.C. and Cohen, J.D. (1998) Anterior cingulate cortex, error detection and the on-line monitoring of performance. Science, 280: 747–749. Darby, D. (1993) Sensory aprosodia: a clinical clue to lesion of the inferior division of the right middle cerebral artery? Neurology, 43: 567–572. Davidson, R. (1992) Anterior cerebral asymmetry and the nature of emotion. Brain Cogn., 6: 245–268. Davidson, R. (2000) Affective style, psychopathology, and resilience: brain mechanisms and plasticity. Am. Psychol., 55: 1196–1214. Davidson, R., Abercrombie, H., Nitschke, J. and Putnam, K. (1999) Regional brain function, emotion and disorders of emotion. Curr. Opin. Neurobiol., 9: 228–234.
293 Davidson, R.J. and Irwin, W. (1999) The functional neuroanatomy of emotion and affective style. Trends Cogn. Sci., 3: 11–21. Dehaene-Lambertz, G., Pallier, C., Serniclaes, W., SprengerCharolles, L., Jobert, A. and Dehaene, S. (2005) Neural correlates of switching from auditory to speech perection. Neuroimage, 24: 21–33. Donaldson, D.I. and Buckner, R.L. (2001) Effective paradigm design. In: Matthews, P.M., Jezzard, P. and Evans, A.C. (Eds.), Functional Magnetic Resonance Imaging of the Brain: Methods for Neuroscience. Oxford University Press, Oxford, pp. 175–195. Dykstra, K., Gandour, J. and Stark, R. (1995) Disruption of prosody after frontal lobe seizures in the non-dominant hemisphere. Aphasiology, 9: 453–476. Emmorey, K. (1987) The neurological substrates for prosodic aspects of speech. Brain Lang., 30: 305–320. Friston, K.J., Zarahn, E., Joseph, O., Henson, R.N.A. and Dale, A. (1999) Stochastic designs in event-related fMRI. Neuroimage, 10: 609–619. Gandour, J., Wong, D., Dzemidzic, M., Lowe, M., Tong, Y. and Li, X. (2003) A cross-linguistic fMRI study of perception of intonation and emotion in Chinese. Hum. Brain Mapp., 18: 149–157. Gazzaniga, M. (1988) Brain modularity: towards a philosophy of conscious experience. In: Marcel, A. and Bisiach, E. (Eds.), Consciousness in Contemporary Science. Oxford University Press, Oxford, pp. 218–238. George, M.S., Parekh, P.I., Rosinsky, N., Ketter, T.A., Kimbrell, T.A., Heilman, K.M., Herscovitch, P. and Post, R.M. (1996) Understanding emotional prosody activates right hemisphere regions. Arch. Neurol., 53: 665–670. Grandjean, D., Sander, D., Pourtois, G., Schwartz, S., Seghier, M.L., Scherer, K.R. and Vuilleumier, P. (2005) The voices of wrath: brain responses to angry prosody in meaningless speech. Nat. Neurosci., 8: 145–146. Gur, R., Skolnnick, B. and Gur, R. (1994) Effects of emotional discrimination tasks on cerebral blood flow: regional activation and its relation to performance. Brain Cogn., 25: 271–286. Heilman, K., Scholes, R. and Watson, R. (1975) Auditory affective agnosia: disturbed comprehension of speech. J. Neurol. Neurosurg. Psychiatry, 38: 69–72. Hornak, J., Bramham, J., Rolls, E.T., Morris, R.G., O’Doherty, J., Bullock, P.R. and Polkey, C.E. (2003) Changes in emotion after circumscribed surgical lesions of the orbitofrontal and cingulate cortices. Brain, 126: 1691–1712. Hornak, J., Rolls, E.T. and Wade, D. (1996) Face and voice expression identification in patients with emotional and behavioural changes following ventral frontal lobe damage. Neuropsychologia, 34: 247–261. Kotz, S., Meyer, M., Alter, K., Besson, M., von Cramon, D.Y. and Friederici, A.D. (2003a) On the lateralization of emotional prosody: an event-related functional MR investigation. Brain Lang., 86: 366–376. Kotz, S., Meyer, M., Besson, M. and Friederici, A.D. (2003b) On the lateralization of emotional prosody: effect of blocked versus event-related designs. J. Cogn. Neurosci. Suppl., 138.
Kotz, S.A., Paulmann, S. and Raettig, T. (2005) Varying task demands during the perception of emotional content: efMRI evidence. J. Cogn. Neurosci., Suppl. S, 63. Kowner, R. (1995) Laterality in facial expressions and its effect on attributions of emotion and personality: a reconsideration. Neuropsychologia, 33: 539–559. Liebenthal, E., Binder, J.R., Piorkowski, R.L. and Remez, R.E. (2003) Short-term reorganization of auditory analysis induced by phonetic experience. J. Cogn. Neurosci., 15: 549–558. Liegeois-Chauvel, C., Lorenzi, C., Trebuchon, A., Regis, J. and Chauvel, P. (2004) Temporal envelope processing in the human left and right auditory cortices. Cereb. Cortex, 14: 731–740. Meyer, M., Zaehle, T., Gountouna, V.E., Barron, A., Ja¨ncke, L. and Turk, A. (2005) Spectro-temporal processing during speech perception involves left posterior auditory cortex. Neuroreport, 19: 1985–1989. Mitchell, R.L., Elliott, R., Barry, M., Cruttenden, A. and Woodruff, P.W. (2003) The neural response to emotional prosody, as revealed by functional magnetic resonance imaging. Neuropsychologia, 41: 1410–1421. Montavont, A., Demarquay, G., Guenot, M., Isnard, J., Mauguiere, F. and Ryvlin, P. (2005) Ictal dysprosody and the role of the non-dominant frontal operculum. Epileptic Disord., 7: 193–197. Morris, J., Frith, C., Perret, D., Rowland, D., Young, A.W., Calder, A.J. and Dolan, R.J. (1996) A differential neural response in the human amygdala to fearful and happy facial expressions. Nature, 383: 812–815. Morris, J., Scott, S. and Dolan, R. (1999) Saying it with feeling: neural responses to emotional vocalizations. Neuropsychologia, 37: 1155–1163. Murphy, F.C., Nimmo-Smith, I. and Lawrence, A. (2003) Functional neuroanatomy of emotions: a meta-analysis. Cogn. Affect. Behav. Neurosci., 3: 207–233. O¨ngu¨r, D., Ferry, A.T. and Price, J.L. (2003) Architectonic subdivision of the human orbital and medial prefrontal cortex. J. Comp. Neurol., 460: 425–449. Ouellette, G. and Baum, S. (1993) Acoustic analysis of prosodic cues in left and right-hemisphere-damaged patients. Aphasiology, 8: 257–283. Paradiso, S., Chemerinski, E., Yazici, K., Tartaro, A. and Robinson, R. (1999) Frontal lobe syndrome reassessed: comparison of patients with lateral or medial frontal brain damage. J. Neurol. Neurosurg. Psychiatry, 67: 664–667. Pell, M. and Baum, S. (1997) Unilateral brain damage, prosodic comprehension deficits, and the acoustic cues to prosody. Brain Lang., 57: 195–214. Pell, M. and Leonard, C. (2003) Processing emotional tone from speech in Parkinson’s disease: a role for the basal ganglia. Cogn. Affect. Behav. Neurosci., 3: 275–288. Phillips, M., Young, A., Scott, S., Calder, A., Andrew, C., Giampietro, V., Williams, S.C., Bullmore, E.T., Brammer, M. and Gray, J.A. (1998) Neural responses to facial and vocal expressions of fear and disgust. Proc. R. Soc. Lond. B Biol. Sci., 265: 1809–1817. Pu, Y., Liu, H.L., Spinks, J.A., Mahankali, S., Xiong, J., Feng, C.M., Tan, L.H., Fox, P.T. and Gao, J.H. (2001) Cerebral
294 hemodynamic response in Chinese (first) and English (second) language processing revealed by event-related functional MRI. Magn. Reson. Imaging, 19: 643–647. Rinne, T., Pekkola, J., Degerman, A., Autti, T., Ja¨a¨skela¨inen, I.P., Sams, M. and Alho, K. (2005) Modulation of auditory cortex activation by sound presentation rate and attention. Hum. Brain Mapp., 26: 94–99. Robin, D., Tranel, D. and Damasio, H. (1990) Auditory perception of temporal and spectral events in patients with focal left and right cerebral lesions. Brain Lang., 39: 539–555. Robinson, R. and Starkstein, S. (1989) Mood disorders following stroke: new findings and future directions. J. Geriatr. Psychiatry Neurol., 22: 1–15. Robinson, R. and Starkstein, S. (1990) Current research in affective disorders following stroke. J. Neuropsychiatry Clin. Neurosci., 2: 1–14. Rolls, E. (Ed.). (1999) The Brain and Emotion. Oxford University Press, Oxford. Ross, E.D. (1981) The aprosodias. Functional-anatomic organization of the affective components of language in the right hemisphere. Arch. Neurol., 38: 561–569. Ross, E.D., Harney, J.H., de Lacoste-Utamsing, C. and Purdy, P.D. (1981) How the brain integrates affective and propositional language into a unified behavioral function. Hyopthesis based on clinicoanatomic evidence. Arch. Neurol., 38: 745–748. Ross, E.D., Thompson, R. and Yenkosky, J. (1997) Lateralization of affective prosody in brain and the callosal integration of hemispheric language functions. Brain Lang., 56: 27–54. Rosadini, G. and Rossi, G.F. (1967) On the suggested cerebral dominance for consciousness. Brain, 90: 101–112. Sackheim, H., Greenberg, M., Weiman, A., Gur, R., Hungerbuhler, J., Geschwind, N. (1982) Hemispheric asymmetry in the expression of positive and negative emotions Neurologic evidence. Arch. Neurol., 39: 219–218. Sackheim, H., Gur, R. and Saucy, M. (1978) Emotions are expressed more intensely on the left side of the face. Science, 202: 434–436. Scha¨fer, A., Schienle, A. and Vaitl, D. (2005) Stimulus type and design influence hemodynamic responses towards visual disgust and fear elicitors. Int. J. Psychophysiol., 57: 53–59. Scherer, K. (1989) Vocal correlates of emotional arousal and affective disturbance. In: Wagner, H. and Manstead, A. (Eds.), Handbook of Social Physiology. Wiley, New York, pp. 165–197. Schirmer, A. and Kotz, S.A. (2006) Beyond the right hemisphere: brain mechanisms mediating vocal emotional processing. Trends Cogn. Neurosci., 10: 24–30. Sidtis, J.J. and Feldmann, E. (1990) Transient ischemic attacks presenting with a loss of pitch perception. Cortex, 26: 469–471. Sidtis, J.J. and Van Lancker-Sidtis, D. (2003) A neurobehavioral approach to dysprosody. Semin. Speech Lang., 24: 93–105. Starkstein, S., Federoff, J., Price, T., Leiguarda, R. and Robinson, R. (1994) Neuropsychological and neuroradiologic correlates of emotional prosody comprehension. Neurology, 44: 515–522.
Strauss, E. and Moscovitch, M. (1981) Perception of facial expressions. Brain Lang., 13: 308–332. Taylor, S.F., Phan, K.L., Decker, L.R. and Liberzon, I. (2003) Subjective rating of emotionally salient stimuli modulates limbic activity. Neuroimage, 18: 650–659. Tompkins, C. and Flowers, C. (1985) Perception of emotional intonation by brain-damaged adults: the influence of task processing levels. J. Speech Hear. Res., 28: 527–538. Tucker, D., Watson, R. and Heilman, K. (1977) Discrimination and evocation of affectively intoned speech in patients with right parietal disease. Neurology, 27: 947–950. Van Lancker, D. (1980) Cerebral lateralization of pitch cues in the linguistic signal. Int. J. Hum. Comm., 13: 101–109. Van Lancker, D. and Sidtis, J. (1992) The identification of affective-prosodic stimuli by left- and right-hemisphere-damaged subjects: all errors are not created equal. J. Speech Hear. Res., 35: 963–970. Vingerhoets, G., Berckmoes, C. and Stoobant, N. (2003) Cerebral hemodynamics during discrimination of prosodic and semantic emotions in speech studied by transcranial Doppler ultrasonography. Neuropsychology, 17: 93–99. Wagner, A.D., Schacter, D.L., Rotte, M., Koutstaal, W., Maril, A., Dale, A.M., Rosen, B.R. and Buckner, R.L. (1998) Building memories: remembering and forgetting of verbal experiences as predicted by brain activity. Science, 281: 1188–1191. Weintraub, S., Mesulam, M. and Kramer, L. (1981) Disturbances in prosody. A right-hemisphere contribution to language. Arch. Neurol., 38: 742–744. Wildgruber, D., Hertrich, I., Riecker, A., Erb, M., Anders, S., Grodd, W. and Ackermann, H. (2004) Distinct frontal regions subserve evaluation of linguistic and emotional aspects of speech intonation. Cereb. Cortex, 14: 1384–1389. Wildgruber, D., Pihan, H., Ackermann, H., Erb, M. and Grodd, W. (2002) Dynamic brain activation during processing of emotional intonation: influence of acoustic parameters, emotional valence, and sex. Neuroimage, 15: 856–869. Wildgruber, D., Riecker, A., Hertrich, I., Erb, M., Grodd, W., Ethofer, T. and Ackermann, H. (2005) Identification of emotional intonation evaluated by fMRI. Neuroimage, 24: 1233–1241. Zaehle, T., Wu¨stenberg, T., Meyer, M. and Ja¨ncke, L. (2004) Evidence for rapid auditory perception as the foundation of speech processing: a sparse temporal sampling fMRI study. Eur. J. Neurosci., 20: 2447–2456. Zatorre, R. (1988) Pitch perception of complex tones and human temporal-lobe function. J. Acoust. Soc. Am., 84: 566–572. Zatorre, R.J. and Belin, P. (2001) Spectral and temporal processing in human auditory cortex. Cereb. Cortex, 11: 946–953. Zatorre, R., Belin, P. and Penhune, V. (2002) Structure and function of auditory cortex: music and speech. Trends Cogn. Sci., 6: 37–46. Zatorre, R., Evans, A. and Meyer, E. (1994) Neural mechanisms underlying melodic perception and memory for pitch. J. Neurosci., 14: 1908–1919.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 16
Psychoacoustic studies on the processing of vocal interjections: how to disentangle lexical and prosodic information? Susanne Dietrich1,, Hermann Ackermann2, Diana P. Szameitat1 and Kai Alter3 1 Max-Planck-Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, 04103 Leipzig, Germany Department of Neurology, Hertie Institute for Clinical Brain Research, University of Tu¨bingen, Tu¨bingen, Germany 3 School of Neurology, Neurobiology & Psychiatry, Newcastle upon Tyne, UK
2
Abstract: Both intonation (affective prosody) and lexical meaning of verbal utterances participate in the vocal expression of a speaker’s emotional state, an important aspect of human communication. However, it is still a matter of debate how the information of these two ‘channels’ is integrated during speech perception. In order to further analyze the impact of affective prosody on lexical access, so-called interjections, i.e., short verbal emotional utterances, were investigated. The results of a series of psychoacoustic studies indicate the processing of emotional interjections to be mediated by a divided cognitive mechanism encompassing both lexical access and the encoding of prosodic data. Emotional interjections could be separated into elements with high- or low-lexical content. As concerns the former items, both prosodic and propositional cues have a significant influence upon recognition rates, whereas the processing of the lowlexical cognates rather solely depends upon prosodic information. Incongruencies between lexical and prosodic data structures compromise stimulus identification. Thus, the analysis of utterances characterized by a dissociation of the prosodic and lexical dimension revealed prosody to exert a stronger impact upon listeners’ judgments than lexicality. Taken together, these findings indicate that both propositional and prosodic speech components closely interact during speech perception. Keywords: cognition; communication; emotion; language; recognition; semantics
As a prerequisite to the recognition of the emotional meaning of a word, thus, propositional labels must refer to a knowledge-based semantic concept (lexical emotional meaning). These processes imply the existence of a mental lexicon ‘housing’ phonetic/ phonological patterns linked to emotional semantics. A variety of functional imaging studies indicate the ‘mental lexicon’ to be bound to specific cerebral areas. It has been assumed that these regions act as an interface between word perception and longterm memory representations of familiar words (Wartburton et al., 1996; Wise et al., 2001). Evidence for the participation of specific cerebral areas
Introduction Verbal utterances may convey information about a speaker’s emotional state by the modulation of intonational cues (affective prosody) as well as emotional word content (lexicality). Thus, both prosodic and lexical data structures must be expected to closely interact during speech production. On the one hand, the sequential order of phonemes reflects the canonical structure of verbal utterances. Corresponding author. Tel.: +49-0341-9940268; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56016-9
295
296
in representing sequences of phonemes comes, e.g., from cortical stimulation studies in patients who underwent surgical treatment for epilepsy (Anderson et al., 1999; Quigg and Fontain, 1999). On the other hand, suprasegmental parameters of verbal utterances such as pitch and loudness contours, syllable durations, and various aspects of voice quality signal affective prosody of verbal utterances. These cues are imposed upon the segmental structure of speech utterances (Banse and Scherer, 1996; MacNeilage, 1998; Sidtis and Van-Lancker-Sidtis, 2003). Similar to lexical features, the recognition of the emotional meaning of an utterance might be based in these instances upon the association of prosodic features with a knowledge-based semantic concept (prosodic emotional meaning). So far, it is unsettled whether the recognition of emotional speech utterances involves a divided cognitive mechanism encompassing both lexical access and the encoding of prosodic information. For example, functional imaging studies revealed different activation patterns in association with the processing of lexical–semantic and prosodic data structures (Wise et al., 2001; Friederici, 2002). These findings raise the question of how affective prosodic information and lexicality influence each other during speech perception. An interaction between both components could be a hint for separate processing of these two components within different cognitive systems. In order to address this issue, emotional interjections, i.e., short utterances often used during speech communication, were investigated. These elements primarily reflect spontaneous expressions of high emotional arousal (Nu¨bling, 2004), and their spectrum extends from verbal utterances like ‘yuck’ (in German ‘igitt’) and vowels of a distinct tone to nonverbal expressions like bursts of laughter. This chapter reviews a series of experiments conducted by our research group to further elucidate perceptual and cognitive processing of affective vocal interjections. The following questions were addressed: (a) Can the affective prosodic and the propositional components of interjections be separated and, thereby, classified based on lexical emotional meaning? (lexical meaning of verbal emotional interjections), (b) In how far depends the recognition of emotional meaning upon the lexical ‘load’ of these
items? (influence of lexical semantics on recognition rates), and (c) Do incongruencies between lexical and prosodic information, first, compromise recognition rate as compared to congruent items and, second, disclose the relative impact of these two components upon listeners’ judgments? (interaction of prosodic and lexical information). These investigations, thus, aimed at a further analysis of the perceptual and cognitive processes underlying two of the most important aspects of speech communication: prosody and lexicality. Participants were asked to judge the emotional meaning of auditory applied vocal interjections. The focus was on so-called ‘basic emotions’ that can be easily displayed by vocal expressions and reliably recognized from verbal utterances (Scherer and Oshinsky, 1977). The following emotion categories were considered for analysis: happiness, anger, disgust, and sadness. Trained native speakers of German had produced these interjections with affective (happy, angry, disgusted, or sad) or neutral prosody.
Lexical meaning of verbal emotional interjections In order to separate segmental and suprasegmental levels of vocal affective expression, a series of interjections of the German language was evaluated with respect to their emotional lexical meaning. Table 1 includes a list of items — spoken at a neutral tone — that served as test materials. Under these conditions, affective prosodic cues are not available and, thus, only lexicality conveys emotional meaning. Participants had to answer the question ‘Which emotion was expressed by the utterance?’ and, thus, were instructed to focus on lexical information ignoring prosody. Listeners rated the items either as neutral (‘neutral button’) or as emotionally meaningful (four ‘emotional buttons’). Thus, the probability to press an emotional and not the neutral button amounts to 50%. Responses were coded in terms of the frequency (in percent) of selecting any one of the emotional categories (percentage of emotional judgments ¼ ‘emotional recognition rate’). Based on this approach, the various vocal interjections spoken at a neutral tone were found to differ in lexical emotional meaning (Fig. 1, Table 2), and the stimuli
297 Table 1. Items used in the study Intonation
Happy
Angry Disgusted
Sad
Stimuli Congruent
Affective incongruent
heissa [‘ha ısa], hurra [hu0 ra:], juhu [ju’hu:], yippie [jıpi:], heida [‘haıda], ja [ja:], aja [‘a:’ja:] pha [pha], ahr [a:r], a [a:]
a, achje, a¨h, auwei, ba¨h, e, herrje, i, igitt, jeminee, o, oje, oweh, pfui, u, ua¨h e, ei, heida, heissa, hurra, i, juhu, o, u, yippie a, o
igitt [i:gıth], pfui [pfui], i [i:], ua¨h [uæ:], ba¨h [bæ:], a¨h [æ:], e [e:], u [u:h] auwei [auvaı:], oje [o’je:], oweh [o:’ve:], achje [ax’je:], jeminee [‘je:mıne:], herrje [her’je:], ach [a:x], o [o:], u [u:], ei [a ı:]
Neutrally spoken
a, ahr, e, i, pah
Neutral
a, ach, achje, a¨h, ahr, aja, auwei, ba¨h, e, ei, heida, heissa, herrje, hurra, i, igitt, ja, jeminee, juhu, oje, oweh, pah, pfui, u, ua¨h, yippie
Three intonational conditions result in affective congruency or incongruency, and in neutrality.
could be assigned to two major categories, i.e., high-lexical (HLI) and low-lexical (LLI) interjections. Thus, some interjections convey emotional meaning even in the absence of relevant affective prosodic cues, i.e., exclusively based on their segmental structure (HLI). Other items failed to show these effects, and in these instances emotional meaning must be expected to depend primarily upon affective prosody (LLI). A second step of analysis aimed at the determination of the emotional category, i.e., happiness, anger, sadness, or disgust, displayed by the various interjections. The probability to select one of the four categories by chance amounts to 25%. For each interjection, the percentage of the four different responses was calculated and the category that achieved the highest rank assumed to represent the lexical emotional meaning of the respective item. If the largest percentage does not achieve the significance level, the respective emotion category cannot be considered the lexical emotional meaning of that stimulus. Nevertheless, these data also had to be considered for analysis since otherwise it would not have been possible to define response correctness in the following experiments. Table 2 and Fig. 1 summarize the emotional meanings of the various interjections as determined by this
approach. Apart from two stimuli, the obtained highest percentage values achieved in all instances the significance level considered. As expected, HLI stimuli displayed an unambiguous lexical emotional meaning each. By contrast, LLI failed to convey an unequivocal affective content. Nevertheless, some preferences in the attribution of lexical emotional meanings could be observed. The findings of a recent behavioral study (Schro¨der, 2003) on interjections show some analogies to the data of our study. In that report, the term ‘transcription variability’ was introduced to characterize the relationship between emotional expression and the segmental structure of a stimulus, providing a criterion for the degree of conventionality. Thus, conventionality is considered to be based upon the canonical segmental structures, i.e., ‘lexicality’. In accordance with our investigation, some of Schro¨der’s stimuli such as ‘yuck’ (in German ‘igitt’) or an affective intonated vowel (‘o’) show high (in the case of ‘igitt’) or low (in the case of ‘o’) listening recognition rate. We found that the vowel ‘a’ was found most frequently associated with anger (44%). However, this observation does not reach significance (t(15) ¼ 1.69, p ¼ 0.118). Therefore, this item might allow for a broader range of emotional interpretations such as happiness,
298
Fig. 1. Emotional recognition rates for neutrally spoken interjections in response to the task question ‘Which emotion is expressed by the utterance?’. The figure illustrates how frequent an utterance was classified as an item with emotional lexical meaning. Gray symbols were not significantly identified as emotional. Black symbols were significantly identified as emotional. Percentage values in brackets indicate the frequency of choosing an emotional category.
disgust, and sadness. By contrast, vowel ‘u’ is associated with disgust and sadness to the same extent (50%). As a consequence, this stimulus is compatible with both emotional expressions. In summary, these studies indicate that at least a subset of interjections conveys emotional meaning both by lexical as well as by prosodic factors (HLI). Other utterances were found to predominantly rely on affective prosody as a means for the display of emotional meaning (LLI) and, thus, lack a significant contribution of lexical information in this regard. These data argue in favor of a new classification scheme of interjections based on their lexical status. It could be documented, furthermore, that a separation of affective prosodic and lexical dimensions is even possible in natural speech materials. Thus, verbal stimuli must not necessarily be manipulated, e.g., by means of lowpass filtering, in order to remove lexical components of emotional meaning.
Influence of lexical semantics on recognition rates A further experiment tried to assess the influence of lexical meaning (HLI vs. LLI stimuli) on the
recognition of vocally expressed emotions. Table 1 provides a list of interjections spoken with congruent prosodic modulation, e.g., the utterance ‘hurra’ produced at a happy tone. In these instances, prosodic and lexical emotional meanings converge. During auditory application of these items, participants had to answer the question ‘Which emotion was expressed?’ in the absence of any further instructions. Thus, listeners could base their judgments either on prosodic or on lexical or on both types of cues. Again, participants had to assign one of the four emotion categories considered, i.e., happiness, anger, sadness, or disgust, to the verbal utterances. Responses were coded in terms of the frequency of correct ratings of the transmitted emotion (proportion of correct to incorrect answers ¼ recognition rate in percent). HLI items yielded significantly higher recognition rates than their LLI cognates (Fig. 2; t(40) ¼ 3.81, po0.001). Recognition rates were found to decline in the absence of either one of the two speech components ‘prosody’ or ‘lexicality’. The comparison of HLI and LLI stimuli spoken with congruent affective prosody revealed better recognition rates for the items of high-lexical emotional meaning. In these cases, both ‘channels’ of information are
Table 2. Recognition rates and identification of special emotional categories for neutrally spoken interjections in response to the task question ‘Which emotion is expressed by the utterance?’ Stimulus
Emotional identification
Identification of special emotion Elation
t Igitt Pfui I Auwei Oje Oweh Achje Heissa Hurra Juhu Yippie Ua¨h Ba¨h A¨h Jeminee Herrje Heida Ach Pah E Ahr O Ja Aja A U Ei
15.00 10.25 10.25 5.20 2.09 1.86 2.08 1.17 1.00 0.29 0.27 0.20 0.52 1.00
P
o0.001 o0.001 o0.001 o0.001 0.054 0.083 0.056 0.261 0.333 0.774 0.791 0.846 0.609 0.333
P (%) SEM 100 100 100 100 100 100 100 100 100 100 100 100 100 97 94 94 88 69 69 66 63 59 53 53 52 44 38
t
P
Hot anger
P (%) SEM
t
P
Disgust
P (%) SEM
t
P
Sadness
P (%) SEM
t
P
P (%) SEM
100 100 100 100 100 100 100 100 100 100 100 3.00 o0.01 3.13 4.27 4.27 7.22 8.98 10.08 7.53 10.70 9.38 10.67 11.61 10.53 11.97 12.50
3.00 o0.01 6 1.73 0.104 13 100 2.50 o0.05 7
6
6.25
23.00 11.00 16.10
o0.001 97 o0.001 94 o0.001 94
3.13 6.25 4.27
0.230
13.69
6.25 7.22 7.14
2.50 o0.05 2.39 o0.05
7 58
7.14 13.69 1.26
6.20 o0.001 83 10.65 2.25 o0.05 8
9.40 7.69
3.38 0.74
13.33 0.38 11.32 1.69
13.33 11.53 3.03 o0.01 1.53 0.170
0.50 0.638
17
0.716 0.118
30 44
16.67
3.13
4.39 o0.001 6 11.00 o0.001 94 8.66 o0.001 88
4.27 6.25 7.22
o0.001 86
9.71
6.26
42 100 2.00 0.071 8 5.50 o0.001 4
0.54 0.598
19 100 o0.01 70 0.476 33
7.00 o0.001 3
8 50
2.97 o0.05 3.32 o0.01
8 69
5.62 13.32
5.50 1.26 0.232 16.37 1.53 0.170 3.50 o0.05
14 50 83
8.82 16.37 16.67
8.33 3.85
Values of percentage as well as statistical parameters are given. Abbreviations: t ¼ t-values of a one sample t-test, p ¼ significance level, P ¼ recognition rate, SEM ¼ error deviation.
299
300
Fig. 2. Recognition rates for interjections with congruent prosody in response to the question task ‘Which emotion is expressed by the speaker?’. Values of percentage and error deviation are given for both lexical categories.
available to the listeners. By contrast, evaluation of LLI utterances rather exclusively must rely on affective prosody. Thus, recognition of the emotional meaning of interjections appears to depend upon the range of available cues. In line with these data, a variety of studies of animal communicative behavior suggest the increase of the amount of available information to enhance the probability that a receiver makes a correct decision (Bradbury and Vehrencamp, 1998).
Interaction of prosodic and lexical information In order to assess whether the processing of prosodic and lexical information is bound to different mechanisms, stimuli spoken with congruent and incongruent prosody were compared to each other. Given an impact of prosody on lexicality or vice versa, recognition rates must be expected to differ between these two conditions. Table 1 summarizes the stimuli spoken with incongruent prosodic modulation. In this study, participants had to respond to the question ‘Which emotion was expressed?’ in the absence of any further instructions. As in the preceding study, listeners, thus, could base their judgments either on prosodic or on lexical or on both types of cues. Again, participants had to assign one of the four emotion categories considered, i.e., happiness, anger, sadness, or disgust, to the verbal utterances. In case of stimuli with incongruent prosody, correct identification of either the lexical or the prosodic emotional meaning counted as
Fig. 3. Recognition rates for interjections with congruent (match) versus incongruent (mismatch) prosody. Values of percentage and error deviation are given for both lexical categories.
an adequate response. Recognition performance was coded in terms of the frequency of correct ratings of the transmitted emotion (proportion of correct to incorrect answers ¼ recognition rate in percent). As concerns affectively spoken interjections with congruent emotional tone, a significantly higher percentage of correct judgments emerged as compared to affectively spoken stimuli with incongruent prosody (Fig. 3; for HLI: t(36) ¼ 4.63, po0.001; for LLI: t(36) ¼ 3.64, po0.001). Comparison of affectively spoken stimuli with congruent and incongruent prosody revealed better recognition rates in the former case. HLI items showed a stronger effect than LLI, most presumably due to reduced lexicality of LLI stimuli. It can be expected, therefore, that the processing of prosody and lexicality cues within the domain of emotional interjections be bound to separate cognitive mechanisms. However, since incongruencies between the prosody and the lexicality of interjections yield lower recognition rates, subsequent higher order cognitive processes appear to match both sources of information. A further step of analysis addressed the question of whether the listeners’ judgments predominantly relied on prosodic or lexical information. Based on the sample of interjections with incongruent prosody, lexical and prosodic decisions were separately analyzed and responses coded in terms of the frequency of correct ratings of the transmitted emotion (proportion of correct prosodic or lexical decisions to incorrect answers ¼ recognition rate in percent). The recognition rates for HLI and LLI items associated with prosodic and lexical decisions were
301
analyzed in order to determine which of the two processes was preferred. Prosodic decisions yielded significantly higher recognition rates than the lexical ones (Fig. 4; for HLI: t(36) ¼ 7.35, po0.001; for LLI: t(36) ¼ 27.13, po0.001). As a consequence, HLI items were more often correctly identified by lexical decisions than their LLI counterparts (Fig. 4; t(36) ¼ 6,42, po0.001). By contrast, the reversed pattern was found in association with prosodic decisions (Fig. 4; t(36) ¼ 4.42, po0.001). Taken together, prosodic cues yielded more often correct judgments than lexical information. It must be expected, therefore, that prosody plays a dominant role during recognition of emotional meaning. Other studies also reported that affective prosodic meaning usually takes precedence if the linguistic message is at odds with emotional intonation (Bolinger, 1972; Ackerman, 1983; Ross, 1997). In case of incongruently spoken utterances, listeners have to focus either on the prosody or on the lexicality of verbal utterances. Conceivably, affective prosodic cues are easier to process and, therefore, more salient during speech perception. Nevertheless, lexical content cannot be totally ignored even under these conditions since LLI showed higher recognition rates than HLI in association with prosodic decisions. Since lexicality seems to have a specific impact on prosody-based judgments, any mismatches must be expected to depend upon the lexical rather than the prosodic informational component. Lexical emotional meaning is strongly associated with the segmental structure of an utterance, a dimension difficult to ignore. By contrast, prosody is bound to the suprasegmental level of speech that can easily be recognized but does not as strictly dependent on the segmental structure of verbal utterances. Thus, a more versatile use of prosodic patterns during speech communication is quite conceivable. Conclusions Both affective prosody and lexical meaning participate in the communication of a speaker’s emotional states. The data reviewed here suggest that these two dimensions of the emotional meaning of interjections are mediated by different cognitive mechanisms. Among others, the classification of these
Fig. 4. Recognition rates for interjections with incongruent prosody. Decisions guided by prosodic or lexical information are illustrated separately. Values of percentage and error deviation are given for both lexical categories.
utterances into different categories provided evidence for separate processing of prosodic and lexical elements during speech communication. Nevertheless, both dimensions combined allow for a more reliable encoding of the emotional meaning of interjections than prosody on its own. Comparing both sources of information, prosody seems to have a more salient function in the transmission of emotions. Taken together, these psychoacoustic data suggest lexical and prosodic components to separately contribute to the formulation of emotional semantic information. In order to correctly perceive the affective message of verbal utterances, both speech elements, however, have to be matched. Abbreviations HLI LLI
high-lexical interjections low-lexical interjections
Acknowledgments This study has been supported by the Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, and the Department of General Neurology, Hertie Institute for Clinical Brain Research, University of Tu¨bingen, Germany. Appendix Participants: All participants were native righthanded (Edinburgh Inventory; Oldfield, 1971)
302
German speakers (mean age ¼ 25.0 years, SD ¼ 3.72, range 18–45 years). None of them had a history of any hearing disorders or neurological diseases. They were all naive with regard to the purposes of the experiments. Sixteen volunteers (8 females) participated in the first and 41 individuals (21 females) in the second experiment. The third study encompassed 37 individuals (18 females). Participants were paid for participation, and all of them had provided informed consent. Recordings: Recordings were performed in a sound proven booth with a microphone (type Sennheiser) positioned at a distance of 0.8 m to the speaker’s mouth. Stimuli were recorded on a digital DAT audiotape and digitized at 44.1 kHz with a 16bit sampling rate. A sound editor (Cool Edit version 2000, Syntrillium Software) was used for further processing of the stimuli. In order to facilitate production of distinct affective tones, suitable frame stories were presented to the speakers: (a) the speaker has just won a lot of money in a horse race (happiness); (b) the speaker is furious about a person he/she dislikes because this individual has broken once again a rule of proper conduct and shows no sign of regret (anger); (c) the speaker regards moving maggots in a wound (disgust); (d) the speaker has just heard about the death of a close relative (sadness). During the recordings, speakers silently read the frame story for a given emotional category and then produced the respective stimuli. Procedure: Participants were seated in a soundproof booth in front of a computer screen with loudspeakers placed to its left and right side, instructions about the experimental procedure and the task to be performed being visually presented prior to each session. Participants had to listen to the interjections applied via loudspeakers and to identify the respective emotional category. After presentation of each item, the question to be answered appeared on the screen. Participants then had to respond as quickly and accurately as possible by pressing a button on a response box. The time interval for the reaction amounted to 5 s. Stimuli were presented in pseudorandomized order, but balanced over the various blocks with 50 items per block. Each stimulus was presented twice. Prior to the experiments, a short practice session including 20 stimuli was performed. Between successive blocks, participants
could make a short break upon their own request. The three studies lasted about 45 min each. Data analysis: One-sample t-tests were calculated in order to determine whether the obtained values differed significantly from chance level. Furthermore, t-tests for dependent samples were performed to determine whether values of two groups differed significantly from each other.
References Ackerman, B. (1983) Form and function in children’s understanding of ironic utterances. J. Exp. Child Psychol., 35: 487–508. Anderson, J.M., Gilmore, R., Roper, S., Crosson, B., Bauer, R.M., Nadeau, S., Beversdorf, D.Q., Cibula, J., Rogish III, M., Kortencamp, S., Hughes, J.D., Gonzalez Rothi, L.J. and Heilman, K.M. (1999) Conduction aphasia and the arcuate fasciculus: a reexamination of the Wernicke-Gschwind model. Brain Lang., 70: 1–12. Banse, R. and Scherer, K.R. (1996) Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol., 70: 614–636. Bolinger, D. (1972) Intonation. Harmondsworth, UK, Penguin. Bradbury, J.W. and Vehrencamp, S.L. (1998). In: Principles of Animal Communication. Sinauer Associates, Sunderland, MA, pp. 387–418. Friederici, A.D. (2002) Towards a neural basis of auditory sentence processing. Trends Cogn. Sci., 6: 78–84. MacNeilage, P.F. (1998) The frame/content theory of evolution of speech production. Behav. Brain Sci., 21: 499–546. Nu¨bling, D. (2004) Die prototypische interjektion: ein definitionsvorschlag. Z. Semiotik, 26(Band 1-2): 11–45. Oldfield, R. (1971) The assessmentand analysis of handedness: the Edinburgh inventory. Neuropsychologia, 9: 97–113. Quigg, M. and Fontain, N.B. (1999) Conduction aphasia elicited by stimulation of the left posterior superior temporal gyrus. J. Neurol. Neurosurg. Psychatry, 66: 393–396. Ross, E.D. (1997) Cortical representation of the emotions. In: Trimble, M.R. and Cummings, J.L. (Eds.), Contemporary Behavioral Neurology. Butterworth-Heinemann, Boston, pp. 107–126. Scherer, K.R. and Oshinsky, J.S. (1977) Cue utilization in emotion attribution from auditory stimuli. Motiv. Emot., 1: 331–346. Schro¨der, M. (2003) Experimental study of affect bursts. Speech Commun., 40: 99–116. Sidtis, J.J. and Van Lancker Sidtis, D. (2003) A neurobehavioral approach to dysprosody. Semin. Speech Lang., 24(2): 93–105. Wartburton, E., Wise, R.J., Price, C.J., Weiller, C., Hadar, U., Ramsay, S. and Frackowiack, R.S. (1996) Noun and verb retrieval by normal subjects: studies with PET. Brain, 119: 159–179. Wise, R.J.S., Scott, S.K., Blank, S.C., Mummery, C.J., Murphy, K. and Wartburton, E.A. (2001) Separate neural subsystems within ‘Wernicke’s area’. Brain, 124: 83–95.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 17
Judging emotion and attitudes from prosody following brain damage Marc D. Pell School of Communication Sciences and Disorders, McGill University, 1266 Ave. des Pins Ouest, Montre´al, QC, H3G 1A8, Canada
Abstract: Research has long indicated a role for the right hemisphere in the decoding of basic emotions from speech prosody, although there are few data on how the right hemisphere is implicated in processes for understanding the emotive ‘‘attitudes’’ of a speaker from prosody. We describe recent clinical studies that compared how well listeners with and without focal right hemisphere damage (RHD) understand speaker attitudes such as ‘‘confidence’’ or ‘‘politeness,’’ which are signaled in large part by prosodic features of an utterance. We found that RHD listeners as a group were abnormally sensitive to both the expressed confidence and expressed politeness of speakers, and that these difficulties often correlated with impairments for understanding basic emotions from prosody in many RHD individuals. Our data emphasize a central role for the right hemisphere in the ability to appreciate emotions and speaker attitudes from prosody, although the precise source of these social-pragmatic deficits may arise in different ways in the context of right hemisphere compromise. Keywords: interpersonal behavior; emotion; attitudes; prosody; right hemisphere; brain-damaged; communication disorders; pragmatic language processing ness, and temporal patterning), which dynamically interact in the speech signal to convey intended meanings to the hearer over various time intervals (words, phrases) (Pell, 2001). Moreover, operations for decoding prosodic distinctions in speech cooccur and are frequently interdependent with those for processing finer segmental distinctions in the verbal–semantic channel of language. These variables, which emphasize the unique, multifaceted status of prosody in the spectrum of human communication systems, have long-defied researchers’ attempts to ‘‘isolate’’ effects due to prosody from other sources of information in speech, although significant gains are now being made (for reviews see Baum and Pell, 1999; Sidtis and Van Lancker Sidtis, 2003). For instance, there is growing credence for the idea that understanding speech prosody engages
Isolating prosodic functions in the brain Researchers interested in the neurocognition of prosody face a number of distinct challenges when studying this communication channel. Speech prosody serves an array of functions that impart meanings as a formal part of language, and simultaneously, which jointly refer to the emotions and/or interpersonal stance (‘‘attitudes’’) of a speaker within the communicative event (Sidtis and Van Lancker Sidtis, 2003; Grandjean et al., this volume). The needs for linguistic and emotive expression through prosody are achieved by exploiting a minimal and overlapping set of prosodic elements in speech (e.g., changes in pitch, loudCorresponding author. Tel.: +1-514–398-4133; Fax: +1–514398-8123; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56017-0
303
304
broadly distributed and bilateral networks in the brain (Mitchell et al., 2003; Pell and Leonard, 2003; Gandour et al., 2004). Asymmetries in network functioning, when detected at specific stages of prosodic processing, are likely to reflect differential sensitivity of the two hemispheres to behavioral, stimulus, and/or task-related variables (Wildgruber et al., 2002; Kotz et al., 2003; Tong et al., 2005; Pell, 2006). For example, it is increasingly apparent that processing prosody as a local cue to linguistic–semantic structure, such as retrieving word meanings defined by tonal contrasts or changes in syllabic stress, favors mechanisms of the left cerebral hemisphere (Pell, 1998; Gandour et al., 2004; Wildgruber et al., 2004; Tong et al., 2005). This literature implies that when prosody acts as a linguistic device with conventionalized expression within the language system, prosodic attributes are treated like other acquired linguistic elements that engage left hemisphere processing mechanisms in a preferential manner (Hsieh et al., 2001). In contrast, the processing of less iconic prosodic representations, which encode vocal expressions of emotion states such as ‘‘anger’’ or ‘‘sadness’’ in speech, shows a distinct right-sided bias in many lesion and neuroimaging studies (Pell, 1998; Wildgruber et al., 2002; Gandour et al., 2003; Pell, 2006). This relative right hemisphere dominance for decoding emotional prosody has been hypothetically traced to early stages for structuring acoustic-perceptual features of the auditory input into an internal prosodic representation over extended time frames (Poeppel, 2003) and/or to adjacent stages for evaluating the affective significance of the event (Wildgruber et al., 2004). However, it must be emphasized that significant interhemispheric interactions are expected when processing emotional prosody in conjunction with language content (Plante et al., 2002; Vingerhoets et al., 2003; Schirmer et al., 2004; Hesling et al., 2005). Accordingly, the existence of comparative mechanisms for resolving the emotive significance of prosody in reference to concurrent semantic features of language has been highlighted in the latest research (Kotz et al., 2003; Pell, 2006). In one of our recent studies, we looked at the effects of task and stimulus parameters on the
recognition of emotional prosody in 9 right-hemisphere-damaged (RHD), 11 left-hemisphere-damaged (LHD), and 12 healthy control (HC) listeners (Pell, 2006). Each participant was required to discriminate, identify, and rate expressions of five basic emotions based on the prosodic features of ‘‘pseudo-utterances,’’ which contained no emotionally relevant language cues (e.g., Someone migged the pazing spoken in a ‘‘happy’’ or ‘‘sad’’ tone). Participants were also required to identify emotions from utterances with semantically biasing language content (e.g., I didn’t make the team spoken in a congruent ‘‘sad’’ tone). After comparing the group patterns and inspecting individual performance characteristics across our prosody tasks, we found that both RHD and LHD patients exhibited impairments in our ‘‘pure prosody’’ tasks, which provided no semantic cues for identifying the five emotions (Cancelliere and Kertesz, 1990; Starkstein et al., 1994; Ross et al., 1997; Pell, 1998). For individuals in the RHD group, these difficulties appeared to reflect a relatively pervasive insensitivity to the emotional features of prosodic stimuli, whereas for individuals in the LHD group, their problems appeared to stem to a greater extent from difficulties interpreting prosody in the context of concurrent language cues, especially when semantically biasing utterances were presented. These findings served to reiterate that while understanding emotional prosody engages regions of both hemispheres, the right hemisphere is probably critical for retrieving the emotional details represented by prosodic cues in speech prior to integrating this information with the meanings of language (Friederici and Alter, 2004; Pell, 2006). Thus, despite continued developments in the literature on prosody, there is repeated evidence of the left hemisphere’s superiority for processing prosody as language and of the right hemisphere’s preferential involvement for processing prosody in less-structured, emotive contexts. The prevalence of these findings justifies the opinion that hemispheric sensitivities for prosody are partly directed by the functional significance of prosodic cues in speech (Baum and Pell, 1999; Gandour et al., 2004). This functional hypothesis follows a model proposed by Van Lancker (1980) who underscored the operation of pitch in prosodic communication,
305
and her idea remains a viable tool for grossly differentiating the respective roles of the two hemispheres in the processing of speech prosody.
On the ‘‘attitudinal’’ functions of prosody However, as was further characterized by Van Lancker’s (1980) functional laterality continuum, prosody fulfills a broader array of functions than is currently being investigated. Of principal interest, prosody is known to assume a key role in communicating the interpersonal stance of the speaker in a variety of contexts, where these cues are routinely understood by listeners as ‘‘speaker attitudes’’ (Uldall, 1960; Brown et al., 1974; Ladd et al., 1985). Accepting the view that functional properties of speech prosody govern patterns of laterality in a critical (although perhaps incomplete) manner, an important question is: what neurocognitive mechanisms are engaged when prosody serves to mark the interpersonal or attitudinal meanings of the speaker as its core function in speech? For example, how are the cerebral hemispheres specialized for structuring an internal representation of prosody when it is used as a strategy for communicating the likely veracity of a statement being uttered, the extent to which a speaker wishes to affiliate with a particular idea or individual being discussed, or the speaker’s intended politeness toward the hearer when making a request? Empirical undertakings have largely failed to elucidate how the cerebral hemispheres respond to prosodic events that serve these interpersonal or attitudinal functions in speech. The ability to infer attitudes held by a speaker and to integrate these details with other sources of information during interpersonal events represents a vital area of pragmatic competence, which guides successful communication (Pakosz, 1983). From a social–pragmatic viewpoint, the attitudinal functions of prosody should be considered ‘‘emotive’’ (as opposed to emotional or linguistic) in nature because they encode various relational meanings, which are socially relevant to the speaker–hearer in the interpersonal context in which they appear (Caffi and Janney, 1994). For example, prosodic cues that signal that a speaker is very confident in
what they are saying, if correctly interpreted by the hearer, will shape further aspects of the discourse event in distinctive ways (i.e., the hearer may initiate actions based on the perceived certainty of the information received). Alternatively, using prosody to determine levels of speaker self-identification or self-assertiveness toward the listener is centrally connected to interpreting how polite the speaker intends to be, a form of evaluation that has major consequences on how discourse events unfold. According to certain pragmatic descriptions of emotive communication, prosody and other cues operate as various ‘‘emotive devices’’ in speech, which encode information along such dimensions as evaluation (positive/negative), proximity (near/ far), and quantity (more/less), among others. These devices allow humans to communicate a range of emotive meanings and attitudes that correspond in many ways to traditional psychological concepts of ‘‘evaluation,’’ ‘‘potency,’’ and ‘‘activity’’ (Caffi and Janney, 1994). Other pragmatic frameworks emphasize the manner by which communicative strategies, such as changes in prosody, act to ‘‘attenuate’’ or ‘‘boost’’ the illocutionary force of speech acts that may be inherently positive or negative in their impact on the listener, thereby communicating the degree of belief, commitment, or strength of feeling of the speaker’s intentions (Holmes, 1984). These different concepts can be used as a theoretical basis for investigating how ‘‘prosodic attitudes’’ are encoded and understood from speech in adults with and without unilateral brain damage. Given the evidence that right hemisphere regions are highly sensitive to vocal cues of emotion and other meanings of prosody not formally structured in language, one might reasonably predict that the right hemisphere is centrally implicated when drawing conclusions about the emotive significance of prosodic attitudes. However, as noted earlier, this supposition is founded on little empirical research and there are other variables to consider when studying the attitudinal functions of prosody, which may dictate how cerebral mechanisms respond to events in this more heterogeneous ‘‘functional category.’’ Whereas ‘‘linguistic’’ prosody is formally represented within the
306
language system and ‘‘emotional’’ prosody can at times be divorced from the propositional message while retaining its meaning (e.g., Pell, 2005), the ability to interpret speaker attitudes from prosody is usually informed by comparative relations in the significance of prosody, concurrent language features, and extralinguistic parameters (e.g., situational cues and/or existing knowledge of the listener). In fact, many speaker attitudes are achieved when speakers violate highly conventionalized, cross-channel associations in the use of prosody and linguistic strategies, which acquire emotive meanings through association over time (Burgoon, 1993; Wichmann, 2002). Thus, one must bear in mind that the ability to understand speaker attitudes from prosody is tightly intertwined with functional properties of language (such as speech acts). As well, the prosodic cues that mark particular attitudes tend to coincide with specific linguistic strategies or devices that serve to elaborate the speaker’s intentions. We attempted to control for these factors in a set of new patient studies that focused on the possible contributions of the right hemisphere for processing two emotive attitudes signaled by prosody: speaker confidence and speaker politeness. Specifically, we sought to determine whether the same RHD individuals who displayed impairments in our experiment on emotional prosody (Pell, 2006) would also present with difficulties recognizing speaker attitudes under comparable testing conditions, given the functional adjacency of ‘‘emotive’’ and ‘‘emotional’’ uses of prosody (Van Lancker, 1980). In each of our studies of speaker attitudes, we manipulated the emotive value of prosody as well as linguistic cues to achieve a fine-grained analysis of whether RHD listeners are insensitive to speaker attitudes on the basis of a misuse of prosodic information, linguistic cues, or both.
Understanding prosody as a cue to speaker confidence Verbal and prosodic elements that convey the relative commitment of a speaker to the propositional content of their utterances, or its probable
‘‘truth value’’ to the listener, promote inferences about the level of speaker confidence in the information conveyed (Caffi and Janney, 1994). Prosodic features play an instrumental if not dominant role in how attributions of speaker confidence are made in the auditory modality (Brennan and Williams, 1995; Blanc and Dominey, 2003). Studies of young, healthy listeners indicate that characteristic alterations in loudness, pitch (rising or falling intonation contour), and the temporal patterning of speech (e.g., pauses, speaking rate) are all important for judging the degree of a speaker’s confidence in an assertion being made; a high level of speaker confidence is identified through increased loudness of voice, rapid rate of speech, short and infrequent pauses, and a terminal fall in the intonation contour (Scherer et al., 1973; Kimble and Seidel, 1991; Barr, 2003). In contrast, low speaker confidence (i.e., doubt) corresponds with longer pre-speech delay, (filled) pauses, and a higher probability of rising intonation or raised pitch (Smith and Clark, 1993; Brennan and Williams, 1995; Boltz, 2005). In addition to prosody, verbal ‘‘hedges’’ (e.g., I thinky, Probablyy, I’m surey) play a role in judging speaker confidence by attenuating or boosting the perceived truth value of the utterance content (Holmes, 1984; Caffi and Janney, 1994), although it is unclear how much weight these verbal cues impose on listeners relative to prosody. Our first study of prosodic attitudes sought to delineate whether RHD patients, many of whom demonstrated impairments for recognizing emotional prosody (Pell, 2006), exhibit concurrent difficulties for recognizing speaker confidence in speech, especially prosodic markers of this attitude. Each of our participants was asked to rate the perceived level of speaker confidence when listening to simple assertions indicating the probable location of an object. Given the possible role of both prosody and verbal strategies for conveying confidence, and the possibility that some RHD patients attend largely to verbal content when making complex interpretations based on redundant cues (Bowers et al., 1987; Brownell et al., 1992), we evaluated our subjects in separate conditions where utterances contained both prosodic and lexical choices for understanding speaker
307
confidence or prosodic cues alone. In the event that RHD patients display impairments for judging speaker confidence as predicted, this design should inform whether failures to understand speaker attitudes such as confidence are linked to prosodic impairments or to a more general misuse of speech-related cues for inferring speaker attitudes in a normal manner. Study 1: evaluating speaker confidence following RHD As reported in full by Pell (under review), we tested two groups of right-handed, English-speaking participants for whom we already had detailed information on their ability to discriminate, categorize, and rate basic emotions from prosody (Pell, 2006). Our patient group consisted of nine righthemisphere-damaged (RHD) participants (four males, five females, mean age ¼ 64.2 years) who had each suffered a single, thromboembolic event with anatomical lesion of the right cerebral hemisphere (see Pell, 2006 for details). All RHD patients were tested during the chronic stage of their stroke (post-onset, range ¼ 2.0–11.9 years). Our control group consisted of eleven healthy control (HC) participants without neurological damage (five males, six females, mean age ¼ 63.4 years), all but one of whom took part in our study of emotional prosody. All RHD and HC participants displayed good hearing and completed a battery of neuropsychological tests, which are summarized in
Table 1, together with data on how the groups performed in our study of emotional prosody. Each participant listened to a series of sentences (6–11 syllables in length), which were constructed to fit two distinct conditions for inferring speaker confidence. In the ‘‘linguistic’’ condition, stimuli were semantically informative statements (e.g., You turn left at the lights) that began with linguistic phrases such as for sure, most likely, or perhaps to convey a relatively high, moderate, or low degree of speaker confidence through linguistic and prosodic cues of these utterances. In the ‘‘prosody’’ condition, comparable pseudo-utterances were constructed to resemble the stimuli entered into the linguistic condition (e.g., You turn left at the lights — You rint mig at the flugs). Each pseudoutterance was produced to communicate a high, moderate, and low degree of confidence by manipulating only prosodic features of the utterances. Basic acoustic analyses of the utterances entered into each task were undertaken and these differences are summarized in Table 2. All stimuli were digitally recorded by four male speakers of English and subjected to thorough pilot testing to establish the perceptual validity of each token prior to entering these materials in the patient study. Independently for the ‘‘linguistic’’ and ‘‘prosody’’ tasks, each participant was instructed to listen to each sentence and then rate the degree of confidence expressed by the speaker on a five-point continuous scale, where ‘‘1’’ signifies the speaker was ‘‘not at all confident’’ and ‘‘5’’ signifies that
Table 1. Neuropsychological features of the healthy control (HC) and right-hemisphere-damaged (RHD) groups, including data on the ability to identify basic emotions from prosody (Pell, 2006) (mean 7 standard deviation converted to percent correct) Measure
Discriminating emotional prosody (/30) Identifying emotion prosody from pseudo-utterances (‘‘pure prosody’’ task, /40) Identifying emotional prosody with congruent semantic cues (‘‘prosody-semantic’’ task, /40) Identifying emotion from faces (/40) Identifying emotion from verbal scenarios (/10) Mini-mental state exam (/30) Benton phoneme discrimination (/30) Benton face discrimination (/54) Verbal working memory — words recalled (/42)a a
Four of the RHD patients (R3, R7, R8, R9) did not complete this task.
HC
RHD
(n ¼ 11)
(n ¼ 9)
79.079.6 68.8724.0 79.5724.1 86.379.1 86.0712.0 — 92.176.7 87.277.1 —
68.5711.8 43.9727.1 65.0725.9 68.6727.3 64.0725.5 90.777.2 82.6710.1 77.876.5 51.4711.1
308 Table 2. Acoustic measures of stimuli entered into the confidence study by task, according to levels of expressed confidence Confidence level Linguistic condition
Prosody condition
High Moderate Low High Moderate Low
Mean f0 (Hz)
f0 Range (Hz)
Speech rate (seconds/syllable)
110 130 133 116 127 136
58 86 87 62 56 103
0.19 0.23 0.27 0.23 0.24 0.33
the speaker was ‘‘very confident.’’ The numerical ratings assigned were analyzed between groups and across experimental conditions. When the mean ratings were examined for each group, it was shown that both the healthy and the RHD participants differentiated high, moderate, and low confidence utterances by assigning progressively lower ratings to items within these categories for both the ‘‘linguistic’’ (HC ¼ 4.57, 3.03, 1.68; RHD ¼ 4.01, 2.92, 2.40) and the ‘‘prosody’’ (HC ¼ 4.37, 3.01, 1.61; RHD ¼ 3.47, 2.47, 1.90) tasks. However, the ability of the RHD group to detect graded differences among the three confidence levels was noticeably reduced in range based on the mean conditional ratings when compared to the HC group, especially in the prosody task. When we examined the frequency distribution of ratings assigned at each interval of the rating scale, we found that members of the RHD and HC groups responded in a distinct manner to speaker confidence cues, implying that the RHD patients were less sensitive to meaningful distinctions in the degree of speaker confidence encoded by stimuli in both our ‘‘linguistic’’ and ‘‘prosody’’ conditions (see Figs. 1a, b). Closer inspection of these data revealed that the ability to interpret speaker confidence from prosodic cues alone was especially problematic for the RHD group; as illustrated in Fig. 1b, there were marked qualitative differences between the groups, which suggested that the RHD patients frequently did not detect overt prosodic indicators of ‘‘speaking with confidence,’’ such as reductions in pitch and a relatively fast speech rate (Scherer et al., 1973; Kimble and Seidel, 1991; Pell, under review). This claim seems particularly true when RHD patients rated ‘‘high’’ confidence utterances from prosody alone where these cues were highly salient to healthy partici-
pants (with frequent ‘‘4’’ and ‘‘5’’ responses), but not to members of the patient group, yielding a highly divergent pattern in this context. Nonetheless, the RHD patients displayed a certain capacity to identify ‘‘high confidence’’ exemplars when lexical cues were present (see Fig. 1a).
Implications of the results on speaker confidence In general, we found that there was little difference in how healthy participants rated speaker confidence according to whether utterances contained lexical phrases (‘‘hedges’’) in addition to representative prosodic features to confidence. In contrast, the RHD listeners appeared to rely more strongly on lexical markers for inferring the extent of speaker confidence, with reduced capacity to use prosodic cues alone to infer the extent of speaker confidence. For example, the RHD patients tended to assign significantly higher ratings of speaker confidence in the linguistic vs. the prosody condition, and exhibited a selective tendency for judging speakers as less or ‘‘not at all’’ confident on the basis of exposure to prosodic cues alone when the two conditions were compared. These results argue that lexical hedges such as ‘‘I’m sure’’ or ‘‘probably’’ assumed very little weight when making attributions about speaker confidence for healthy listeners, but rather, that prosodic features were the decisive factor for rating speaker confidence in both conditions (Kimble and Seidel, 1991; Brennan and Williams, 1995; Barr, 2003). Accordingly, it is likely that the observed impairments for recognizing speaker confidence by our RHD patients revolved significantly around a failure to analyze prosody in both of our experimental conditions.
309
Fig. 1. The frequency of responses assigned at each interval of the five-point confidence scale by participants with right hemisphere damage (RHD) and healthy controls (HC) in the linguistic and prosody conditions (frequency counts were converted to proportions).
In addition to demonstrating that RHD patients are less sensitive to prosodic attributes of speaker confidence, our initial experiment revealed that RHD individuals tend to be more attuned to lexical features when judging this attitude (in spite of the fact that lexical cues are often less informative to healthy listeners when inferring speaker confidence). The idea that RHD patients frequently accord greater weight to the value of linguistic/verbal cues over prosody — perhaps as a compensatory attempt to understand attitudinal meanings, which are encoded to a large extent through prosody — has been described previously when RHD patients have performed other interpretive language tasks (Tompkins and Mateer, 1985; Bowers et al., 1987; Brownell et al., 1992; Pell and Baum, 1997). For our experiment on speaker confidence, this abnormal tendency to focus on the language channel (to the extent possible) may have been directed by subtle impairments for processing the prosodic underpinnings of con-
fident attitudes following damage to the right hemisphere. One limitation of our study of speaker confidence was that we were only able to compare confidence judgments based on prosody alone to the added (i.e., congruent) effects of lexical cues, which proved to be relatively minimal in importance for our stimuli. This precluded an analysis of many naturally occurring conditions in which listeners must interpret speaker attitudes when the meaning of prosody and language content sometimes conflict, such as for understanding the politeness of a speaker who is making a request to the listener. This motivated a second experiment that probed how RHD patients interpret speaker politeness under various stimulus conditions in which prosodic strategies interact more extensively with linguistic strategies to indicate particular interpretations of this attitude, furnishing additional clues about the source of difficulties for processing speaker attitudes in our RHD patients.
310
Understanding prosody as a cue to speaker politeness Emotive attributes of speech, which reflect the selfassertiveness of the speaker vis-a`-vis the listener, are strongly associated with the notion of ‘‘politeness principles,’’ or the operation of ‘‘volitionality devices’’ (Caffi and Janney, 1994). Politeness is conveyed linguistically through word selection (e.g., please) and through choices in sentence structure; in the latter case, perceived politeness tends to increase in the face of ‘‘conventional indirectness,’’ for example, when a command is framed as an optional request posed to the listener (Clark and Schunk, 1980; Blum-Kulka, 1987; Brown and Levinson, 1987). At the level of prosody, politeness is communicated in large part through conventionalized choices in intonational phrasing; utterances with high/rising pitch tend to be perceived as more polite than those with a terminal falling contour (Loveday, 1981; Wichmann, 2002; Culpeper et al., 2003), as are utterances produced with decreased loudness (Trees and Manusov, 1998; Culpeper et al., 2003). The idea that prosody and utterance type necessarily combine to signal speaker attitudes concerning the politeness of requests by each modifying (i.e., attenuating or boosting) the negative, illocutionary force of the request to the listener is well accepted (Trees and Manusov, 1998; Wichmann, 2002; LaPlante and Ambady, 2003). Although research has examined how RHD patients process the meaning of direct vs. indirect requests (Hirst et al., 1984; Weylman et al., 1989; Stemmer et al., 1994), the specific role of prosody in judging the politeness of requests following RHD has not been carefully explored (Foldi, 1987). In our follow-up study, we assessed whether listeners who took part in our investigations of emotional prosody and speaker confidence displayed normal sensitivity to speaker politeness based on the complex interplay of linguistic and prosodic strategies for understanding this attitude (or based on prosodic features alone). We anticipated that RHD listeners would again experience difficulties using prosodic information to recognize speaker attitudes of politeness but that these impairments would be most pronounced in condi-
tions where this attitude is marked by divergent cues in the prosody and language channels.
Study 2: evaluating speaker politeness following RHD For this study, we recruited six RHD patients and ten HC subjects who originally participated in our studies of speaker confidence and emotional prosody. Our stimuli were English sentences phrased in the form of a command or request for a simple action to be performed by the listener, recorded by two female speakers. Again, distinct items were carefully prepared to enter into a ‘‘linguistic’’ task, for eliciting politeness judgments based on combined prosody and lexical–semantic cues, and a ‘‘prosody’’ task, for eliciting politeness judgments based on prosodic features alone. The linguistic stimuli were eight ‘‘stem’’ commands (e.g., Do the dishes), which were linguistically modified to change the emotive force of the command in four distinct ways (see Table 3 for a summary). ‘‘Direct’’ utterances contained initial phrases, which linguistically boosted the negative intent of the command, whereas ‘‘indirect,’’ ‘‘very indirect,’’ and ‘‘please’’ utterances employed conventional indirectness or explicit lexical markers to attenuate the negative force of the command. Each linguistic utterance type was then produced by the actors in two prosodic modes: with a high/rising tone that tends to attenuate the imposition of requests (i.e., be interpreted as polite); and a falling tone that tends to boost the negativity of the request (i.e., less polite) (Loveday, 1981; Culpeper et al., 2003). As demonstrated in Table 3, some of these cue combinations led to stimuli in which linguistic and prosodic conventions for understanding speaker politeness were pragmatically opposed (e.g., indirect language spoken with a low prosody). Stimuli presented in the prosody task were comparable pseudo-utterances produced by the two actors in a ‘‘rising’’ and a ‘‘falling’’ tone as was accomplished in the linguistic condition. Again, basic acoustic differences in our stimuli were explored, as reported in Table 4. Each RHD and HC participant listened to all sentences in the linguistic and prosody conditions and indicated on a
311 Table 3. Examples of stimuli presented in the experiment on speaker politeness by condition, according to whether language and prosodic features served to ‘‘attenuate’’ or ‘‘boost’’ the negative impact of the request on the listener
Linguistic condition
Prosody condition
Utterance type
Example
Prosody
Emotive impact of utterance type/prosody
Stem command
Do the dishes
Direct
You must do the dishes
Indirect
Can you do the dishes
Very indirect
Could I bother you to do the dishes
Please
Please do the dishes
Pseudo-utterance
Gub the mooshes
High Low High Low High Low High Low High Low High Low
Boost/attenuate Boost/boost Boost/attenuate Boost/boost Attenuate/attenuate Attenuate/boost Attenuate/attenuate Attenuate/ boost Attenuate/attenuate Attenuate/boost –-/attenuate –/boost
Table 4. Acoustic measures of stimuli entered into the politeness study by task, according to manipulations in utterance type and prosody Utterance type
Linguistic condition
Prosody condition
Mean f0 (Hz)
Stem command Direct Indirect Very indirect Please Pseudo-utterance
f0 range (Hz)
Falling prosody
Rising prosody
Falling prosody
Rising prosody
228 228 216 229 218 226
277 268 258 270 257 268
151 158 139 218 165 175
192 184 221 312 180 254
five-point (1–5) scale how ‘‘polite the speaker sounds,’’ where ‘‘1’’ represented ‘‘not-at-all polite’’ and ‘‘5’’ represented ‘‘very polite.’’ On the basis of the mean politeness ratings assigned by members of each group, we found that sentences spoken in a high vs. a low prosody had the largest influence on politeness ratings; for both the HC and RHD listeners, sentences spoken with a high prosody were always judged to be more polite than those spoken in a low prosody, in both the linguistic (HC: high ¼ 3.54, low ¼ 1.60; RHD: high ¼ 3.40, low ¼ 2.06) and the prosody (HC: high ¼ 4.20, low ¼ 1.73; RHD: high ¼ 3.98, low ¼ 2.13) tasks. As expected, the mean ratings assigned by the healthy listeners in the linguistic condition indicated that prosodic factors interacted significantly with utterance type to determine the perceived degree of speaker politeness for
Speech rate (Seconds/ Syllable) Falling prosody 3.87 3.79 4.63 4.32 2.93 3.81
Rising prosody 4.27 4.68 4.74 4.70 3.95 4.05
healthy listeners (Trees and Manusov, 1998; LaPlante and Ambady, 2003). For example, when sentences with overt linguistic mitigating strategies of indirectness (Could I bother you to do the dishes) were produced in a corresponding high (polite) tone, HC listeners perceived these stimuli as the most polite form of request, whereas the same sentences produced in a low (impolite) tone were rated as no more polite than listening to stem commands (Do the dishes). However, we found no comparable evidence that the mean politeness ratings assigned by the RHD group were influenced by the combined effects of prosody and utterance type. Thus, although the RHD patients seemed capable of using gross differences in prosody to judge politeness, the subtle interactions of prosody and utterance type, which served to alter the perceived politeness of speakers for healthy listeners,
312
were recognized less frequently by members of the RHD group. Given the interplay of utterance type and prosody for understanding speaker politeness, we conducted further analyses in our linguistic condition to see whether the groups differed when levels of utterance type and prosody were defined strictly by their emotive function as a device for ‘‘attenuating’’ or ‘‘boosting’’ the negativity of requests. Previous research involving healthy adults implies that unconventional associations of linguistic and prosodic cues, which conflict in their emotive function, are most likely to generate a negative (impolite) evaluation of the utterance (LaPlante and Ambady, 2003). After analyzing the frequency distribution of ratings recorded at each interval of the five-point politeness scale, we noted that the two groups differed significantly only in the two conditions in which prosody and language conflicted in their emotive intent (see Fig. 2). Most notably, the RHD patients were relatively insensitive to the intended politeness of speakers when
linguistic strategies were employed to render the request more polite in conjunction with a disconfirming, low, and impolite prosody (top right graph in Fig. 2). In this context, RHD patients tended to rate these utterances as more polite than the HC listeners.
Implications of the results on speaker politeness In general, results of the politeness experiment indicate that RHD patients were largely able to interpret whether highly conventionalized utterance types (Stemmer et al., 1994) or differences in intonational phrasing were acting to attenuate or boost the negative attitude of the speaker when the operation of cues in each channel was considered independently. However, the RHD group recognized fewer meaningful distinctions among our stimulus conditions and demonstrated little sensitivity to the interplay of prosodic and linguistic factors, which combine to signal speaker politeness. The fact that
Fig. 2. The frequency of responses assigned at each interval of the five-point politeness scale by participants with right hemisphere damage (RHD) and healthy controls (HC) according to the combined function of prosody vs. language cues (frequency counts were converted to proportions).
313
RHD patients were sensitive to broad differences in prosody when making politeness ratings (especially when pseudo-utterances were presented) implies that these listeners were able to judge attitudes that revolve around conventional, categorical choices in contour type (Scherer et al., 1984; Wichmann, 2002). These findings in our politeness experiment contrast with arguments we arrived at when studying speaker confidence, where there was strong evidence that members of our RHD group could not meaningfully process representative prosodic cues to make appropriate decisions about this speaker attitude. What posed greatest difficulties for RHD listeners was to render interpretations of speaker politeness when these were signaled by functional discrepancies in the emotive force of linguistic vs. prosodic cues of the stimuli. When conventional linguistic strategies for attenuating the imposition of requests (e.g., Would you mindy) were conjoined with conventional prosodic choices, which helped boost the impoliteness (i.e., perceived dominance) of the speaker, RHD patients were less sensitive to the intended politeness of these utterances. These deficits were most evident in our condition, which resembled ‘‘mock politeness’’ (i.e., very indirect language + low prosody) where RHD patients failed to detect the negative/ impolite attitude of the speaker, a situation that resembles the case of sarcasm, which is commonly encountered in human discourse. Since RHD adults were fundamentally aware of how prosody functions to signal politeness based on these features alone (prosody condition), it is likely that a major source of their errors in the politeness task was the ability to engage in a comparative analysis of relevant cues across the two speech channels (Foldi, 1987; Weylman et al., 1989). Possibly, our RHD patients were less efficient at detecting violations in the emotive significance of language and prosody precluding inferential processes, which would highlight the intended (i.e., nonconventional) attitude of the speaker in these situations. This explanation places the locus of their deficit in our politeness experiment closer to the stage of applying and integrating social and affective cues, which are relevant to understanding the intended meaning of
communicative events (Brownell et al., 1992, 1997; Cheang and Pell, 2006). Finally, in our politeness experiment we uncovered further indications that RHD patients assigned greater weight to the functional significance of language over prosodic content (Tompkins and Mateer, 1985; Bowers et al., 1987; Pell and Baum, 1997), although for speaker politeness the apparent ‘‘language focus’’ of our RHD patients did not reflect an incapacity to process relevant prosodic distinctions to this attitude. It is possible that many of the patients’ difficulties emerged at a stage of retrieving learned associations between utterance type and intonation, which are routinely used by listeners to infer speaker politeness, particularly in instances when violations of these expectations occur and generate distinct attitudinal interpretations (Wichmann, 2002). More research will be needed to evaluate these initial hypotheses.
Individual profiles of impairment for understanding emotions versus prosodic attitudes One of the intended benefits of investigating right hemisphere contributions to speaker attitudes was to compare these findings to data on how the same individuals recognized expressions of basic emotions from prosody. Given the consistent deficits of our RHD group in each of these experiments (Pell, 2006; Pell, under review), our findings establish a platform for predicting that many RHD patients who fail to comprehend emotional prosody also fail to normally appreciate many of the emotive attitudes conveyed by speakers at the level of prosody. However, as emphasized in our previous research (Pell and Baum, 1997; Pell, 2006), not all RHD individuals display receptive impairments for emotional prosody and this was also true of our data on speaker attitudes. As a supplementary measure, we compared individual performance features of the RHD patients in our confidence and politeness studies in relation to their individual scores on tasks for understanding emotional prosody (Pell, 2006). We detected considerable overlap in the ability of individual RHD patients to judge speaker confidence and to judge vocal emotions in our ‘‘pure
314
prosody’’ identification task (Pell, 2006). Three RHD participants (R3, R5, R6) who were markedly ‘‘impaired’’ in the recognition of speaker confidence, based on the expected range of their mean confidence ratings, were also the most impaired individuals for judging basic emotions from prosodic cues alone (Pell, 2006). Participant R5 who had a large temporoparietal lesion and whose deficits for emotional prosody were most pronounced was also most severely impaired for judging speaker confidence (assigning mean ratings of 2.1, 2.2, and 2.2 for low, moderate, and high confidence utterances, respectively). Finally, it is noteworthy that two RHD patients who were unimpaired in processing emotional prosody (R1, R7) also performed normally in the confidence experiment, with mean conditional ratings approximating the HC group values. While attrition of our RHD group (including participants R5 and R6) precluded a similar analysis of individual data on speaker politeness with our emotional prosody results, it is nonetheless apparent that deficits for emotional prosody in our RHD sample frequently correlated with deficits for understanding speaker confidence, and perhaps, other speaker attitudes pending more research that explores these issues in greater depth.
On the source of failures to appreciate prosodic attitudes following RHD It would appear that for many RHD patients, a failure to appreciate speaker attitudes may reflect impairments at slightly different stages of processing depending on the manner in which prosody, language, and other variables which define speaker attitudes are assigned meaning by the interpersonal context. In our experiment on speaker confidence, prosody was always the dominant cue for inferring this attitude, justifying our hypothesis that difficulties relating specifically to prosody governed the RHD patients’ difficulties in that task. Like basic emotions, vocal features, which communicate distinctions in confidence/doubt, are likely represented by multiple prosodic elements with continuous expression in the speech signal (Scherer et al., 1973, 1984). For this reason, an underlying
prosodic defect in our RHD patients may have contributed to difficulties in both the confidence and emotional prosody tasks (Pell, 2006). In contrast, our data on speaker politeness highlighted the interdependence of prosody and language content for understanding this attitude and established that the same RHD patients could successfully interpret categorical distinctions in intonation contours for judging the polite or impolite attitude of speakers. At the level of prosody alone, this task likely resembles the ability of many RHD patients to recognize highly conventionalized, phonological pitch categories, which is known to be intact in most RHD adults (Baum and Pell, 1999; Hsieh et al., 2001). However, RHD patients performed abnormally when prosody and language choices combined in an unconventional (discordant) manner to communicate politeness. This places the likely source of impairment for understanding certain attitudes such as politeness, not at the level of prosody per se, but in resolving these attitudinal meanings based on the comparative weight of prosody with other socially relevant cues such as utterance type. This idea fits with a well-established literature indicating that RHD patients do not successfully integrate all relevant sources of information when intended meanings are noncanonical or nontransparent in form (for a review, see Brownell et al., 1997; Martin and McDonald, 2003). Along similar lines, our data on speaker attitudes such as politeness imply that RHD patients do not always incorporate their evaluations of prosodic information with the significance of language content, which was often the primary focus of these listeners. Until further data are gathered to inform these issues in a more detailed manner, one can argue that (1) when continuous changes in multiple prosodic elements weigh heavily for assigning speaker attitudes (e.g., confidence), or (2) when prosody must be resolved with associated parameters of the utterance to mark noncanonical interpretations of the expressed attitude (e.g., speaker politeness), RHD patients often fail to generate appropriate inferences about the emotive intentions of the speaker. In either case, one must assume that these specific deficits are part of a wider array of ‘‘pragmatic language deficits’’ exhibited by many RHD patients who show marked difficulties in the generation
315
of inferences from socially and emotionally relevant information in speech (Happe et al., 1999; see Martin and McDonald, 2003 for a recent overview). It will be highly informative to see these findings extended to a broader array of interpersonal contexts that revolve around prosody usage, to a larger sample of RHD participants, and studied through other investigative approaches. Our data also call on researchers to explore how emotive features of prosody are interpreted by other brain-damaged populations who exhibit deficits for emotional prosody, such as patients with acquired left hemisphere lesions (Cancelliere and Kertesz, 1990; Van Lancker and Sidtis, 1992; Pell, 1998, 2006; Adolphs et al., 2002; Charbonneau et al., 2003) or Parkinson’s disease (Breitenstein et al., 1998; Pell and Leonard, 2003). Although it is our standpoint that emotional and emotive meanings of prosody rely on mandatory right hemisphere contributions during speech comprehension, our findings are inconclusive regarding the probable input of left hemisphere (Kotz et al., 2003) and subcortical (Pell and Leonard, 2003) regions in this processing. Given the increasingly strong evidence of interhemispheric cooperation in how emotional prosody is processed from speech (see Kotz et al., in this volume), coupled with the notion that emotive and interpersonal meanings of prosody bear an important functional relationship to emotional prosody, one would predict that bilateral hemispheric mechanisms are engaged when processing speaker attitudes embedded in spoken language content (Pell, 2006). Through careful inspection of how speaker attitudes are dictated by the physical form of prosody, the degree to which prosodic parameters are conventionalized by the speaker–hearer, and the relationship of the prosodic message to conjoined linguistic, contextual, and other relational variables, future research will undoubtedly culminate in increasingly finer descriptions of how speaker attitudes are processed in the human mind/brain.
Abbreviations HC LHD RHD
healthy control left-hemisphere-damaged right-hemisphere-damaged.
Acknowledgments The author thanks Tamara Long and Alana Pearlman for their helpful comments and assistance with data collected for the studies on speaker confidence and speaker politeness, respectively; Elmira Chan and Nicole Hallonda Price for additional help with data collection, data organization, and manuscript preparation; the Canadian Institutes for Health Research (Institute of Aging) and the Natural Sciences and Engineering Research Council of Canada for generous operating support; and McGill University for investigator support (William Dawson Chair). References Adolphs, R., Damasio, H. and Tranel, D. (2002) Neural systems for recognition of emotional prosody: a 3-D lesion study. Emotion, 2: 23–51. Barr, D.J. (2003) Paralinguistic correlates of conceptual structure. Psychonomic Bull. Rev., 10: 462–467. Baum, S.R. and Pell, M.D. (1999) The neural bases of prosody: insights from lesion studies and neuroimaging. Aphasiology, 13: 581–608. Blum-Kulka, S. (1987) Indirectness and politeness in requests: same or different? J. Prag., 11: 131–146. Blanc, J.-M. and Dominey, P. (2003) Identification of prosodic attitudes by a temporal recurrent network. Cogn. Brain Res., 17: 693–699. Boltz, M.G. (2005) Temporal dimensions of conversational interaction: the role of response latencies and pauses in social impression formation. J. Lang. Soc. Psychol., 24: 103–138. Bowers, D., Coslett, H.B., Bauer, R.M., Speedie, L.J. and Heilman, K.M. (1987) Comprehension of emotional prosody following unilateral hemispheric lesions: processing defect versus distraction defect. Neuropsychologia, 25: 317–328. Breitenstein, C., Daum, I. and Ackermann, H. (1998) Emotional processing following cortical and subcortical brain damage: contribution of the fronto-striatal circuitry. Behav. Neurol., 11: 29–42. Brennan, S.E. and Williams, M. (1995) The feeling of another’s knowing: prosody and filled pauses as cues to listeners about the metacognitive states of speakers. J. Mem. Lang., 34: 383–398. Brown, P. and Levinson, S. (1987) Politeness: Some Universals in Language Usage. Cambridge University Press, Cambridge. Brownell, H.H., Carroll, J.J., Rehak, A. and Wingfield, A. (1992) The use of pronoun anaphora and speaker mood in the interpretation of conversational utterances by right hemisphere brain-damaged patients. Brain Lang., 43: 121–147. Brownell, H., Pincus, D., Blum, A., Rehak, A. and Winner, E. (1997) The effects of right-hemisphere brain damage on
316 patients’ use of terms of personal reference. Brain Lang., 57: 60–79. Brown, B.L., Strong, W.J. and Rencher, A.C. (1974) Fifty-four voices from two: the effects of simultaneous manipulations of rate, mean fundamental frequency, and variance of fundamental frequency on ratings of personality from speech. J. Acoust. Soc. Am., 55: 313–318. Burgoon, J.K. (1993) Interpersonal expectations, expectancy violations, and emotional communication. J. Lang. Soc. Psychol., 12: 30–48. Caffi, C. and Janney, R.W. (1994) Toward a pragmatics of emotive communication. J. Prag., 22: 325–373. Cancelliere, A.E.B. and Kertesz, A. (1990) Lesion localization in acquired deficits of emotional expression and comprehension. Brain Cogn, 13: 133–147. Charbonneau, S., Scherzer, B.P., Aspirot, D. and Cohen, H. (2003) Perception and production of facial and prosodic emotions by chronic CVA patients. Neuropsychologia, 41: 605–613. Cheang, H.S. and Pell, M.D. (2006) A study of humour and communicative intention following right hemisphere stroke. J. Clin. Linguist. Phonet., 20: 447–462. Clark, H.H. and Schunk, D. (1980) Polite responses to polite requests. Cognition, 8: 111–143. Culpeper, J., Bousfield, D. and Wichmann, A. (2003) Impoliteness revisited: with special reference to dynamic and prosodic aspects. J. Prag., 35: 1545–1579. Foldi, N.S. (1987) Appreciation of pragmatic interpretations of indirect commands: comparison of right and left hemisphere brain-damaged patients. Brain Lang., 31: 88–108. Friederici, A. and Alter, K. (2004) Lateralization of auditory language functions: a dynamic dual pathway model. Brain Lang., 89: 267–276. Gandour, J., Tong, Y., Wong, D., Talavage, T., Dzemidzic, M., Xu, Y., Li, X. and Lowe, M. (2004) Hemispheric roles in the perception of speech prosody. NeuroImage, 23: 344–357. Gandour, J., Wong, D., Dzemidzic, M., Lowe, M., Tong, Y. and Li, X. (2003) A cross-linguistic fMRI study of perception of intonation and emotion in Chinese. Hum. Brain Mapp., 18: 149–157. Happe, F., Brownell, H. and Winner, E. (1999) Acquired theory of mind impairments following stroke. Cognition, 70: 211–240. Hesling, I., Clement, S., Bordessoules, M. and Allard, M. (2005) Cerebral mechanisms of prosodic integration: evidence from connected speech. NeuroImage, 24: 937–947. Hirst, W., Ledous, J. and Stein, S. (1984) Constraints on the processing of indirect speech acts: evidence from aphasiology. Brain Lang., 23: 26–33. Holmes, J. (1984) Modifying illocutionary force. J. Prag., 8: 345–365. Hsieh, L., Gandour, J., Wong, D. and Hutchins, G. (2001) Functional heterogeneity of inferior frontal gyrus is shaped by linguistic experience. Brain Lang., 76: 227–252. Kimble, C.E. and Seidel, S. (1991) Vocal signs of confidence. J. Nonver. Behav., 15: 99–105.
Kotz, S., Meyer, M., Alter, K., Besson, M., Von Cramon, Y. and Friederici, A. (2003) On the lateralization of emotional prosody: an event-related functional MR investigation. Brain Lang., 86: 366–376. Ladd, D.R., Silverman, K.E.A., Tolkmitt, F., Bergmann, G. and Scherer, K.R. (1985) Evidence for the independent function of intonation contour type, voice quality, and Fo range in signaling speaker affect. J. Acoust. Soc. Am., 78: 435–444. Laplante, D. and Ambady, N. (2003) On how things are said: voice tone, voice intensity, verbal content and perceptions of politeness. J. Lang. Soc. Psychol., 22: 434–441. Loveday, L. (1981) Pitch, politeness and sexual role: an exploratory investigation into the pitch correlates of English and Japanese politeness formulae. Lang. Speech, 24: 71–89. Martin, I. and McDonald, S. (2003) Weak coherence, no theory of mind, or executive dysfunction? Solving the puzzle of pragmatic language disorders. Brain Lang., 85: 451–466. Mitchell, R.L.C., Elliott, R., Barry, M., Cruttenden, A. and Woodruff, P.W.R. (2003) The neural response to emotional prosody, as revealed by functional magnetic resonance imaging. Neuropsychologia, 41: 1410–1421. Pakosz, M. (1983) Attitudinal judgments in intonation: some evidence for a theory. J. Psycholinguist. Res., 12: 311–326. Pell, M.D. (1998) Recognition of prosody following unilateral brain lesion: influence of functional and structural attributes of prosodic contours. Neuropsychologia, 36: 701–715. Pell, M.D. (2001) Influence of emotion and focus location on prosody in matched statements and questions. J. Acoust. Soc. Am., 109: 1668–1680. Pell, M.D. (2005) Prosody-face interactions in emotional processing as revealed by the facial affect decision task. J. Nonver. Behav., 29: 193–215. Pell, M.D. (2006) Cerebral mechanisms for understanding emotional prosody in speech. Brain Lang., 96: 221–234. Pell, M.D. Reduced sensitivity to ‘‘prosodic attitudes’’ in adults with focal right hemisphere brain damage. Brain Lang., in press. Pell, M.D. and Baum, S.R. (1997) The ability to perceive and comprehend intonation in linguistic and affective contexts by brain-damaged adults. Brain Lang., 57: 80–99. Pell, M.D. and Leonard, C.L. (2003) Processing emotional tone from speech in Parkinson’s disease: a role for the basal ganglia. Cogn. Affect. Behav. Neurosci., 3: 275–288. Plante, E., Creusere, M. and Sabin, C. (2002) Dissociating sentential prosody from sentence processing: activation interacts with task demands. NeuroImage, 17: 401–410. Poeppel, D. (2003) The analysis of speech in different temporal integration windows: cerebral lateralization as ‘asymmetric sampling in time’. Speech Commun., 41: 245–255. Ross, E.D., Thompson, R.D. and Yenkosky, J. (1997) Lateralization of affective prosody in brain and the collosal integration of hemispheric language functions. Brain Lang., 56: 27–54. Scherer, K.R., Ladd, D.R. and Silverman, K. (1984) Vocal cues to speaker affect: testing two models. J. Acoust. Soc. Am., 76: 1346–1356.
317 Scherer, K.R., London, H. and Wolf, J.J. (1973) The voice of confidence: paralinguistic cues and audience evaluation. J. Res. Person., 7: 31–44. Schirmer, A., Zysset, S., Kotz, S. and Von Cramon, D.Y. (2004) Gender differences in the activation of inferior frontal cortex during emotional speech perception. NeuroImage, 21: 1114–1123. Sidtis, J. and Van Lancker Sidtis, D. (2003) A neurobehavioral approach to dysprosody. Semin. Speech Lang., 24: 93–105. Smith, V.L. and Clark, H.H. (1993) On the course of answering questions. J. Mem. Lang., 32: 25–38. Starkstein, S.E., Federoff, J.P., Price, T.R., Leiguarda, R.C. and Robinson, R.G. (1994) Neuropsychological and neuroradiologic correlates of emotional prosody comprehension. Neurology, 44: 515–522. Stemmer, B., Giroux, F. and Joanette, Y. (1994) Production and evaluation of requests by right-hemisphere brain-damaged individuals. Brain Lang., 47: 1–31. Tompkins, C.A. and Mateer, C.A. (1985) Right hemisphere appreciation of prosodic and linguistic indications of implicit attitude. Brain Lang., 24: 185–203. Tong, Y., Gandour, J., Talavage, T., Wong, D., Dzemidzic, M., Xu, Y., Li, X. and Lowe, M. (2005) eural circuitry underlying sentence-level linguistic prosody. NeuroImage, 28: 417–428. Trees, A.R. and Manusov, V. (1998) Managing face concerns in criticism: integrating nonverbal behaviors as a dimension of politeness in female friendship dyads. Hum. Commun. Res., 24: 564–583.
Uldall, E. (1960) Attitudinal meanings conveyed by intonation contours. Lang. Speech, 3: 223–234. Van Lancker, D. (1980) Cerebral lateralization of pitch cues in the linguistic signal. Pap. Linguist., 13: 201–277. Van Lancker, D. and Sidtis, J.J. (1992) The identification of affective-prosodic stimuli by left- and right-hemisphere-damaged subjects: all errors are not created equal. J. Speech Hearing Res., 35: 963–970. Vingerhoets, G., Berckmoes and Stroobant, N. (2003) Cerebral hemodynamics during discrimination of prosodic and semantic emotion in speech studied by transcranial Doppler ultrasonography. Neuropsychology, 17: 93–99. Weylman, S., Brownell, H., Roman, M. and Gardner, H. (1989) Appreciation of indirect requests by left- and right-braindamaged patients: the effects of verbal context and conventionality of wording. Brain Lang., 36: 580–591. Wichmann, A. (2002) Attitudinal intonation and the inferential process. In: Bel, B., Marlien, I. (Eds.), Proceedings of Speech Prosody 2002 Conference, 11–13 April 2002. Laboratoire Parole et Langage, Aix-en-Provence, France. Wildgruber, D., Hertrich, I., Riecker, A., Erb, M., Anders, S., Grodd, W. and Ackermann, H. (2004) Distinct frontal regions subserve evaluation of linguistic and emotional aspects of speech intonation. Cerebral Cortex, 14: 1384–1389. Wildgruber, D., Pihan, H., Ackermann, H., Erb, M. and Grodd, W. (2002) Dynamic brain activation during processing of emotional intonation: influence of acoustic parameters, emotional valence, and sex. NeuroImage, 15: 856–869.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 18
Processing of facial identity and expression: a psychophysical, physiological, and computational perspective Adrian Schwaninger1,2,, Christian Wallraven1, Douglas W. Cunningham1 and Sarah D. Chiller-Glaus2 1
Department of Bu¨lthoff, Max Planck Institute for Biological Cybernetics, Spemannstr. 38, 72076 Tu¨bingen, Germany 2 Department of Psychology, University of Zurich, Zurich, Switzerland
Abstract: A deeper understanding of how the brain processes visual information can be obtained by comparing results from complementary fields such as psychophysics, physiology, and computer science. In this chapter, empirical findings are reviewed with regard to the proposed mechanisms and representations for processing identity and emotion in faces. Results from psychophysics clearly show that faces are processed by analyzing component information (eyes, nose, mouth, etc.) and their spatial relationship (configural information). Results from neuroscience indicate separate neural systems for recognition of identity and facial expression. Computer science offers a deeper understanding of the required algorithms and representations, and provides computational modeling of psychological and physiological accounts. An interdisciplinary approach taking these different perspectives into account provides a promising basis for better understanding and modeling of how the human brain processes visual information for recognition of identity and emotion in faces. Keywords: face recognition; facial expression; interdisciplinary approach; psychophysics of face processing; computational modeling of face processing; face processing modules; component and configural processing faces have not been seen for 50 years. Moreover, people identify facial expressions very fast and even without awareness (see Leiberg and Anders, this volume). These abilities seem to be remarkably disrupted if faces are turned upside-down. Consider the pictures in Fig. 1. Although this woman is a well-known celebrity, it is difficult to recognize her from the inverted photographs. One might detect certain differences between the two pictures despite the fact that both seem to have the same facial expression. Interestingly, after rotating this page by 1801 so that the two faces are upright, one can now easily identify the person depicted in these pictures and grotesque differences in the facial expression are revealed. This illusion was discovered
Introduction Everyday object recognition is usually a matter of discriminating between quite heterogeneous object classes that differ with regard to their global shape, parts, and other distinctive features such as color or texture. Face recognition, in contrast, relies on the discrimination of exemplars of a very homogenous category. According to Bahrick et al. (1975) we are able to recognize familiar faces with an accuracy of 90% or more, even when some of these
Corresponding author. Tel.: +41-76-393-24-46; Fax: +497071-601-616; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56018-2
321
322
Fig. 1. Thatcher illusion. When the photographs are viewed upside-down (as above) it is more difficult to identify the person belonging to the pictures and the facial expressions seem similar. When the pictures are viewed right side up, it is very easy to identify the person depicted in these pictures and the face on the right appears highly grotesque.
by Thompson (1980). He used Margaret Thatcher’s face, which is why the illusion is known as the ‘‘Thatcher illusion.’’ It was already well known by painters and Gestalt psychologists that face processing is highly dependent on orientation (e.g., Ko¨hler, 1940). However, the finding that upside-down faces are disproportionately more difficult to recognize than other inverted objects has been referred to as the face inversion effect and was first reported by Yin (1969). Another interesting effect was discovered by Young et al. (1987). Composite faces were created by combining the top and bottom half of different faces. Figure 2 shows an example. If the two halves were aligned and presented upright, a new face resembling each of the two originals seemed to emerge. This made it difficult to identify the persons shown in either half. If faces were inverted or if the top and bottom halves were misaligned horizontally, then the two halves did not spontaneously fuse to create a new face and the constituent halves remained identifiable. Calder et al. (2000) used the same technique to investigate the processing of facial expressions. They prepared emotional face composites by aligning the top half of one expression (e.g., anger) with the bottom half of another (e.g., happiness) from the same person. When the face composites were aligned, a new facial expression emerged and participants were slower to identify
the expression in either half of these composite images. However, this effect diminished when faces were misaligned or inverted, which parallels the composite effect for facial identity by Young et al. (1987). Interestingly, in an additional experiment Calder et al. found evidence for the view that the composite effects for identity and expression operate independently of one another. These examples illustrate that information of parts and spatial relations are somehow combined in upright faces. In contrast, when faces are turned upside-down it seems that only the local partbased information is processed. In this chapter, we discuss the representations and processes used in recognition of identity and facial emotion. We follow a cognitive neuroscience approach, by discussing the topic from a psychophysical, physiological, and computational perspective. Psychophysics describes the relationship between stimuli in our external world and our internal representations. We first review the psychophysics literature on recognition of faces and facial expressions. Because our goal is to gain a deeper understanding of how our brain produces behavior, we discuss possible neural substrates of the representations and processes identified in neuroscience. Computer science, the third perspective, provides computational algorithms to solve certain recognition problems and the possibility of biologically plausible computer models.
323
Fig. 2. Aligned and misaligned halves of different identities (Margaret Thatcher and Marilyn Monroe). When upright (as above), a new identity seems to emerge from the aligned composites (left), which makes it more difficult to extract the original identities. This does not occur for the misaligned composite face (right). When viewed upside-down, the two identities do not fuse to a new identity.
Psychophysical perspective Recognition of identity Two main hypotheses have been proposed to explain the recognition of identity in faces: the holistic hypothesis and the component-configural hypothesis. According to the holistic hypothesis, upright faces are stored as unparsed perceptual wholes in which individual parts are not explicitly represented (Tanaka and Farah, 1991, 1993; Farah et al., 1995b). The main empirical evidence in favor of this view is based on a paradigm by Tanaka and Farah (1993). These authors argued that if face recognition relies on parsed representations, then single parts of a face, such as nose, mouth, or eyes, should be easily recognized even if presented in isolation. However, if faces are represented as unparsed perceptual wholes (i.e., holistically), then recognition of the same isolated parts should be more difficult. In their experiments, participants were shown a previously learned face together with a slightly different version in which one single part (e.g., nose or mouth) had been replaced. The task was to judge which face had been shown in the learning phase. The experiment was conducted in both a whole face condition and an isolated parts condition without facial context. In the isolated condition, face parts proved to be more difficult to
recognize than in the whole face condition. However, when participants were trained to recognize inverted faces, scrambled faces, and houses no such advantage of context was found. Tanaka and Farah concluded that face recognition relies mainly on holistic representations, in contrast to the recognition of objects. While the encoding and matching of parts are assumed to be relatively orientation invariant (see also Biederman, 1987), holistic processing is thought to be very sensitive to orientation (see also Farah et al., 1995b; Biederman and Kalocsai, 1997). The component-configural hypothesis is based on a qualitative distinction between component and configural information. The term component (or part-based, piecemal, feature-based, featural) information refers to those elements of a face that are perceived as parts of the whole (e.g., the eyes, mouth, nose, ears, chin, etc.). According to Bruce (1988), the term configural information (or configurational, spatial-relational, second-order relational information) refers to the ‘‘spatial interrelationship of facial features’’ (p. 38). Examples are the eye–mouth or intereye distance. Interestingly, these distances are overestimated by 30–40% (eye–mouth distance) and about 15% (intereye distance) in face perception (Schwaninger et al., 2003b). In practice, configural changes have been induced by altering the distance between
324
components or by rotating components (e.g., turning the eyes and mouth upside-down within the facial context like in the Thatcher illusion described above). According to the component-configural hypothesis, the processing of configural information is strongly impaired by inversion or plane rotation, whereas processing of component information is much less affected. There are now a large number of studies providing converging evidence in favor of this view (e.g., Sergent, 1984; Rhodes et al., 1993; Searcy and Bartlett, 1996; Leder and Bruce, 1998; Schwaninger and Mast, 2005; see Schwaninger et al., 2003a, for a review). These studies changed component information by replacing components (e.g., eyes of one person were replaced with the eyes of another person). Configural changes were induced by altering the distance between components (e.g., larger or smaller intereye distance). However, one possible caveat is that these types of manipulations often change the holistic aspects of the face and they are difficult to carry out selectively. For example, replacing the nose (component change) might change the distance between the contours of the nose and the mouth, which induces a configural change (Leder and Bruce, 1998, 2000). Moving the eyes apart (configural change) can lead to an increase in size of the bridge of the nose, which is a component change (see Leder et al., 2001). Such problems can be avoided by using scrambling and blurring procedures to reduce configural and component information independently (e.g., Sergent, 1985; Davidoff and Donnelly, 1990; Collishaw and Hole, 2000; Boutet et al., 2003). Schwaninger et al. (2002) extended previous research by ensuring that scrambling and blurring effectively eliminate configural and component information separately. Furthermore, in contrast to previous studies, Schwaninger et al. (2002) used the same faces in separate experiments on unfamiliar and familiar face recognition to avoid potential confounds with familiarity. In an old–new recognition paradigm it was found that previously learned intact faces could be recognized even when they were scrambled into constituent parts. This result challenges the assumption of purely holistic processing according to Farah et al. (1995b) and suggests that components are encoded and stored explicitly. In a
second condition, the blur level was determined that made the scrambled versions impossible to recognize. This blur level was then applied to whole faces in order to create configural versions that by definition did not contain local featural information. These configural versions of previously learned intact faces could be recognized reliably. These results suggest that separate representations exist for component and configural information. Familiar face recognition was investigated in a second experiment by running the same conditions with participants who knew the target faces (all distractor faces were unfamiliar to the participants). Component and configural recognition was better when the faces were familiar, but there was no qualitative shift in processing strategy as indicated by the fact that there was no interaction between familiarity and condition (see Fig. 3). Schwaninger et al. (2002, 2003a) proposed a model that allows integrating the holistic and component-configural hypotheses. Pictorial aspects of a face are contained in the pictorial metric input representation that corresponds to activation of primary visual areas. On the basis of years of experience, neural networks are trained to extract specific information in order to activate component and configural representations in the ventral visual stream. The output of these representations converges towards the same identification units. These units are holistic in the sense that they integrate component and configural information. Note that this concept of holistic differs from the original definition of Tanaka and Farah (1993) and Farah et al. (1995b), which implies that faces are stored as perceptual wholes without explicit representations of parts (component information). The model by Schwaninger et al. (2002, 2003a) assumes that it is very difficult to mentally rotate a face as a perceptual whole (Rock, 1973, 1974, 1988; Schwaninger and Mast, 2005). When faces are substantially rotated from upright, they have to be processed by matching parts, which explains why information about their spatial relationship (configural information) is hard to recover when faces are inverted (for a similar view see Valentine and Bruce, 1988). Since face recognition depends on detecting subtle differences in configural
325
Fig. 3. Recognition performance in unfamiliar and familiar face recognition across three different conditions at test. Scr: scrambled; ScrBlr: scrambled and blurred; Blr: blurred. (Adapted with permission from Schwaninger et al., 2002.)
information, a large inversion effect is observed (Yin, 1969). Consistent with this view, Williams et al. (2004) suggested that inverted faces are initially processed by parts-based assessment before second-order relational processing is initiated. Sekuler et al. (2004) used response classification and found that the difference between the processing of upright and inverted faces was of quantitative, rather than of qualitative, nature, i.e., information was extracted more efficiently from upright faces than from inverted faces. This is also consistent with Schwaninger et al.’s model if one assumes that configural processing is not abruptly but gradually impaired by rotation (Murray et al., 2000; Schwaninger and Mast, 2005) and integrated with the output of component processing. Inverting the eyes and mouth within an upright face results in a strange activation pattern of component and configural representations that appears grotesque when upright, but not when upside-down (Thatcher illusion). This can be explained by the integrative model as follows: in an inverted Thatcher face, the components themselves are in the correct orientation which results in a relatively
normal activation of component representations. The unnatural spatial relationship (changed configural information) is hard to perceive due to capacity limitations of an orientation normalization mechanism. As a consequence, the strange activation pattern of configural representations is reduced and the grotesque perception disappears. The composite face illusion can be explained on the basis of similar reasoning. Aligned upright face composites contain new configural information resulting in a new perceived identity. Inverting the aligned composites reduces the availability of configural information and it is easier to access the two different face identification units on the basis of component information alone. Note that the model does not apply to processing component and configural information for gaze perception. As shown by Schwaninger et al. (2005), there is also an inversion effect on gaze perception. However, this effect is not due to impaired configural processing but due to orientation-sensitive processing of local component information in the eyes. This difference between inversion effects for recognition of identity versus
326
perceived eye gaze direction are consistent with separate functional systems for these tasks, which is consistent with physiological evidence discussed below. In short, the model by Schwaninger et al. (2002, 2003a) allows the integration of the component configural hypothesis and holistic aspects of face processing relevant to recognition of identity. It can also explain striking perceptual effects such as the Thatcher illusion and the composite face illusion. Most importantly, it provides an integrative basis for understanding special characteristics of face recognition such as the specialization in upright faces and the sensitivity to configural information. Recognition of expressions The structure and perception of facial expressions have been subject to scientific examination since at least Duchenne’s (1990) and Darwin’s (1872) seminal work. The majority of these studies consisted of showing static photographs of expressions to observers and examining the relationship between statically visible deformations of the facial surface and the judgments made by the observers. It is, of course, clear that different facial areas are important for the recognition of different emotions (Hanawalt, 1944; Plutchik, 1962; Nummenmma, 1964; Bassili, 1979; Cunningham et al., 2005). For example, as mentioned above, Bassili (1979) used point-light faces to show that the upper portions of the face are important for some expressions, while the lower portions of the face are important for other expressions. Facial features also play differentiated roles in other aspects of facial expression processing, such as the perception of sincerity. For example, according to Ekman and Friesen (1982), a true smile of enjoyment, which Ekman refers to as a Duchenne smile, has a characteristic mouth shape as well as specific wrinkles around the eyes. Faked expressions of enjoyment, in contrast, contain just the mouth information. Furthermore, Ekman and Friesen (1982) have shown that deceptive expressions of enjoyment appear to have different temporal characteristics than spontaneous ones.
Given the general preoccupation with the role of featural information in the recognition of facial expressions, it should not be surprising that the vast majority of descriptive systems and models of facial expressions are explicitly part-based (FroisWhittmann, 1930; Frijda and Philipszoon, 1963; Leventhal and Sharp, 1965; Ekman and Friesen, 1978; Izard, 1979; Tronick et al., 1980; Essa and Pentland, 1994; Ellison and Massaro, 1997). Perhaps the most widely used methods for parametrizing the high-dimensional space of facial expressions is the facial action coding system (or FACS; Ekman and Friesen, 1978), which segments the visible effects of facial muscle activity and rigid head motion into ‘‘action units.’’ Combinations of these action units can then be used to describe different expressions. It is important to note that FACS was designed as a system for describing the elements of photographs of facial expressions. It is not a model of facial expression processing and makes no claims about which elements go together to produce different expressions (Sayette et al., 2001). Massaro and colleagues proposed a parts-based model of perception (the fuzzy logical model of perception or FLMP) in which the features are independently processed and subsequently integrated. The model makes specific claims about how the featural information is processed and integrated, and thus makes clear predictions about the perception and categorization of facial expressions. In one study, Ellison and Massaro (1997) used computer graphics animation techniques to produce static facial expressions where either (a) the mouth shape was parametrically varied, (b) the eyebrow shape was parametrically varied, or (c) both were independently parametrically varied. The faces were shown to a number of observers, who were asked if the expression in the photographs was that of happiness or anger. Ellison and Massaro found that both features (eyebrow position and mouth position) affected the participants’ judgments, and that the influence of one feature was more prominent when the other feature was neutral or ambiguous. Moreover, the FLMP captured patterns in the data better than either holistic models or a straight-forward additive model based on recognition rates of the individual features.
327
Elison and Massaro consequently claimed that the perceptual system must be using featural information in the recognition process and cannot be employing a purely holistic approach. These results are consistent with the finding that the aligned combination of two different emotions leads to decreased recognition performance (Calder et al., 2000). Just as is true for the processing of facial identity, the separate roles of component-configural and holistic information have been discussed within the context of facial expressions. There are at least two models that integrate holistic information (Izard et al., 1983; White, 2000). White (2000) proposed a ‘‘hybrid model,’’ according to which expression recognition is part-based on one hand and holistic in the sense of undecomposed wholes on the other hand. Several researchers have examined the role of temporal information in the perception and recognition of expressions (Bassili, 1978, 1979; Bruce, 1988; Edwards, 1998; Kamachi et al., 2001). Kamachi et al., for example, manipulated the velocity in which a neutral face turned into an emotional one. They found that happiness and surprise were better recognized from fast sequences, sadness was better recognized from slow sequences, and angriness was best recognized at medium speed. This indicates that different expressions seem to have a characteristic speed or rate of change. In an innovative study, Edwards (1998) presented participants with photographs of individual frames from a video sequence of a dynamic expression, in a scrambled order, and asked participants to place the photographs in the correct order. Participants were remarkably accurate in their reconstructions, showing a particularly strong sensitivity to the temporal characteristics in the early phases of an expression. Interestingly, participants performed better when asked to complete the task with extremely tight time constraints than when given unlimited time, from which Edwards concluded that conscious strategies are detrimental to this task. He further concluded that the results show that humans do encode and represent temporal information about expressions. In sum, it is clear that different expressions require different features to be recognized, that one
can describe expressions in terms of their features and configuration, and that dynamic information is represented in the human visual system and is important for various aspects of facial expression processing. Moreover, simple, purely holistic models do not seem to describe the perception of facial expressions very well. Dynamic information The vast majority of research on the perception of faces has tended to focus on the relatively stable aspects of faces. This has consequently led to a strong emphasis on static facial information (i.e., information that is available at any given instant, such as eye color, distance between the eyes, etc.). In general, however, human faces are not static entities. Humans are capable of moving their faces in a wide variety of ways, and they do so for an astonishingly large number of reasons. Recent advances in technology, however, have allowed researchers to begin examining the role of motion in face processing. Before one can determine what types of motion are used in the recognition of faces and facial expressions (i.e., the dynamic features), one must determine if motion plays any role at all. To this end, it has been clearly established that dynamic information can be used to recognize identity (Pike et al., 1997; Bruce et al., 1999, 2001; Lander et al., 1999, 2001; Knappmeyer et al., 2003) and to recognize expressions (Bassili, 1978, 1979; Humphreys et al., 1993; Edwards, 1998; Kamachi et al., 2001; Cunningham et al., 2005; Wallraven et al., 2005a). Overall, the positive influence of dynamic information is most evident when static information is degraded. It is difficult, if not impossible, to present dynamic information without static information. Thus, Pike et al. (1997) and Lander and colleagues performed a series of control experiments to ensure that the apparent advantage moving faces have over static faces is due to information that is solely available over time (i.e., dynamic information). One might, for example, describe dynamic sequences as a series of static snapshots. Under such a description, the advantage of dynamic stimuli would not lie with dynamic information,
328
but with the fact that a video sequence has more static information (i.e., it has supplemental information provided by the different views of the face). To test this hypothesis, Lander et al. (1999) asked participants to identify a number of famous faces. The faces were presented in three different formats. In one condition, participants saw a nineframe video sequence. In a second condition, participants saw all nine frames at once, arranged in an ordered array. In the final condition, the participants saw all nine frames at once in a jumbled array. Lander et al. found that the faces were better recognized in the video condition than in the either of the two static conditions, and that the performances in the two static conditions did not differ from one another. Thus, the reason why video sequences are recognized better is not simply that they have more snapshots. To test whether the advantage is due to motion in general or due to some specific type of motion, Lander and Bruce (2000) and Pike et al. (1997) presented a video where the images were in a random order. Note that such sequences have information about the motion, but this motion is random (and does not occur in nature). It was found that identity is more accurately recognized in normal sequences that in random sequences, implying that it is not just the presence of motion that is important, but the specific, naturally occurring motion that provides the advantage. Further, it was found that reversing the direction of motion (by playing the sequence backwards) decreases recognition performance, suggesting that the temporal direction of the motion trajectories is important (Lander and Bruce, 2000). Finally, by changing the speed of a motion sequence (e.g., by playing parts or all of a video sequence too fast or slow), the researchers showed that the specific tempo and rhythm of motion is important for face recognition (Lander and Bruce, 2000). In a perhaps more direct examination of the role of motion, Bassili (1978, 1979) used Johonnson point-light faces as stimuli (see Johonnson, 1973, for more information on point-light stimuli). More specifically, the face and neck of several actors and actresses were painted black and then covered with approximately 100 white spots. These actors and actresses were then recorded under low light
conditions performing either specific expressions (happy, sad, surprise, disgust, interest, fear, and anger) or any facial motion the actor/actress desired. They kept their eyes closed during the recording sessions. Thus, in the final video sequence, all that was visible were the 100 white points. Each participant saw a single display, either as a single static snapshot or a full video recording of one expression, and was asked to describe what they saw. The collection of points was recognized as being a face more often in the dynamic conditions than in the static conditions (73% vs. 22% of the time, respectively). Additionally, the sequences were recognized as containing a face slightly more often for the free-form motion conditions than for the expression conditions (55% vs. 39%, on average, respectively). In a second experiment, the actors and actresses were again recorded while performing the various emotions. In the first set of recordings, they again wore the black makeup and white spots. The second set of recordings was made without makeup. Participants were asked to identify the expression using a forced choice task. Overall, faces were recognized more often in the full-face condition than in the dots-only condition (65% vs. 33% correct responses, respectively). Critically, the percentage of correct responses in the point-light condition (33%) is significantly higher than expected by chance, suggesting that the temporal information is sufficient to recognize expressions. Basilli (1979) went on to examine the role of upper versus lower internal facial motion for the recognition of expressions and found that different facial areas were important for different expressions. It remains unclear as exactly what the appropriate dynamic features are. One traditional way of describing motion is to separate it into rigid and nonrigid motions (see, e.g., Gibson, 1957, 1966; Roack et al., 2003). Rigid face motion generally refers to the rotations and translations of the entire head (such as the one which occurs when someone nods his/her head). Nonrigid face motion, in contrast, generally refers to motion of the face itself, which consists mostly of nonlinear surface deformations (e.g., lip motion, eyebrow motion). Most naturally occurring face-related motion contains both rigid and nonrigid motion. Indeed, it is very difficult for humans to produce facial (i.e.,
329
nonrigid) motion without moving their head (rigid motion), and vice versa. Thus, it should not be surprising that few studies have systematically examined the separate contributions of rigid and nonrigid face motions. Pike et al. (1997) conducted one of the few studies to explicitly focus on the contribution of rigid motion. They presented a 10 s clip of an individual rotating in a chair through a full 3601 (representing a simple change in relative viewpoint). They found higher identity recognition performance in dynamic conditions than in static conditions. Christie and Bruce (1998), in contrast, presented five frames of a person moving his/her head up and down (e.g., representing social communication — a nod of agreement) and found no difference between static and dynamic conditions. They suggest that the apparent conflict between the two studies comes from the type of rigid head motion: viewpoint change versus social signal. Munhall et al. (2004) focused explicitly on the role of rigid head motion in communication and showed in an elegant study that the specific pattern of rigid head motion that accompanies speech can be used to disambiguate the speech signal when the audio is degraded. Hill and Johnson (2001) used facial animations to show that rigid head motion is more useful than nonrigid motion for identity recognition and that nonrigid motion was more useful than rigid motion in recognizing the gender of an individual. In sum, it is clear that some form of facial information is available only over time, and that it plays an important role in the recognition of identity, expression, speech, and gender. Moreover, at least several different types of motion seem to exist, they play different roles, and a simple rigid/ nonrigid dichotomy is neither sufficient nor appropriate to describe these motions. Additional research is necessary to determine what the dynamic features for face processing are.
Physiological perspective Face-selective areas — evidence from neuroscience At least since the discovery of the face inversion effect (Yin, 1969) it has been discussed whether a
specific area for the processing of faces exists in the human brain. Neuropsychological evidence for specialization has been derived from prosopagnosia, a deficit in face identification following inferior occipitotemporal lesions (e.g., Damasio et al., 1982; for a review see DeRenzi, 1997). There have been a few reports of prosopagnostic patients in which object recognition seemed to have remained intact (e.g., McNeil and Warrington, 1993; Farah et al., 1995a; Bentin et al., 1999). Prosopagnosia has been regarded as a face-specific deficit that does not necessarily reflect a general disorder in exemplar recognition (e.g., Henke et al., 1998). Consistent with this view, patients who suffered from associative object agnosia have been reported, while their face identification remained unaffected (e.g., Moscovitch et al., 1997). Such a double dissociation between face and object recognition would imply that the two abilities are functionally distinct and anatomically separable. However, on the basis of methodological concerns, some authors have doubted whether face recognition can really be dissociated from object recognition based on current literature on prosopagnosia (e.g., Gauthier et al., 1999a; see also Davidoff and Landis, 1990). Evidence for the uniqueness of face processing has also been derived from event-related potential (ERP) and magnetoencephalographic (MEG) studies. A response component called the N170 (or M170 in MEG) occurring around 170 ms after stimulus onset is usually twice as large for face stimuli when compared to other control stimuli such as hands, houses, or animals (e.g., Bentin et al., 1996; Liu et al., 2002). However, the debate on whether such activation is unique for faces or whether it represents effects of expertise that are not specific to face processing is still ongoing (for recent discussions see, e.g., Rossion et al., 2002; Xu et al., 2005). In functional brain imaging, several areas have been identified as being of special importance for the processing of faces (see Haxby et al., 2000, for a review). These involve a region in the lateral fusiform gyrus, the superior temporal sulcus (STS), and the ‘‘occipital face area’’ (OFA; Gauthier et al., 2000a). All areas have been
330
identified bilaterally, albeit with a somewhat stronger activation in the right hemisphere. The face-selective area in the fusiform gyrus has been referred to as the ‘‘fusiform face area’’ (FFA) by Kanwisher et al. (1997). While FFA activation has been related to facial identity, the STS in humans reacts particularly to changing aspects of faces with social value, such as expression, direction of gaze, and lip movement (e.g., Puce et al., 1998; Hoffman and Haxby, 2000). In a recent functional magnetic resonance imaging (fMRI) study using adaptation (reduction of brain activity due to repetitive stimulus presentation), Andrews and Ewbank (2004) investigated differences in face processing by the FFA versus the STS. Activity in the FFA was reduced over time by stimuli having the same identity. Adaptation was dependent on viewpoint but not on size changes. The STS showed no adaptation to identity but an increased response when the same face was shown with a different expression or from different viewpoints. These results suggest a relatively size-invariant neural representation in FFA for recognition of facial identity, and a separate face-selective region in STS involved in processing changeable aspects of a face such as facial expression. OFA and inferior occipital gyrus seem to be associated with early structural encoding processes; they are primarily sensitive to sensory attributes of faces (Rotshtein et al., 2005). Rossion et al. (2003) obtained results in an fMRI study suggesting that OFA and FFA might be functionally associated: PS, a patient suffering from severe prosopagnosia due to lesions in the left middle fusiform gyrus and the right inferial occipital cortex, performed poorly in a face-matching task despite normal activation of the intact right FFA. Rossion et al. thus concluded that the FFA alone does not represent a fully functional module for face perception, but that for normal face processing intact OFA and FFA in the right hemisphere with their re-entrant integration are necessary. Yovel and Kanwisher (2005) came to a different conclusion. They correlated the behavioral performance in a face-matching task of upright and inverted faces with the neuronal responses to upright and inverted faces in the three regions: FFA, STS, and OFA. It was
found that only the FFA showed a difference in activity between upright and inverted faces. This can be interpreted as functional dissociation between FFA and the other cortical regions involved in face processing. The authors also concluded that the FFA appears to be the main neurological source for the behavioral face inversion effect originally reported by Yin (1969). The latter, however, is not exclusive to faces. In a behavioral study, Diamond and Carey (1986) found comparable inversion effects for faces and side views of dogs when dog experts were tested. Subsequent behavioral and imaging studies using recognition experiments with trained experts and artificial objects (‘‘Greebles’’) as well as bird and car experts with bird and car images provided further evidence in favor of a process-specific interpretation rather than a domain-specific interpretation (Gauthier et al., 1999b, 2000a). According to their view (‘‘expertise hypothesis’’), FFA activity is related to the identification of different classes of visual stimuli if they share the same basic configuration and if substantial visual expertise has been gained. The question on whether FFA activity is domain or process specific is being debated since several years now. It is beyond the scope of this chapter to review this ongoing debate but for an update on the current status see, for example, Downing et al. (2005), Xu (2005), Bukach et al. (in press), Kanwisher and Yovel (in press). Nevertheless, it should be noted that activation in face-selective regions of the fusiform area is not exclusive to faces. Significant responses to other categories of objects have been found in normal subjects, for example, for chairs, houses, and tools (Chao et al., 1999; Ishai et al., 1999, 2000; Haxby et al., 2001). Moreover, it has also been shown that face-selective regions in the fusiform area can be modulated by attention, emotion, and visual imagery, in addition to modulation by expertise as mentioned above (e.g., O’Craven et al., 1999; Vuilleumier et al., 2001; Ishai et al., 2002). In recent years, substantial progress has been made regarding models on how different brain areas interact in processing information contained in faces. Three main accounts are summarized in the following section.
331
Cognitive neuroscience models of face processing The model by Bruce and Young (1986) is one of the most influential accounts in the psychological face processing literature. This framework proposes parallel routes for recognizing facial identity, facial expression, and speech-related movements of the mouth. It is a rather functional account since Bruce and Young did not provide specifics regarding the neural implementation of their model. The recent physiological framework proposed by Haxby et al. (2000) is consistent with the general conception proposed by Bruce and Young. According to Haxby et al.’s model, the visual system is hierarchically structured into a core and an extended system. The core system comprises three bilateral regions in occipitotemporal visual extrastriate cortex: inferior occipital gyrus, lateral fusiform gyrus, and STS. Their function is the visual analysis of faces. Early perception of facial features and early structural encoding processes are mediated by processing in inferior occipital gyrus. The lateral fusiform gyrus processes invariant aspects of faces as the basis for the perception of unique identity. Changeable properties such as eye gaze, expression, and lip movement are processed by STS. The representations of changeable and invariant aspects of faces are proposed to be independent of one another, consistent with the Bruce and Young model. The extended system contains several regions involved in other cognitive functions such as spatially directed attention (intraparietal sulcus), prelexical speech perception (auditory cortex), emotion (amygdala, insula, limbic system), and personal identity, name, and biographical information (anterior temporal region). The model of Haxby et al. has been taken as a framework for extension by O’Toole et al. (2002). By taking into account the importance of dynamic information in social communication, they further explain the processing of facial motion. In their system, dynamic information is processed by the dorsal stream of face recognition and static information is processed by the ventral stream. Two different types of information are contained in facial motion: social communication signals such as gaze, expression, and lip movements,
which are forwarded to the STS via the middle temporal (MT) area; and person-specific motion (‘‘dynamic facial signatures’’). O’Toole et al. suggest that the latter type of information is also processed by the STS, representing an additional route for familiar face recognition. This model is in accordance with the supplemental information hypothesis that claims that facial motion constitutes additional information to static information. According to O’Toole et al., structure-from-motion may also support face recognition by communication between the ventral and the dorsal streams. For instance, the structural representation in FFA could be enhanced by input from MT. Thus, the model also integrates the representation enhancement hypothesis. In a detailed review of psychological and neural mechanisms, Adolphs (2002) provides a description of the processing of emotional facial expressions as a function of time. The initial stage provides automatic fast perceptual processing of highly salient stimuli (e.g., facial expressions of anger and fear). This involves the superior colliculus and pulvinar, as well as activation of the amygdala. Cortical structures activated in this stage are V1, V2, and other early visual cortices that receive input from the lateral geniculate nucleus of the thalamus. Then, a more detailed structural representation of the face is constructed until about 170 ms. This processing stage involves the fusiform gyrus and the superior temporal gyrus, which is consistent with Haxby et al.’s core system. Dynamic information in the stimulus would engage MT, middle superior temporal area, and posterior parietal visual cortices. Recognition modules for detailed perception and emotional reaction involve Haxby et al.’s extended system. After 300 ms conceptual knowledge of the emotion signaled by the face is based on late processing in the fusiform and superior temporal gyri, orbitofrontal and somatosensory cortices, as well as activation of the insula. The assumption of separate processes for facial identity and facial expression is supported by a number of studies. Neuropsychological evidence suggests a double dissociation; some patients show impairment in identity recognition but normal emotion recognition, and other patients show
332
intact identity recognition but impaired emotion recognition (for reviews see Damasio et al., 1982, 1990; Wacholtz, 1996; Adolphs, 2002). In a recent study, Winston et al. (2004) revealed dissociable neural representations of identity and expression using an fMRI adaptation paradigm. They found evidence for identity processing in fusiform cortex and posterior STS. Coding of emotional expression was related to a more anterior region of STS. Bobes et al. (2000) showed that emotion matching resulted in a different ERP scalp topography compared to identity matching. In another ERP study, Eimer and Holmes (2002) investigated possible differences in the processing of neutral versus fearful facial stimuli. They found that the N170, which is related to structural encoding of the face in processing identity, did occur in both the neutral and the fearful conditions. This indicates that structural encoding is not affected by the presence of emotional information and is also consistent with independent processing of facial expression and identity. However, results from other studies challenge the assumption of completely independent systems. DeGelder et al. (2003) found that subjects suffering from prosopagnosia performed much better when faces showed emotions than when they depicted a neutral expression. With normal subjects, the case was the opposite. DeGelder et al. assume that the areas associated with expression processing (amygdala, STS, parietal cortex) have a modulatory role in face identification. Their findings challenge the notion that different aspects of faces are processed independently (assumption of dissociation) and only after structural encoding (assumption of hierarchical processing). Calder and Young (2005) share a similar view. They argue that a successful proof of the dissociation of identity and expression would require two types of empirical evidence. First, patients with prosopagnosia but without any impairment in facial expression recognition. Second, intact processing of facial identity and impaired recognition of emotion without impairment of other emotional functions. On the basis of their review the authors conclude that such clear patterns have not been revealed yet. The reported selective disruption of facial expression recognition would rather reflect an impairment of more
general systems than damage (or impaired access) to visual representations of facial expression. The authors do not completely reject the dissociation of identity and expression, but they suggest that the bifurcation takes place at a much later stage than that proposed by the model of Haxby et al., namely only after a common representational system. This alternative approach is supported by computational modeling studies using principal component analysis (PCA; see next section). A critical problem of these approaches, however, is that they rely on a purely holistic processing strategy of face stimuli, which in light of the previously discussed behavioral evidence seems not plausible. As discussed in the previous section, there is a growing number of studies in the psychophysics literature that clearly suggest an important role of both component and configural information in face processing. This is supported by neurophysiological studies. In general, it has been found that cells responsive to facial identity are found in inferior temporal cortex while selectivity to facial expressions, viewing angle, and gaze direction can be found in STS (Hasselmo et al., 1989; Perret et al., 1992). For some neurons, selectivity for particular features of the head and face, e.g. the eyes and mouth, has been revealed (Perret et al., 1982, 1987, 1992). Other groups of cells need the simultaneous presentation of multiple parts of a face, which is consistent with a more holistic type of processing (Perret and Oram, 1993; Wachsmuth et al., 1994). Yamane et al. (1988) have discovered neurons that detect combinations of distances between facial parts, such as the eyes, mouth, eyebrows, and hair, which suggest sensitivity for the spatial relations between facial parts (configural information). Although they are derived from different physiological studies, the three models by Haxby, O’Toole et al., and Adolphs share many common features. Nevertheless, it seems that some links to behavioral and physiological studies are not taken up in these models. As discussed above, the concept of component and configural processing seems to be a prominent characteristic of face processing. The models, however, do not make this processing step explicit by specifying at which stage this information is extracted. Furthermore,
333
the distributed network of brain regions involved in the processing of face stimuli has so far not been characterized in terms of the features that are processed in each region — how does a face look like for the amygdala, for example? Some of these questions may be answered in connection with a closer look at the computational properties of face recognition. In the next section, we therefore present a brief overview of computational models of identity and expression recognition.
Computational perspective Since the advent of the field of computer vision (Marr, 1982), face recognition has been and continues to be one of its best-researched topics with hundreds of papers being published each year in conferences and journals. One reason for this intense interest in face recognition is certainly due to the growing range of commercial applications for computational face recognition systems — especially in the areas of surveillance and biometrics, but increasingly also in other areas such as human–computer interaction or multimedia applications. Despite these tremendous efforts, however, even today there exists no single computational system that is able to match human performance — both in terms of recognition discriminability and in terms of generalization to new viewing conditions including changes in lighting, pose, viewing distance, etc. It is especially this fact that has led to a growing interest of the computer vision community in understanding and applying the perceptual, cognitive, and neurophysiological issues underlying human performance. Similar statements could be made about the area of automatic recognition of facial expressions — the critical difference being that commercial interest in such systems is less than in systems that can perform person identification. Nevertheless, the area of expression recognition continues to be a very active topic in computer vision because it deals with the temporal component of visual input: how the face moves and how computers might be able to map the space of expressions are of interest for computer vision researchers leading to potential applications in, for example, human–computer
interaction in which the actions of the user have to be analyzed and recognized in a temporal context. As the previous sections have shown, however, apart from having commercial prospects, techniques developed by the computer vision community also have wide uses in cognitive research: by analyzing and parametrizing the high-dimensional space of face appearances, for example, researchers gain access to a high-level, statistical description of the underlying visual data. This description can then be used to design experiments in a welldefined subspace of facial appearance (for a review of face spaces see Valentine, 1995). A well-known example consists of the PCA of faces that defines prototypes in a face space (Leopold et al., 2001). The same holds true in the case of facial expressions as the amount of spatio-temporal data quickly becomes prohibitive in order to conduct controlled experiments at a more abstract level that goes beyond mere pixels. A recent study that has used advanced computer vision techniques to manipulate components of facial expressions is the study by Cunningham et al. (2005). In the following, we will briefly review the main advances and approaches in both the area of identity recognition and recognition of expressions (see Li and Jain, 2004, for further discussion). As a first observation, it is interesting to note that both identity and expression recognition in computer vision follow the same basic structure: in the first step, the image is scanned in order to find a face — this stage is usually called face detection and can also encompass other tasks such as estimating the pose of the face. As a result of space restrictions, we will not deal with face detection explicitly — rather, the reader is referred to Hjelmas and Low (2001). Interestingly, the topic of face detection has received relatively little attention in cognitive research so far (see, e.g., Lewis and Edmonds, 2003) and needs to be further explored. Following face detection, in a second step the image area that comprises the face is further analyzed to extract discriminative features ranging from holistic approaches using the pixels themselves to more abstract approaches extracting the facial components. Finally, the extracted features are compared to a database of stored identities or
334
expressions in order to recognize the person or their expression. This comparison can be done by a range of different classification schemes from simple, winner-take-all strategies to highly complex algorithms from machine learning. Research in face recognition can be roughly divided into three areas following the type of information that is used to identify the person in the feature extraction step: (1) Holistic approaches use the full image pixel information of the area subtended by the face. (2) Feature-based approaches try to extract more abstract information from the face area ranging from high-contrast features to semantic facial features. (3) Hybrid systems combine these two approaches. The earliest work in face recognition focused almost exclusively on high-level, feature-based approaches. Starting in the 1970s, several systems were proposed which relied on extracting facial features (eyes, mouth, and nose) and in a second step calculated two-dimensional geometric properties of these features (Kanade, 1973). Although it was shown that recognition using only geometric information (such as distances between the eyes, the mouth, etc.) was computationally effective and efficient, the robust, automatic extraction of such high-level facial features has proven to be very difficult under general viewing conditions (Brunelli and Poggio, 1993). One of the most successful face recognition systems based on local image information therefore used much simpler features that were supplemented by rich feature descriptors: in the elastic bunch-graph matching approach, a face image is represented as a collection of nodes which are placed in a regular grid over the face. Each of these nodes carries so-called ‘‘jets’’ of Gabor-filter responses, which are collected over various scales and rotations. This representation is very compact yet has proven to be very discriminative, therefore enabling good performance even under natural viewing conditions (Wiskott et al., 1997). It is interesting to note this system’s similarity to the human visual system (see Biederman and Kalocsai, 1997) as the Gabor filters used closely resemble the receptive field structure found in the human cortex. The advantage of such low-level features as used also in later recognition systems lies in their conceptual simplicity and compactness.
In the early 1990s, Turk and Pentland (1991) developed a holistic recognition system called ‘‘eigenfaces,’’ which used the full pixel information to construct an appearance-based low-dimensional representation of faces. This approach proved to be very influential for computer vision in general and inspired many subsequent recognition algorithms. Its success is partially due to the fact that natural images contain many statistical redundancies. These can be exploited by algorithms such as PCA by building lower-dimensional representations that capture the underlying information contained in, for example, the space of identities given a database of faces. The result of applying PCA to such a database of faces is a number of eigenvectors (the ‘‘eigenfaces’’) that encode the main statistical variations in the data. The first eigenvector is simply the average face and corresponds to the prototype face used in psychology. Recognition of a new face image is done by projecting it into the space spanned by the eigenvectors and looking for the closest face in that space. This general idea of a face space is shared by other algorithms such as linear discriminant analysis (LDA; Belhumeur et al., 1997), independent component analysis (ICA; Bartlett et al., 2002), non-negative matrix factorization (NMF; Lee and Seung, 1999), or support vector machines (SVMs; Phillips, 1999). The main difference between these algorithms lies in the statistical description of the data as well as in the metrics used to compare different elements of the face space: PCA and LDA usually result in holistic descriptions of the data where every region of the face contributes to the final result; ICA and NMF can yield more sparse descriptors with spatially localized responses; SVMs describe the space of face identities through difficult-to-recognize face exemplars (the support vectors) rather than through prototypical faces as PCA does. In terms of metrics, there are several possibilities ranging from simple Euclidean distances to weighted distances, which can take the statistical properties of the face space into account. The advantage of PCA (and other holistic approaches) in particular is that it develops a generative model of facial appearance which enables it, for example, to reconstruct the appearance of a noisy or occluded input face. An extreme example
335
of this is the morphable model by Blanz and Vetter (1999, 2003), which does not work on image pixels but works on three-dimensional (3D) data of laser scans of faces. Because of their holistic nature, however, all of these approaches require specially prepared training and testing data with very carefully aligned faces in order to work optimally. Given the distinction between local and holistic approaches, it seems natural to combine the two into hybrid recognition architectures. Eigenfaces can of course be extended to ‘‘eigenfeatures’’ by training facial features instead of whole images. Indeed, such systems have been shown to work much better under severe changes in the appearance of the face such as due to occlusion by other objects or make-up (see Swets and Weng, 1996). Another system uses local information extracted from the face to fit a holistic shape model to the face. For recognition, not only holistic information is used, but also local information from the contour of the face (Cootes et al., 2001). Finally, in a system proposed by Heisele et al. (2003), several SVMs are trained to recognize facial features in an image, which are then combined into a configuration of features by a higher-level classification scheme. Again, such a scheme has been shown to outperform other, purely holistic, approaches. Recently, several rigorous testing schemes have been proposed to evaluate the various methods proposed by computer vision researchers: the FERET (FacE REcognition Technology) evaluations in 1994–1996 and three face recognition vendor tests in 2000, 2002, and 2005 (see http://www.frvt.org). Although these evaluations have shown that performance has increased steadily over the past years, several key challenges still need to be addressed before face recognition systems can become as good as their human counterpart: Robust extraction of facial features: Although face detection is possible with very high accuracy, this accuracy is usually achieved by analyzing large databases of face and nonface images to extract statistical descriptors. These statistical descriptors usually do not conform to meaningful face components — components that humans rely on to detect faces.
Tolerance to changes in viewing conditions: Existing databases often contain only frontal images or a single illumination. In the real world, however, changes in illumination and pose are rather frequent — usually several changes occur at the same time. Dealing with large amounts of training data: Connected to the previous point, typical recognition systems need massive amounts of training data to learn facial appearance under various viewing conditions. While this is reminiscent of the extensive experience humans acquire with faces, this represents a challenge for the statistical algorithms as the relevant discriminative information has to be extracted from the images. Taking into account the context: Faces seldom appear without context — in this case context can mean a particular scene, evidence from other modalities, or the fact that faces are part of the human body and therefore co-occur. This information could be used not only for more reliable face detection, but could also assist in recognition tasks. Interestingly, context effects are also not well studied in the behavioral literature. Dealing with spatio-temporal data: Even though humans can recognize faces from still images, we generally experience the world dynamically (see Section ‘‘Dynamic information’’ above). Although first steps have been made (Li and Chellappa, 2001), the full exploitation of this fact remains to be done. Characterizing performance with respect to humans: Although the claim that current systems do not yet reach human performance levels is certainly valid, there has been relatively little research on trying to relate computational and human performance in a systematic manner (examples include the studies by Biederman and Kaloscai, 1997; O’Toole et al., 2000; Wallraven et al., 2002, 2005b; Schwaninger et al., 2004). Such information could be used to fine-tune existing as well as develop novel approaches to face recognition. In addition to work on face recognition, considerable attention has been devoted to the
336
automatic recognition of facial expressions. Early work in this area has focused mainly on recognition of the six prototypical or universal expressions (these are angry, disgust, fear, happy, sad, and surprised; see Ekman et al., 1969) whereas in later work the focus has been to provide a more fine-grained recognition and even interpretation of core components of facial expressions. As mentioned above, all computational systems follow the same basic structure of face detection, feature extraction, and classification. Moreover, one can again divide the proposed systems into different categories on the basis of whether they use holistic information, local feature-based information, or a hybrid approach. An additional aspect is whether the systems estimate the deformation of a neutral face for each image or whether they rely explicitly on motion to detect a facial expression. Systems based on holistic information have employed PCA (Calder et al., 2001) to recognize static images, estimating dense optic flow to analyze the deformation of the face in two dimensions (Bartlett et al., 1999), as well as 3D deformable face models (DeCarlo and Metaxas, 1996). In contrast, systems based on local information rely on analyzing regions in the face that are prone to changes under facial expressions. Such systems have initially used tracking of 2D contours (Terzopoulos and Waters, 1993) or high-contrast regions (Rosenblum et al., 1996; Black and Yacoob, 1997), elastic bunch-graph matching with Gabor filters (Zhang et al., 1998), as well higher-level 3D face models (Gokturk et al., 2002). Despite good recognition results on a few existing test databases, however, these systems mostly focused on recognition of the six prototypical expressions. Therefore, they could not extract important dimensions such as the intensity of the recognized expressions — dimensions, which are critical for human–computer interface (HCI) applications, for example. A very influential description of facial expressions is FACS developed by Ekman and Friesen in the late 1960s and continuously improved in the following years. As mentioned above (see section on recognition of expressions), FACS encodes both anatomical muscle activations as well as socalled miscellaneous actions in 44 action units. It is
important to stress again that the co-activation of action units into complex expressions is external to FACS, making it a purely descriptive rather than an inferential system. In addition, the mapping from action units to complex expressions is ambiguous. Nevertheless, most recent systems try to recognize action units from still images or image sequences — perhaps due to the fact that FACS is one of the few parametric, high-level descriptions of facial motion. As a result of the highly localized structure of action units, these systems rely mostly on local information and use a combination of low-level, appearance-based features and geometric facial components for increased robustness (Bartlett et al., 1999; Donato et al., 1999). One of the most advanced recognition systems (Tian et al., 2001) uses a hybrid scheme combining both low- and high-level local information in a neural network to recognize 16 action units from static images with an accuracy of 96%. Similarly to face recognition systems there exist several standard databases for benchmarking facial expression algorithms (Ekman and Friesen, 1978; Kanade et al., 2000) — so far, however, no comprehensive benchmark comparing the different systems for facial expression analysis has been developed. Interestingly, several of the systems discussed here have been explicitly benchmarked against human performance — both in the case of prototypical expressions (Calder et al., 2001) and in the case of action unit recognition (Bartlett et al., 1999; Tian et al., 2001). This shows that in the area of facial expression recognition, the coupling between psychological and computational research is much closer than that seen in identity recognition — one of the reasons may be that expression analysis has drawn more heavily from results in psychological research (Ekman et al., 1969; Ekman and Friesen, 1978). Finally, the following presents a list of key challenges in the area of automatic recognition of facial expressions that still need to be addressed to design and implement robust systems with humanlike performance: Robust extraction of facial features: Even more than in the case of identity recognition,
337
extraction of the exact shape of facial components would enable to determine action units very precisely. Dealing with variations in appearance: In contrast to identity recognition, expressions need to be recognized despite changes in gender and identity (as well as of course additional parameters such as pose, lighting, etc.). Current algorithms do not yet perform these generalizations well enough. Going beyond FACS: Although FACS has remained popular, it is still unclear whether it really is a useful basis for describing the space of facial expressions — both from a human perspective as well as from a statistical point of view. Among the alternatives that have been proposed is FACS+, an encoding system for facial actions that seeks to determine well-defined actions as well as the mapping between these actions and complex expressions (Essa and Pentland, 1997). Another alternative is the MPEG4 face-coding scheme, a very generic face animation framework based on movement of keypoints on the face (e.g., Koenen, 2000; see also Section ‘‘Dynamic information’’ above). Full spatio-temporal, high-level models for recognition: Facial expressions are a highly efficient form of communication — the communicative aspect is not yet exploited by current systems, which is partially due to the lack of explicit models of how humans employ facial motions to convey meaning. Such knowledge could, for example, prove useful as a high-level prior for automatic recognition of expressions. More descriptive power: For humans, recognition of the expression is only the first step in a longer pipeline, which involves not only judgments of intensity, but also other interpretative dimensions such as believability, naturalness, etc. Our behaviour may be more determined by these dimensions rather than by the classification itself.
Summary and conclusions The review of psychophysical studies showed that faces are processed in terms of their components
and their spatial relationship (configural information). The integrative model by Schwaninger et al. (2002, 2003a) provides a good basis for combining the component-configural hypothesis and holistic aspects of face processing. According to the model, component and configural information are first analyzed separately and then integrated for recognition. Rotating the faces in the plane results in a strong impairment of configural processing, while component processing is much less, if at all, affected by plane rotation. This could be due to capacity limitations of an orientation normalization mechanism such as mental rotation, which is required in order to extract configural information from the plane rotated or inverted faces. Because adult face recognition relies more on the processing of configural information than basic level object recognition, a strong inversion effect is obtained. Different facial areas and facial motions are important for the recognition of different emotions. Most of the models on facial expression processing have stressed the importance of component information while some models also integrate configural information (e.g., Izard et al., 1983; White, 2000). As pointed out by Schwaninger et al. (2002, 2003a), a model which assumes separate processing of component and configural information before integrating them can explain the effects of facial expression processing in the Thatcher illusion and the face composite illusion. In upright faces, both component and configural information can be processed. This results in a bizarre facial expression in the Thatcher illusion and in a new identity or facial expression when different face composites are aligned. Turning faces upsidedown disrupts configural processing. As a consequence, the bizarre facial expression in the Thatcher illusion vanishes. Similarly, the absence of interference from configural information in inverted composites makes it easier to identify the different identities (Young et al., 1987) or emotions (Calder et al., 2000). The model by Bruce and Young (1986) proposes separate parallel routes for recognizing facial identity, facial expression, and speech. Recent physiological models proposed by Haxby et al. (2000), O’Toole et al. (2002), and Adolphs (2002) are
338
consistent with this view. Although now much is known about the role and interaction of different brain areas in recognition of identity and expression, the neuronal implementation of analyzing component and configural information and their integration with motion information is less clear. Interestingly, both identity and expression recognition in computer vision follow the same basic processing steps. Following face detection, in a second step the image area containing a face is processed to extract discriminative features, which are then compared to a database of stored identities or expressions. The different recognition algorithms can be distinguished on whether they use holistic information, local feature-based information, or a hybrid approach; the last two are usually more successful than the first one. One example of a close connection between computational modeling and psychophysical research in this context is the set of studies by Wallraven et al. (2002, 2005b) on the implementation and validation of a
model of component and configural processing in identity recognition. On the basis of the model developed by Schwaninger et al. (2003a, 2004) outlined above, they implemented the two routes of face processing using methods from computer vision. The building blocks of the face representation consisted of local features that were extracted at salient image regions at different spatial frequency scales. The basic idea for the implementation of the two routes was that configural processing should be based on a global, positionsensitive connection of these features, whereas component processing should be local and position insensitive. Using these relatively simple ingredients, they showed that the model could capture the human performance pattern observed in the psychophysical experiments. As can be seen in Fig. 4, the results of the computational model are very similar to the psychophysical results obtained with humans in the experiment conducted by Schwaninger et al. (2002) (see also Fig. 3).
Fig. 4. Human and computational (Comp) performance for the face recognition task of Schwaninger et al. (2002) for unfamiliar (Unfam) and familiar (Fam) face recognition (adapted from Wallraven et al., 2005b). Performance is shown as area under the ROCcurve (AUC). In order to determine the relative contributions of component and configural processing, participants had to recognize previously learnt faces in either scrambled (Scr), blurred (Blr), or scrambled-blurred (ScrBlr) conditions (also see human data in Fig. 3). The recognition performance of the computational model (Comp) was very similar to human performance. Moreover, the results indicate that the two-route implementation of configural and component processing captures the relative contributions of either route to recognition. The observed increase for familiar face recognition by humans could also be modeled with the computational system.
339
Moreover, combining the two routes resulted in increased recognition performance, which has implications in a computer vision context regarding recognition despite changes in viewpoint (see Wallraven et al., 2005b). In general, a closer coupling between psychophysical research, neuroscience, and computer vision would benefit all research areas by enabling a more advanced statistical analysis of the information necessary to recognize individuals and expressions as well as the development of better, perceptually motivated recognition algorithms that are able to match human classification performance. This will be necessary in order to better understand processing of component, configural, and motion information and their integration for recognition of identity and facial expression. Abbreviations FACS FERET FFA FLMP HCI ICA LDA MT NMF OFA STS SVM
facial action coding system FacE REcognition Technology fusiform face area fuzzy logical model of perception human–computer interface independent component analysis linear discriminant analysis middle temporal non-negative matrix factorization occipital face area superior temporal sulcus support vector machine
References Adolphs, R. (2002) Recognizing emotion from facial expressions: psychological and neurological mechanisms. Behav. Cogn. Neurosci. Rev., 1(1): 21–61. Andrews, T.J. and Ewbank, M.P. (2004) Distinct representations for facial identity and changeable aspects of faces in human temporal lobe. NeuroImage, 23: 905–913. Bahrick, H.P., Bahrick, P.O. and Wittlinger, R.P. (1975) Fifty years of memory for names and faces: a cross-sectional approach. J. Exp. Psychol.: Gen., 104: 54–75. Bartlett, M.S., Hager, J.C., Ekman, P. and Sejnowski, T.J. (1999) Measuring facial expressions by computer image analysis. Psychophysiology, 36: 253–263.
Bartlett, M.S., Movellan, J.R. and Sejnowski, T.J. (2002) Face recognition by independent component analysis. IEEE Trans. Neural Networks, 13(6): 1450–1464. Bassili, J.N. (1978) Facial motion in the perception of faces and of emotional expression. J. Exp. Psychol.: Hum. Percep. Perform., 4(3): 373–379. Bassili, J. (1979) Emotion recognition: the role of facial motion and the relative importance of upper and lower areas of the face. J. Pers. Soc. Psychol., 37: 2049–2059. Belhumeur, P., Hespanha, J. and Kriegman, D. (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal.Mach. Intell., 19(7): 711–720. Bentin, S., Allison, T., Puce, A., Perez, E. and McCarthy, G. (1996) Electrophysiological studies of face perception in human. J. Cogn. Neurosci., 8: 551–565. Bentin, S., Deouell, L.Y. and Soroker, N. (1999) Selective visual streaming in face recognition: evidence from developmental prosopagnosia. NeuroReport, 10: 823–827. Biederman, I. (1987) Recognition-by-components: a theory of human image understanding. Psychol. Rev., 94(2): 115–147. Biederman, I. and Kalocsai, P. (1997) Neurocomputational bases of object and face recognition. Philos. Trans. R. Soc. Lond. Ser. B, 352: 1203–1219. Black, M. and Yacoob, Y. (1997) Recognizing facial expressions in image sequences using local parameterized models of image motion. Int. J. Comput. Vis., 25(1): 23–48. Blanz, V. and Vetter, T. (1999) A morphable model for the synthesis of 3D faces. SIGGRAPH’99, Conference Proceedings, Los Angeles, CA, USA, pp. 187–194. Blanz, V. and Vetter, T. (1993) Face recognition based on fitting a 3D morphable model. IEEE Trans. Pattern Anal. Mach. Intell., 25: 1063–1074. Bobes, M.A., Martı´ n, M., Olivares, E. and Valde´s-Sosa, M. (2000) Different scalp topography of brain potentials related to expression and identity matching of faces. Cogn. Brain Res., 9: 249–260. Boutet, I., Collin, C. and Faubert, J. (2003) Configural face encoding and special frequency information. Percept. Psychophys., 65(7): 1087–1093. Bruce, V. (1988) Recognising Faces. Lawrence Erlbaum Associates, Hillsdale, NJ. Bruce, V., Henderson, Z., Greenwood, K., Hancock, P.J.B., Burton, A.M. and Miller, P. (1999) Verification of face identities from images captured on video. J. Exp. Psychol. Appl., 5(4): 339–360. Bruce, V., Henderson, Z., Newman, C. and Burton, M.A. (2001) Matching identities of familiar and unfamiliar faces caught on CCTV images. J. Exp. Psychol.: Appl., 7: 207–218. Bruce, V. and Young, A. (1986) Understanding face recognition. Brit. J. Psychol., 77: 305–327. Brunelli, R. and Poggio, T. (1993) Face recognition: features versus templates. IEEE Trans. Pattern Anal. Mach. Intell., 15(10): 1042–1052. Bukach, C.M., Gauthier, I. and Tarr, M.J. (in press) Beyond faces and modularity: the power of an expertise framework. Trends Cogn. Sci., 10(4): 159–166.
340 Calder, A.J., Burton, A.M., Miller, P., Young, A.W. and Akamatsu, S. (2001) A principal component analysis of facial expressions. Vis. Res., 41: 1179–1208. Calder, A.J. and Young, A.W. (2005) Understanding the recognition of facial identity and facial expression. Nat. Rev. Neurosci., 6: 641–651. Calder, A.J., Young, A.W., Keane, J. and Deane, M. (2000) Configural information in facial expression perception. J. Exp. Psychol., 26(2): 527–551. Chao, L.L., Haxby, J.V. and Martin, A. (1999) Attribute-based neural substrates in temporal cortex for perceiving and knowing about objects. Nat. Neurosci., 2: 913–919. Christie, F. and Bruce, V. (1998) The role of dynamic information in the recognition of unfamiliar faces. Mem. Cogn., 26(4): 780–790. Collishaw, S.M. and Hole, G.J. (2000) Featural and configurational processes in the recognition of faces of different familiarity. Perception, 29: 893–910. Cootes, T., Edwards, G. and Taylor, C. (2001) Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell., 23: 681–685. Cunningham, D., Kleiner, M., Wallraven, C. and Bu¨lthoff, H. (2005) Manipulating video sequences to determine the components of conversational facial expressions. ACM Trans. App. Percept., 2(3): 251–269. Damasio, A.R., Damasio, H. and Van Hoesen, G.W. (1982) Prosopagnosia: anatomic bases and behavioral mechanisms. Neurology, 32: 331–341. Damasio, A.R., Tranel, D. and Damasio, H. (1990) Face agnosia and the neural substrates of memory. Annu. Rev. Neurosci., 13: 89–109. Darwin, C. (1872) The Expression of the Emotions in Man and Animals. John Murray, London. Davidoff, J. and Donnelly, N. (1990) Object superiority: a comparison of complete and part probes. Acta Psychol., 73: 225–243. Davidoff, J. and Landis, T. (1990) Recognition of unfamiliar faces in prosopagnosia. Neuropsychologia, 28: 1143–1161. DeCarlo, D. and Metaxas, D., 1996. The integration of optical flow and deformable models with applications to human face shape and motion estimation. Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR ‘96), San Francisco, CA, USA, pp. 231–238. DeGelder, B., Frissen, I., Barton, J. and Hadjikhani, N. (2003) A modulatory role for facial expressions in prosopagnosia. Proc. Natl. Acad. Sci. USA, 100(22): 13105–13110. DeRenzi, E. (1997) Prosopagnosia. In: Feinberg, T.E. and Farah, M.J. (Eds.), Behavioral Neurology and Neuropsychology. McGraw-Hill, New York, pp. 245–256. Diamond, R. and Carey, S. (1986) Why faces are and are not special: an effect of expertise. J. Exp. Psychol.: Gen., 115(2): 107–117. Donato, G., Bartlett, S., Hager, J., Ekman, P. and Sejnowski, T. (1999) Classifying facial actions. IEEE Trans. Pattern Anal. Mach. Intell., 21(10): 974–989. Downing, P.E., Chan, A.W., Peelen, M.V., Dodds, C.M. and Kanwisher, N. (2005). Domain specificity in visual cortex. Cereb. Cortex (Dec 7, electronic publication, ahead of print).
Duchenne, B. (1990) The Mechanism of Human Facial Expression or an Electro-Physiological Analysis of the Expression of the Emotions. Cambridge University Press, New York. Edwards, K. (1998) The face of time: temporal cues in facial expressions of emotion. Psychol. Sci., 9: 270–276. Eimer, M. and Holmes, A. (2002) An ERP study on the time course of emotional face processing. Cogn. Neurosci. Neuropsychol., 13(4): 427–431. Ekman, P. and Friesen, W.F. (1978) Facial Action Coding System. Consulting Psychologists Press, Palo Alto. Ekman, P. and Friesen, W.V. (1982) Felt, false, and miserable smiles. J. Nonverb. Behav., 6: 238–252. Ekman, P., Hager, J., Methvin, C. and Irwin, W. (1969) Ekman-Hager Facial Action Exemplars. Human Interaction Laboratory, University of California, San Francisco. Ellison, J.W. and Massaro, D.W. (1997) Featural evaluation, integration, and judgment of facial affect. J. Exp. Psychol.: Hum. Percept. Perform., 23(1): 213–226. Essa, I. and Pentland, A. (1994) A vision system for observing and extracting facial action parameters. Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’94), Seattle, WA, USA, pp. 76–83. Essa, I. and Pentland, A. (1997) Coding, analysis, interpretation and recognition of facial expressions. IEEE Trans. Pattern Anal. Mach. Intell., 19: 757–763. Farah, M.J., Levinson, K.L. and Klein, K.L. (1995a) Face perception and within-category discrimination in prosopagnosia. Neuropsychologia, 33: 661–674. Farah, M.J., Tanaka, J.W. and Drain, H.M. (1995b) What causes the face inversion effect? J. Exp. Psychol.: Hum. Percept. Perform., 21(3): 628–634. Frijda, N.H. and Philipszoon, E. (1963) Dimensions of recognition of emotion. J. Abnorm. Soc. Psychol., 66: 45–51. Frois-Wittmann, J. (1930) The judgment of facial expression. J. Exp. Psychol., 13: 113–151. Gauthier, I., Behrmann, M. and Tarr, M.J. (1999a) Can face recognition be dissociated from object recognition? J. Cogn. Neurosci., 11: 349–370. Gauthier, I., Skudlarski, P., Gore, J.C. and Anderson, A.W. (2000a) Expertise for cars and birds recruits brain areas involved in face recognition. Nat. Neurosci., 3: 191–197. Gauthier, I., Tarr, M.J., Anderson, A.W., Skudlarski, P. and Gore, J.C. (1999b) Activation of the middle fusiform area increases with expertise in recognizing novel objects. Nat. Neurosci., 6: 568–573. Gauthier, I., Tarr, M.J., Moylan, J., Skudlarski, P., Gore, J.C. and Anderson, A.W. (2000b) The fusiform ‘‘face area’’ is part of a network that processes faces at the individual level. J. Cogn. Neurosci., 12(3): 495–504. Gibson, J.J. (1957) Optical motions and transformations as stimuli for visual perception. Psychol. Rev., 64: 228–295. Gibson, J.J. (1966) The Senses Considered as Perceptual Systems. Houghton Mifflin, Boston, MA. Gokturk, S., Tomasi, C., Girod, B. and Bouguet, J. (2002) Model-based face tracking for view-independent facial expression recognition. In: Fifth IEEE International Conference
341 on Automatic Face and Gesture Recognition, Washington, D.C., USA, pp. 287–293. Hanawalt, N. (1944) The role of the upper and lower parts of the face as the basis for judging facial expressions: II. In posed expressions and ‘‘candid camera’’ pictures. J. Gen. Psychol., 31: 23–36. Hasselmo, M.E., Rolls, E.T. and Baylis, C.G. (1989) The role of expression and identity in the face-selective responses of neurons in the temporal visual cortex of the monkey. Exp. Brain Res., 32: 203–218. Haxby, J.V., Gobbini, M.I., Furey, M.L., Ishai, A., Schouten, J.L. and Pietrini, P. (2001) Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293: 2425–2430. Haxby, J.V., Hoffman, E.A. and Gobbini, M.I. (2000) The distributed human neural system for face perception. Trends Cogn. Sci., 4(6): 223–233. Heisele, B., Ho, P., Wu, J. and Poggio, T. (2003) Face recognition: comparing component-based and global approaches. Comput. Vis. Image Understanding, 91(1–2): 6–21. Henke, K., Schweinberger, S.R., Grigo, A., Klos, T. and Sommer, W. (1998) Specificity of face recognition: recognition of exemplars of non-face objects in prosopagnosia. Cortex, 34(2): 289–296. Hill, H. and Johnson, A. (2001) Categorization and identity from the biological motion of faces. Curr. Biol., 11: 880–885. Hjelmas, E. and Low, B. (2001) Face detection: a survey. Comput. Vis. Image Understanding, 83: 236–274. Hoffman, E. and Haxby, J. (2000) Distinct representations of eye gaze and identity in the distributed human neural system for face perception. Nat. Neurosci., 3: 80–84. Humphreys, G., Donnelly, N. and Riddoch, M. (1993) Expression is computed separately from facial identity, and is computed separately for moving and static faces: neuropsychological evidence. Neuropsychologia, 31: 173–181. Ishai, A., Haxby, J.V. and Ungerleider, L.G. (2002) Visual imagery of famous faces: effects of memory and attention revealed by fMRI. NeuroImage, 17: 1729–1741. Ishai, A., Ungerleider, L.G., Martin, A. and Haxby, J.V. (2000) The representation of objects in the human occipital and temporal cortex. J. Cogn. Neurosci., 12: 35–51. Ishai, A., Ungerleider, L.G., Martin, A., Schouten, J.L. and Haxby, J.V. (1999) Distributed representation of objects in the human ventral visual pathway. Proc. Natl. Acad. Sci. USA, 96: 9379–9384. Izard, C.E. (1979) The maximally discriminative facial movement coding system (MAX). Unpublished manuscript. (Available from Instructional Resource Center, University of Delaware, Newark, DE.) Izard, C.E., Dougherty, L.M. and Hembree, E.A. (1983) A system for identifying affect expressions by holistic judgments. Unpublished manuscript, University of Delaware. Johonnson, G. (1973) Visual perception of biological motion and a model for its analysis. Percept. Psychophys., 14: 201–211. Kamachi, M., Bruce, V., Mukaida, S., Gyoba, J., Yoshikawa, S. and Akamatsu, S. (2001) Dynamic properties influence the perception of facial expressions. Perception, 30: 875–887.
Kanade, T. (1973) Computer Recognition of Human Faces. Basel and Stuttgart, Birkhauser. Kanade, T., Cohn, J.F. and Tian, Y. (2000) Comprehensive database for facial expression analysis. Proceedings of the 4th International Conference on Automatic Face and Gesture Recognition, Grenoble, France, pp. 46–53. Kanwisher, N., McDermott, J. and Chun, M.M. (1997) The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci., 17: 4302–4311. Kanwisher, N. and Yovel, G. (in press) The fusiform face area: a cortical region specialized for the perception of faces. Philos. Trans. R. Soc. Lond. Ser. B. Knappmeyer, B., Thornton, I.M. and Bu¨lthoff, H.H. (2003) The use of facial motion and facial form during the processing of identity. Vis. Res., 43: 1921–1936. Koenen, R. (2000) Mpeg-4 Project Overview, International Organization for Standardization, ISO/IEC JTC1/SC29/WG11. Ko¨hler, W. (1940) Dynamics in Psychology. Liveright, New York. Lander, K. and Bruce, V. (2000) Recognizing famous faces: exploring the benefits of facial motion. Ecol. Psychol., 12(4): 259–272. Lander, K., Bruce, V. and Hill, H. (2001) Evaluating the effectiveness of pixelation and blurring on masking the identity of familiar faces. Appl. Cogn. Psychol., 15: 101–116. Lander, K., Christie, F. and Bruce, V. (1999) The role of movement in the recognition of famous faces. Mem. Cogn., 27(6): 974–985. Leder, H. and Bruce, V. (1998) Local and relational aspects of face distinctiveness. Quart. J. Exp. Psychol., 51A(3): 449–473. Leder, H. and Bruce, V. (2000) When inverted faces are recognized: the role of configural information in face recognition. Quart. J. Exp. Psychol., 53A(2): 513–536. Leder, H., Candrian, G., Huber, O. and Bruce, V. (2001) Configural features in the context of upright and inverted faces. Perception, 30: 73–83. Lee, D. and Seung, H. (1999) Learning the parts of objects by non-negative matrix factorization. Nature, 401: 788–791. Leopold, D.A., O’Toole, A., Vetter, T. and Blanz, V. (2001) Prototype-referenced shape encoding revealed by high-level after effects. Nat. Neurosci., 4: 89–94. Leventhal, H. and Sharp, E. (1965) Facial expression as indicators of distress. In: Tomkins, S.S. and Izard, C.E. (Eds.), Affect, Cognition and Personality: empirical Studies. Springer, New York, pp. 296–318. Lewis, M.B. and Edmonds, A.J. (2003) Face detection: mapping human performance. Perception, 32(8): 903–920. Li, B. and Chellappa, R. (2001) Face verification through tracking facial features. J. Op. Soc. Am. A, 18(12): 2969–2981. Li, S. and Jain, A. (Eds.). (2004) Handbook of Face Recognition. Springer, New York. Liu, J., Harris, A. and Kanwisher, N. (2002) Stages of processing in face perception: an MEG study. Nat. Neurosci., 5: 910–916. Marr, D. (1982) Vision. Freeman Publishers, San Francisco. McNeil, J.E. and Warrington, E.K. (1993) Prosopognosia: a face-specific disorder. Quart. J. Exp. Psychol., 46A: 1–10.
342 Moscovitch, M., Winocur, G. and Behrmann, M. (1997) What is special about face recognition? Nineteen experiments on a person with visual object agnosia and dyslexia but normal face recognition. J. Cogn. Neurosci., 9: 555–604. Munhall, K.G., Jones, J.A., Callan, D.E., Kuratate, T. and Vatikiotis-Bateson, E. (2004) Visual prosody and speech intelligibility: head movement improves auditory speech perception. Psychol. Sci., 15(2): 133–137. Murray, J.E., Yong, E. and Rhodes, G. (2000) Revisiting the perception of upside-down faces. Psychol. Sci., 11: 498–502. Nummenmma, T. (1964) The language of the face. In: Jyvaskyla Studies in Education, Psychology, and Social Research. Jyvaskyla, Finland. O’Craven, K.M., Downing, P.E. and Kanwisher, N. (1999) FMRI evidence for objects as the units of attentional selection. Nature, 401: 584–587. O’Toole, A.J., Phillips, P.J., Cheng, Y., Ross, B. and Wild, H.A. (2000) Face recognition algorithms as models of human face processing. Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France. O’Toole, A.J., Roark, D.A. and Abdi, H. (2002) Recognizing moving faces: a psychological and neural synthesis. Trends Cogn. Sci., 6(6): 261–266. Perret, D.I., Hietanen, J.K., Oram, M.W. and Benson, P.J. (1992) Organization and functions of cells in the macaque temporal cortex. Philos. Trans. R. Soc. Lond. Ser. B, 335: 23–50. Perret, D.I., Mistlin, A.J. and Chitty, A.J. (1987) Visual neurones responsive to faces. Trends Neurosci., 10: 358–364. Perret, D.I. and Oram, M.W. (1993) Image Vis. Comput., 11: 317–333. Perret, D.I., Rolls, E.T. and Caan, W. (1982) Visual neurons responsive to faces in the monkey temporal cortex. Exp. Brain Res., 47: 329–342. Phillips, P.J. (1999) Support vector machines applied to face recognition. Adv. Neural Inform. Process. Systems, 11: 803–809. Pike, G., Kemp, R., Towell, N. and Phillips, K. (1997) Recognizing moving faces: the relative contribution of motion and perspective view information. Vis. Cogn., 4: 409–437. Plutchik, R. (1962) The Emotions: Facts, Theories, and A New Model. Random House, New York. Puce, A., Allison, T., Bentin, S., Gore, J.C. and McCarthy, G. (1998) Temporal cortex activation in humans viewing eye and mouth movements. J. Neurosci., 18: 2188–2199. Rhodes, G., Brake, S. and Atkinson, A.P. (1993) What’s lost in inverted faces? Cognition, 47: 25–57. Roack, D.A., Barrett, S.E., Spence, M., Abdi, H. and O’Toole, A.J. (2003) Memory for moving faces: psychological and neural perspectives on the role of motion in face recognition. Behav. Cogn. Neurosci. Rev., 2(1): 15–46. Rock, I. (1973) Orientation and Form. Academic Press, New York. Rock, I. (1974) The perception of disoriented figures. Sci. Am., 230: 78–85. Rock, I. (1988) On Thompson’s inverted-face phenomenon (Research Note). Perception, 17: 815–817.
Rosenblum, M., Yacoob, Y. and Davis, L. (1996) Human expression recognition from motion using a radial basis function network architecture. IEEE Trans. Neural Networks, 7(5): 1121–1138. Rossion, B., Caldara, R., Seghier, M., Schuller, A.M., Lazeyras, F. and Mayer, E. (2003) A network of occipito-temporal face-sensitive areas besides the right middle fusiform gyrus is necessary for normal face processing. Brain, 126: 2381–2395. Rossion, B., Curran, T. and Gauthier, I. (2002) A defense of the subordinate-level expertise account for the N170 component. Cognition, 85: 189–196. Rotshtein, P., Henson, R.N.A., Treves, A., Driver, J. and Donlan, R.J. (2005) Morphing Marilyn into Maggie dissociates physical identity face representations in the brain. Nat. Neurosci., 8(1): 107–113. Sayette, M.A., Cohn, J.F., Wertz, J.M., Perrott, M.A. and Dominic, J. (2001) A psychometric evaluation of the facial action coding system for assessing spontaneous expression. J. Nonverb. Behav., 25: 167–186. Schwaninger, A., Carbon, C.C. and Leder, H. (2003a) Expert face processing: specialization and constraints. In: Schwarzer, G. and Leder, H. (Eds.), Development of Face Processing. Go¨ttingen, Hogrefe, pp. 81–97. Schwaninger, A., Lobmaier, J. and Collishaw, S.M. (2002) Component and configural information in face recognition (Lectures Notes). Comput. Sci., 2525: 643–650. Schwaninger, A., Lobmaier, J. and Fischer, M. (2005) The inversion effect on gaze perception is due to component information. Exp. Brain Res., 167: 49–55. Schwaninger, A. and Mast, F. (2005) The face inversion effect can be explained by capacity limitations of an orientation normalization mechanism. Jpn. Psychol. Res., 47(3): 216–222. Schwaninger, A., Ryf, S. and Hofer, F. (2003b) Configural information is processed differently in perception and recognition of faces. Vis. Res., 43: 1501–1505. Schwaninger, A., Wallraven, W. and Bu¨lthoff, H.H. (2004) Computational modeling of face recognition based on psychophysical experiments. Swiss J. Psychol., 63(3): 207–215. Searcy, J.H. and Bartlett, J.C. (1996) Inversion and processing of component and spacial-relational information in faces. J. Exp. Psychol.: Hum. Percept. Perform., 22(4): 904–915. Sekuler, A.B., Gaspar, C.M., Gold, J.M. and Bennett, P.J. (2004) Inversion leads to quantitative, not qualitative, changes in face processing. Curr. Biol., 14(5): 391–396. Sergent, J. (1984) An investigation into component and configural processes underlying face perception. Br. J. Psychol., 75: 221–242. Sergent, J. (1985) Influence of task and input factors on hemispheric involvement in face processing. J. Exp. Psychol.: Hum. Percept. Perform., 11(6): 846–861. Swets, D. and Weng, J. (1996) Using discriminant eigenfeatures for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 18: 831–836. Tanaka, J.W. and Farah, M.J. (1991) Second-order relational properties and the inversion effect: testing a theory of face perception. Percept. Psychophys., 50(4): 367–372.
343 Tanaka, J.W. and Farah, M.J. (1993) Parts and wholes in face recognition. Quart. J. Exp. Psychol., 46A(2): 225–245. Terzopoulos, D. and Waters, K. (1993) Analysis and synthesis of facial image sequences using physical and anatomical models. IEEE Trans. Pattern Anal. Mach. Intell., 15: 569–579. Thompson, P. (1980) Margaret Thatcher: a new illusion. Perception, 9: 483–484. Tian, Y., Kanade, T. and Cohn, J. (2001) Recognizing action units for facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell., 23(2): 97–115. Tronick, E., Als, H. and Brazelton, T.B. (1980) Monadic phases: a structural descriptive analysis of infant-mother face-to-face interaction. Merrill-Palmer Quart. Behav. Dev., 26: 3–24. Turk, M. and Pentland, A. (1991) Eigenfaces for recognition. J. Cogn. Neurosci., 3: 72–86. Valentine, T. (1995) Cognitive and Computational Aspects of Face Recognition: Explorations in Face Space. Routledge, London. Valentine, T. and Bruce, V. (1988) Mental rotation of faces. Mem. Cogn., 16(6): 556–566. Vuilleumier, P., Armony, J.L., Driver, J. and Dolan, R.J. (2001) Effects of attention and emotion on face processing in the human brain: an event-related fMRI study. Neuron, 30: 829–841. Wacholtz, E. (1996) Can we learn from the clinically significant face processing deficits, prosopagnosia and capgras delusion? Neuropsychol. Rev., 6: 203–258. Wachsmuth, E., Oram, M.W. and Perret, D.I. (1994) Recognition of objects and their component parts: responses of single units in the temporal cortex of the macaque. Cereb. Cortex, 4: 509–522. Wallraven, C., Breidt, M., Cunningham, D.W. and Bu¨lthoff, H.H. (2005a) Psychophysical evaluation of animated facial expressions. Proceedings of the 2nd Symposium on Applied Perception in Graphics and Visualization, A Corun˜a, Spain, pp. 17–24.
Wallraven, C., Schwaninger, A. and Bu¨lthoff, H.H. (2005b) Learning from humans: computational modeling of face recognition. Network: Comput. Neural Syst, 16(4): 401–418. Wallraven, C., Schwaninger, A., Schuhmacher, S. and Bu¨lthoff, H.H. (2002) View-based recognition of faces in man and machine: re-visiting inter-extra-ortho (Lectures Notes). Comput. Sci., 2525: 651–660. White, M. (2000) Parts and wholes in expression recognition. Cogn. Emotion, 14(1): 39–60. Williams, M.A., Moss, S.A. and Bradshaw, J.L. (2004) A unique look at face processing: the impact of masked faces on the processing of facial features. Cognition, 91: 155–172. Winston, J.S., Henson, R.N.A., Fine-Goulden, M.R. and Dolan, R.J. (2004) fMRI-adaption reveals dissociable neural representations if identity and expression in face perception. J. Neurophysiol., 92: 1830–1839. Wiskott, L., Fellous, J. and von der Malsburg, C. (1997) Face recognition by elastic bunch graph matching. IEEE Trans. Pattern Anal. Mach. Intell., 19: 775–779. Xu, Y. (2005) Revisiting the role of the fusiform face area in visual expertise. Cereb. Cortex, 15(8): 1234–1242. Xu, Y., Liu, J. and Kanwisher, N. (2005) The M170 is selective for faces, not for expertise. Neuropsychology, 43: 588–597. Yamane, S., Kaji, S. and Kawano, K. (1988) What facial features activate face neurons in the inferotemporal cortex of the monkey? Exp. Brain Res., 73: 209–214. Yin, R. (1969) Looking at upside-down faces. J. Exp. Psychol., 81(1): 141–145. Young, A.W., Hellawell, D. and Hay, D.C. (1987) Configural information in face perception. Perception, 16: 747–759. Yovel, G. and Kanwisher, N. (2005) The neural basis of the behavioral face-inversion effect. Curr. Biol., 15: 2256–2262. Zhang, Z., Lyons, M., Schuster, M. and Akamatsu, S. (1998) Comparison between geometry-based and Gabor-waveletsbased facial expression recognition using multi-layer perceptron. Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, pp. 454–459.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 19
Investigating audiovisual integration of emotional signals in the human brain Thomas Ethofer1,2,, Gilles Pourtois3 and Dirk Wildgruber2 1
Section of Experimental MR of the CNS, Department of Neuroradiology, Otfried-Mu¨ller-Str. 51, University of Tu¨bingen, 72076 Tu¨bingen, Germany 2 Department of General Psychiatry, University of Tu¨bingen, Tu¨bingen, Germany 3 Laboratory for Neurology and Imaging of Cognition, Departments of Neurology and Neurosciences, Centre Me´dical Universitaire, University of Geneva, Geneva, Switzerland
Abstract: Humans can communicate their emotional state via facial expression and affective prosody. This chapter reviews behavioural, neuroanatomical, electrophysiological and neuroimaging studies pertaining to audiovisual integration of emotional communicative signals. Particular emphasis will be given to neuroimaging studies using positron emission tomography (PET) or functional magnetic resonance imaging (fMRI). Conjunction analyses, interaction analyses, correlation analyses between haemodynamic responses and behavioural effects and connectivity analyses have been employed to analyse neuroimaging data. There is no general agreement as to which of these approaches can be considered ‘‘optimal’’ to classify brain regions as multisensory. We argue that these approaches provide complementing information as they assess different aspects of multisensory integration of emotional information. Assets and drawbacks of the different analysis types are discussed and demonstrated on the basis of one fMRI data set. Keywords: conjunction analysis; connectivity analysis; correlation analysis; emotion; facial expression; interaction analysis; multisensory; prosody
perceptual sensitivity and shortened response latencies on a behavioural level (Miller, 1982, 1986; Schro¨ger and Widmann, 1998). This is of particular importance for perception of emotionally relevant information which can be simultaneously perceived via the visual modality (e.g. facial expression, body postures and gestures) and the auditory modality (e.g. affective prosody and propositional content). It has been demonstrated that congruency in information expressed via facial expression and affective prosody facilitates behavioural reactions to such emotion-laden stimuli (Massaro and Egan, 1996; de Gelder and Vroomen, 2000; Dolan et al., 2001). Furthermore, affective information obtained via one sense can alter information processing in another, for example a facial expression is more likely
Behavioural studies In natural environment, most events generate stimulation via multiple sensory channels. Integration of inputs from different modalities enables a unified representation of the world and can provide information that is unavailable from any single modality in isolation. A compelling example of such merging of information is the McGurk effect (McGurk and MacDonald, 1976) in which a heard syllable / ba/ and a seen syllable /ga/ are perceived as /da/. Moreover, integration of information obtained from different modalities can result in enhanced Corresponding author. Tel.: +49-7071-2987385; Fax: +497071-294371; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56019-4
345
346
to be perceived as fearful if accompanied by a fearful (as opposed to a neutral) voice (Massaro and Egan, 1996; Ethofer et al., 2006). Such crossmodal biases occur even under the explicit instruction to ignore information conveyed in the concurrent channel (de Gelder and Vroomen, 2000; Ethofer et al., 2006) and are unconstrained by the allocation of attentional resources (Vroomen et al., 2001). These findings argue for a mandatory nature of processes underlying integration of facial and vocal affective information.
Neuroanatomical studies In animal experiments, several areas with converging projections from visual and auditory cortices have been identified. Such convergence zones (Damasio, 1989) constitute candidate regions for mediation of audiovisual integration and crossmodal effects in humans (for a review, see Mesulam, 1998; Driver and Spence, 2000; Calvert, 2001). These regions include both cortical structures, such as the banks of the superior temporal sulcus (STS; Jones and Powell, 1970; Seltzer and Pandya, 1978), the insula (Mesulam and Mufson, 1982) and the orbitofrontal cortex (Jones and Powell, 1970; Chavis and Pandya, 1976), as well as subcortical structures comprising the superior colliculus (Fries, 1984), claustrum (Pearson et al., 1982) and several nuclei within the amygdala (Turner et al., 1980; Murray and Mishkin, 1985; McDonald, 1998; Pitka¨nen, 2000) and thalamus (Mufson and Mesulam, 1984). At the single neuron level, the most intensively studied of these convergence zones is the superior colliculus (Gordon, 1973; Meredith and Stein, 1983; Peck, 1987; Wallace et al., 1993, 1996) which plays a fundamental role in attention and orientation behaviour (for review, see Stein and Meredith, 1993). On the basis of their detailed studies on multisensory neurons in deep layers of the superior colliculus, Stein and colleagues (Stein and Meredith, 1993) formulated a series of key ‘‘integration rules’’: First, multimodal stimuli that occur in a close temporal and spatial proximity elicit supra-additive responses (i.e. the number of impulses to a bimodal stimulus exceeds the arithmetic sum of impulses
to the respective unimodal stimuli). Second, the stronger these crossmodal interaction effects, the less effective are the unimodal stimuli in generating a response in the multisensory cell (inverse effectiveness rule). Third, spatially disparate crossmodal cues result in pronounced response depression in multisensory cells (i.e. the response to a stimulus can be severely diminished by a spatially incongruent stimulus from another modality). A similar behaviour has been described for multisensory convergence sites in the cortex, such as the banks of the STS (Bruce et al., 1981; Hikosaka et al., 1988; Barraclough et al., 2005) and posterior insula (Loe and Benevento, 1969; Fallon et al., 1978). However, neurons in these cortical regions and the superior colliculus do not project to or receive afferents from each other (Wallace et al., 1993) and show different sensitivity to spatial factors (Stein and Wallace, 1996). Therefore, it is believed that they fulfil different integrative functions (attention/orientating behaviour in the superior colliculus vs. perceptual judgements in the cortex; Stein et al., 1996). Animal experiments provided insight into possible multisensory integration sites in the brain that enable definition of regions of interest for analysis of human neuroimaging studies. With a typical spatial resolution of 3 3 3 mm3, each data point acquired in a neuroimaging experiment is attributable to the averaged response of several millions of neurons (Goldman-Racic, 1995). In view of the fact that only about 25% of the neurons in cerebral multisensory integration sites are sensitive to stimuli from more than one modality (Wallace et al., 1992), the effects elicited by multisensory integration processes are expected to be small. Restricting the search region to small anatomical structures strongly improves the sensitivity to identify such integration sites by reducing the problem of multiple comparisons (Worsley et al., 1996).
Electrophysiological studies Recording of electric brain responses over the human scalp (event-related potentials or ERPs) has been primarily employed to investigate the time course of crossmodal binding of affective
347
audiovisual information, given the high temporal resolution of this technique. De Gelder et al. (1999) demonstrated that a facial expression with conflicting emotional information in relation to a simultaneously presented affective voice evokes an early mismatch-negativity response around 180 ms after its onset. These ERP findings indicate that auditory processing is modulated by concurrent visual information. A subsequent ERP study from the same group demonstrated that the auditory N1 component occurring around 110 ms after presentation of an affective voice is significantly enhanced by an emotionally congruent facial expression. This effect occurs for upright but not for inverted faces (Pourtois et al., 2000). Recognition of emotional facial expressions is substantially hindered by face inversion (White, 1999). Thus, the observation that modulation of auditory ERP components is restricted to upright faces suggests that this effect is driven by the expressed facial affect and is not attributable to low-level pictural features of the visual stimuli. Finally, an analysis focused on the positive deflection following the N1-P1 component around 220 ms poststimulus revealed a shortened latency of this deflection in emotionally congruent as compared to incongruent audiovisual trials (Pourtois et al., 2002). These faster ERP responses parallel behavioural effects showing facilitated responses to affective prosody when presented simultaneously with a congruent versus an incongruent emotional facial expression (de Gelder and Vroomen, 2000). In summary, electrophysiological studies on audiovisual integration of emotional information indicate that multisensory integration occurs at an early stage of cerebral processing (i.e. around 110–220 ms poststimulus). The observation that crosstalk between the modalities takes place during the early perceptual rather than during late decisional stages offers an explanation for the finding that crossmodal biases between the modalities occur irrespective of attentional resources (Vroomen et al., 2001) and instructions to ignore a concurrent stimulus (de Gelder and Vroomen, 2000; Ethofer et al., 2006). Furthermore, these electrophysiological findings point to neuronal structures that conduct early steps in the processing of external information. However, the low spatial resolution
of ERP data does not allow inference on which brain regions are involved in integration of multisensory information.
Neuroimaging studies Positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) have been used to shed light on the functional neuroanatomy of multisensory integration of emotional information. However, definition of the appropriate analysis in multimodal studies is not trivial and several approaches have been applied to model audiovisual integration or crossmodal effects in the brain. These approaches include conjunction analyses, interaction analyses, correlation analyses with effects observed on behavioural level and connectivity analyses. We demonstrate the application of these approaches on the basis of one data set acquired in an event-related fMRI study conducted to investigate the neuronal correlates underlying audiovisual integration of emotional information from face and voice (Ethofer et al., 2006). Twelve right-handed subjects participated in this experiment conducted at a 1.5 T scanner (Siemens VISION, Erlangen, Germany) comprising two sessions with 36 visual (V) trials, two sessions with 36 auditory (A) trials and two sessions with 36 audiovisual (AV) trials. In the visual sessions, every trial consisted of a presentation of a facial expression shown for 1 s. These visual stimuli were obtained from the Ekman and Friesen series (1976) and comprised facial expressions ranging from neutral to 100% fear and from neutral to 100% happiness in incremental steps of 25% created by digital morphing techniques. In the auditory sessions, sentences spoken by professional actors in an either happy or fearful voice were presented. In the bimodal sessions, auditory and visual stimuli were presented with facial expressions being shown during the last second of the spoken sentences. Subjects were instructed to judge the emotional valence of the stimuli on a nine-point self-assessment manikin scale (SAM; Bradley and Lang, 1994) by pressing buttons in their right or left hand. In all the sessions, the SAM scale was presented for 4 s 200 ms after
348
stimulus offset (see Fig. 1a). In unimodal sessions, subjects rated the emotional valence of the presented stimuli (facial expressions or prosody). In the bimodal sessions, subjects were instructed to judge the emotional valence of the facial expression and ignore the concomitant affective voice. fMRI data were analysed using statistical parametric mapping software (SPM2, Welcome Department of Imaging Neuroscience, London, UK, http://www.fil.ion.ucl.ac.uk/spm). Coordinates of activation clusters are given in MNI space (Montreal Neurological Institute; Collins et al., 1994). Main effects of presentation of unimodal (A and V) and bimodal (AV) stimuli are shown in Fig. 1b. As expected, unimodal presentation of either emotional facial expressions or affectively spoken sentences engaged bilateral primary and higher-order visual or auditory cortices, respectively, while in bimodal trials both visual and auditory cortices were found active. In all three conditions, activations were found in bilateral
motor cortices and cerebellum which are most probably attributable to the motor responses made by the subjects to judge the stimuli. Furthermore, dorsolateral prefrontal cortices of both hemispheres presumably subserving working memory processes showed responses to stimulus presentation in both unimodal and bimodal sessions.
Conjunction analyses Conjunction analyses were originally introduced by Price and Friston (1997). The objective of this approach is to test for commonalities in brain activation induced by different stimuli or tasks. Recently, conjunctions employed in analysis of neuroimaging data have been revised and divided into those that test for a global null hypothesis (H0: No effect in any of components; H1: Significant effect in at least one of the components;
Fig. 1. Experimental design (a) and brain activation (b) for auditory (upper panels), visual (middle panels) and audiovisual trials (lower panels) as compared to rest. Brain activations are thresholded at a height threshold of po0.001 (uncorrected) and corrected for multiple comparisons at cluster level k450, po0.05 (corrected).
349
Friston et al., 1999, 2005) and those that test for a conjunction null hypothesis (H0: No effect in at least one of the components; H1: Significant effects in all of the components; Nichols et al., 2005). Since only rejection of the conjunction null hypothesis can be taken as evidence for a logical AND, all conjunction analyses reported here were carried out on the basis of a conjunction null hypothesis. An obvious property of multisensory neural structures is their responsiveness to stimuli obtained from more than one modality. Thus, a straightforward approach to locate brain regions containing such structures is a conjunction analysis on responses to unimodal stimuli of both modalities (Unimodal 1 \ Unimodal 2). This approach has been applied in a PET study demonstrating multisensory convergence zones in the left intraparietal sulcus for spatial attention to vision and touch (Macaluso et al., 2000) and an
fMRI study identifying audiovisual integration sites of motion processing in lateral parietal cortex (Lewis et al., 2000). In our study, the intersection A \ V revealed significant responses in candidate areas for audiovisual convergence, such as the posterior thalamus extending into the superior colliculus and the right posterior temporooccipitoparietal junction (see Fig. 2). However, areas presumably involved in the nonsensory components of the task, such as the dorsolateral prefrontal cortex (working memory) and motor cortex, supplementary motor area and cerebellum (motor responses), also showed significant responses during both unimodal sessions. These findings illustrate that the results provided by the simple intersection A \ V cannot separate multimodal convergence zones from unspecific activations attributable to nonsensory components of the task. Therefore, results produced by such a conjunction do not necessarily reflect multisensory convergence zones.
Fig. 2. Intersection of brain activation during unimodal auditory and visual stimulation (A \ V). Brain activations are thresholded at a height threshold of po0.001 (uncorrected) and corrected for multiple comparisons at cluster level, k450, po0.05 (corrected).
350
Furthermore, brain regions responding exclusively to bimodal stimulation or achieving suprathreshold activations by supra-additive responses induced by the simultaneous presence of stimuli of two modalities will be missed compromising the sensitivity of this approach. Both the lack of specificity for multimodal integration sites as well as the impaired sensitivity to detect regions responding exclusively to multimodal stimuli can be overcome by investigating brain areas that show a significantly stronger response to bimodal stimuli than to unimodal stimuli of both modalities. This can be achieved by computing the conjunction [Bimodal – Unimodal 1] \ [Bimodal – Unimodal 2]. This analytic strategy has been employed by Grefkes et al. (2002) to investigate brain regions subserving crossmodal transfer of visuotactile information and in a comparable way by Calvert et al. (1999) to detect neural structures involved in processing of audiovisual speech. Recently, a more elaborated form of this approach was used by Pourtois et al. (2005) in a PET study on audiovisual integration of emotional information. In this study, the experimental design included two different AV conditions in which subjects were instructed to judge either the information from the visual (AV(judge V)) or auditory channel (AV(judge A)). The conjunction of [AV(judge A) – A(judge A)] \ [AV(judge V) – V(judge V)] was computed which represents a sound way to remove task-related brain responses. However, it should be noted that any conjunction analysis based on a conjunction null hypothesis (Nichols et al., 2005) is a very conservative strategy which only gives an upper bound for the false positive rate (Friston et al., 2005). While such conjunction analyses remain valid even in statistical worst-case scenarios (Nichols et al., 2005), their conservativeness must be paid for by a loss of sensitivity (Friston et al., 2005). This is especially critical if two differential contrasts that are expected to yield small effects are submitted to such an analysis. Accordingly, in none of the studies that employed this approach (Calvert et al., 1999; Grefkes et al., 2002; Pourtois et al., 2005) brain activations were significant if corrected for multiple comparisons across the whole brain. Therefore, it is decisive to increase the sensitivity of such
conjunctions by correcting the search volume to small anatomical regions (small volume correction (SVC); Worsley et al., 1996). Definition of regions of interest in our analysis relied on knowledge inferred from neuroanatomical studies and previous neuroimaging studies and comprised the cortex adjacent to the posterior STS (Jones and Powell, 1972; Seltzer and Pandya, 1978; Beauchamp et al., 2004a, 2004b; van Atteveldt et al., 2004), orbitofrontal cortex (Jones and Powell, 1972; Chavis and Pandya, 1976), insular cortex (Mesulam and Mufson, 1982), claustrum (Pearson et al., 1992; Olson et al., 2002), superior colliculus (Fries, 1984; Calvert et al., 2000), thalamus (Mufson and Mesulam, 1984) and amygdala (Turner et al., 1980; McDonald, 1998; Pitka¨nen, 2000; Dolan et al., 2001). The conjunction analysis [AV – V] \ [AV – A] revealed activation clusters in bilateral posterior STS, right orbitofrontal cortex, bilateral posterior thalamus and right posterior insula/ claustrum (Fig. 3). The activation cluster in the left posterior STS was significant (SVC on a 6 mm radius spherical volume of interest centred at x ¼ –50, y ¼ –54, z ¼ 6, a priori coordinates derived from Beauchamp et al., 2004b). This result is in keeping with reports showing stronger responses in the posterior STS cortices during audiovisual presentation of objects (Beauchamp et al., 2004b), letters (van Atteveldt et al., 2004) and speech (Wright et al., 2003; van Atteveldt et al., 2004) than with unimodal presentation of these stimuli. Thus, there is converging evidence implicating posterior STS cortices in integration of audiovisual stimuli irrespective of the type of information conveyed by these stimuli. Furthermore, a recent PET study (Pourtois et al., 2005) demonstrated increased cerebral blood flow in the left middle temporal gyrus during audiovisual presentation of emotional information as compared to isolated presentation of emotional faces or voices. However, it has to be noted that the activation cluster in our study was localized more posterior and superior than the cluster described by Pourtois et al. (2005). Differences in the imaging modality and task instructions (gender differentiation in the PET study of Pourtois et al., 2005 as compared to rating of emotional information in the fMRI study by Ethofer et al., 2006) might
351
Fig. 3. Conjunction analysis [AV – A] \ [AV – V] showing activations in (a) bilateral posterior superior temporal sulcus (x ¼ –54, y ¼ –51, z ¼ 12, Z ¼ 2.90, k ¼ 51 and x ¼ 51, y ¼ –42, z ¼ 12, Z ¼ 2.79, k ¼ 119 for the left and right STS, respectively), right orbitofrontal cortex (x ¼ 39, y ¼ 24, z ¼ –12, Z ¼ 2.69, k ¼ 64), (b) bilateral posterior thalamus (x ¼ –30, y ¼ –30, z ¼ 0, Z ¼ 2.82, k ¼ 120 and x ¼ 12, y ¼ –24, z ¼ 9, Z ¼ 2.88, k ¼ 76 for the left and right thalamic cluster, respectively) and (c) right posterior insula/ claustrum (x ¼ 36, y ¼ –3, z ¼ 6, Z ¼ 2.45, k ¼ 124). Brain activations are thresholded at a height threshold of po0.05 (uncorrected). (d) Event-related responses to unimodal auditory (red), unimodal visual (blue) and bimodal (magenta) stimuli in the left posterior STS.
constitute an explanation for the different localization of the activation clusters within the left temporal lobe in the two studies. Another promising approach employed by Pourtois et al. (2005) to investigate brain regions subserving integration of audiovisual emotional information is to define separate conjunction analyses for specific emotions, such as [AVhappy – Ahappy] \ [AVhappy – Vhappy] or [AVfear – Afear] \ [AVfear – Vfear]. The results of this analysis enable inference on the localization of brain regions showing stronger responses if a certain emotion is signalled via two modalities as compared to unimodal presentation of this emotion in either modality. We submitted the fMRI data set reported here to an analogous conjunction analysis. In this analysis, facial expressions were considered to express happiness or fear, if they showed at least 50% of the respective emotion. The conjunction [AVhappy – Ahappy] \ [AVhappy – Vhappy] revealed stronger responses for bimodal presentation of happiness as compared to responses to unimodal
presentation of either happy voices or happy facial expressions in the right posterior insula/claustrum (x ¼ 39, y ¼ –6, z ¼ –3, Z ¼ 3.66, k ¼ 91, po0.05 SVC corrected for right insula). The only brain structure included in our regions of interest showing enhanced responses to audiovisual fear as compared to unimodally presented fear [AVfear – Afear] \ [AVfear – Vfear] was the right amygdala (x ¼ 27, y ¼ –9, z ¼ –24, Z ¼ 2.00, k ¼ 29). However, this activation failed to reach significance within the search volume comprising the amygdala. In conclusion, the results from conjunction analyses of our experiment suggest that neocortical areas in vicinity to the STS might be more generally concerned with integration of audiovisual signals, while phylogenetically older structures, such as the posterior insula or the amygdala, show additive responses if certain emotions are expressed in a congruent way via different sensory channels. However, one limitation of all analyses relying on conjunctions of [AV – V] \ [AV – A] is that they have the potential to detect brain regions in
352
which responses to information from auditory and visual channels sum up in a linear way and might therefore simply reflect areas in which both neurons responsive to unimodal auditory and unimodal visual information coexist without the need of multimodal integration in these areas (Calvert, 2001; Calvert and Thesen, 2004).
Interaction analyses Calvert and Thesen (2004) suggested that activations of multisensory integration sites should differ from the arithmetic sum of the respective activations to unimodal stimuli: If the response to a bimodal stimulus exceeds the sum of the unimodal responses [Bimodal4Unimodal 1+Unimodal 2], then this is defined as positive interaction; while if the summed responses are greater than the bimodal response [BimodaloUnimodal 1+Unimodal 2], then this is defined as negative interaction effect (Calvert et al., 2000, 2001). Usually, the most efficient way to investigate interactions between two factors is a 2 2 factorial design. Theoretically, such a 2 2 factorial design for investigating interactions between two sensory modalities would include one bimodal and two unimodal conditions, in which the subject judges some aspect of the presented stimuli, and a control condition which contains all components of the other conditions (e.g. working memory, judgement and motor responses), but no sensory stimulation. However, for all paradigms including a behavioural task, it is practically impossible to implement such a control condition since it is impossible for the subject to judge a specific aspect (e.g. gender or conveyed emotion) of a stimulus if it is not presented. Therefore, all imaging studies investigating interactions between the modalities omitted this control condition and simply compared haemodynamic responses obtained during bimodal stimulation to the sum of the responses of the unimodal conditions: Bimodal [Unimodal 1+Unimodal 2]. However, this omission of the control condition has serious consequences for the interpretation of both positive and negative interactions. For example, brain regions involved in nonsensory components of the task (e.g. working memory and
motor responses) showing similar positive responses in both unimodal and the bimodal trials will produce a negative interaction effect. Furthermore, brain areas which deactivate in a similar way in all three conditions will produce a positive interaction effect. Positive and negative interactions, as computed by AV – (A+V) are shown in Fig. 5a in red and green, respectively. At first glance, the finding of a positive interaction in the right inferior parietal cortex and the right orbitofrontal cortex is an interesting result. Inspection of the event-related responses during unimodal auditory, unimodal visual and bimodal trials, however, reveals that in both these regions the haemodynamic response decreases in all three conditions. The finding of unspecific deactivations to stimulus presentation in the right inferior parietal cortex is in agreement with the view that this area belongs to the resting state network and is tonically active in the baseline state (Raichle et al., 2003; Fox et al., 2005). Thus, the positive response in this region as calculated by the interaction AV – (A+V) is caused by the fact that the added deactivations during unimodal trials exceed the deactivation during bimodal trials (see Fig. 4b). A more complex behaviour of the haemodynamic response was found in the orbitofrontal cortex showing a decrease of the blood oxygen level dependent (BOLD) response with varying delay followed by a positive response further complicating the interpretation of the positive interaction computed by AV – (A+V). Negative interactions were found in dorsolateral prefrontal areas, motor cortex and cerebellum. Event-related responses in the right dorsolateral prefrontal cortex (see Fig. 4d), however, show that the negative interaction in this region is due to very similar responses in all three conditions. The vulnerability of the interaction AV – (A+V) to unspecific activations attributable to the behavioural task resulting in negative interactions and unspecific deactivations of resting state network components producing positive interactions demands caution in the application of this technique. Therefore, we suggest that interpretation of both positive and negative interactions calculated by this approach should be based on inspection of time series. Calvert et al. (2001) suggested that electrophysiological criteria for investigation of multimodal
353
Fig. 4. (a) Positive (red) and negative (green) interactions as calculated by AV– (A+V). Brain activations are thresholded at a height threshold of po0.001 (uncorrected). Event-related responses to unimodal auditory (red), unimodal visual (blue) and bimodal (magenta) stimuli in (b) the right lateral inferior parietal cortex (MNI coordinates: x ¼ 57; y ¼ –66; z ¼ 12, Z ¼ 4.81, k ¼ 36), (c) the right orbitofrontal cortex (MNI coordinates: x ¼ 39; y ¼ 21; z ¼ –21, Z ¼ 5.23, k ¼ 46) and (d) the right dorsolateral prefrontal cortex (MNI coordinates: x ¼ 45; y ¼ 6; z ¼ 21, Z ¼ 4.71, k ¼ 229).
integration should be applied to the BOLD effect. According to these criteria, cells subserving multimodal integration show responses to congruent information obtained via several modalities that exceed the sum of responses to the respective unimodal stimuli (supra-additivity: Bimodal(congruent)4Unimodal 1+Unimodal 2). Furthermore, conflicting multimodal information results in response depression in which the response to incongruent bimodal information is smaller than the stronger of the two unimodal responses (re-
sponse depression: Bimodal(incongruent)o Maximum(Unimodal 1, Unimodal 2)). Calvert et al. (2001) demonstrated that BOLD responses within the superior colliculi fulfil these criteria showing supra-additive responses to audiovisual nonspeech stimuli if they are presented in temporal synchrony and corresponding response depressions if the audiovisual stimuli are presented in an asynchronous manner. We investigated whether one of our regions of interest shows a comparable behaviour if congruence of audiovisual information is defined
354
by emotional content conveyed via affective prosody and facial expressions. To this end, we compared responses to audiovisual trials with congruent emotional information (e.g. showing at least 50% of the respective emotion in the facial expression as expressed via emotional prosody) to the sum of haemodynamic responses to unimodal visual and auditory trials (AV(congruent) – (A+V), see Fig. 5a). Obviously, this analysis technique is burdened with the same drawbacks as the simple interaction AV – (A+V) and not surprisingly, positive interactions driven by similar decreases of the BOLD response in all three conditions were found in the right lateral inferior parietal and orbitofrontal cortex. In addition, significant supra-additive responses were also found in the right posterior insula/claustrum (see Fig. 5b). Inspection of event-related responses demonstrates that the interaction found in this region is of a completely different nature than those found in the parietal and orbitofrontal cortex showing robust activation to congruent audiovisual trials and slightly negative responses to unimodal stimulation in either modality (see Fig. 5b). To investigate whether the responses of this region fulfil the criteria of response depression to conflicting emotional content as conveyed by voice and face, we plotted event-related responses to bimodal trials with incongruent emotional information
(see Fig. 5b). No evidence for a depression of responses below the level of unimodal responses was found. Instead, the results of this analysis suggest that the right posterior insula/claustrum also responds, albeit to some lesser extent, to audiovisual information conveying conflicting emotional information. The observation that this region shows strongest haemodynamic responses if auditory and visual information are simultaneously available is concordant with results obtained from single-cell recordings of the posterior insula cortex (Loe and Benevento, 1969). The finding that activation in this region is stronger during congruent than during conflicting audiovisual emotional information is in keeping with observations from an fMRI experiment on synchronized and desynchronized audiovisual speech (Olson et al., 2002). In this experiment, audiovisual stimuli presented in temporal synchrony resulted in stronger haemodynamic responses in the left claustrum than asynchronally presented stimuli. Olson et al. (2002) reasoned that audiovisual integration might be better explained by a ‘‘communication relay’’ model in which subcortical areas, such as the claustrum, receive their information directly from unimodal cortices than by a ‘‘site-specific’’ model assuming integration in multisensory brain regions, such as the posterior STS. We argue that increased haemodynamic responses as found for temporally congruent information in
Fig. 5. (a) Positive interaction in the right posterior insula/claustrum (MNI coordinates: x ¼ 39; y ¼ –6; z ¼ –3; Z ¼ 4.05; k ¼ 14; po0.05, SVC corrected) as calculated by AV(congruent) – (A+V). Brain activations are thresholded at a height threshold of po0.001 (uncorrected). (b) Event-related responses to unimodal auditory (red), unimodal visual (blue) and bimodal trials with congruent emotional information (magenta) and incongruent information (cyan) have been plotted.
355
the study of Olson et al. (2002) and emotionally congruent information in our study might reflect a more general role of the claustrum in determining whether multisensory information matches or not. A possible problem with application of neurophysiological criteria to analyses of the BOLD signal arises from the fact that the signal acquired in an fMRI is caused by haemodynamic phenomena. While supra-additivity is a well-known electrophysiological characteristic of multimodal cells (Stein and Meredith, 1993), there is reason to doubt that supra-additive neuronal responses must necessarily translate into supra-additive BOLD responses. There is compelling evidence that the BOLD response to two stimuli in temporal proximity is overpredicted by simply adding the responses to the two stimuli presented in isolation (Friston et al., 1998; Mechelli et al., 2001). This phenomenon has been named ‘‘haemodynamic refractoriness’’ and is specific for an fMRI since the major part of this nonlinear behaviour arises from the transformation of changes in regional cerebral blood flow to the BOLD response (Friston et al., 1998; Mechelli et al., 2001). The fact that a preceding or simultaneously presented stimulus can attenuate the BOLD response to a second stimulus might compromise the sensitivity of an fMRI data analysis in which responses during multimodal integration are expected to exceed the linear sum of the responses to unimodal stimuli. It might therefore be useful to estimate the neuronal response from the BOLD response via a plausible biophysical model (Friston et al., 2000; Gitelman et al., 2003) and search for regions in which the neuronal response exhibits supra-additive effects. While construction of a full 2 2 factorial design for investigation of interaction between sensory modalities is compromised by the lack of an appropriate control condition, investigating interactions between emotions expressed via these modalities is not burdened with this problem. Furthermore, no unimodal trials are required in factorial analyses designed to investigate interactions between emotional information expressed via two different modalities. Instead, such designs include only bimodal trials with two conditions in which emotional information is presented in a congruent way (e.g. happy face-happy voice (hH) and fearful
face-fearful voice (fF)) and for two conditions that were conflicting information is expressed via the two modalities (e.g. happy face-fearful voice (fH) and fearful face-happy voice (hF)). The interaction is then calculated by (hH – fH) – (hF – fF). For interpretation of results calculated by this interaction term it is worth noting that the interaction of emotional information is mathematically equivalent to contrasting congruent with incongruent conditions: (hH+fF) – (fH+hF). This factorial design was used by Dolan et al. (2001) to investigate interactions between visually and auditorily presented fear and happiness. In this study, participants were instructed to categorize the presented facial expression in either fearful or happy, while ignoring the simultaneously presented emotional voice. Behaviourally, the authors found a facilitation of the emotion categorization as indicated by shortened response latencies for congruent as compared to incongruent audiovisual trials. On a cerebral level, a significantly stronger activation of the left basolateral amygdala as calculated by (hH – fH) – (hF – fF) was found. Inspection of parameter estimates of the four conditions revealed that this interaction was mainly driven by an augmentation of haemodynamic responses during the fear congruency condition (fF). On the basis of their results, Dolan et al. (2001) concluded that it is the left amygdala that subserves crossmodal integration in fear processing. This interpretation is in line with observations from neuropsychological studies indicating that lesions of the amygdala can impair recognition of both fearful faces (Adolphs et al., 1994) and fearful voices (Scott et al., 1997) as well as neuroimaging studies of healthy subjects demonstrating enhanced activation to fear signalled via the face (Breiter et al., 1996; Morris et al., 1996) and the voice (Phillips et al., 1997). So far, little is known on the neuronal substrates underlying integration of audiovisual emotional information other than fear. The factorial design employed by Dolan et al. (2001) has the potential to clarify whether audiovisual integration of other basic emotions also occurs via the amygdala or is mediated by different neuroanatomical structures. However, we feel that interpretability of the results provided by such factorial designs could be improved by using
356
neutral facial expressions (N) and intonations (n) in combination with facial expressions (E) and intonations (e) of one emotion and then calculate the interaction accordingly (e.g. (eE – nE) – (eN – nN)).
Correlation analyses between brain activation and crossmodal behavioural effects On a behavioural level, crossmodal integration results in shortened response latencies to congruent bimodal information (Miller et al., 1982; Schro¨ger and Widmann, 1998; de Gelder and Vroomen, 2000). Furthermore, judgement of sensory information from one modality can be influenced by information obtained from another modality (McGurk and MacDonald, 1976). To investigate which brain regions mediate the McGurk illusion, Jones and Callan (2003) employed the McGurk paradigm in an fMRI experiment and correlated haemodynamic responses with the influence of visual cues on judgement of auditory information. Activity in the occipitotemporal junction was found to correlate with the strength of the McGurk illusion, suggesting that modulation of responses within this region might constitute the neural substrate of the McGurk effect (Jones and Callan, 2003). Crossmodal biases also occur in perception of emotional information (Massaro and Egan,
1996; de Gelder and Vroomen, 2000) and it has been demonstrated that fearful or neutral faces are perceived as more fearful when accompanied by a fearful voice (see Fig. 6a, Ethofer et al., 2006). To examine which brain regions mediate this shift in judgement of facial affect, we correlated the difference in brain responses to facial expressions in presence and absence of a fearful voice with the difference of the subjects’ valence rating of the facial expressions in both conditions. A significant correlation was found in the left basolateral amygdala extending into the periamygdaloid cortex (see Fig. 6b). This finding indicates that cognitive evaluation of emotional information signaling threat or danger is modulated by amygdalar responses and is in agreement with the view that the amygdala has a key integrative role in processing of emotional content, particularly when fear is expressed across sensory channels (Dolan et al., 2001). Response-related correlation analyses between brain activation and crossmodal behavioural effects represent a useful approach to model systems associated with the behavioural outcome of multisenory integration. Connectivity analyses All analysis techniques discussed so far are concerned with the localization of brain areas that
Fig. 6. (a) Valence ratings of facial expressions ranging from 0% to 100% fear in presence (red) and absence (blue) of a fearfully spoken sentence. (b) Correlation analysis between crossmodal impact of fearful voices on judgement of faces and BOLD response revealed a cluster within the left amygdala (MNI coordinates: x ¼ –24, y ¼ –6, z ¼ –24, Z ¼ 3.84, k ¼ 42, po0.05, SVC). (c) Eventrelated responses of the left basolateral amygdala for trials in which faces were judged as being more fearful when presented together with a fearful voice (red) and trials where no such shift in interpretation occurred (blue). Modified from Ethofer et al. (2006).
357
mediate a certain cognitive process of interest. Recent developments (Friston et al., 2003; Gitelman et al., 2003) in modeling effective connectivity between brain regions (i.e. the influence one neural system exerts over another, Friston et al., 1997) offer the opportunity to investigate neural interactions between sensory systems. In an fMRI study on the neural substrates of speaker recognition, von Kriegstein et al. (2005) found that familiar voices activate the fusiform face area (FFA, Kanwisher et al., 1997). A psychophysiological interaction (PPI) analysis (Friston et al., 1997) with FFA activity as physiological and familiarity of the voices as psychological factor revealed that this crossmodal activation of face-sensitive areas by an auditory stimulus was driven by activity of voicesensitive areas in the middle part of the STS. On the basis of these results, von Kriegstein et al. (2005) suggested that assessment of person familiarity does not necessarily engage a ‘‘supramodal cortical substrate’’ but can be the result of a ‘‘supramodal process’’ constituting an enhanced functional coupling of face- and voice-sensitive cortices. In our study, significantly stronger activations were found in the right FFA (MNI coordinates: x ¼ 24, y ¼ –69, z ¼ –15, Z ¼ 3.91, k ¼ 46) during judgement of facial expressions in presence of fearful as compared to happy voices (Ethofer et al., 2006). To elucidate which cortical areas mediate this crossmodal effect of a fearful voice on processing within the fusiform face area, a PPI analysis with activity of the right FFA as physiological and emotion expressed via affective prosody (fear or happiness) as psychological factor was carried out. Increased effective connectivity between the right FFA and left basolateral amygdala/periamygdaloid cortex (MNI coordinates: x ¼ –18; y ¼ –12; z ¼ –30; Z ¼ 2.68; k ¼ 5, po0.05 small volume corrected for the amygdalar cluster in which activity was correlated with behavioural responses, see above) was found during rating of facial expressions in presence of a fearful as compared to a happy voice. Even at low thresholds (po0.05, uncorrected), no evidence for modulation of responses in the right FFA by activity in STS regions was found, suggesting that crossmodal effects of emotional auditory information on FFA responses is not mediated by direct coupling
between voice- and face-sensitive cortices, as described previously for speaker identification (von Kriegstein et al., 2005), but rather via supramodal relay areas, such as the amygdala. The amygdala is anatomically well positioned to provide such a supramodal relay function (Murray and Mishkin, 1985) since it is bidirectionally connected with visual and auditory higher-order cortices (Pitka¨nen, 2000). Furthermore, augmented effective connectivity between amygdala and fusiform cortex in context of a fearful voice is in agreement with data obtained from lesion (Adolphs et al., 1994; Scott et al., 1997) and neuroimaging studies (Breiter et al., 1996; Morris et al., 1996; Phillips et al., 1997) implicating the amygdala in fear processing, and results from a PET experiment suggest that the amygdala exerts a top-down control on neural activity in extrastriate cortex (Morris et al., 1998). Although anatomically segregated, voice- and face-processing modules have to interact to form a unified percept of the emotional information expressed via different sensory channels. Analyses of effective connectivity have the potential to investigate the interaction between these modules and could elucidate whether integration is achieved via supramodal nodes or by direct coupling of faceand voice-sensitive cortices.
Conclusion Integration of information from different sensory channels is a complex phenomenon and recruits several cerebral structures. Application of different types of analyses aimed at identifying multisensory integration sites to the fMRI data set presented here revealed the cortex in the left posterior STS, the right posterior insula/claustrum and the left amygdala as being implicated in audiovisual integration. The left posterior STS responded significantly stronger to bimodal stimuli than to isolated presentation of either faces or voices as determined by the conjunction [AV – V] \ [AV – A]. Notably, the left posterior STS did not show supra-additive BOLD responses as determined by the comparison of bimodal to the sum of unimodal responses (AV – (A+V)). This lack of supra-additivity in
358
posterior STS regions in our study is in agreement with results obtained in previous fMRI experiments on audiovisual integration (Beauchamp et al., 2004a, b; van Atteveldt et al., 2004). On the basis of the observation that the BOLD response in posterior STS to audiovisual stimuli does not exceed the sum of responses to the respective unimodal stimuli as expected from singlecell recordings (Bruce et al., 1981; Hikosaka et al., 1988; Barraclough et al., 2005), Beauchamp (2005) challenged the concept of supra-additivity as an appropriate criterion for definition of multisensory regions in neuroimaging. The ‘‘haemodynamic refractoriness’’ (Friston et al., 1998) of BOLD responses to temporally proximate stimuli might constitute a possible explanation for this discrepancy between fMRI and electrophysiological data. The right posterior insula/claustrum responded stronger to congruent audiovisual emotional information than to unimodal information of both modalities or to incongruent audiovisual information. This stronger responsiveness to congruent than to conflicting audiovisual information is in line with previous reports on the claustrum showing stronger responses to synchronized than to desynchronized audiovisual speech (Olson et al., 2002). These converging findings obtained across domains (temporal domain in the study of Olson et al. (2002) and emotional domain in our study) suggest a more general role of the claustrum in processes that determine whether information gathered across different channels matches or not. Future studies relying on analyses of effective connectivity should address the question whether the claustrum receives its information directly from unimodal cortices as suggested by the ‘‘communication relay’’ model or via multimodal cortices in posterior STS. Activation in the left amygdala was found to correlate with changes in rating of emotional facial expressions induced by a simultaneously presented fearful voice (Ethofer et al., 2006). Correlation of amygdalar activity with behavioural effects suggests that the amygdala modulates cognitive judgements. This finding is consistent with previous suggestions implicating the amygdala in integration of emotional information obtained from different modalities, particularly if this information signals
threat or danger (Dolan et al., 2001). The right fusiform cortex showed stronger responses when facial expressions were rated in presence of a fearful as compared to a happy voice (Ethofer et al., 2006). A psychophysiological interaction analysis revealed enhanced effective connectivity between the left amygdala and right fusiform cortex providing a possible neural basis for the observed behavioural effects. The aim of this chapter was to review the different methodological approaches to model multisensory integration in neuroimaging including conjunction analyses, interaction analyses, correlation analyses between brain responses and behavioural effects and connectivity analyses. None of these approaches can be considered as the optimal method to clarify as to which brain structures participate in multisensory integration. Rather, we would like to emphasize that each of these analyses elucidates different aspects of the interplay of brain regions in integrational processes and thus provides complementing information. Abbreviations A AV BOLD EE eN ERP fF FFA fH fMRI H0 hF hH MNI nE
auditory audiovisual blood oxygen level dependent bimodal trial with emotional voice and emotional face bimodal trial with emotional voice and neutral face event-related potentials bimodal trial with fearful voice and fearful face fusiform face area bimodal trial with fearful voice and happy face. functional magnetic resonance imaging null hypothesis bimodal trial with happy voice and fearful face bimodal trial with happy voice and happy face Montreal Neurological Institute bimodal trial with neutral voice and emotional face
359
nN PET PPI SAM SPM STS SVC V
bimodal trial with neutral voice and neutral face positron emission tomography psychophysiological interaction self-assessment manikin statistical parametric mapping superior temporal sulcus small volume correction visual
Acknowledgments This study was supported by the Deutsche Forschungsgemeinschaft (SFB 550) and by the Junior Science Programme of the Heildelberger Academy of Sciences and Humanities. References Adolphs, R., Tranel, D., Damasio, H. and Damasio, A. (1994) Impaired recognition of emotion in facial expressions following bilateral damage to the human amygdala. Nature, 372: 669–672. Barraclough, N.F., Xiao, D., Baker, C.I., Oram, M.W. and Perret, D.I. (2005) Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. J. Cogn. Neurosci., 17: 377–391. Beauchamp, M.S. (2005) Statistical criteria in FMRI studies of multisensory integration. Neuroinformatics, 3: 93–113. Beauchamp, M.S., Argall, B.D., Bodurka, J., Duyn, J.H. and Martin, A. (2004a) Unravelling multisensory integration: patchy organization within human STS multisensory cortex. Nat. Neurosci., 7: 1190–1192. Beauchamp, M.S., Lee, K.E., Argall, B.D. and Martin, A. (2004b) Integration of auditory and visual information about objects in superior temporal sulcus. Neuron, 41: 809–823. Bradley, M.M. and Lang, P.J. (1994) Measuring emotion: The self-assessment manikin and the semantic differential. J. Behav. Ther. Exp. Psychiatry, 225: 49–59. Breiter, H.C., Etcoff, N.L., Whalen, P.J., Kennedy, W.A., Rauch, S.L., Buckner, R.L., Strauss, M.M., Hyman, S.E. and Rosen, B.R. (1996) Response and habituation of the human amygdala during visual processing of facial expression. Neuron, 2: 875–887. Bruce, C.J., Desimone, R. and Gross, C.G. (1981) Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. J. Neurophysiol., 46: 369–384. Calvert, G.A. (2001) Crossmodal processing in the human brain: Insights from functional neuroimaging studies. Cereb. Cortex, 11: 1110–1123. Calvert, G.A., Brammer, M.J., Bullmore, E.T., Campbell, R., Iversen, S.D. and David, S.A. (1999) Response amplification
in sensory-specific cortices during crossmodal binding. NeuroReport, 10: 2619–2623. Calvert, G.A., Campbell, R. and Brammer, M. (2000) Evidence from functional magnetic resonance imaging of crossmodal binding in human heteromodal cortex. Curr. Biol., 10: 649–657. Calvert, G.A., Hansen, P.C., Iversen, S.D. and Brammer, M.J. (2001) Detection of audio-visual integration in humans by application of electrophysiological criteria to the BOLD effect. NeuroImage, 14: 427–438. Calvert, G.A. and Thesen, T. (2004) Multisensory integration: methodological approaches and emerging principles in the human brain. J. Physiol. Paris, 98: 191–205. Chavis, D.A. and Pandya, D.N. (1976) Further observations on corticofrontal connections in the rhesus monkey. Brain Res., 117: 369–386. Collins, D.L., Neelin, P., Peters, T.M. and Evans, A.C. (1994) Automatic 3D intersubject registration of MR volumetric data in standardized Talairach space. J. Comput. Assist. Tomogr., 18: 192–205. Damasio, A.R. (1989) Time-locke multiregional retroactivation: a systems-level proposal for the neural substrates of recall and recognition. Cognition, 33: 25–62. de Gelder, B., Bo¨cker, K.B.E., Tuomainen, J., Hensen, M. and Vroomen, J. (1999) The combined perception of emotion from voice and face: early interaction revealed by human electric brain responses. Neurosci. Lett., 260: 133–136. de Gelder, B. and Vroomen, J. (2000) The perception of emotion by ear and eye. Cognit. Emotion, 14: 289–312. Dolan, R.J., Morris, J. and de Gelder, B. (2001) Crossmodal binding of fear in voice and face. Proc. Natl. Acad. Sci., 98: 10006–10010. Driver, J. and Spence, C. (2000) Multisensory perception: beyond modularity and convergence. Curr. Biol., 10: 731–735. Ekman, P. and Friesen, W.V. (1976) Pictures of Facial Affect. Consulting Psychologists Press, Palo Alto. Ethofer, T., Anders, S., Erb, M., Droll, C., Royen, L., Saur, R., Reiterer, S., Grodd, W. and Wildgruber, D. (2006). Impact of voice on emotional judgment of faces: an event-related fMRI study. Hum. Brain Mapp., In press. DOI: 10.1002/hbm.20212. Fallon, J.H., Benevento, L.A. and Loe, P.R. (1978) Frequencydependent inhibition to tones in neurons of cat insular cortex (AIV). Brain Res., 779: 314–319. Fox, M.D., Snyder, A.Z., Vincent, J.L., Corbetta, M., Van Essen, D.C. and Raichle, M.E. (2005) The human brain is intrinsically organized into dynamic anticorrelated functional networks. Proc. Natl. Acad. Sci., 102: 9673–9678. Fries, W. (1984) Cortical projections to the superior colliculus in the macaque monkey: a retrograde study using horseradish peroxidase. J. Comp. Neurol., 230: 55–76. Friston, K.J., Buechel, C., Fink, G.R., Morris, J., Rolls, E. and Dolan, R.J. (1997) Psychophysiological and modulatory interactions in neuroimaging. NeuroImage, 6: 218–229. Friston, K.J., Harrison, L. and Penny, W. (2003) Dynamic causal modelling. NeuroImage, 19: 1273–1302. Friston, K.J., Holmes, A.P., Price, C.J., Buechel, C. and Worsley, K.J. (1999) Multisubject fMRI studies and conjunction analyses. NeuroImage, 10: 385–396.
360 Friston, K.J., Josephs, O., Rees, G. and Turner, R. (1998) Nonlinear event-related responses in fMRI. Magn. Res. Med., 39: 41–52. Friston, K.J., Mechelli, A., Turner, R. and Price, C.J. (2000) Nonlinear responses in fMRI: the balloon model, volterra kernels, and other hemodynamics. NeuroImage, 12: 466–477. Friston, K.J., Penny, W. and Glaser, D.E. (2005) Conjunction revisited. NeuroImage, 25: 661–667. Gitelman, D.R., Penny, W.D., Ashburner, J. and Friston, K.J. (2003) Modelling regional and psychophysiologic interactions in fMRI: the importance of hemodynamic deconvolution. NeuroImage, 19: 200–207. Gordon, B.G. (1973) Receptive fields in the deep layers of the cat superior colliculus. J. Neurophysiol., 36: 157–178. Goldman-Racic, P. (1995) Architecture of the prefrontal cortex and the central executive. In: Grafman, J., Holyoak, K. and Bollder, F. (Eds.), Structure and Functions of the Human Prefrontal Cortex. The New York Academy of Sciences, NY, USA, pp. 71–83. Grefkes, C., Weiss, P.H., Zilles, K. and Fink, G.R. (2002) Crossmodal processing of object features in human anterior intraparietal cortex: an fMRI study implies equivalencies between humans and monkeys. Neuron, 35: 173–184. Hikosaka, K., Iwai, E., Saito, H. and Tanaka, K. (1988) Polysensory properties of neurons in the anterior bank of the caudal superior temporal sulcus of the macaque monkey. J. Neurophysiol., 60: 1615–1637. Jones, J.A. and Callan, D.E. (2003) Brain activity during audiovisual speech perception: an fMRI study of the McGurk effect. NeuroReport, 14: 1129–1133. Jones, E.G. and Powell, T.P.S. (1970) An anatomical study of converging sensory pathways within the cerebral cortex of the monkey. Brain, 93: 793–820. Kanwisher, N., Mc Dermott, J. and Chun, M.M. (1997) The fusiform face area: a module in human extrastriate cortex specialized for face processing. J. Neurosci., 17: 4302–4311. Lewis, J.W., Beauchamp, M.S. and DeYoe, E.A. (2000) A comparison of visual and auditory motion processing in human cerebral cortex. Cereb. Cortex, 10: 873–888. Loe, P.R. and Benevento, L.A. (1969) Auditory–visual interaction in single units in the orbito-insular cortex of the cat. Electroencephalogr. Clin. Neurophysiol., 26: 395–398. Macaluso, E., Frith, C.D. and Driver, J. (2000) Selective spatial attention in vision and touch: unimodal and multimodal mechanisms revealed by PET. J. Neurophysiol., 83: 3062–3075. Massaro, D.W. and Egan, J.W. (1996) Perceptual recognition of facial affect: cross-cultural comparisons. Mem. Cogn., 24: 812–822. McDonald, A.J. (1998) Cortical pathways to the mammalian amygdala. Prog. Neurobiol., 55: 257–332. McGurk, H. and MacDonald, J. (1976) Hearing lips and seeing voices. Nature, 264: 746–748. Mechelli, A., Price, C.J. and Friston, K.J. (2001) Nonlinear coupling between evoked rCBF and BOLD signals: a simulation study on hemodynamic responses. NeuroImage, 14: 862–872.
Meredith, M.A. and Stein, B.E. (1983) Interactions among converging sensory inputs in the superior colliculus. Science, 221: 389–391. Mesulam, M.M. (1998) From sensation to cognition. Brain, 121: 1013–1052. Mesulam, M.M. and Mufson, E.J. (1982) Insula of the old world monkey. III: efferent cortical output and comments on function. J. Comp. Neurol., 212: 38–52. Miller, J.O. (1982) Divided attention: evidence for coactivation with redundant signals. Cogn. Psychol., 14: 247–279. Miller, J.O. (1986) Time course of coactivation in bimodal divided attention. Percept. Psychophys., 40: 331–343. Morris, J.S., Friston, K.J., Buechel, C., Frith, C.D., Young, A.W., Calder, A.J. and Dolan, R.J. (1998) A neuromodulatory role for the human amygdala in processing emotional facial expressions. Brain, 121: 47–57. Morris, J.S., Frith, C.D., Perrett, D.I., Rowland, D., Young, A.W., Calder, A.J. and Dolan, R.J. (1996) A differential response in the human amygdala to fearful and happy facial expressions. Nature, 383: 812–815. Mufson, E.J. and Mesulam, M.M. (1984) Thalamic connections of the insula in the rhesus monkey and comments on the paralimbic connectivity of the medial pulvinar nucleus. J. Comp. Neurol., 227: 109–120. Murray, E.A. and Mishkin, M. (1985) Amygdalactomy impairs crossmodal association in monkeys. Science, 228: 604–606. Nichols, T., Brett, M., Andersson, J., Wager, T. and Poline, J.B. (2005) Valid conjunction inference with the minimum statistic. NeuroImage, 25: 653–660. Olson, I.R., Gatenby, J.C. and Gore, J.C. (2002) A comparison of bound and unbound audio–visual information processing in the human cerebral cortex. Brain Res. Cogn. Brain Res., 14: 129–138. Peck, C.K. (1987) Auditory interactions in cat’s superior colliculus: their role in the control of gaze. Brain Res., 420: 162–166. Pearson, R.C., Brodal, P., Gatter, K.C. and Powell, T.P. (1982) The organisation of the connections between the cortex and the claustrum in the monkey. Brain Res., 234: 435–441. Phillips, M.L., Young, A.W., Scott, S.K., Calder, A.J., Andrew, C., Giampietro, V., Williams, S.C., Bullmore, E.T., Brammer, M. and Gray, J.A. (1997) Neural responses to facial and vocal expressions of fear and disgust. Proc. R. Soc. Lond. Ser. B, 265: 1809–1817. Pitka¨nen, A. (2000). Connectivity of the rat amygdaloid complex. In: The Amygdala: A Functional Analysis. Oxford University Press, New York. Pourtois, G., Debatisse, D., Despland, P.A. and de Gelder, B. (2002) Facial expressions modulate the time course of long latency auditory potentials. Cogn. Brain Res., 14: 99–105. Pourtois, G., de Gelder, B., Bol, A. and Crommelinck, M. (2005) Perception of facial expression and voices and their combination in the human brain. Cortex, 41: 49–59. Pourtois, G., de Gelder, B., Vroomen, J., Rossion, B. and Crommelinck, M. (2000) The time-course of intermodal
361 binding between seeing and hearing affective information. NeuroReport, 11: 1329–1333. Price, C.J. and Friston, K.J. (1997) Cognitive conjunctions: a new approach to brain activation experiments. NeuroImage, 5: 261–270. Raichle, M.E., MacLeod, A.M., Snyder, A.Z., Powers, W.J., Gusnard, D.A. and Shulman, G.L. (2003) A default mode of brain function. Proc. Natl. Acad. Sci., 98: 676–682. Schro¨ger, E. and Widmann, A. (1998) Speeded responses to audiovisual signal changes result from bimodal integration. Psychophysiology, 35: 755–759. Scott, S.K., Young, A.W., Calder, A.J., Hellawell, D.J., Aggleton, J.P. and Johnson, M. (1997) Impaired auditory recognition of fear and anger following bilateral amygdala lesions. Nature, 385: 254–257. Seltzer, B. and Pandya, D.N. (1978) Afferent cortical connections and architectonics of the superior temporal sulcus and surrounding cortex in the rhesus monkey. Brain Res., 149: 1–24. Stein, B.E., London, N., Wilkinson, L.K. and Price, D.D. (1996) Enhancement of perceived visual intensity by auditory stimuli: a psychophysical analysis. J. Cogn. Neurosci., 8: 497–506. Stein, B.E. and Meredith, M.A. (1993) Merging of Senses. MIT Press, Cambridge. Stein, B.E. and Wallace, M.T. (1996) Comparison of crossmodal integration in midbrain and cortex. Prog. Brain Res., 112: 289–299. Turner, B.H., Mishkin, M. and Knapp, M. (1980) Organization of the amygdalopetal projections from modality-specific cortical association areas. J. Comp. Neurol., 191: 515–543.
van Atteveldt, N., Formisano, E., Goebel, R. and Blomert, L. (2004) Integration of letters and speech sounds in the human brain. Neuron, 43: 271–282. von Kriegstein, K., Kleinschmidt, A., Sterzer, P. and Giraud, A.-L. (2005) Interaction of face and voice areas during speaker recognition. J. Cogn. Neurosci., 17: 367–376. Vroomen, J., Driver, J. and de Gelder, B. (2001) Is cross-modal integration of emotional expressions independent of attentional resources? Cogn. Affect. Behav. Neurosci., 1: 382–387. Wallace, M.T., Meredith, M.A. and Stein, B.E. (1992) Integration of multiple sensory modalities in cat cortex. Exp. Brain. Res., 91: 484–488. Wallace, M.T., Meredith, M.A. and Stein, B.E. (1993) Converging influences from visual, auditory, and somatosensory cortices onto output neurons of the superior colliculus. J. Neurophysiol., 69: 1797–1809. Wallace, M.T., Wilkinson, L.K. and Stein, B.E. (1996) Representation and integration of multiple sensory inputs in primate superior colliculus. J. Neurophysiol., 76: 1246–1266. White, M. (1999) Representation of facial expressions of emotion. Am. J. Psychol., 112: 371–381. Worsley, K.J., Marrett, S., Neelin, P., Vandal, A.C., Friston, K.J. and Evans, A.C. (1996) A unified statistical approach for determining significant signals in images of cerebral activation. Hum. Brain Mapp., 4: 58–73. Wright, T.M., Pelphrey, K.A., Allison, T., McKeown, M.J. and McCarthy, G. (2003) Polysensory interactions along lateral temporal regions evoked by audiovisual speech. Cereb. Cortex,, 13: 34–43.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 20
Role of the amygdala in processing visual social stimuli Ralph Adolphs and Michael Spezio Division of the Humanities and Social Sciences, HSS 228-77, California Institute of Technology, Pasadena, CA 91125, USA
Abstract: We review the evidence implicating the amygdala as a critical component of a neural network of social cognition, drawing especially on research involving the processing of faces and other visual social stimuli. We argue that, although it is clear that social behavioral representations are not stored in the amygdala, the most parsimonious interpretation of the data is that the amygdala plays a role in guiding social behaviors on the basis of socioenvironmental context. Thus, it appears to be required for normal social cognition. We propose that the amygdala plays this role by attentionally modulating several areas of visual and somatosensory cortex that have been implicated in social cognition, and in helping to direct overt visuospatial attention in face gaze. We also hypothesize that the amygdala exerts attentional modulation of simulation in somatosensory cortices such as supramarginal gyrus and insula. Finally, we argue that the term emotion be broadened to include increased attention to bodily responses and their representation in cortex. Keywords: amygdala; face processing; simulation; lesion studies; social cognition; emotion severely limit a primate’s range of social responses, perhaps going so far as to eliminate some part or all of the social repertoire altogether. More recent findings challenge the view that the amygdala is required for basic social behaviors. Yet the question remains open whether the amygdala is a required component for normal social cognition. For example, is the amygdala necessary for the normal information processing associated with an organism’s evaluation of a visual social stimulus, such as a facial expression (on which subsequent behaviors could then be based)? We will see that an answer to this question depends on a new consideration of evidence for the amygdala’s role. The view we will present takes into consideration evidence regarding the amygdala’s role in modulating autonomic arousal, new evidence regarding the amygdala’s potential to affect visuospatial and visual objectbased attention, and recent accounts that explain
Introduction The amygdala has long been implicated in primate social cognition and behavior, due primarily to the well-known work by Kluver and Bucy (1939) and the studies by Kling and colleagues (Dicks et al., 1968; Kling, 1968, 1974; Kling et al., 1970, 1979; Brothers et al., 1990). An influential view of the amygdala emerging from early studies of its function was that it acts as a generative locus of social cognition and behavior, required to link the perception of any stimuli to information about their value to an organism (Weiskrantz, 1956). One interpretation of this view is that the amygdala is a primary source of social behavior, and the lack of a functioning amygdala would be expected to Corresponding author. Tel.: +1-626-395-4486; Fax: +1-626-793-8580; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56020-0
363
364
social cognition in terms of simulation theory. These newer developments have posed something of a puzzle for older theories of the amygdala. We will review some older findings first, and the framework that was based on them. Then we will introduce the new findings and framework, and end by proposing a framework describing the amygdala’s function in recognizing the social value of stimuli.
Shifting views of the amygdala in social cognition Studies of the primate amygdala began in the 1930s with Kluver and Bucy’s well-known experiments in monkeys (Kluver and Bucy, 1937, 1997). Following large bitemporal lesions that encompassed the amygdala, the animals came to exhibit a constellation of impairments in recognizing the emotional and social meaning of stimuli—the so-called ‘‘psychic blindness’’ of Kluver–Bucy syndrome. Notably, the monkeys became exceptionally tame and placid, a behavioral abnormality that has been replicated to some degree in later studies (Emery et al., 2001; Kalin et al., 2001, 2004; Izquierdo et al., 2005). The animals also exhibited a variety of other unusual behaviors, including hypermetamorphosis and hypersexuality that have not been so reliably replicated. Modern-day studies using selective neurotoxins to lesion the amygdala, sparing surrounding tissues, not surprisingly provide a much more muted and selective picture. Such selective lesions, like the earlier lesions, do result in the lack of a normal ‘‘brake’’ on behavior, and the animals tend to approach objects and situations that normal monkeys would avoid—they are also seldom regarded as dominant by other monkeys (Meunier et al., 1999; Emery et al., 2001). Yet the selective amygdala lesions do not produce monkeys that exhibit the array of unusual behavior as Kluver and Bucy described. The recent lesion studies in monkeys have also begun to highlight how complex the role of the amygdala in regulating social behavior is likely to be. The consequences of amygdala lesions are quite different depending on the age at which they are made, and infant monkeys with amygdala lesions actually show exaggerated social fear responses
rather than the placidity that tends to be seen in adults (Prather et al., 2001; Bauman et al., 2004b). Furthermore, there are notable differences between (the relatively small number of) different studies of how amygdala lesions in monkeys affect basic social behaviors such as canonical facial expressions, bodily postures (e.g., ‘‘present groom’’), and attachment behaviors. These differences between studies likely reflect effects of additional factors such as lesion methodology and extent, lab- vs. wild rearing, or the exact species used (Bachevalier and Loveland, 2006). For example, there is evidence suggesting that amygdala lesions profoundly impair even basic social behaviors in monkeys (Bachevalier et al., 1999; Meunier et al., 1999; Meunier and Bachevalier, 2002). Amaral and colleagues, however, found no impairment in basic social behaviors following selective neurotoxic amygdala lesions, though they did find impairments in the appropriate deployment of these social behaviors (Bauman et al., 2004a, b). Given this heterogeneity, it has been argued that the amygdala is not required for monkeys to show the full repertoire of social behaviors, because under some circumstances animals with complete bilateral amygdala lesions, nevertheless, can show all the components of emotional and social behaviors that normal monkeys would show (Amaral et al., 2003a), even if they are deployed abnormally (Bauman et al., 2004a). The most parsimonious interpretation of the data thus far is rather that the amygdala plays a role in guiding social behaviors on the basis of the socioenvironmental context with which an animal is faced. It is important to keep in mind that the socioenvironmental context is likely to include not only what is available to the monkey in the immediate circumstance, but also information that has been neurally encoded throughout development as the monkey has learned and adapted to the surrounding world. Here we see a shift in viewpoint from the amygdala as a structure that itself stores and activates patterns of basic social behaviors to one in which the amygdala plays an influential role in the deployment of these behaviors. This view predicts that primates lacking a functional amygdala retain the ability to display the full range of basic social behaviors while being impaired in the appropriate context-dependent
365
deployment of these behaviors and of more complex social behaviors. The idea is similar to the difference between a novice chess player who knows how each piece moves and even several useful openings and a grand master who can rapidly choose the appropriate move among a myriad of options. This shift in understanding of the amygdala’s role in social cognition to some extent parallels debates regarding the amygdala’s role in certain forms of memory. As with social behavior, declarative memory is not stored in the amygdala as such, but is influenced by the amygdala’s processing and projection into other structures, such as the hippocampus in the case of memory (McGaugh, 2004). As we will see below, this new framework of the amygdala’s role in social cognition is supported by a number of studies showing that the amygdala influences the evaluation of stimuli in contributing to the perception, recognition, and judgment of socially relevant stimuli.
Impaired social cognition in humans following amygdala damage While the amygdala has been implicated in monkey social behavior for some time, it is only very recently that such a role has been established in humans, and that detailed hypotheses have been investigated regarding the underlying mechanisms. Here, we review evidence that the amygdala has a role in the recognition of emotion from faces, in interpreting eye gaze, and in more complex social judgments in humans. Two early studies showed that bilateral damage confined mainly to the amygdala resulted in a disproportionately severe impairment in the ability to recognize fear from facial expressions (Adolphs et al., 1994; Young et al., 1995). One patient in particular, SM, had damage that was relatively restricted to the amygdala (Fig. 1A–C), and an impairment that was very specific to the recognition of fear (Adolphs et al., 2000). SM’s lesion encompassed the entire amygdala bilaterally and extended also into anterior portions of the entorhinal cortex; there was no damage evident to any other structures. When shown standardized emotional facial expressions that depicted the six ‘‘basic’’ emotions
(happiness, surprise, fear, anger, disgust, and sadness), SM was insensitive to the intensity of the emotion shown in fear, but not in other expressions (Fig. 1D) (Adolphs et al., 1994). The specificity to fear was confirmed using morphs (linear blends) between the different emotions: the closer to the fear prototype the emotional expressions were, the more impaired SM’s recognition became. The impairment was all the more striking because she was able to recognize other kinds of information from fearful faces normally (such as judging their gender, age, or identity), and because she was able to discriminate the faces normally when presented pairwise (on same/different judgments). When shown an expression of fear and asked in an unconstrained task simply to name the emotion, she typically replied that she did not know what the emotion was. If forced, she would often mistake fear for surprise, anger, or sadness (but never happiness). SM’s impaired recognition of fear was followed up in a series of studies that showed that SM does have at least some components of the concept of fear, because she can use the word relatively appropriately in conversation, she believes she knows what fear is, and she can in fact retrieve many facts about fear (such that being confronted with a bear would make one afraid, etc.) (Adolphs et al., 1995). While the amygdala’s role in recognizing fear in other sensory modalities remains unclear, in SM’s case she was even able to recognize fear from tone of voice (prosody) normally (Adolphs and Tranel, 1999). But she could not recognize it from the face, nor could she generate an image of the facial expression when given the name (e.g., when asked to draw it) (Adolphs et al., 1995). The impaired recognition of fear thus seemed to be relatively specific to the facial expression—the use of other visual information, such as context and body posture, was less compromised. In fact, adding facial expressions to scenes initially devoid of faces decreased the accuracy of emotion recognition in subjects with bilateral amygdala damage, whereas it increased it in healthy subjects (Adolphs and Tranel, 2003). Several studies have followed up these initial findings. Other lesion studies have found impaired recognition of fear from facial expressions following bilateral amygdala damage (Calder et al., 1996;
366
Fig. 1. Bilateral amygdala lesions impair recognition of emotion from faces. There are two structural MRI slices (A) showing intact hippocampus (B) and bilateral amygdala lesion (C) for SM. SM’s ratings of the degree to which a face expressed a particular emotion are shown in (D) (SM: closed triangles; controls: open circles) and the correlation between SM’s ratings and normal ratings is shown in (E) for each facial expression. From Adolphs et al. (1994).
367
Broks et al., 1998; Adolphs et al., 1999b), and functional imaging studies have found activation of the amygdala when subjects view fearful faces (Breiter et al., 1996; Morris et al., 1996; Whalen et al., 2001). However, the findings are not as specific as in the case of SM. Several lesion subjects with complete bilateral amygdala damage were impaired also on emotions other than fear (always negatively valenced emotions) (Adolphs et al., 1999b), and in several cases their impairment in recognizing anger, disgust, or sadness was more severe than their impairment in recognizing fear. Similarly, functional imaging studies found activation of the amygdala to expressions other than fear, such as happiness (Canli et al., 2002; Williams et al., 2005), surprise (Kim et al., 2004), sadness (Wang et al., 2005), and anger (Whalen et al., 2001). A further function of the amygdala in processing aspects of faces comes from studies of its role in processing the eyes in a face. The eyes and their direction of gaze are key social signals in many species (Emery, 2000), especially apes and humans, whose white sclera makes the pupil more easily visible and permits better discrimination of gaze. Eyes signal important information about emotional states, and there is evidence from functional imaging studies that at least some of this processing recruits the amygdala (Baron-Cohen et al., 1999; Kawashima et al., 1999; Wicker et al., 2003b). The amygdala’s involvement in processing gaze direction in emotional faces has been explored recently. It was found that direct gaze facilitated amygdala activation in response to approach-oriented emotions such as anger, whereas averted gaze facilitated amygdala activation to avoidance-oriented emotions such as fear (Adams and Kleck, 2003). Further, the amygdala has been found to be active during monitoring for direct gaze (Hooker et al., 2003). The amygdala’s role is not limited to making judgments about basic emotions, but includes a role in making social judgments, as well. This function was already suggested by earlier studies in nonhuman primates (Kluver and Bucy, 1937; Rosvold et al., 1954; Brothers et al., 1990; Kling and Brothers, 1992), which demonstrated impaired social behavior following amygdala damage and amygdala responses to complex social stimuli. They have been corroborated in recent times by studies in monkeys
with more selective amygdala lesions, and by using more sophisticated ways of assessing social behavior (Emery and Amaral, 1999; Emery et al., 2001), and consistent findings have been shown now also in humans. We have found that the amygdala is important for judging complex mental states and social emotions from faces (Adolphs et al., 2002), and for judging the trustworthiness of people from viewing their face (Adolphs et al., 1998; Winston et al., 2002). Relatedly, the amygdala shows differential habituation of activation to faces of people of another race (Hart et al., 2000), and amygdala activation has been found to correlate with race stereotypes of which the viewer may be unaware (Phelps et al., 2000). On the basis of these findings, some recent studies suggest a general role for the amygdala in so-called ‘‘theory of mind’’ abilities: the collection of abilities whereby we attribute internal mental states, intentions, desires, and emotions to other people (Baron-Cohen et al., 2000; Fine et al., 2001). Various theories have been put forth to account for some of these findings, some proposing that the amygdala is specialized for recognition of emotions that are high in arousal (Adolphs et al., 1999a), or that relate to withdrawal (Anderson et al., 2000), or that require disambiguation (Whalen, 1999). It is fair to say that, at present, there is no single accepted scheme to explain which emotion categories are affected by amygdala damage. These differences notwithstanding, we can identify a general framework for understanding the mechanisms by which the amygdala normally contributes to emotion judgment and social cognition. The framework is built upon (1) recent work showing (a) the amygdala’s ability to influence visual processing at early stages, and (b) the amygdala’s role in influencing overt attention to the eyes in a face; (2) the amygdala’s role in autonomic arousal; and (3) work implicating the pulvinar and Brodmann area 40 (SII) in the processing of affectively aversive visual stimuli. Each of these elements is supported by evidence from neuroanatomical studies of the internal and external connectivity of the amygdala. Our current neuroanatomical understanding of the amygdala, which consists of a number of separate nuclei in primates (Price, 2003), supports a scheme whereby faces are associated
368
with their emotional meaning in the lateral and basolateral nuclei, in interaction with additional brain structures such as orbitofrontal and medial prefrontal cortices (Ghashghaei and Barbas, 2002). This evaluation is conveyed to central and basomedial amygdala nuclei whose projections then influence processing in visual cortex, processing that elicits autonomic and motor responses in the body (Price, 2003), and/or processing that involves somatosensory areas putatively involved in simulation-based transformations of the visual percept to an internal bodily representation (Gallese et al., 2004; Rizzolatti and Craighero, 2004; see also Keysers and Gazzola, this volume). We will consider each of these aspects of the amygdala’s function in social cognition.
The amygdala influences early visual processing of faces and affective stimuli There is abundant data regarding the cortical processing of faces, and such cortical processing presumably can serve to provide highly processed input to the amygdala. To briefly review this, functional magnetic resonance imaging (fMRI) studies have revealed an array of higher order visual cortical regions that are engaged in face processing, including the fusiform face area (FFA) in the fusiform gyrus, the face-sensitive area in the superior temporal sulcus (STS), and superior and middle temporal gyrus (Kanwisher et al., 1997; McCarthy, 1999; Haxby et al., 2000). The STS in particular has been implicated in the detection of gaze direction in humans and nonhuman primates (Campbell et al., 1990; Puce et al., 1998; Wicker et al., 1998, 2003b; Calder et al., 2002; Hooker et al., 2003; Pourtois et al., 2004). A distributed array of visually responsive regions in the temporal lobe appears to encode classes of biologically salient objects, notably faces and bodies, in humans (Downing et al., 2001; Haxby et al., 2001; Spiridon and Kanwisher, 2002) as in monkeys (Pinsk et al., 2005). Regions in the superior temporal lobe appear specialized to process biological motion stimuli, such as point-light displays of people (Haxby et al., 2000; Grossman and Blake, 2002). It has been generally supposed that
higher order cortices in temporal lobe first encode the visual properties of socially relevant stimuli, and that this information is then subsequently passed to neurons within the ventromedial prefrontal cortex and the amygdala that associate the visual percept with its emotional meaning. This standard view of a strong feedforward input to the amygdala, one in which visual cortices in the temporal lobe comprise a series of visual processing stages the later components of which feed into the amygdala, is now being modified. Accumulating evidence strongly supports the notion that the amygdala can directly influence visual processing, even at very early stages. Recent anatomical studies show that the amygdala projects topographically to the ventral visual stream, from rostral temporal cortical area TE to caudal primary visual cortex (V1) (Amaral et al., 2003b). A majority of projections from the basal nucleus to V1 and TE colocalize with synaptophysin, suggesting that the amygdala can exert direct influence on synaptic associations at multiple stages of primary and object-based visual processing (Freese and Amaral, 2005). Such direct influence on cortical visual processing may be a later evolutionary adaptation, as these anatomical projections have not been reported in rats and cats (Price, 2003). Through this architecture in primates, the amygdala can link the perception of stimuli to an emotional response, and then subsequently modulate cognition on the basis of the value of the perceived stimulus (Amaral and Insausti, 1992; Adolphs, 2002). Thus, perception and evaluation of faces are closely intertwined. Functional neuroimaging in humans indicates that these structural pathways from the amygdala to visual areas are put to use in social cognition, specifically in the modulation of attention. Activation in the amygdala has been shown to predict extrastriate cortex activation specific to fearful facial expressions (Morris et al., 1998a). Lesions of the amygdala eliminate facial expression-specific activations in occipital and fusiform cortices (Vuilleumier et al., 2004). Such findings are consistent with the dependence of visual processing on prior amygdala processing of visual information, in a manner specific to the information’s associated value for the organism. Even more striking is evidence from single unit studies of face-selective neurons in TE and
369
STS of macaque monkeys (Sugase et al., 1999). These neurons discriminate between faces and objects about 50 ms faster than they discriminate between facial expressions, which is enough time for the action of projections from the amygdala. Expression-dependent activity in these neurons occurs within 150 ms following stimulus onset, consistent with the notion that input from the amygdala occurs early in visual processing. Clearly, rapid input of visual information to the amygdala is required for the amygdala to exert an expression-dependent influence on the ventral visual system prior to the observed expression-dependent activity in the latter system. The medial nucleus of the pulvinar complex provides such a pathway, as it forms a strong projection to the lateral and basolateral nuclei of the amygdala in macaque monkeys (Jones and Burton, 1976; Aggleton et al., 1980; Romanski et al., 1997). There is now evidence that these connections exist and are functionally active in humans. In healthy controls, masked facial stimuli activate the amygdala in the absence of awareness (Ohman, 2005), together with activation of the superior colliculus and the pulvinar (Liddell et al., 2005). Functional connectivity of the right amygdala with the right pulvinar and superior colliculus increases, and connectivity with fusiform and orbitofrontal cortices decreases, during subliminal presentation of fear-conditioned faces (Morris et al., 1998b, 1999). The left amygdala shows no masking-dependent changes in connectivity. A patient with blindsight (i.e., residual visual capacity without perceptual awareness) in the right cortical field nevertheless showed preserved ability to guess correctly the identity of facial expressions presented to his blind hemifield (de Gelder et al., 1999). Both fearful and fear-conditioned faces presented to the blind hemifield increased the functional connectivity between the right amygdala, superior colliculus, and posterior thalamus (i.e., the pulvinar) (Morris et al., 2001). A recent study of a patient with total cortical blindness (i.e., destruction of bilateral visual cortices) found that the patient could correctly guess the facial expression of a displayed face, but could not guess the identity of other stimuli, i.e., emotional or not (Pegna et al., 2005). The right but not the left amygdala in this patient showed expression-dependent activation,
consistent with evidence from neuroimaging of subliminal processing of faces in healthy controls (Morris et al., 1998b). Further evidence supporting the involvement of a pulvinar–amygdala–inferotemporal pathway in the rapid visual processing of emotional stimuli comes from a patient who sustained a complete and focal loss of the left pulvinar (Ward et al., 2005). In a paradigm designed to measure how threatening images interfere with a goal-directed task, the patient’s behavior indicated that the threatening images interfered with subsequent color identification of a single letter (‘‘O’’) when the images were presented to the ipsilesional field, but no interference was observed when the threatening images were presented to the contralesional field. Interference by images in the contralesional field returned if they were displayed for a relatively long time (600 ms vs. 300 ms). In light of the evidence presented here, it appears that the pulvinar–amygdala pathway is required for the extremely rapid processing of threat, and is capable of using the results of this processing to influence visual perception in primary and higher visual cortices.
The amygdala influences face gaze In addition to influencing visual processing even at very early stages, recent evidence suggests that the amygdala affects face gaze in a surprisingly direct manner (Adolphs et al., 2005). This is consistent with the amygdala’s influence on visual processing and with previous work showing that the amygdala affects visual and visuospatial attention. Lesions of the amygdala, particularly of the left amygdala, seriously impair the attentional benefit in the perception of aversive words during an attentional blink paradigm involving rapid stimulus presentation and target detection (Anderson and Phelps, 2001). Emotional facial expressions and aversive nonfacial stimuli overcome attentional deficits in patients showing neglect due to right parietal lesions (Vuilleumier and Schwartz, 2001a, b). It is likely that the latter finding is the result of exogenous attentional modulation by the amygdala in visual cortex and perhaps in visually responsive prefrontal cortex. Recall that the amygdala is required for facial expression-specific activation of early visual cortex (Vuilleumier et al.,
370
2004), evidence that fits well within an understanding of the amygdala as part of an attentional network responsive to visual stimuli having value for an organism. We have seen that the amygdala influences information processing in visual cortices and that it is strongly implicated in attention to evaluatively salient stimuli. It is possible, then, that the amygdala plays a role, via its projections to visual cortex particularly, in directing overt attention during the exploration of a face in social judgment. Face gaze, that is, might be dependent on evaluative processing within the amygdala. More specifically, in light of the evidence that the amygdala is sensitive to gaze direction in a face, it is likely that an attentional role for the amygdala would include directing gaze to the eyes in a face. Indeed, a recent study of face gaze in a patient with bilateral amygdala damage supports this view (Adolphs et al., 2005). The study tested a patient (SM) with complete and focal bilateral amygdala lesions during emotion judgment, measuring both face gaze and the use of facial information. To understand SM’s use of facial information during a simple emotion judgment, the study used the Bubbles technique and compared the result with those obtained from typical, agematched controls. SM displayed a marked reduction impairment of the ability to use the eyes in a face, compared to controls (Fig. 2A–D). Subsequent investigation of SM’s face gaze using eyetracking revealed a near absence of gaze to the eyes during visual exploration of faces (Fig. 2E–G). When SM was instructed to look only at the eyes while making emotion judgments from faces (Fig. 2H), performance in recognizing fear returned to normal (Fig. 2I). Yet this remarkable recovery in emotion judgment was not sustained once SM went back to nondirected, free viewing of faces. These results provide the first evidence showing a requirement for the amygdala in direct eye gaze, extending our understanding of the amygdala’s influential role in visuospatial attention to faces during social judgment. In keeping with the new view of the amygdala described in the section ‘‘Shifting views of the amygdala in social cognition,’’ these findings support the notion that the amygdala is a crucial component of normal social cognition, while not
being required for basic social behaviors. SM clearly displayed direct eye gaze after being instructed to do so, and was even able to use the information that direct eye gaze provided to fully recover her recognition of fear faces. An absence of a functioning amygdala thus does not result in a loss of the ability to engage in the social behavior of direct eye gaze. However, the amygdala is required for the appropriate deployment of this social behavior via its processing of socioenvironmental context and its influence on visual attentional systems involved in social cognition. We will see this theme reappear in relation to the amygdala’s role in autonomic arousal to facial expressions and in visuosomatosensoric processing of facial expressions of emotion.
The amygdala mediates autonomic arousal elicited by faces The human amygdala was originally thought to have a key role in the generation of normal autonomic responses associated with orienting and arousal due to studies of amygdalectomized monkeys (Bagshaw and Benzies, 1968; Bagshaw and Coppock, 1968; Pribram et al., 1979). Monkeys with bilateral amygdalectomies fail to produce the expected changes in skin conductance response (SCR), heart rate, and respiratory rate in response to irregularly repeated sounds, while ear movements to the sounds are normal (Bagshaw and Benzies, 1968). Further, these animals show no Pavlovian conditioning of SCR when a conditioned stimulus is paired with electrical stimulation to the skin (Bagshaw and Coppock, 1968; Pribram et al., 1979), although they do show normal SCR in response to the unconditioned stimulus (Bagshaw and Coppock, 1968). However, in humans, amygdala lesions appear not to affect orienting SCRs (Tranel and Damasio, 1989), while severely impairing Pavlovian conditioning of SCRs (Bechara et al., 1995). It is therefore the linking of a conditioned stimulus with an unconditioned stimulus and its associated autonomic response that requires the amygdala, and not the generation of the autonomic response, associated with orienting or otherwise.
371
Fig. 2. Bilateral amygdala lesions impair the use of the eyes and gaze to the eyes during emotion judgment. Using the Bubbles method (see Adolphs et al., 2005) to identify face areas used during emotion judgment, SM (B) differed from controls (A), such that controls exhibited much greater use of the eyes (C) than SM, while SM did not rely more on any area of the face than did controls (D). While looking at whole faces, SM exhibited abnormal face gaze (E), making far fewer fixations to the eyes than did controls. This was observed across emotions (F) and across tasks (G; free viewing, emotion judgment, gender discrimination). When SM was instructed to look at the eyes (I, ‘‘SM eyes’’) in a whole face, she could do this (H), resulting in a remarkable recovery in ability to recognize the facial expression of fear (I).
Neuroimaging studies of classical conditioning are consistent with the work using the lesion method. In an analysis contrasting SCR+ trials with SCR- trials in an orienting paradigm, activations occurred in the hippocampus, anterior cingulate, and ventromedial prefrontal cortex, but not
in the amygdala, while they found increased activation in the amygdala only with conditioned SCR (Williams et al., 2000; Knight et al., 2005). A study of SCR in several different cognitive/behavioral tasks found that SCR covaries with activation in ventromedial prefrontal cortex (Brodmann 10/32),
372
supramarginal gyrus (Brodmann 40), cingulomotor cortex (Brodmann 6/24), posterior cingulate cortex (Brodmann 23/30), right cerebellum, and thalamus (Patterson et al., 2002). More recently, in a study of the neural systems underlying arousal and SCR elicited by static images, the brain area that most closely associated with SCR variability was the ventromedial prefrontal cortex (Anders et al., 2004). These findings are consistent with what was seen using the lesion method and support the notion that the amygdala is not required for SCR. Facial images are conditioned with an aversive unconditioned stimulus; however, fMRI reveals CS+ specific activations in the anterior cingulate, anterior insula, and bilateral amygdala (Buchel et al., 1998), suggesting a role for the amygdala in evaluative associations. Such influence of the amygdala in evaluative assessment is supported by evidence implicating the right amygdala in the generation of SCR in response to emotionally arousing visual stimuli (Glascher and Adolphs, 2003). Lesions of the right temporal lobe and bilateral temporal area, including lesions to the amygdala, impaired normal SCRs to nonfacial stimuli that were emotionally arousing. Further, when SCR is used to partition recorded amygdala activation in a fearful vs. neutral face contrast, no expression-dependent difference in amygdala activation is seen unless an associated SCR is observed (Williams et al., 2001). Again, this is consistent with an evaluative function of the amygdala, this time directly in relation to facial expressions of emotion. Here is another example of how a function that was once held to be dependent directly on the amygdala, namely SCR, is actually influenced by the action of the amygdala without actually requiring the amygdala. Rather, the amygdala evaluates the socioenvironmental context and influences the deployment of SCR in an appropriate manner, either for learning in a classical conditioning paradigm or for normal evaluation of visual stimuli such as faces. It is likely that this action is dependent on the central nucleus of the amygdala (Price and Amaral, 1981; Price, 2003), though it is also possible that projections from the basal nucleus of the amygdala to the ventromedial prefrontal cortex and cingulate cortex and from
the basomedial nucleus to the insula influence these nuclei that appear crucial to the generating of SCR (Amaral and Price, 1984).
The amygdala and simulation: somatosensory attention as a component of emotional response to faces in social judgment So far, we have seen that the amygdala acts to influence key components of object-based visual processing, visuospatial attention, and autonomic responses during the processing of facial expressions in social judgment. One important component of emotion judgment not yet addressed in this scheme is systems that have been implicated by simulation theoretic approaches to social cognition (Gallese et al., 2004; Rizzolatti and Craighero, 2004, see also Keysers and Gazzola, this volume). We will briefly outline a proposal for the amygdala’s action on this system, which we view as primarily being one of somatosensory attention. Attentional modulation of the somatosensory cortices by the amygdala, we propose, involves several aspects analogous to attentional modulation in other contexts. First, the amygdala’s action could increase the sensitivity of somatosensory cortices to the signals received from the body. Second, amygdala inputs could enhance selectivity of inputs to, activity within, and outputs from somatosensory cortices. Finally, past associations established within these cortices may be reactivated so as to facilitate neural traces having resulted from previous learning. In sum, we are extending the amygdala’s role in emotional response from its important and well-established role as facilitating bodily responses to emotional stimuli (Damasio, 1996) to a role in modulating the cortical processing of those responses via somatosensory attentional mechanisms. This move implies that emotion may be understood as being both increased autonomic responses (i.e., the ‘‘body’’ loop) and stored cortical representations of those responses (the ‘‘as-if’’ loop), as well as increased attention to those responses and their representation in cortex. On this view, emotion, at least in part, is attentional modulation of those neural systems dedicated to processing somatosensory signals, serving to
373
establish the value of a particular socioenvironmental context for an organism. Several lines of evidence now point to the involvement of somatosensory cortices in the judgment of emotion from faces. Bilateral lesions of the insula completely abolish the ability to judge the emotion of disgust from static and dynamic facial displays (Calder et al., 2000; Adolphs et al., 2003). Such lesions also appear to abolish the ability to be disgusted by nonfood items that are widely recognized as disgusting. Consistent with the lesion data, neuroimaging reveals activation of the insula when observing dynamic displays of facial disgust (Wicker et al., 2003a). Lesions of the right somatosensory areas, particularly including the supramarginal gyrus (SMG; Brodmann 40), seriously impair judging emotion from faces (Adolphs et al., 1996, 2000) and from bodily motion conveyed in point-light displays (Heberlein et al., 2004). Again, neuroimaging data are consistent with the idea that SII is important for judging emotion from faces (Winston et al., 2003). Looking at dynamic displays of smiling activates areas such as SII in the right hemisphere, including regions within the supramarginal gyrus and left anterior insula, and these areas are also activated when smiling (Hennenlotter et al., 2005). The ventral amygdala is found to be activated only during observation, however. The pivotal role in judging facial emotion suspected to be played by the supramarginal gyrus is intriguing in light of its evolutionary homology to area 7b in macaque monkeys (Rizzolatti and Craighero, 2004). Area 7b is a cortical region with facial haptic neurons whose haptic spatial receptive fields and preferred directions of haptic stimulation overlap considerably with their visuospatial receptive fields and preferred directions of movement in the visual field (Duhamel et al., 1998). Most importantly, neurons in 7b exhibit mirror neuron-like qualities in single unit recordings (Rizzolatti and Craighero, 2004). In the monkey, there are several neuroanatomical pathways that could permit the amygdala to act on somatosensory cortices such as SMG and insula in a way similar to that described at the beginning of this section. The basal and basomedial nuclei of the amygdala project lightly and directly to area 7 of the
monkey parietal cortex (Amaral and Price, 1984). Moreover, the medial division of the pulvinar projects strongly to area 7b along with other areas in the parietal cortex (Mesulam et al., 1977; Romanski et al., 1997), and it is known that the central nucleus of the amygdala projects back to the medial pulvinar, the same nucleus that conveys rapid visual input to the amygdala (Price and Amaral, 1981). It is plausible, then, that the amygdala acts on SMG via the pulvinar, as well as by its direct projections. The amygdala also projects heavily into the insula region (Amaral and Price, 1984), an area strongly implicated in simulation-based processing of facial emotion and in the representation of emotion (Adolphs et al., 2003; Wicker et al., 2003a). It is thus more likely in the case of the insula than in the case of the SMG that the amygdala acts directly via its many projections from basal and basomedial nuclei into this cortical region. The proposal here regarding the amygdala’s possible role in attentionally modulating somatosensory cortices is consistent with what is established by the evidence reviewed in the previous two sections. It is not likely, in other words, that the amygdala itself is a locus of simulation. Rather, it is much more plausible that it interprets the socioenvironmental context and then affects simulation networks such as may inhabit the somatosensory cortices detailed here. Two brief points might be made before moving on to a summary of the current view of the amygdala’s function in judging emotions from faces and other visual social stimuli. One is that evidence from neuroimaging experiments and single unit studies is required in order to test the framework detailed here. Use of dynamic causal modeling or Granger causality, for example, in the analysis of fMRI data would help discern whether amygdala activation precedes and predicts activation in SMG and insula. The second point is a more general one regarding the relation of amygdala activation to emotional experience and pathology. It is simply this: the amygdala likely is not itself a generator of such experience, either in healthy persons or in emotional disturbance, a view consistent with data from some amygdala lesion patients (Anderson and Phelps, 2002). Instead, the amygdala helps to control attention
374
inward, i.e., toward the body and encoded emotional associations. Malfunction in these inward attention networks could very likely yield the kind of negatively valent ideation and sensations often accompanying mental illness. The new model for how the amygdala contributes to the recognition of emotion from visual social stimuli We are now able to articulate a coherent view of the amygdala’s action in judging emotion from a face. The story proceeds like this (Fig. 3): visual input to the amygdala, which can occur very rapidly via the pulvinar, results in initial modulation of subsequent visual inputs from visual cortex. Attentional modulation of somatosensory (i.e., putative simulation) cortex occurs so as to increase sensitivity to and selectivity for bodily responses and encoded emotional associations. Modulation of temporal visual cortex by the amygdala may, via coarse visuospatial coding in these neurons, influence the dorsal ‘‘where’’ stream so as to direct visuospatial attention to emotionally salient features (e.g., the eyes in a face). Richer visual input
from object-selective visual cortex soon follows; and this, together with input from other areas, leads to the generation of autonomic responses via action by the central nucleus. Each of these steps casts the amygdala as an important (attentional) modulator of neural systems, and a key aspect of the proposal here is the amygdala’s influence on simulation systems. Importantly, each element in this new framework of the amygdala’s function is supported with empirical data. Moreover, the connection between amygdala processing and simulation networks is supported by anatomical detail, though the functional relevance of this connectivity has yet to be clearly established. A more complete functional understanding of this relationship is sure to come given the evident energy and productivity of research into these networks of social cognition. Acknowledgments The authors thank Fred Gosselin and Dirk Neumann for helpful discussions. This work was supported by grants from the National Institute of
Fig. 3. Schematic of the proposed action of the amygdala in attentionally modulating visual and somatosensory cortical areas either directly or via projections to the pulvinar.
375
Mental Health, the Cure Autism New Foundation, and the Pfeiffer Research Foundation.
References Adams Jr., R.B., Gordon, H.L., Baird, A.A., Ambady, N. and Kleck, R.E. (2003) Effects of gaze on amygdala sensitivity to anger and fear faces. Science, 300: 1536. Adams Jr., R.B. and Kleck, R.E. (2003) Perceived gaze direction and the processing of facial displays of emotion. Psychol. Sci., 14: 644–647. Adolphs, R. (2002) Recognizing emotion from facial expressions: psychological and neurological mechanisms. Behav. Cogn. Neurosci. Rev., 1: 21–61. Adolphs, R., Baron-Cohen, S. and Tranel, D. (2002) Impaired recognition of social emotions following amygdala damage. J. Cogn. Neurosci., 14: 1264–1274. Adolphs, R., Damasio, H., Tranel, D., Cooper, G. and Damasio, A.R. (2000) A role for somatosensory cortices in the visual recognition of emotion as revealed by three-dimensional lesion mapping. J. Neurosci., 20: 2683–2690. Adolphs, R., Damasio, H., Tranel, D. and Damasio, A.R. (1996) Cortical systems for the recognition of emotion in facial expressions. J. Neurosci., 16: 7678–7687. Adolphs, R., Gosselin, F., Buchanan, T.W., Tranel, D., Schyns, P. and Damasio, A.R. (2005) A mechanism for impaired fear recognition after amygdala damage. Nature, 433: 68–72. Adolphs, R., Russell, J.A. and Tranel, D. (1999a) A role for the human amygdala in recognizing emotional arousal from unpleasant stimuli. Psychol. Sci., 10: 167–171. Adolphs, R. and Tranel, D. (1999) Intact recognition of emotional prosody following amygdala damage. Neuropsychologia, 37: 1285–1292. Adolphs, R. and Tranel, D. (2003) Amygdala damage impairs emotion recognition from scenes only when they contain facial expressions. Neuropsychologia, 41: 1281–1289. Adolphs, R., Tranel, D. and Damasio, A.R. (1998) The human amygdala in social judgment. Nature, 393: 470–474. Adolphs, R., Tranel, D. and Damasio, A.R. (2003) Dissociable neural systems for recognizing emotions. Brain Cogn., 52: 61–69. Adolphs, R., Tranel, D., Damasio, H. and Damasio, A. (1994) Impaired recognition of emotion in facial expressions following bilateral damage to the human amygdala. Nature, 372: 669–672. Adolphs, R., Tranel, D., Damasio, H. and Damasio, A.R. (1995) Fear and the human amygdala. J. Neurosci., 15: 5879–5891. Adolphs, R., Tranel, D., Hamann, S., Young, A.W., Calder, A.J., Phelps, E.A., Anderson, A., Lee, G.P. and Damasio, A.R. (1999b) Recognition of facial emotion in nine individuals with bilateral amygdala damage. Neuropsychologia, 37: 1111–1117. Aggleton, A., Burton, M. and Passingham, R. (1980) Cortical and subcortical afferents to the amygdala of the rhesus monkey (Macaca mulatta). Brain Res., 190: 347–368.
Amaral, D.G., Bauman, M.D., Capitanio, J.P., Lavenex, P., Mason, W.A., Mauldin-Jourdain, M.L. and Mendoza, S.P. (2003a) The amygdala: is it an essential component of the neural network for social cognition? Neuropsychologia, 41: 517–522. Amaral, D.G., Behniea, H. and Kelly, J.L. (2003b) Topographic organization of projections from the amygdala to the visual cortex in the macaque monkey. Neuroscience, 118: 1099–10120. Amaral, D.G. and Insausti, R. (1992) Retrograde transport of D-[3 H]-aspartate injected into the monkey amygdaloid complex. Exp. Brain Res., 88: 375–388. Amaral, D.G. and Price, J.L. (1984) Amygdalo-cortical projections in the monkey (Macaca fascicularis). J. Comp. Neurol., 230: 465–496. Anders, S., Lotze, M., Erb, M., Grodd, W. and Birbaumer, N. (2004) Brain activity underlying emotional valence and arousal: a response-related fMRI study. Hum. Brain Mapp., 23: 200–209. Anderson, A.K. and Phelps, E.A. (2001) Lesions of the human amygdala impair enhanced perception of emotionally salient events. Nature, 411: 305–309. Anderson, A.K. and Phelps, E.A. (2002) Is the human amygdala critical for the subjective experience of emotion? Evidence of intact dispositional affect in patients with amygdala lesions. J. Cogn. Neurosci., 14: 709–720. Anderson, A.K., Spencer, D.D., Fulbright, R.K. and Phelps, E.A. (2000) Contribution of the anteromedial temporal lobes to the evaluation of facial emotion. Neuropsychology, 14: 526–536. Bachevalier, J., Beauregard, M. and Alvarado, M.C. (1999) Long-term effects of neonatal damage to the hippocampal formation and amygdaloid complex on object discrimination and object recognition in rhesus monkeys (Macaca mulatta). Behav. Neurosci., 113: 1127–1151. Bachevalier, J. and Loveland, K.A. (2006) The orbitofrontalamygdala circuit and self-regulation of social-emotional behavior in autism. Neurosci. Biobehav. Rev., 30: 97–117. Bagshaw, M.H. and Benzies, S. (1968) Multiple measures of the orienting reaction and their dissociation after amygdalectomy in monkeys. Exp. Neurol., 20: 175–187. Bagshaw, M.H. and Coppock, H.W. (1968) Galvanic skin response conditioning deficit in amygdalectomized monkeys. Exp. Neurol., 20: 188–196. Baron-Cohen, S., Ring, H.A., Bullmore, E.T., Wheelwright, S., Ashwin, C. and Williams, S.C. (2000) The amygdala theory of autism. Neurosci. Biobehav. Rev., 24: 355–364. Baron-Cohen, S., Ring, H.A., Wheelwright, S., Bullmore, E.T., Brammer, M.J., Simmons, A. and Williams, S.C. (1999) Social intelligence in the normal and autistic brain: an fMRI study. Eur. J. Neurosci., 11: 1891–1898. Bauman, M.D., Lavenex, P., Mason, W.A., Capitanio, J.P. and Amaral, D.G. (2004a) The development of mother-infant interactions after neonatal amygdala lesions in rhesus monkeys. J. Neurosci., 24: 711–721. Bauman, M.D., Lavenex, P., Mason, W.A., Capitanio, J.P. and Amaral, D.G. (2004b) The development of social behavior
376 following neonatal amygdala lesions in rhesus monkeys. J. Cogn. Neurosci., 16: 1388–1411. Bechara, A., Tranel, D., Damasio, H., Adolphs, R., Rockland, C. and Damasio, A.R. (1995) Double dissociation of conditioning and declarative knowledge relative to the amygdala and hippocampus in humans. Science, 269: 1115–1118. Breiter, H.C., Etcoff, N.L., Whalen, P.J., Kennedy, W.A., Rauch, S.L., Buckner, R.L., Strauss, M.M., Hyman, S.E. and Rosen, B.R. (1996) Response and habituation of the human amygdala during visual processing of facial expression. Neuron, 17: 875–887. Broks, P., Young, A.W., Maratos, E.J., Coffey, P.J., Calder, A.J., Isaac, C.L., Mayes, A.R., Hodges, J.R., Montaldi, D., Cezayirli, E., Roberts, N. and Hadley, D. (1998) Face processing impairments after encephalitis: amygdala damage and recognition of fear. Neuropsychologia, 36: 59–70. Brothers, L., Ring, B. and Kling, A. (1990) Response of neurons in the macaque amygdala to complex social stimuli. Behav. Brain Res., 41: 199–213. Buchel, C., Morris, J., Dolan, R.J. and Friston, K.J. (1998) Brain systems mediating aversive conditioning: an event-related fMRI study. Neuron, 20: 947–957. Calder, A.J., Keane, J., Manes, F., Antoun, N. and Young, A.W. (2000) Impaired recognition and experience of disgust following brain injury. Nat. Neurosci., 3: 1077–1078. Calder, A.J., Lawrence, A.D., Keane, J., Scott, S.K., Owen, A.M., Christoffels, I. and Young, A.W. (2002) Reading the mind from eye gaze. Neuropsychologia, 40: 1129–1138. Calder, A.J., Young, A.W., Rowland, D., Perrett, D.I., Hodges, J.R. and Etcoff, N.L. (1996) Facial emotion recognition after bilateral amygdala damage: differentially severe impairment of fear. Cogn. Neuropsychol., 13: 699–745. Campbell, R., Heywood, C.A., Cowey, A., Regard, M. and Landis, T. (1990) Sensitivity to eye gaze in prosopagnosic patients and monkeys with superior temporal sulcus ablation. Neuropsychologia, 28: 1123–1142. Canli, T., Sivers, H., Whitfield, S.L., Gotlib, I.H. and Gabrieli, J.D. (2002) Amygdala response to happy faces as a function of extraversion. Science, 296: 2191. Damasio, A.R. (1996) The somatic marker hypothesis and the possible functions of the prefrontal cortex. Philos. Trans. R. Soc. Lond. B Biol. Sci., 351: 1413–1420. de Gelder, B., Vroomen, J., Pourtois, G. and Weiskrantz, L. (1999) Non-conscious recognition of affect in the absence of striate cortex. Neuroreport., 10: 3759–3763. Dicks, D., Myers, R.E. and Kling, A. (1968) Uncus and amygdala lesions: effects on social behavior in the free-ranging rhesus monkey. Science, 165: 69–71. Downing, P.E., Jiang, Y., Shuman, M. and Kanwisher, N. (2001) A cortical area selective for visual processing of the human body. Science, 293: 2470–2473. Duhamel, J.R., Colby, C.L. and Goldberg, M.E. (1998) Ventral intraparietal area of the macaque: congruent visual and somatic response properties. J. Neurophysiol., 79: 126–136. Emery, N.J. (2000) The eyes have it: the neuroethology, function and evolution of social gaze. Neurosci. Biobehav. Rev., 24: 581–604.
Emery, N.J. and Amaral, D.G. (1999) The role of the amygdala in primate social cognition. In: Lane R.D., NadelL. (Ed.), Cognitive Neuroscience of Emotion. Oxford University Press, Oxford. Emery, N.J., Capitanio, J.P., Mason, W.A., Machado, C.J., Mendoza, S.P. and Amaral, D.G. (2001) The effects of bilateral lesions of the amygdala on dyadic social interactions in rhesus monkeys (Macaca mulatta). Behav. Neurosci., 115: 515–544. Fine, C., Lumsden, J. and Blair, R.J. (2001) Dissociation between ‘theory of mind’ and executive functions in a patient with early left amygdala damage. Brain, 124: 287–298. Freese, J.L. and Amaral, D.G. (2005) The organization of projections from the amygdala to visual cortical areas TE and V1 in the macaque monkey. J. Comp. Neurol., 486: 295–317. Gallese, V., Keysers, C. and Rizzolatti, G. (2004) A unifying view of the basis of social cognition. Trends Cogn. Sci., 8: 396–403. Ghashghaei, H.T. and Barbas, H. (2002) Pathways for emotion: interactions of prefrontal and anterior temporal pathways in the amygdala of the rhesus monkey. Neuroscience, 115: 1261–1279. Glascher, J. and Adolphs, R. (2003) Processing of the arousal of subliminal and supraliminal emotional stimuli by the human amygdala. J. Neurosci., 23: 10274–10282. Grossman, E.D. and Blake, R. (2002) Brain areas active during visual perception of biological motion. Neuron, 35: 1167–1175. Hart, A.J., Whalen, P.J., Shin, L.M., McInerney, S.C., Fischer, H. and Rauch, S.L. (2000) Differential response in the human amygdala to racial outgroup vs ingroup face stimuli. Neuroreport, 11: 2351–2355. Haxby, J.V., Gobbini, M.I., Furey, M.L., Ishai, A., Schouten, J.L. and Pietrini, P. (2001) Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293: 2425–2430. Haxby, J.V., Hoffman, E.A. and Gobbini, M.I. (2000) The distributed human neural system for face perception. Trends Cogn. Sci., 4: 223–233. Heberlein, A.S., Adolphs, R., Tranel, D. and Damasio, H. (2004) Cortical regions for judgments of emotions and personality traits from point-light walkers. J. Cogn. Neurosci., 16: 1143–1158. Hennenlotter, A., Schroeder, U., Erhard, P., Castrop, F., Haslinger, B., Stoecker, D., Lange, K.W. and Ceballos-Baumann, A.O. (2005) A common neural basis for receptive and expressive communication of pleasant facial affect. Neuroimage, 26: 581–591. Hooker, C.I., Paller, K.A., Gitelman, D.R., Parrish, T.B., Mesulam, M.M. and Reber, P.J. (2003) Brain networks for analyzing eye gaze. Brain Res. Cogn. Brain Res., 17: 406–418. Izquierdo, A., Suda, R.K. and Murray, E.A. (2005) Comparison of the effects of bilateral orbital prefrontal cortex lesions and amygdala lesions on emotional responses in rhesus monkeys. J. Neurosci., 25: 8534–8542. Jones, E.G. and Burton, H. (1976) A projection from the medial pulvinar to the amygdala in primates. Brain Res., 104: 142–147.
377 Kalin, N.H., Shelton, S.E. and Davidson, R.J. (2004) The role of the central nucleus of the amygdala in mediating fear and anxiety in the primate. J. Neurosci., 24: 5506–5515. Kalin, N.H., Shelton, S.E., Davidson, R.J. and Kelley, A.E. (2001) The primate amygdala mediates acute fear but not the behavioral and physiological components of anxious temperament. J. Neurosci., 21: 2067–2074. Kanwisher, N., McDermott, J. and Chun, M.M. (1997) The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci., 17: 4302–4311. Kawashima, R., Sugiura, M., Kato, T., Nakamura, A., Hatano, K., Ito, K., Fukuda, H., Kojima, S. and Nakamura, K. (1999) The human amygdala plays an important role in gaze monitoring. A PET study. Brain, 122(Pt 4): 779–783. Kim, H., Somerville, L.H., Johnstone, T., Polis, S., Alexander, A.L., Shin, L.M. and Whalen, P.J. (2004) Contextual modulation of amygdala responsivity to surprised faces. J. Cogn. Neurosci., 16: 1730–1745. Kling, A. (1968) Effects of amygdalectomy and testosterone on sexual behavior of male juvenile macaques. J. Comp. Physiol. Psychol., 65: 466–471. Kling, A. (1974) Differential effects of amygdalectomy in male and female nonhuman primates. Arch. Sex Behav., 3: 129–134. Kling, A.S. and Brothers, L.A. (1992) The amygdala and social behavior. In: Aggleton, A.P. (Ed.), The Amygdala: Neurobiological Aspects of Emotion, Memory, and Mental Dysfunction. Wiley-Liss, New York. Kling, A., Lancaster, J. and Benitone, J. (1970) Amygdalectomy in the free-ranging vervet (Cercopithecus aethiops). J. Psychiatr. Res., 7: 191–199. Kling, A., Steklis, H.D. and Deutsch, S. (1979) Radiotelemetered activity from the amygdala during social interactions in the monkey. Exp. Neurol., 66: 88–96. Kluver, H. and Bucy, P.C. (1937) Psychic blindness and other symptoms following bilateral temporal amygdalectomy in Rhesus monkeys. Am. J. Physiol., 119: 352–353. Kluver, H. and Bucy, P.C. (1939) Preliminary analysis of functions of the temporal lobes in monkeys. Arch. Neurol. Psychiat., 2: 979–1000. Kluver, H. and Bucy, P.C. (1997) Preliminary analysis of functions of the temporal lobes in monkeys. 1939. J. Neuropsych. Clin. Neurosci., 9: 606–620. Knight, D.C., Nguyen, H.T. and Bandettini, P.A. (2005) The role of the human amygdala in the production of conditioned fear responses. Neuroimage, 26: 1193–1200. Liddell, B.J., Brown, K.J., Kemp, A.H., Barton, M.J., Das, P., Peduto, A., Gordon, E. and Williams, L.M. (2005) A direct brainstem-amygdala-cortical ‘alarm’ system for subliminal signals of fear. Neuroimage, 24: 235–243. McCarthy, G. (1999) Physiological studies of face processing in humans. In: Gazzaniga, M.S. (Ed.), The New Cognitive Neurosciences. MIT Press, Cambridge, MA, pp. 393–410. McGaugh, J.L. (2004) The amygdala modulates the consolidation of memories of emotionally arousing experiences. Annu. Rev. Neurosci., 27: 1–28.
Mesulam, M.M., Van Hoesen, G.W., Pandya, D.N. and Geschwind, N. (1977) Limbic and sensory connections of the inferior parietal lobule (area PG) in the rhesus monkey: a study with a new method for horseradish peroxidase histochemistry. Brain Res., 136: 393–414. Meunier, M. and Bachevalier, J. (2002) Comparison of emotional responses in monkeys with rhinal cortex or amygdala lesions. Emotion, 2: 147–161. Meunier, M., Bachevalier, J., Murray, E.A., Malkova, L. and Mishkin, M. (1999) Effects of aspiration versus neurotoxic lesions of the amygdala on emotional responses in monkeys. Eur. J. Neurosci., 11: 4403–4418. Morris, J.S., DeGelder, B., Weiskrantz, L. and Dolan, R.J. (2001) Differential extrageniculostriate and amygdala responses to presentation of emotional faces in a cortically blind field. Brain, 124: 1241–1252. Morris, J.S., Friston, K.J., Buchel, C., Frith, C.D., Young, A.W., Calder, A.J. and Dolan, R.J. (1998a) A neuromodulatory role for the human amygdala in processing emotional facial expressions. Brain, 121(Pt 1): 47–57. Morris, J.S., Frith, C.D., Perrett, D.I., Rowland, D., Young, A.W., Calder, A.J. and Dolan, R.J. (1996) A differential neural response in the human amygdala to fearful and happy facial expressions. Nature, 383: 812–815. Morris, J.S., Ohman, A. and Dolan, R.J. (1998b) Conscious and unconscious emotional learning in the human amygdala. Nature, 393: 467–470. Morris, J.S., Ohman, A. and Dolan, R.J. (1999) A subcortical pathway to the right amygdala mediating ‘‘unseen’’ fear. Proc. Natl. Acad. Sci. USA, 96: 1680–1685. Ohman, A. (2005) The role of the amygdala in human fear: automatic detection of threat. Psychoneuroendocrinology, 30: 953–958. Patterson II, J.C., Ungerleider, L.G. and Bandettini, P.A. (2002) Task-independent functional brain activity correlation with skin conductance changes: an fMRI study. Neuroimage, 17: 1797–1806. Pegna, A.J., Khateb, A., Lazeyras, F. and Seghier, M.L. (2005) Discriminating emotional faces without primary visual cortices involves the right amygdala. Nat. Neurosci., 8: 24–25. Phelps, E.A., O’Connor, K.J., Cunningham, W.A., Funayama, E.S., Gatenby, J.C., Gore, J.C. and Banaji, M.R. (2000) Performance on indirect measures of race evaluation predicts amygdala activation. J. Cogn. Neurosci., 12: 729–738. Pinsk, M.A., DeSimone, K., Moore, T., Gross, C.G. and Kastner, S. (2005) Representations of faces and body parts in macaque temporal cortex: a functional MRI study. Proc. Natl. Acad. Sci. USA, 102: 6996–7001. Pourtois, G., Sander, D., Andres, M., Grandjean, D., Reveret, L., Olivier, E. and Vuilleumier, P. (2004) Dissociable roles of the human somatosensory and superior temporal cortices for processing social face signals. Eur. J. Neurosci., 20: 3507–3515. Prather, M.D., Lavenex, P., Mauldin-Jourdain, M.L., Mason, W.A., Capitanio, J.P., Mendoza, S.P. and Amaral, D.G. (2001) Increased social fear and decreased fear of objects in monkeys with neonatal amygdala lesions. Neuroscience, 106: 653–658.
378 Pribram, K.H., Reitz, S., McNeil, M. and Spevack, A.A. (1979) The effect of amygdalectomy on orienting and classical conditioning in monkeys. Pavlov J. Biol. Sci., 14: 203–217. Price, J.L. (2003) Comparative aspects of amygdala connectivity. Ann. NY Acad. Sci., 985: 50–58. Price, J.L. and Amaral, D.G. (1981) An autoradiographic study of the projections of the central nucleus of the monkey amygdala. J. Neurosci., 1: 1242–1259. Puce, A., Allison, T., Bentin, S., Gore, J.C. and McCarthy, G. (1998) Temporal cortex activation in humans viewing eye and mouth movements. J. Neurosci., 18: 2188–2199. Rizzolatti, G. and Craighero, L. (2004) The mirror–neuron system. Annu. Rev. Neurosci., 27: 169–192. Romanski, L.M., Giguere, M., Bates, J.F. and Goldman-Rakic, P.S. (1997) Topographic organization of medial pulvinar connections with the prefrontal cortex in the rhesus monkey. J. Comp. Neurol., 379: 313–332. Rosvold, H.E., Mirsky, A.F. and Pribram, K.H. (1954) Influence of amygdalectomy on social behavior in monkeys. J. Comp. Physiol. Psychol., 47: 173–178. Spiridon, M. and Kanwisher, N. (2002) How distributed is visual category information in human occipito-temporal cortex? An fMRI study. Neuron, 35: 1157–1165. Sugase, Y., Yamane, S., Ueno, S. and Kawano, K. (1999) Global and fine information coded by single neurons in the temporal visual cortex. Nature, 400: 869–873. Tranel, D. and Damasio, H. (1989) Intact electrodermal skin conductance responses after bilateral amygdala damage. Neuropsychologia, 27: 381–390. Vuilleumier, P., Richardson, M.P., Armony, J.L., Driver, J. and Dolan, R.J. (2004) Distant influences of amygdala lesion on visual cortical activa during emotional face processing. Nat. Neurosci., 7: 1271–1278. Vuilleumier, P. and Schwartz, S. (2001a) Beware and be aware: capture of spatial attention by fear-related stimuli in neglect. Neuroreport, 12: 1119–1122. Vuilleumier, P. and Schwartz, S. (2001b) Emotional facial expressions capture attention. Neurology, 56: 153–158. Wang, L., McCarthy, G., Song, A.W. and Labar, K.S. (2005) Amygdala activation to sad pictures during high-field (4 tesla) functional magnetic resonance imaging. Emotion, 5: 12–22. Ward, R., Danziger, S. and Bamford, S. (2005) Response to visual threat following damage to the pulvinar. Curr. Biol., 15: 571–573.
Weiskrantz, L. (1956) Behavioral changes associated with ablation of the amygdaloid complex in monkeys. J. Comp. Physiol. Psychol., 49: 381–391. Whalen, P.J. (1999) Fear, vigilance, and ambiguity: initial neuroimaging studies of the human amygdala. Curr. Dir. Psychol. Sci., 7: 177–187. Whalen, P.J., Shin, L.M., McInerney, S.C., Fischer, H., Wright, C.I. and Rauch, S.L. (2001) A functional MRI study of human amygdala responses to facial expressions of fear versus anger. Emotion, 1: 70–83. Wicker, B., Keysers, C., Plailly, J., Royet, J.P., Gallese, V. and Rizzolatti, G. (2003a) Both of us disgusted in My insula: the common neural basis of seeing and feeling disgust. Neuron, 40: 655–664. Wicker, B., Michel, F., Henaff, M.A. and Decety, J. (1998) Brain regions involved in the perception of gaze: a PET study. Neuroimage, 8: 221–227. Wicker, B., Perrett, D.I., Baron-Cohen, S. and Decety, J. (2003b) Being the target of another’s emotion: a PET study. Neuropsychologia, 41: 139–146. Williams, LM., Brammer, M.J., Skerrett, D., Lagopolous, J., Rennie, C., Kozek, K., Olivieri, G., Peduto, T. and Gordon, E. (2000) The neural correlates of orienting: an integration of fMRI and skin conductance orienting. Neuroreport, 11: 3011–3015. Williams, M.A., McGlone, F., Abbott, D.F. and Mattingley, J.B. (2005) Differential amygdala responses to happy and fearful facial expressions depend on selective attention. Neuroimage, 24: 417–425. Williams, L.M., Phillips, M.L., Brammer, M.J., Skerrett, D., Lagopoulos, J., Rennie, C., Bahramali, H., Olivieri, G., David, A.S., Peduto, A. and Gordon, E. (2001) Arousal dissociates amygdala and hippocampal fear responses: evidence from simultaneous fMRI and skin conductance recording. Neuroimage, 14: 1070–1079. Winston, J.S., O’Doherty, J. and Dolan, R.J. (2003) Common and distinct neural responses during direct and incidental processing of multiple facial emotions. Neuroimage, 20: 84–97. Winston, J.S., Strange, B.A., O’Doherty, J. and Dolan, R.J. (2002) Automatic and intentional brain responses during evaluation of trustworthiness of faces. Nat. Neurosci., 5: 277–283. Young, A.W., Aggleton, J.P., Hellawell, D.J., Johnson, M., Broks, P. and Hanley, J.R. (1995) processing impairments after amygdalotomy. Brain, 118(Pt 1): 15–24.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 21
Towards a unifying neural theory of social cognition Christian Keysers and Valeria Gazzola BCN Neuro-Imaging-Centre, University Medical Center Groningen, University of Groningen, A. Deusinglaan 2, 9713AW Groningen, The Netherlands
Abstract: Humans can effortlessly understand a lot of what is going on in other peoples’ minds. Understanding the neural basis of this capacity has proven quite difficult. Since the discovery of mirror neurons, a number of successful experiments have approached the question of how we understand the actions of others from the perspective of sharing their actions. Recently we have demonstrated that a similar logic may apply to understanding the emotions and sensations of others. Here, we therefore review evidence that a single mechanism (shared circuits) applies to actions, sensations and emotions: witnessing the actions, sensations and emotions of other individuals activates brain areas normally involved in performing the same actions and feeling the same sensations and emotions. We propose that these circuits, shared between the first (I do, I feel) and third person perspective (seeing her do, seeing her feel) translate the vision and sound of what other people do and feel into the language of the observers own actions and feelings. This translation could help understand the actions and feelings of others by providing intuitive insights into their inner life. We propose a mechanism for the development of shared circuits on the basis of Hebbian learning, and underline that shared circuits could integrate with more cognitive functions during social cognitions. Keywords: mirror system; social cognition; emotions; actions; sensations; empathy; theory of mind actors slip off the roof, we shiver if we see an actor cut himself, we grimace in disgust as the character has to eat disgusting food. This sharing experience begs two related questions: How do we manage to slip into the skin of other people so effortlessly? Why do we share the experiences we observe instead of simply understanding them? The goal of this chapter will be to propose that a single principle — shared circuits — could provide a unifying perspective on both of these questions. To foreshadow the main message of our proposal, we claim that a circuit composed of the temporal lobe (area STS (superior temporal sulcus) in monkeys or MTG (middle temporal gyrus) in humans), the rostral inferior parietal lobule (PF/IPL) and the ventral premotor cortex (F5/BA44+6) is involved both in our own actions and those of others, thereby forming a shared circuit for performing and observing actions. We will show that
Humans are exquisitely social animals. The progress of our species and technology is based on our capacity for social learning. Social learning and skilled social interactions rest upon our capacity to gain insights into the mind of others. Not surprisingly, humans are indeed excellent at understanding the inner life of others. This is exemplified in our inner experience of watching a Hollywood feature film: we relax while effortlessly attributing a vast range of emotions and motivations to the main character simply by witnessing the actions of the character, and the events that occur to him. Not only do we feel that we need very little explicit thoughts to understand the actors, we actually share their emotions and motivations: our hands sweat and our heart beats faster while we see Corresponding author. Fax: +31-503638875; E-mail:
[email protected],
[email protected] DOI: 10.1016/S0079-6123(06)56021-2
379
380
the somatosensory cortices are involved both in experiencing touch on our own body and in viewing other human beings or objects being touched; that the anterior cingulate and insular cortices are involved in the experience of pain, and the perception of other people’s pain; and finally that the anterior insula is also involved both in the experience of disgust and in the observation of disgust in others (for the case of actions, this model is similar to those put forward by other authors: Gallese et al. (2004), Rizzolatti and Craighero (2004) and Hurley S.L. [http://www.warwick.ac.uk/staff/S.L.Hurley]). Common to all these cases is that some of the brain areas involved in the first person perspective (I do or I feel) are also involved in the third person perspective (she does or she feels). We will argue that this sharing transforms what we see other people do or feel into something very well known to us: what we do and feel ourselves. By doing so it provides an intuitive grasp of the inner life of others. We will review separately key evidence for shared circuits for actions, sensations and emotions. We will then show that these systems appear to generalize beyond the animate world. We will conclude by suggesting how Hebbian learning could account for the emergence of these shared circuits.
Shared circuits for actions The first evidence that certain brain areas might be involved both in the processing of first and third person perspectives comes from the study of actions in monkeys. Understanding the actions of others is a pragmatic need of social life. Surprisingly, some areas involved in the monkey’s own actions are activated by the sight of someone else’s actions (Dipellegrino et al., 1992; Gallese et al., 1996). Today, we start to understand more about the circuitry that might be responsible for the emergence of this phenomenon (Keysers et al., 2004a; Keysers and Perrett, 2004). Imaging studies suggest that a similar system exists in humans (see Rizzolatti and Craighero, 2004 and Rizzolatti et al., 2001 for a review).
Primates Three brain areas have been shown to contain neurons that are selectively activated by the sight of the actions of other individuals: the STS (Bruce et al., 1981; Perrett et al., 1985, 1989; Oram and Perrett, 1994, 1996), the anterior inferior parietal lobule (an area sometimes called 7b and sometimes PF, but the two names refer to the same area, and we will use PF in this manuscript; Gallese et al., 2002) and the ventral premotor cortex (area F5; Dipellegrino et al., 1992; Gallese et al., 1996; Rizzolatti et al., 1996; Keysers et al., 2003) (Fig. 1). These three brain areas are anatomically interconnected: STS has reciprocal connections with PF (Seltzer and Pandya, 1978; Selemon and Goldmanrakic, 1988; Harries and Perrett, 1991; Seltzer and Pandya, 1994; Rizzolatti and Matelli 2003) and PF is reciprocally connected with F5 (Matelli et al., 1986; Luppino et al., 1999; Rizzolatti and Luppino, 2001; Tanne-Gariepy et al., 2002), while there are no direct connections between F5 and the STS (see Keysers and Perrett, in press, for a recent review). All three areas contain neurons that appear to selectively respond to the sight of hand–object interactions, with particular neurons responding to the sight of particular actions, such as grasping, tearing or manipulating (Perrett et al., 1989; Dipellegrino et al., 1992; Gallese et al., 1996, 2002; Keysers et al., 2003). There is however a fundamental difference among the three areas. Virtually all neurons in F5 that respond when the monkey observes another individual perform a particular action also respond when the monkey performs the same action whether he is able to see his own actions or not (Gallese et al., 1996). These neurons called mirror neurons therefore constitute a link between what the monkey sees other people do and what the monkey does himself. A substantial number of neurons in PF shows a similar behaviour (Gallese et al., 2002). While in F5 and PF, motor information has an excitatory effect on activity, the situation in the STS is quite different. None of the neurons in the STS responding to the sight of a particular action have been shown to robustly respond when the monkey performs the same action with his eyes closed (Keysers et al., 2004a; Keysers and Perrett, 2004). While some neurons in the STS
381
c
ip
IPL
a
PF
F5 s
S
BA6/ F5 BA44
TG
M
T
S
Fig. 1. (a) Lateral view of the macaque brain with the location of F5, PF and STS together with their anatomical connections (arrows). The following sulci are shown: a ¼ arcuate, c ¼ central, ip ¼ intraparietal, s ¼ sylvian sulcus. (b) Corresponding view of the human brain.
respond similarly when the monkey sees himself perform an action and when it sees someone else perform the same action (Perrett et al., 1989, 1990), many actually cease to respond to the sight of their preferred movement if the monkey himself is causing this movement (Hietanen and Perrett, 1993, 1996). For these latter neurons, the motor/proprioceptive signal therefore assumes an inhibitory function, in contrast to the excitatory function observed in F5 and PF. As a result, half of the cells in the STS appear to treat self and other in similar ways, the other half of the STS sharply distinguishes other- from self-caused actions. Considering the STS-PF-F5 circuit as a whole, we therefore have a system that responds to the actions of others. Two of its components (PF and F5) link the actions of others to our own motor programs, and may therefore give us an intuitive insight into the actions of others because they transform the sight of these actions into something very well known to ourselves: our own actions (Gallese et al., 2004; Keysers and Perrett, in press). An essential property of mirror neurons is their congruent selectivity, namely, the fact that if they respond more to a particular action (e.g. precision grip) during execution, they also respond more to that same action during observation, compared to other actions (Gallese et al., 1996). Importantly, not all mirror neurons show the same selectivity: some are very precisely tuned for a particular action (e.g. they respond strongly to a precision grip, but not to a whole-hand prehension), while others are much more broadly tuned (responding to all
kinds of grasps, but not to other actions not related to grasping). This combination of precisely and broadly tuned neurons is very important: the precisely tuned neurons can give very detailed insights into the actions of others, but require that these actions are within the motor vocabulary of the observing monkey. The more broadly tuned neurons on the other hand will also respond to the sight of novel actions that are not within the motor vocabulary of the monkey, but resemble actions that are within the monkey’s vocabulary. Examples of the latter are the neurons responding to tool use, which have now been found in F5 (Ferrari et al., 2005): the monkeys used in this experiment have never used tools (e.g. a pincer) and yet the sight of someone using a tool activated some F5 neurons that responded when the monkey performed similar but different actions (grasping with its hands). The STS-PF-F5 circuit also responds in cases where we recognize the actions of others but are unable to fully see these actions. In the STS, some neurons respond strongly to the invisible presence of a human hiding behind an occluding screen in a particular location. The same human hiding in a different location often caused no response (Baker et al., 2001; Fig. 2a). Although this capacity has been demonstrated for hidden humans, similar responses may exist for hidden objects. In F5, about half of the mirror neurons responding when the monkey himself grasps an object also respond to the sight of a human reaching behind an occluder but only when the monkey previously saw an object
382
(a) STS
5spks/s
(b) F5
5spk/s 1s
1s
Fig. 2. (a) Response of a neuron in STS while the monkey observes a human walk towards, hide behind and then reappear from an occluding screen. The top and bottom histograms show its activity when hiding behind the left and centre occluder, respectively (see cartoon on the left). The different experimental phases are shown on top, and coded as a white background when the subject is fully visible, light grey when partially and dark grey when fully occluded by the screen. The discharge is stronger in the top compared to the bottom occluded phase although in both cases, there were only three occluders to be seen without any visible individual (Baker et al., 2001). (b) An F5 neuron while a human demonstrator grasps behind an occluding screen. In the top but not the bottom case, the monkey previously saw an object being placed on a tray before the occluder was sled in front of the tray. The discharge starting as the hand begins to be occluded (light and dark grey background) is much stronger in the top case, yet at that moment both visual stimuli (top and bottom) are equal (Umilta et al., 2001). The scales are different in (a) and (b).
being placed behind the occluder (Umilta et al., 2001; Fig. 2b). This observation begs the question of where the information necessary for this type of F5 responses originates. As shown above, the STS could provide a representation of the reaching and a representation of the hidden object. The STS-PF-F5 circuit may then extrapolate from these two pieces of information towards a complete representation of the action, causing F5 grasping neurons to fire. The circuit is particularly well suited for such extrapolations because it is an inherent function of the premotor cortex to code movement sequences unfolding in time. The same hardware could be used to extrapolate the visible beginning of a grasping into the full action. Many important actions around us are not fully visible: a leopard may be approaching a monkey, intermittently disappearing
behind trees. In such cases, understanding the leopards action, although it is not fully visible, will make the difference between life and death for the observing monkey. Both STS and F5 also contain neurons that respond to the sound of actions. Neurons were found in the STS that respond to the sound and/or the vision of walking, with much smaller responses to other actions such as clapping hands (Barraclough et al. 2005; Fig. 3a). Similar neurons have been found in F5, but responding to seeing and/or hearing a peanut being broken (Fig. 3b; Kohler et al. 2002; Keysers et al., 2003). The latter neurons in F5 also respond when the monkey breaks a rubber peanut out of sight (i.e. without sound or vision of his own action). It therefore appears as though the entire STS-PF-F5 circuit is multimodal: some of its
383
a
STS 1st footstep
b
F5 crack of peanut
an actor with 490% accuracy based either on the sound or the vision of the action alone (Keysers et al., 2003).
Humans V+S
S
1s Fig. 3. (a) response of an STS neuron while the monkey heard (S ¼ sound), saw (V ¼ vision) or saw and heard (V+S) an experimenter walk. Note the strong response in all three cases. (b) response of an F5 neuron in the same three conditions but for the action of breaking a peanut. This neuron also responded while the monkey broke a rubber peanut out of sight. The curves at the bottom are sonographs (figure adapted from Keysers and Perrett, in press).
neurons respond in similar ways to an action independently of whether it is seen or heard. Given its connections to both the auditory and visual cortices, STS appears to be a likely site for this audiovisual integration (see Ethofer and Wildgruber, this volume). In the F5-PF-STS circuit, this audiovisual action representation then appears to be integrated with the motor program of the matching action. With such a multimodal system, the mere sound of someone knocking on the door would activate a multimodal, audio-visuo motor representation of the action, leading to a deep understanding and sharing of the heard action. Indeed, mirror neurons with audiovisual properties are able to discriminate which of two actions was performed by
A mirror system similar to that found in the monkey has now been described in humans. Regarding the observation of actions, a number of imaging studies, including fMRI, PET and MEG experiments, have reported the three following areas being particularly involved in the observation of actions: the caudal inferior frontal gyrus and adjacent premotor cortex (Broadman areas [BAs] 44 and 6) corresponding to the monkey’s area F5, the rostral inferior parietal lobule (IPL) corresponding to the monkey’s area PF, and caudal sectors of the temporal lobe, in particular the posterior superior temporal sulcus (pSTS) and the adjacent MTG corresponding to the monkey’s STS (see Fig. 1; Grafton et al., 1996; Rizzolatti et al., 1996; Decety et al., 1997; Grezes et al., 1998; Iacoboni et al., 1999; Nishitani and Hari, 2000; Buccino et al., 2001; Grezes et al., 2001; Iacoboni et al., 2001; Perani et al., 2001; Decety et al., 2002; Nishitani and Hari, 2002; Grezes et al., 2003; Manthey et al., 2003; Buccino et al., 2004b; Wheaton et al. 2004). Two of these three areas, the IPL and BA44/6 are known to play an important role in motor control. A smaller number of studies have also measured brain activity during the execution of actions in the same individuals in order to check if certain parts of the brain are involved both during motor execution and the observation of similar actions (Grafton et al., 1996; Iacoboni et al., 1999; Buccino et al., 2004b). These studies found sectors of the IPL and BA44/6 to be involved both in the observation and execution of actions, representing a human equivalent of the monkey’s mirror neurons found in PF and F5. The situation in the pSTS/MTG is less clear: Iacoboni et al. (2001) find the STS to be active both during motor execution and observation, while Grafton et al. (1996) and Buccino et al. (2004b) fail to find robust STS activation during motor execution. Two explanations have been offered for this STS/MTG activation during the
384
execution of actions. The first holds that an efference copy of executed actions is sent to congruent visual neurons in the STS/MTG to create a forward model of what the action should look like (Iacoboni et al., 2001). The second, based on the fact that in monkeys the execution of actions reduces the spiking activity of STS neurons, holds that an efference copy is sent in order to cancel the visual consequences of our own actions (Keysers and Perrett, in press). Why, though, should a reduction in spiking show up as an increase in blood oxygen level dependent (BOLD) signal? Logothetis (2003) has suggested that the BOLD effect is dominated by synaptic activity, not spiking activity; the metabolic demands of inhibitory synaptic input could thus outweigh a reduction of spiking activity and thus be measured as an overall increase in BOLD signal (but see Waldvogel et al., 2000). Either way, the STS/MTG is an important element of the ‘mirror circuitry’ involved both in the observation and execution of actions (Keysers and Perrett, in press). A key property of the mirror system in monkeys is its congruent selectivity: mirror neurons responding for instance to a precision grip more than to a whole-hand prehension during motor execution also respond more to the observation of a precision grip compared to a whole-hand prehension (Gallese et al., 1996). Can the same be demonstrated for the human mirror system? A promising alley for providing proof of such selectivity stems from studies looking at somatotopy in the premotor activations. Buccino et al. (2001) and Wheaton et al. (2004) showed participants’ foot, hand and mouth actions, and observed that these actions activated partially distinct cortical sites. They interpret these activations as reflecting the mapping of the observation of hand actions onto the execution of hand actions, and so on for foot and mouth. Unfortunately, neither of these studies contained motor execution tasks, and both therefore fail to establish the congruence of the somatotopical organization during observation and execution. Leslie et al. (2004) asked participants to imitate facial and manual actions, and observed the existence of patches of premotor cortex involved in either manual or facial imitation. Unfortunately, they did not separate the vision of faces/hands from the motor
execution, and therefore congruent somatotopy cannot be proven by their study either. It is noteworthy, that during motor execution in other studies (e.g. Rijntjes et al., 1999; Hauk et al., 2004), a somatotopy for action execution was observed, which apparently resembled the visual one found in the above-cited studies. Corroborating evidence for the existence of selective mirror neurons in humans stems from a number of transcranial magnetic stimulation (TMS) studies (Fadiga et al., 1995; Gangitano et al., 2001; see Fadiga et al., 2005 for a review), which suggests that observing particular hand/arm movements selectively facilitates the motor execution of the specific muscles involved in the observation. Evidence that BA44 is essential for recognizing the actions of others comes from studies that show that patients with premotor lesions show deficits in pantomime recognition that cannot be accounted for by verbal problems alone (Bell, 1994; Halsband et al., 2001). Also, repetitive TMS induced virtual lesions of BA44 impair the capacity to imitate actions, even though they do not impair the capacity to perform the same actions when cued through spatial stimuli instead of a demonstrator’s actions (Heiser et al., 2003). The mirror system in monkeys was shown to also respond to the sound of actions (Kohler et al., 2002; Keysers et al., 2003). In a recent study, we could demonstrate that a similar system also exists in humans (Gazzola et al., 2005). In this study, the same participants were scanned during execution of hand and mouth actions and when they listened to the sound of similar actions. The entire circuit composed of MTG-IPL-BA44/6 responded both during the execution and the sound of hand and mouth actions. Most importantly, the voxels in the premotor cortex that responded more during the sound of hand actions compared to mouth actions also responded more during the execution of hand actions compared to mouth actions, and vice versa for the mouth actions, demonstrating for the first time a somatotopical organization of the mirror system in humans, albeit for sounds. If the observation of other individuals’ actions are mapped onto our own motor programs, one may wonder how the perception of actions change
385
when we acquire new skills. Seung et al. (2005) and Bangert et al. (2006) show that pianists demonstrate stronger activations of BA6/44, IPL and MTG while listening to piano pieces compared with nonpianists, suggesting that the acquisition of the novel motor skill of piano playing enhanced also the auditory mirror representation of these actions while listening — an observation that might relate to the fact that pianists often find it harder to keep their fingers still while listening to piano pieces. Calvo-Merino et al. (2005) showed male and female dancers’ dance movements that were specific for one of the genders. They found that female dancers activated their premotor cortex more to female dance moves, and male dancers more to male dance moves. This finding is particularly important, as both male and female dancers rehearse together and have therefore similar degrees of visual expertise with both types of movements, but have motor expertise only of their own gender-specific movements. The premotor differences observed therefore truly relate to motor expertise. It is interesting, that although in both examples, responses were stronger for experts compared to nonexperts, mirror activity was not absent in people devoid of firsthand motor expertise of the precise actions they were witnessing. These weaker activations thus probably reflect the activity of more broadly tuned mirror neurons (Gallese et al., 1996) that may discharge maximally to other, similar actions (e.g. walking, jumping), but also respond slightly to these different actions (e.g. a specific dance move involving steps and jumps). With these more widely tuned neurons, we can gain insights into actions that are novel to us, by drawing on analogies with similar actions already within our motor vocabulary.
Conclusions Both monkeys and humans appear to activate a circuit composed of temporal, parietal and frontal neurons while observing the actions of others. The frontal and parietal nodes of this circuit are active both when the subjects perform an action and when they perceive someone else perform a similar action. These nodes are therefore shared between
the observation and execution of actions, and will be termed ‘shared-circuits for actions’. The implications of having shared circuits for actions are widespread. By transforming the sight of someone’s actions into our motor representation of these actions, we achieve a very simple and yet very powerful understanding of the actions of others (Gallese et al., 1996; Keysers, 2003; Gallese et al., 2004). In addition to providing insights into the actions of others, activating motor programs similar to the ones we have observed/heard is of obvious utility for imitating the actions of others, and shared circuits for actions have indeed been reported to be particularly active during the imitation of actions (Iacoboni et al., 1999; Buccino et al., 2004b). Finally, as will be discussed below in more detail, by associating the execution and the sound of actions, mirror neurons might be essential for the acquisition of spoken language (Kohler et al., 2002; Keysers et al., 2003).
Sensations Observation and experience of touch If shared circuits may be essential to our understanding of the actions of others, how about the sensations of others? If we see a spider crawling on James Bond’s chest in the movie Dr. No, we literally shiver, as if the spider crawled on our own skin. What brain mechanisms might be responsible for this automatic sharing of the sensations of others? May shared circuits exist for the sensation of touch? To investigate this possibility, we showed subjects movies of other subjects being touched on their legs. In control movies, the same legs were approached by an object, but never touched. In separate runs finally, we touched the legs of the participant. We found that touching the subjects’ own legs activated the primary and secondary somatosensory cortex of the subjects. Most interestingly, we found that large extents of the secondary somatosensory cortex also respond to the sight of someone else’s legs being touched. The control movies produced much smaller activations (Fig. 4; Keysers et al., 2004b).
386
From touch to pain
Fig. 4. Brain activity when a human is touched on his leg in the scanner (red), and when he sees another individual being touched on his leg (blue). The white voxels represent voxels active in both cases. (Adapted from Keysers et al., 2004b). The right hemisphere is shown on the right of the figure (neurological conventions)
Intrigued by the observation of a patient C, who reported that when she sees someone else being touched on the face she literally feels the touch on her own skin (Blakemore et al., 2005), she scanned both C and a group of normal controls while touching them on their faces and necks. In a following session they showed video clips of someone else being touched on the same locations. As in our study, the experience of touch activated primary and secondary somatosensory cortices. During observation, they found SI and SII activation. In C, these activations were significantly stronger, potentially explaining why she literally felt the touch that happened to others. It therefore appears as seeing someone else being touched activated a somatosensory representation of touch in the observers, as if they had been touched themselves. This finding is particularly important as it demonstrates that the concept of shared circuits put forward for actions appears to be applicable to a very different system: that of touch.
Painful stimulation of the skin and the observation of a similar stimulation applied to others also appear to share a common circuitry including the anterior cingulate cortex (ACC) and the anterior insula. First, a neuron was recorded in the ACC responding both to pinpricking off the patients hand and to the sight of the surgeon pinpricking himself (Hutchison et al., 1999). Later, this anecdotic finding was corroborated by an elegant fMRI investigation, where on some trials the participant received a small electroshock on her hand; on other trials she saw a signal on a screen signifying that her partner was receiving a similar electroshock. Some voxels in the ACC and the anterior insula were activated in both cases (Singer et al., 2004), and the amount of that activation correlated with how empathic the subjects were according to two paper-and-pencil empathy scales that measure specifically how much an observer shares the emotional distress of others. The presence of activations in the anterior cingulate and anterior insula during the observation of pain occurring to others was corroborated by Jackson et al. (2005). In a TMS study, Avenanti et al. (2005) observed that observing someone else being pinpricked on his hand selectively facilitated TMS induced movements of the hand, suggesting that the sharing of pain influences the motor behaviour of the observer. This observation supports the existence of cross-talks between different shared circuits.
Emotions The insula and disgust Do shared circuits exist also for emotions? A series of elegant imaging studies by Phillips and collaborators (Phillips et al., 1997, 1998) suggested that the anterior insula is implicated in the perception of the disgusted facial expressions of others. The same area has been implicated in the experience of disgust (Small et al., 2003). In addition, both Calder et al. (2000) and Adolphs et al. (2003) reported patients with insular lesions that lost both
387
the capacity to experience disgust and to recognize disgust in the faces of others. It therefore appears as though the insula may provide a shared circuit for the experience and the perception of disgust. Using fMRI we measured brain activity while subjects viewed short movie clips of actors sniffing the content of a glass and reacting with a pleased, neutral or disgusted facial expression. Thereafter, we exposed the subjects to pleasant or disgusting odorants through an anaesthesia mask. The latter manipulation induced the experience of disgust in the subjects. We found that the anterior insula was activated both by the experience of disgust and the observation of the disgusted facial expressions of others (Wicker et al., 2003) (Fig. 5, yellow circles). These voxels were not significantly activated by the pleasant odorants or the vision of the pleased facial expressions of others. We then superimposed the location of the voxels involved in the experience of disgust and in the observation of disgust onto an MRI image of a patient with insular damage reporting a reduced experience of disgust and a deficient capacity to recognize disgust in others (Fig. 5, blue zone; Calder et al., 2000). The lesion encompassed our activations. Penfield and Faulk (1955) demonstrated that electrical stimulation of the anterior insula can cause sensations of nausea supporting the idea that the observation of the disgusted facial expressions of others actually triggered an internal representation of nausea in the participant. It therefore appears that the anterior insula indeed forms a shared circuit for the first and third person perspective of disgust, a conclusion corroborated by electrophysiological studies (KrolakSalmon et al., 2003). The lesion data support the idea that this circuit is indeed necessary for our understanding of disgust in others. Interestingly, just as we showed for the shared circuits for actions, the insula also appears to receive auditory information about the disgusted emotion state of others. Adolphs et al. (2003) showed that their patient B with extensive insular lesions was unable to recognize disgust, even if it was acted out with distinctive sounds of disgust, such as retching and vocal prosody. Imaging studies still fail to find insular activation to vocal expressions of disgust (Phillips et al., 1998).
Fig. 5. Sagittal T1-weighted anatomical MRI of patient NK (Calder et al., 2000) normalized to MNI space. The blue outline marks the zone of the left insular infarction. The red outline shows the zone we found to be activated during the experience of disgust; the yellow outline indicates those zones found to be common to this experience and the observation of someone else’s facial expression of disgust (Wicker et al., 2003). Adapted from Gallese et al. (2004).
The amygdala and fear A similar logic has been put forward for the relationship between fear and the amygdala, suggesting that the amygdala responds to the sight of fearful facial expressions and during the experience of fear. According to this logic, without amygdala, both the capacity to perceive fear in the face of others and that to experience fear would be greatly affected. The state of that literature is undergoing a recent re-evaluation (Spezio and Adolphs, this volume). Below we will describe the arguments first in favour, then against the role of the amygdala as a central site both for the experience and recognition of fear. For: Anatomically, the amygdala is linked both to face processing and to bodily states. The amygdala is a complex anatomical structure that receives highly processed sensory information from higher sensory cortices (Amaral and Price, 1984), including the temporal lobe where single neurons respond to the sight of faces and facial expressions (Perrett et al., 1984; Hasselmo et al., 1989). These connections would enable the amygdala to process
388 Table 1. The amygdala and the emotion of fear Damage
Ethiology
Subject
Left
Right
SM JM RH SE DR GT EP SP DBB NM SZ JC YW RB JK MA FC AF AW EW WS AvdW RL BR
+++ +++ +++ +++ ++ +++ +++ ++ +++ ++ +++ ++ ++ +++ ++ +++ +++ +++ +++ +++ +++ ++ ++ +++
+++ +++ +++ +++ + +++ +++ +++ ++ +++ ++ +++ +++ – ++ +++ +++ +++ +++ +++ +++ ++ ++ +++
UW E E E S E E S S ? E E E UW UW UW UW UW UW UW UW UW UW
Perceptual deficits Fear
Other
Yes Yes No Yes Yes No No Yes No Yes No Yes Yes Yes No No No Yes No No No Yes No Yes
Surprised Sad, disgusted, angry Angry Surprised Sad, disgusted, angry, surprised Angry Sad, disgusted Sad, disgusted, angry Sad Angry Angry
References
a,b,c,g c,g c,g d,g e,g f,g f,g g g h k i i i j j j j j j j j j j
Note: A number of neuropsychological studies have asked subjects with bilateral amygdala damage to rate how afraid six photographs of the Ekman series of emotional facial expression photographs looked. Here we show a table reviewing all these studies, reporting for each patient whether he rated these facial expressions as looking less afraid than do healthy control subjects. This information is taken from the referenced publications except for patients JK to BR. For these patients, the original publication (Ref. j) reported only group data. M. Siebert and H. Markowitsch gave us the single subject ratings of their patients and healthy subjects, and we considered deficient those patients that fell below 1.64 standard deviations of the healthy controls. In total, 12 of 24 subjects with bilateral amygdala damage rated scared facial expressions as less afraid than normal subjects do. Abbreviations: ‘–’: no damage, or no deficit; ‘+’: minimal damage; ‘++’: partial damage; ‘+++’: extensive or complete damage; UW: Urbach-Wiethe disease, a congenital disease that causes bilateral calcifications in the amygdala; E: encephalitis, usually affecting extensive regions of the brain; S: surgical removal, usually for treatment of epilepsy. References: a: Adolphs et al. (1994); b: Adolphs et al. (1995); c: Adolphs et al. (1998); d: Calder et al. (1996); e: Young et al. (1995); f: Hamann et al. (1996); g: Adolphs et al. (1999); h: Sprengelmeyer et al. (1999); i: Broks et al. (1998); j: Siebert et al. (2003). k: Adolphs and Tranel (2003).
facial expressions. It sends fibres back to subcortical structures such as the hypothalamus, enabling it to induce the kind of changes in the state of the body that are so typical of fear. It also sends fibres back to the cortex, including the STS, which could enable it to influence the way faces are processed. In humans, bilateral amygdala damage does affect the capacity of subjects to recognize fear in the face of other individuals, but only in about half the subjects. A review of the literature reveals reports of 24 subjects with bilateral amygdala damage (see Table 1). When asked to rate how afraid, angry, sad, happy, disgusted or surprised the emotional facial photographs of (Ekman and Friesen,
1976) appeared, 12 of 24 subjects rated facial expressions of fear as less afraid than did control subjects without bilateral amygdala lesions (see Table 1). This ‘fear-blindness’ was not due to general facial recognition deficits (the patients never had problems recognizing happy faces as happy), nor was it due to the patients not understanding the concept of fear (all patients specifically tested could provide plausible scenarios of situations in which people are scared). Other negative emotions such as anger were often also affected. Imaging studies using static facial expressions corroborate the idea that the amygdala is important for the perception of fear in others: in the
389
majority of the cases, the amygdala was activated preferentially when subjects viewed fearful or angry facial expressions as compared to neutral facial expressions (Zald, 2003). Studies using movies provide a different message (see below). Lesions of the amygdala also corroborate to some extent the idea of its involvement in generating fear. Monkeys with lesions in the amygdala appear to be disinhibited: unlike their unlesioned counterparts, they immediately engage in social contacts with total strangers and in play with normally scary objects such as rubber snakes — as if, without amygdala, the monkeys fail to be scared of other individuals and objects (Amaral et al., 2003). In addition, three of the amygdala patients of Table 1 (SM, NM and YW) were tested with regards to their own emotions of fear. SM appears to have reduced psychophysiological reactions to fear (Adolphs et al., 1996); NM only remembers having been scared once in his life and enjoyed activities that would be terrifying to most of us (e.g. bear hunting in Siberia, hanging from a helicopter, Sprengelmeyer et al., 1999, p. 2455); YW did not even experience fear while being mugged at night. This suggests that without amygdala, there is something different and reduced in the subjective experience of fear. Electrical stimulations of the amygdala in humans lead to a variety of experiences, but whenever it evoked an emotion, it was that of fear (Halgren et al., 1978). Taken together with the neuroimaging data in humans and the lesion data in monkeys, the amygdala thus appears to be important for the normal associations of stimuli with our personal, first person perspective of fear. The role of the amygdala in experiencing fear is corroborated by a number of imaging studies. Arachnophobic individuals, when viewing spiders, experience more fear and show stronger BOLD signals in their amygdala compared with control subjects (Dilger et al., 2003). Cholecystokinin-tetrapeptide (cck-4) injections induce panic attacks that are accompanied by intense feeling of fear and cause augmentation of regional cerebral blood flow (rCBF) in the amygdala (Benkelfat et al., 1995). The above evidence therefore suggests a role for the amygdala both in the recognition and the experience of fear. The idea of shared circuits would
require that parts of the neural representations of the experience of fear should be triggered by the observation of other peoples fear. This prediction receives support from a study by Williams et al. (2001). They showed subjects Ekman faces of fear, and simultaneously recorded brain activity and skin conductance. They found that the trials in which the fear-faces produced increases of skin conductance were accompanied by increased BOLD responses in the amygdala. It therefore appears as though the vision of a fearful facial expression activates the amygdala and induces a body state of fear/arousal in the observer, as indicated by augmented skin conductance. This link between amygdala and body state is also corroborated by Anders et al. (2004). Against: While there is evidence both from lesion studies and imaging supporting the dual role of the amygdala in experiencing and recognizing fear, there is a number of recent studies that shed doubts on this interpretation. First, half of the patients with bilateral amygdalar lesions show no impairments in rating fear in fearful faces. Authors have failed to find etiological or anatomical differences between the patients with and without fear-blindness (Adolphs et al., 1998). Second, a recent study on SM, one of the subjects with bilateral amygdala damage, indicate that the patient’s problem in identifying the expression of fear in others is not due to an inability to recognize fear per se, but an inappropriate exploration of the stimuli (Adolphs et al., 2005): unlike control individuals, she failed to look at the eye region of photographs. If she was encouraged to do so, her fear recognition became entirely normal. In the context of the connections of the amygdala with the STS, the function of the amygdala may not be to recognize the facial expression of fear, but to render the eye region of facial expressions a salient stimulus, selectively biasing the stimulus processing in the STS towards the eye region (see also Spezio and Adolphs, this volume). If the amygdala is indeed not responsible for the recognition of fear but only in orienting visual inspection towards the eye region, one would predict equal activation of the amygdala to all facial expressions. While this is often not the case when
390
static images of facial expressions were used (Phan et al., 2004 for a review), using short movies of facial expressions we found that the amygdala was indeed activated similarly by all facial expressions, be they emotional or not (van der Gaag et al., 2005). We used movies of happiness, disgust, fear and a neutral expression that contained as much movement as the other facial expressions (blowing up the cheeks). This finding sheds doubt on the idea of the amygdala as showing direct fear selectivity, and supports the idea of the amygdala participating in the processing of all facial expressions (for instance by biasing visual processing to the eyes). The reason why we found neutral faces to cause as much activation as emotional and fearful expressions using movies while studies using static stimuli have often reported differences remains to be fully understood. Ecologically, facial expressions are dynamic stimuli, not still photographs: the task of detecting emotions from photos is evolutionary rather new. We thus suggest that the lack of amygdalar selectivity found using movies, although needing replication, may be a more valid picture of amygdalar function than the selectivity often observed using photographs. Doubt must also be shed on the importance of the amygdala in feeling fear. Monkeys with very early lesions in the amygdala still demonstrate signs of panic, although they occur in contexts that are not normally inducing fear (Amaral et al., 2003). In addition, there is no good evidence that patient SM completely lacks fear as an emotion, although it may well be that she does not exhibit fear appropriately in context — this is a difficult issue to measure in humans, and still remains unresolved (Adolphs, personal communication). However, Anderson and Phelps (2002) have assessed this question in patients with amygdala damage, and also found no evidence that they lack fear as an emotion. Together, it might thus be speculated that the amygdala has a role both in the normal experience of fear and in the recognition of fear in others, but that this role may be indirect, through focusing gaze on the eye region and by linking normally fear-producing stimuli with other brain areas that, in turn, are responsible for fear. The amygdala may thus be part of a circuit that enables us to share the fear of other individuals,
but its role in doing so may be indirect, by biasing attention towards informative sections of facial stimuli and by relaying information towards brain areas responsible for the experience of fear. The other nodes of this circuitry remain to be investigated.
Shared circuits for actions, sensations and emotions Subsuming the above evidence, it appears that in three systems — actions, sensations and emotions — certain brain areas are involved both in first person experience (I do, I feel) and third person perspective (knowing what he does or he feels). These areas or circuits, that we call shared circuits, are the premotor cortex and inferior parietal lobule interconnected with the STS/MTG for actions, the insula for the emotion of disgust, the ACC and the anterior insula for pain, and somatosensory cortices for touch. Possibly, the amygdala may be part of a shared circuit for fear. In all these cases, observing what other people do or feel is therefore transformed into an inner representation of what we would do or feel in a similar situation — as if we would be in the skin of the person we observe. The idea of shared circuits, initially put forward for actions (Gallese and Goldman, 1998) therefore appears much broader. In the light of this evidence, it appears as though social situations are processed by the STS to a high degree of sophistication, including multimodal audio-visual representations of complex actions. These representations privilege the third person perspective, with lesser responses if the origin of the stimulus is endogenous. Through the recruitment of shared circuits, the brain then adds specific first person elements to this description. If an action is seen, the inferior parietal and premotor areas add an inner representation of actions to the sensory third person description. If touch is witnessed, the somatosensory cortices add an inner representation of touch. If pain is witnessed, the ACC and the anterior insula add a sense of pain. If disgust is witnessed, the insula adds a sense of disgust. What emerges from the resulting neural activity is a very rich neural description of what has been perceived, adding the richness of our
391
subjective experience of actions, emotions and sensations to the objective visual and auditory description of what has been seen. We are not normally confused about where the third person ends and the first starts, because although the shared areas react in similar ways to our own experience and the perception of others, many other areas clearly discriminate between these two cases. Our own actions include strong M1 activation and week STS activations, while those of others fail to normally activate M1 but strongly activate the STS. When we are touched, our SI is strongly active, while it is much less active while we witness touch occurring to others. Indeed, patient C who is literally confused about who is being touched shows reliable SI activity during the sight of touch (Blakemore et al., 2005). In this context, the distinction between self and other is quite simple, but remains essential for a social cognition based on shared circuits to work (Gallese and Goldman, 1998; Decety and Sommerville, 2003). Some authors now search for brain areas that explicitly differentiate self from other. Both the right inferior parietal lobule and the posterior cingulate gyrus have been implicated in this function (Decety and Sommerville, 2003 and Vogt, 2005 for reviews). The account based on shared representation we propose differs from those of other authors in that it does not assume that a particular modality is critical. Damasio and coworkers (Damasio, 2003) emphasize the importance of somatosensory representation, stating that it is only once our brain reaches a somatosensory representation of the body state of the person we observe that we understand the emotion he/she are undergoing. We, on the other hand, believe that somatosensory representations are important for understanding the somatosensory sensations of others, but may not be central to our understanding of other individuals’ emotions and actions. The current proposal represents an extension from our own previous proposals (e.g. Gallese et al., 2004), where we emphasized the motor aspect of understanding other people. We believe that motor representations are essential for understanding the actions of others, yet the activity in somatosensory cortices observed during the observation of
someone else being touched is clearly nonmotor. Instead we think that each modality (actions, sensations and emotions) is understood and shared in our brain using its own specific circuitry. The neural representation of actions, emotions and sensations that results from the recruitment of shared representations are then the intuitive key to understanding the other person, without requiring that they have to pass necessarily a somatosensory or motor common code to be interpreted. Of course many social situations are complex, and involve multiple modalities: witnessing someone hitting his finger with a hammer contains an action, an emotion and a sensation. In most social situations, the different shared circuits mentioned above thus work in concert. Once shared circuits have transformed the actions, sensations and emotions of others into our own representations of actions, sensations and emotions, understanding other people’s boils down to understanding ourselves — our own actions, sensations and emotions, an aspect that we will return to later in relation to theory of mind.
Demystifying shared circuits through a Hebbian perspective Neuroscientific evidence for the existence of shared circuits is rapidly accumulating. The importance of these circuits for social cognitions is evident. Yet, for many readers, the existence of single neurons responding to the sight, sound and execution of an action — to take a single example — remains a very odd observation. How can single neurons with such marvellous capacities emerge? The plausibility of a neuroscientific account of social cognitions based on shared circuits stands and falls with our capacity to give a plausible explanation of how such neurons can emerge. As outlined in detail elsewhere (Keysers and Perrett, in press), we propose that shared circuits are a simple consequence of observing ourselves and others (please refer to Keysers and Perrett, in press, for citations supporting the claims put forward below). When they are young, monkeys and humans spend a lot of time watching themselves. Each time, the child’s hand wraps around an object, and
392
brings it towards him, a particular set of neural activities overlaps in time. Neurons in the premotor cortex responsible for the execution of this action will be active at the same time as the audiovisual neurons in the STS responding to the sight and sound of grasping. Given that STS and F5 are connected through PF, ideal Hebbian learning conditions are met (Hebb, 1949): what fires together wires together. As a result, the synapses going from STS grasping neurons to PF and then F5 will be strengthened as the grasping neurons at all three levels will be repeatedly coactive. After repeated self-observation, neurons in F5 receiving the enhanced input from STS will fire at the mere sight of grasping. Given that many neurons in the STS show reasonably viewpoint-invariant responses (Perrett et al. 1989, 1990, 1991; Logothetis et al., 1995), responding in similar ways to views of a hand taken from different perspective, the sight of someone else grasping in similar ways then suffices to activate F5 mirror neurons. All that is required for the emergence of such mirror responses is the availability of connections between STS-PF-F5 that can show Hebbian learning, and there is evidence that Hebbian learning can occur in many places in the neocortex (Bi and Poo, 2001; Markram et al., 1997). The same Hebbian argument can be applied to the case of sensations and emotions. While seeing ourselves being touched, somatosensory activations overlap in time with visual descriptions of an object moving towards and touching our body. After Hebbian association the sight of someone else being touched can trigger somatosensory activations (Keysers et al., 2004b; Blakemore et al., 2005). Multimodal responses are particularly important for cases where we do not usually see our own actions. How, for example, associating the sight of someone’s lip movements with our own lip movements is an important step in language acquisition. How can we link the sight of another individual’s mouth producing a particular sound with our own motor programs given that we cannot usually see our own mouth movements? While seeing other individuals producing certain sounds with their mouth, the sound and sight of the action are correlated in time, and can lead to STS multimodal neurons. During our own attempts to produce
sounds with our mouth, the sound and the motor program are correlated in time. As the sound will recruit multimodal neurons in the STS, the established link also ties the sight of other people producing similar sounds to our motor program. The visual information thereby rides on the wave of the auditory associations (Keysers and Perrett, in press). The case of emotions might be similar, yet slightly more difficult. How can the sight of a disgusted facial expression trigger our own emotion of disgust, despite the fact that we do not usually see our own disgusted facial expression? First, disgust can often have a cause that will trigger simultaneous disgust in many individuals (e.g., a disgusting smell). In this case, one’s own disgust then correlates directly with the disgusted facial expression of others. Second, in parent–child relationships, facial imitation is a prominent observation (e.g. Stern, 2000). In our Hebbian perspective, this imitation means that the parent acts as a mirror for the facial expression of the child, leading again to the required correlation between the child’s own emotion and the facial expression of that emotion in others. As described above, the insula indeed receives the required visual input from the STS, where neurons have been shown to respond to facial expressions (Mufson and Mesulam, 1982; Perrett et al., 1987, 1992; Puce and Perrett, 2003; Keysers et al., 2001). The insula also receives highly processed somatosensory information about our own facial expressions. These somatosensory data will be Hebbianly associated with both the sight of other individual’s facial expressions and our own state of disgust. After that Hebbian training, seeing someone else’s facial expressions may trigger a neuronal representation of the somatosensory components of our own matching facial expressions. The debilitating effect of somatosensory lesions in understanding the emotions of others (Adolphs et al., 2000) may indicate that this triggering is indeed important in understanding the emotions of others. To summarize, Hebbian association (a simple and molecularly well-understood process) can therefore predict the emergence of associations between the first and third person perspective of actions, sensations and emotions.
393
Shared circuits and the inanimate world The world around us is not inhabited only by other human beings: we often witness events that occur to inanimate objects. Do the shared circuits we described above react to the sight of an inanimate object performing actions or being touched? To investigate the first question, we showed subjects movies of an industrial robot interacting with everyday life objects (Gazzola et al., 2004, Society for Cognitive Neuroscience Annual Meeting). The robot for instance was grasping a wine glass or closing a salt box. These actions were contrasted against the sight of a human
performing the same actions. Fig. 6a illustrates a frame from a typical stimulus, as well as the BOLD signal measured in BA 44 as defined by the probabilistic maps (Amunts et al., 1999). As seen in 1.2, this area has been shown to be activated both during the execution of actions and the observation of another human being performing a similar action. Here, we see that the same area was also activated during the sight of a robot performing an action involving everyday objects. This result is in contrast with previous reports in the literature that failed to find premotor activation to the sight of a robot performing actions (Tai et al., 2004). Our experiment differs in some important
Fig. 6. (a) Top: location of the right BA44 according to Amunts et al. (1999), defined as the voxels where at least 1 of her 10 subjects satisfied the cytoarchitectonic criterions for BA44. Below: the brain activity in this right BA44 for 14 subjects, expressed in terms of parameter estimates in the GLM while subjects looked at a fixation cross or at a human or a robot opening a bottle. A star indicates significant differences from the fixation. (b) Location of the region of interest (white) defined in Keysers et al. (2004b), and below, the mean BOLD signal of eight independent subjects while being touched on their legs, seeing another human being touched, and seeing objects being touched. All three cases differ significantly from fixation, but not from one another (adapted from Keysers et al., 2004b). All error bars refer to standard error of the means.
394
aspects from these studies: first, we used more complex actions instead of grasping of a ball; second our blocks contained a variety of actions and not the same action repeated over and over again. Both of these factors could account for the observed difference. In the light of our results, it thus appears as though the shared circuit for actions responds to complex meaningful actions regardless of whether they are performed by humans and robots. Half way along this human–robot continuum, the premotor cortex also responds to the sight of animals from another species performing actions that resemble ours, such as biting (Buccino et al., 2004a). To test whether the secondary somatosensory cortex responds to the sight of objects being touched, we showed subjects movies of inanimate objects such as ring binders and rolls of paper towels being touched by a variety of rods. These conditions were compared against the activity when the subject himself was touched, and when he saw another human leg being touched in similar ways. Results, shown in Fig. 6b indicate that the SII/PV complex was at least as activated by the sight of objects being touched as by the sight of humans being touched. Blakemore et al. (2005) also showed movies of objects being touched, and found that compared to seeing a human face being touched, seeing a fan being touched induced smaller activations. Unfortunately, the authors did not indicate if the activation to seeing an object being touched was significant. Why Blakemore et al. found stronger activity to the sight of a human face being touched compared to an object, while we found similar activity to a human leg being touched compared to toilet paper rolls remains to be investigated. Together, data appear to emerge suggesting that the sight of the actions and tactile ‘experiences’ of the inanimate world may be transformed into our own experience of these actions and sensations, but further investigations of this aspect are important considering the somewhat contradictory findings from different studies. Shared circuits and communication We show that the brain appears to automatically transform the visual and auditory descriptions of
the actions, sensations and emotions of others into neural representations normally associated with our own execution of similar actions, and our own experience of similar sensations and emotions. Hebbian learning could explain how these automatic associations arise. Once these associations have been learned, they transform what other people do and feel into our own experience of these actions, sensations and emotions. This transformation represents a intuitive and powerful form of communication: it transmits the experience of doing and feeling from one brain to another. This simple form of communication has obvious adaptive value: being able to peek into someone else’s mind, and to share his experiences renders constructive social interactions faster and more effective. For instance, sharing the disgust of a conspecific probing a potential source of food will prevent the observer from tasting potentially damaging items. Most forms of communication have a fundamental problem: the sender transforms a content into a certain transmittable form according to a certain encoding procedure. The receiver then receives the encoded message, and has to transform it back into the original content. How does the receiver learn how to decode the message? When we learn a spoken language we spend years of our life guessing this encoding/decoding procedure. For the case of actions, the shared circuits we propose use the correlation in time in the STS-PFF5 circuit during self-observation to determine the reciprocal relationship between the motor representation of actions and their audio-visual consequences. Similar procedures may apply to sensations and emotions. The acquired reciprocal relationships can then be used to decode the motor, somatosensory and emotional contents contained in the behaviour of other individuals and in the situation they are exposed to (see also Leiberg and Anders, this volume). When dealing with other human beings this decoding procedure is generally very successful. Our brain appears to use the same procedure to understand members of other species and even inanimate objects and robots. In the case of members of other animal species, the decoded motivations, emotions and feelings are anthropocentric, and imperfect: when monkeys for instance open their
395
lips to show their teeth, interpreting this as a smile is a misinterpretation — it is actually a display of threat. Often such interpretations will enable us to predict the forthcoming behaviour of the animal better than if we make no interpretation at all. In the case of inanimate objects, the interpretations are very often wrong (e.g. the ring binders probably ‘felt’ nothing when being touched, and the robot was not thirsty when it grasped for the glass of whine). This overgeneralization may simply be a bug in our brain. Alternatively, overall, it might be better to apply the rule of the thumb: everything is probably a bit like myself, than to make no assumption at all. A clear implication of this tendency is that to make the human–machine communication as smooth as possible, robots should be made to behave as similarly to humans as possible.
The limits of simulation — a word of caution The shared circuits we describe have received considerable interest. Often they now tend to be seen as a panacea to explain any issues of social cognition. It is important to note that while we believe shared circuits to be very important for our intuition of the inner life of others, they cannot explain everything. We can for instance try to imagine what it feels like to fly like a bird, although we do not have the motor programs to do so. Such abstract imaginations are detached from our own bodily experience, and should thus not be attributed to shared circuits. We can of course imagine what it feels like to flap our hands, as kids do to pretend to fly, but that would still leave us with doubts about what real flying feels like. These limitations are often cruelly clear to us during imitation: we have often seen our mothers knit, feeling that we can truly understand the movement, yet when we tried for the first time to knit something ourselves, we realise that our understanding had been quite superficial indeed, as we were lacking the true motor programs on which to mirror the actions we saw. But even with the required motor skills, we do not understand all the inner life of other human
beings through shared circuits. C. Keysers, E. Kohler and M. Goodale (unpublished observation) have for instance examined brain activity while watching the eye movements of other individuals in the hope to find evidence that brain areas such as the frontal eye field (FEF), normally responsible for our own eye movements, are critical for our understanding of the eye movements of others. We found very little evidence for such a system: the sight of eye movements activated the FEF no more than the sight of random dots moving by the same amount. Despite the difficulty of interpreting negative results, this finding is not too surprising: if two people face each other, and one suddenly stares at the wall behind the other person, the other person will tend to look behind him. The motor programs involved are very different: a very small saccade for the first individual, and a turning of the head and torso for the second. There being so little in common in motor terms, it makes no sense to analyse the gaze direction of others through one’s own motor programs. An external frame of reference, and an analysis of gaze in this external frame is needed to understand what the other person is looking at — a task that our motor system, working in egocentric coordinates, is very poorly equipped for. Shared circuits and mirror neurons therefore have little to contribute to this task. It will remain for future research to outline the limits of what shared circuits can explain.
Simulation and theory of mind — a hypothesis Social cognitions are not restricted to the simulations that shared circuits provide. Explicit thoughts exist in humans and clearly supplement these automatic simulations. It is hard for instance to imagine how a gossip of the type: ‘Did you know that Marry still believes that her husband is faithful while everyone else knows that he is having an affair with another women every week?’ can be the result of simulation, yet thinking about the (false) beliefs of others is clearly an important part of our social intelligence. The words theory of mind (ToM) and ‘mentalizing’ have often been used to describe the set of cognitive skills involved in thinking about the mind
396
of others, in particular their beliefs (Frith and Frith, 2003). People are considered to have a ToM if they are able to deal with the fact that other people can have beliefs that differ from reality, a capacity that is tested with so called false belief tasks such as the famous ‘Sally and Anne’ test (Baron-Cohen et al., 1985). In that test, an observer sees Sally hide an object in a basket. Sally then goes away for a while, and unbeknown to her, Anne moves the object from the basket into a box. Sally then returns, and the observer is asked: ‘where will Sally [first] look for her object?’ If the observer answers ‘in the basket, because she doesn’t know that Anne moved it’, the observer is thought to have a ToM. If the answer is ‘in the box’, the observer failed. Children from the age of 4 years pass this test, while autistic individuals often fail the test even in their teens (Baron-Cohen et al., 1985). Comparing this ToM task with the tasks involved in the neuroimaging of shared circuits, it is quite clear that these two fields of research tap into phenomena that differ dramatically in the amount of explicit thoughts that are involved (see also Leiberg and Anders this volume). In research on shared circuits, subjects simply watch short video clips of actions, emotions and sensations, without being asked to reflect upon the meaning of these stimuli or the beliefs and thoughts of the actors. In ToM tasks, subjects are directly invited to reflect about states of minds of others. Strangely enough, a number of authors have introduced a dichotomy between simulation processes and more theory driven processes involved in understanding others (e.g. Gallese and Goldman, 1998; Saxe, 2005), suggesting that either ToM or simulation should explain all of social cognitions. Here we will attempt to provide a model that proposes that ToM might utilize simulations to reflect on the mind of others. ToM-type tests have now been investigated a number of times using fMRI and PET (see Frith and Frith, 2003 for a comprehensive review), and all tasks have activated medial prefrontal cortex (mPFC) compared to conditions requiring less mentalizing. What is intriguing, is that similar sites of the mPFC are also involved in reflecting about ourselves and our own emotions (Gusnard et al., 2001; Frith and Frith, 2003), which lead Uta and Chris Frith to speculate that thinking about other
people’s minds might be a process related to thinking about one’s self. If seen in the context of shared circuits, this leads to a simple framework for associating shared circuits and ToM. The mPFC may originally interpret states of our own mind and body, as evidenced by experiments such as that of Gusnard et al. (2001). In this experiment, subjects saw emotional pictures, and were asked to judge if the image evoked pleasant or unpleasant emotions in themselves, or whether the images were taken indoors or outdoors. The mPFC was more activated in the former task, where subjects had to introspect their own emotions. The mPFC receives indirect inputs about all aspects of our own body, including motor, somatosensory and visceral representations, which could allow it to create secondary representation of our bodily state (Frith and Frith, 2003). Considering our first person perspective, one could thus differentiate a first level representation, being our actions, emotions and sensations as they occur, and a second level representation in the mPFC of these states more related to our conscious understanding and interpretation of ourselves. To illustrate that difference, if you vomit, you will feel disgust, and activate your insula (primary representation). If asked what you feel, you may start reflecting upon what you are feeling in a more conscious way, one that you can formulate in words (‘I felt like having a stone in my stomach, I guess those mussels weren’t that freshy’) and you are likely to activate your mPFC in addition to the primary representations. This is where simulation ties into the concepts of theory of mind. Through the shared circuits we have described, the actions, emotions and sensations of others are ‘translated’ into the neural language of our own actions, emotions and sensations. By doing so, they have been transformed into what we called primary representations of these states. This could generate an implicit sharing and hence understanding of the states of others. If asked explicitly what went on in the mind of that other person, you would need to generate a secondary, more conscious and cognitive representation of his state. Given that his state has already been translated into the language of our own states, one may hypothesize, that this task
397
would now be no different from reflecting about your own states, and therefore activate the same mPFC sites. Testing this hypothesis directly will be an exciting issue in future neuroimaging work. In this concept, shared circuits act like a translator, converting the states of others into our own primary state representations. Often social processing can stop at that: we share some of the states of our partner, her/his sadness or happiness for instance, without thinking any further. These are the cases that simulation proponents have concentrated on. In some cases, we might reflect further upon her/his mind, just like we often reflect about our own states. Such reflections then provide much more elaborate, cognitive and differentiated understandings of other individuals. These latter are the processes that ToM investigators are excited about. With those mentalizing processes on top of simulation, thinking about others can reach levels of sophistications that go far beyond using simulation alone. Using simulation, we inherently assume that we are all equal. This is not the case: actions that may make us happy may make other people sad, reflecting biological and cultural differences, and keeping in mind those differences may be a critical role for higher processes (see also Leiberg and Anders, this volume). Acknowledgements C.K. was financed by a VIDI grant by NWO and a Marie Curie Excellence grant. The authors are particularly thankful to Michaela Siebert for sharing her raw data with us to calculate Table 1, to Andy Calder for providing the anatomical scan of patient NK, to Ralph Adolphs for helpful discussions regarding the function of the amygdala and to Vittorio Gallese, Bruno Wicker and Giacomo Rizzolatti for many inspiring discussions that led to the development of these models. References Adolphs, R., Damasio, H., Tranel, D., Cooper, G. and Damasio, A.R. (2000) A role for somatosensory cortices in the visual recognition of emotion as revealed by three-dimensional lesion mapping. J. Neurosci., 20: 2683–2690.
Adolphs, R., Damasio, H., Tranel, D. and Damasio, A.R. (1996) Cortical systems for the recognition of emotion in facial expressions. J. Neurosci., 16: 7678–7687. Adolphs, R., Gosselin, F., Buchanan, T.W., Tranel, D., Schyns, P. and Damasio, A.R. (2005) A mechanism for impaired fear recognition after amygdala damage. Nature, 433: 68–72. Adolphs, R. and Tranel, D. (2003) Amygdala damage impairs emotion recognition from scenes only when they contain facial expressions. Neuropsychologia, 41: 1281–1289. Adolphs, R., Tranel, D. and Damasio, A.R. (1998) The human amygdala in social judgment. Nature, 393: 470–474. Adolphs, R., Tranel, D. and Damasio, A.R. (2003) Dissociable neural systems for recognizing emotions. Brain Cogn., 52: 61–69. Adolphs, R., Tranel, D., Damasio, H. and Damasio, A. (1994) Impaired recognition of emotion in facial expressions following bilateral damage to the human amygdala. Nature, 372: 669–672. Adolphs, R., Tranel, D., Damasio, H. and Damasio, A.R. (1995) Fear and the human amygdala. J. Neurosci., 15: 5879–5891. Adolphs, R., Tranel, D., Hamann, S., Young, A.W., Calder, A.J., Phelps, E.A., Anderson, A., Lee, G.P. and Damasio, A.R. (1999) Recognition of facial emotion in nine individuals with bilateral amygdala damage. Neuropsychologia, 37: 1111–1117. Amaral, D.G., Capitanio, J.P., Jourdain, M., Mason, W.A., Mendoza, S.P. and Prather, M. (2003). The amygdala: is it an essential component of the neural network for social cognition? Neuropsychologia, 41: 235–240. Amaral, D.G. and Price, J.L. (1984) Amygdalo-cortical projections in the monkey (Macaca fascicularis). J. Comp. Neurol., 230: 465–496. Amunts, K., Schleicher, A., Burgel, U., Mohlberg, H., Uylings, H.B. and Zilles, K. (1999) Broca’s region revisited: cytoarchitecture and intersubject variability. J. Comp. Neurol., 412: 319–341. Anders, S., Lotze, M., Erb, M., Grodd, W. and Birbaumer, N. (2004) Brain activity underlying emotional valence and arousal: a response-related fMRI study. Hum. Brain Mapp., 23: 200–209. Anderson, A.K. and Phelps, E.A. (2002) Is the human amygdala critical for the subjective experience of emotion? Evidence of intact dispositional affect in patients with amygdala lesions. J. Cogn. Neurosci., 14: 709–720. Avenanti, A., Bueti, D., Galati, G. and Aglioti, S.M. (2005) Transcranial magnetic stimulation highlights the sensorimotor side of empathy for pain. Nat. Neurosci., 8: 955–960. Baker, C.I., Keysers, C., Jellema, T., Wicker, B. and Perrett, D.I. (2001) Neuronal representation of disappearing and hidden objects in temporal cortex of the macaque. Exp. Brain Res., 140: 375–381. Bangert, M., Peschel, T., Schlaug, G., Rotte, M., Drescher, D., Hinrichs, H., Heinze, H.J. and Altenmuller, E. (2006) Shared networks for auditory and motor processing in professional pianists: evidence from fMRI conjunction. Neuroimage, 30: 917–926.
398 Baron-Cohen, S., Leslie, A.M. and Frith, U. (1985) Does the autistic child have a ‘‘theory of mind’’. Cognition, 21: 37–46. Barraclough, N.E., Xiao, D.K., Baker, C.I., Oram, M.W. and Perrett, D.I. (2005) Integration of visual and auditory information by STS neurons responsive to the sight of actions. J. Cogn. Neurosci., 17(3): 377–391. Bell, B.D. (1994) Pantomime recognition impairment in aphasia: an analysis of error types. Brain Lang., 47: 269–278. Benkelfat, C., Bradwejn, J., Meyer, E., Ellenbogen, M., Milot, S., Gjedde, A. and Evans, A. (1995) Functional neuroanatomy of CCK4-induced anxiety in normal healthy volunteers. Am. J. Psychiatry, 152: 1180–1184. Bi, G. and Poo, M. (2001) Synaptic modification by correlated activity: Hebb’s postulate revisited. Ann. Rev. Neurosci., 24: 139–166. Blakemore, S.J., Bristow, D., Bird, G., Frith, C. and Ward, J. (2005) Somatosensory activations during the observation of touch and a case of vision-touch synaesthesia. Brain, 128: 1571–1583. Broks, P., Young, A.W., Maratos, E.J., Coffey, P.J., Calder, A.J., Isaac, C.L., Mayes, A.R., Hodges, J.R., Montaldi, D., Cezayirli, E., Roberts, N. and Hadley, D. (1998) Face processing impairments after encephalitis: amygdala damage and recognition of fear. Neuropsychologia, 36: 59–70. Bruce, C., Desimone, R. and Gross, C.G. (1981) Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. J. Neurophysiol., 46: 369–384. Buccino, G., Binkofski, F., Fink, G.R., Fadiga, L., Fogassi, L., Gallese, V., Seitz, R.J., Zilles, K., Rizzolatti, G. and Freund, H.J. (2001) Action observation activates premotor and parietal areas in a somatotopic manner: an fMRI study. Eur. J. Neurosci., 13: 400–404. Buccino, G., Lui, F., Canessa, N., Patteri, I., Lagravinese, G., Benuzzi, F., Porro, C.A. and Rizzolatti, G. (2004a) Neural circuits involved in the recognition of actions performed by nonconspecifics: an FMRI study. J. Cogn. Neurosci., 16: 114–126. Buccino, G., Vogt, S., Ritzl, A., Fink, G.R., Zilles, K., Freund, H.J. and Rizzolatti, G. (2004b) Neural circuits underlying imitation learning of hand actions: an event-related fMRI study. Neuron, 42: 323–334. Calder, A.J., Keane, J., Manes, F., Antoun, N. and Young, A.W. (2000) Impaired recognition and experience of disgust following brain injury. Nat. Neurosci., 3: 1077–1078. Calder, A.J., Young, A.W., Rowland, D., Perrett, D.I., Hodges, J.R. and Etcoff, N.L. (1996) Facial emotion recognition after bilateral amygdala damage: differentially severe impairment of fear. Cogn. Neuropsychol., 13: 699–745. Calvo-Merino, B., Glaser, D.E., Grezes, J., Passingham, R.E. and Haggard, P. (2005) Action observation and acquired motor skills: an fMRI study with expert dancers. Cerebral Cortex, 15: 1243–1249. Damasio, A.R. (2003) Looking for Spinoza. Harcourt, New York. Decety, J., Chaminade, T., Grezes, J. and Meltzoff, A.N. (2002) A PET exploration of the neural mechanisms involved in reciprocal imitation. Neuroimage, 15: 265–272.
Decety, J., Grezes, J., Costes, N., Perani, D., Jeannerod, M., Procyk, E., Grassi, F. and Fazio, F. (1997) Brain activity during observation of actions Influence of action content and subject’s strategy. Brain, 120(Pt 10): 1763–1777. Decety, J. and Sommerville, J.A. (2003) Shared representations between self and other: a social cognitive neuroscience view. Trends Cogn. Sci., 7: 527–533. Dilger, S., Straube, T., Mentzel, H.J., Fitzek, C., Reichenbach, J.R., Hecht, H., Krieschel, S., Gutberlet, I. and Miltner, W.H. (2003) Brain activation to phobia-related pictures in spider phobic humans: an event-related functional magnetic resonance imaging study. Neurosci. Lett., 348: 29–32. Dipellegrino, G., Fadiga, L., Fogassi, L., Gallese, V. and Rizzolatti, G. (1992) Understanding motor events — a neurophysiological study. Exp. Brain Res., 91: 176–180. Ekman, P. and Friesen, M.V. (1976) Pictures of Facial Affect. Consulting Psychologists Press, Palo Alto CA. Fadiga, L., Craighero, L. and Olivier, E. (2005) Human motor cortex excitability during the perception of others’ action. Curr. Opin. Neurobiol., 15: 213–218. Fadiga, L., Fogassi, L., Pavesi, G. and Rizzolatti, G. (1995) Motor facilitation during action observation: a magnetic stimulation study. J. Neurophysiol., 73: 2608–2611. Ferrari, P.F., Rozzi, S. and Fogassi, L. (2005) Mirror neurons responding to observation of actions made with tools in monkey ventral premotor cortex. J. Cogn. Neurosci., 17: 212–226. Frith, U. and Frith, C.D. (2003) Development and neurophysiology of mentalizing. Philos. Trans. R. Soc. Lond. B Biol. Sci., 358: 459–473. Gallese, V., Fadiga, L., Fogassi, L. and Rizzolatti, G. (1996) Action recognition in the premotor cortex. Brain, 119: 593–609. Gallese, V., Fadiga, L., Fogassi, L. and Rizzolatti, G. (2002) Action representation and the inferior parietal lobule. Common Mech. Percept. Act., 19: 334–355. Gallese, V. and Goldman, A. (1998) Mirror neurons and the simulation theory of mind-reading. Trends Cogn. Sci., 2: 493–501. Gallese, V., Keysers, C. and Rizzolatti, G. (2004) A unifying view of the basis of social cognition. Trends Cogn. Sci., 8: 396–403. Gangitano, M., Mottaghy, F.M. and Pascual-Leone, A. (2001) Phase-specific modulation of cortical motor output during movement observation. Neuroreport, 12: 1489–1492. Gazzola, V., Aziz-Zadeh, L., Formisano, E., and Keysers, C., 2005. Hearing what you are doing — an FMRI study of auditory empathy. J. Cogn. Neurosci. Suppl. S, 82. Grafton, S.T., Arbib, M.A., Fadiga, L. and Rizzolatti, G. (1996) Localization of grasp representations in humans by positron emission tomography. 2. Observation compared with imagination. Exp. Brain Res., 112: 103–111. Grezes, J., Armony, J.L., Rowe, J. and Passingham, R.E. (2003) Activations related to ‘‘mirror’’ and ‘‘canonical’’ neurones in the human brain: an fMRI study. Neuroimage, 18: 928–937. Grezes, J., Costes, N. and Decety, J. (1998) Top-down effect of strategy on the perception of human biological motion: a PET investigation. Cogn. Neuropsychol., 15: 553–582.
399 Grezes, J., Fonlupt, P., Bertenthal, B., Delon-Martin, C., Segebarth, C. and Decety, J. (2001). Does perception of biological motion rely on specific brain regions? Neuroimage, 13: 775–785. Gusnard, D.A., Akbudak, E., Shulman, G.L. and Raichle, M.E. (2001) Medial prefrontal cortex and self-referential mental activity: relation to a default mode of brain function. Proc. Natl. Acad. Sci. USA, 98: 4259–4264. Halgren, E., Walter, R.D., Cherlow, D.G. and Crandall, P.H. (1978) Mental phenomena evoked by electrical stimulation of the human hippocampal formation and amygdala. Brain, 101: 83–117. Halsband, U., Schmitt, J., Weyers, M., Binkofski, F., Grutzner, G. and Freund, H.J. (2001) Recognition and imitation of pantomimed motor acts after unilateral parietal and premotor lesions: a perspective on apraxia. Neuropsychologia, 39: 200–216. Hamann, S.B., Stefanacci, L., Squire, L.R., Adolphs, R., Tranel, D., Damasio, H. and Damasio, A. (1996) Recognizing facial emotion. Nature, 379: 497. Harries, M.H. and Perrett, D.I. (1991) Visual processing of faces in temporal cortex — physiological evidence for a modular organization and possible anatomical correlates. J. Cogn. Neurosci., 3: 9–24. Hasselmo, M.E., Rolls, E.T. and Baylis, G.C. (1989) The role of expression and identity in the face-selective responses of neurons in the temporal visual-cortex of the monkey. Behav. Brain Res., 32: 203–218. Hauk, O., Johnsrude, I. and Pulvermuller, F. (2004) Somatotopic representation of action words in human motor and premotor cortex. Neuron, 41: 301–307. Hebb, D. (1949) The Organisation of Behavior. Wiley, New York. Heiser, M., Iacoboni, M., Maeda, F., Marcus, J. and Mazziotta, J.C. (2003) The essential role of Broca’s area in imitation. Eur. J. Neurosci., 17: 1123–1128. Hietanen, J.K. and Perrett, D.I. (1993) Motion sensitive cells in the macaque superior temporal polysensory area1. Lack of response to the sight of the animals own limb movement. Exp. Brain Res., 93: 117–128. Hietanen, J.K. and Perrett, D.I. (1996) Motion sensitive cells in the macaque superior temporal polysensory area: response discrimination between self-generated and externally generated pattern motion. Behav. Brain Res., 76: 155–167. Hutchison, W.D., Davis, K.D., Lozano, A.M., Tasker, R.R. and Dostrovsky, J.O. (1999) Pain-related neurons in the human cingulate cortex. Nat. Neurosci., 2: 403–405. Iacoboni, M., Koski, L.M., Brass, M., Bekkering, H., Woods, R.P., Dubeau, M.C., Mazziotta, J.C. and Rizzolatti, G. (2001) Reafferent copies of imitated actions in the right superior temporal cortex. Proc. Natl. Acad. Sci. USA, 98: 13995–13999. Iacoboni, M., Woods, R.P., Brass, M., Bekkering, H., Mazziotta, J.C. and Rizzolatti, G. (1999) Cortical mechanisms of human imitation. Science, 286: 2526–2528. Jackson, P.L., Meltzoff, A.N. and Decety, J. (2005) How do we perceive the pain of others? A window into the neural processes involved in empathy. Neuroimage, 24: 771–779.
Keysers, C., 2003. Mirror neurons. In: Encyclopedia of Neuroscience, 3rd edn. Elsevier, Amsterdam. Keysers, C., Gallese, V., Tereshenko, L., Nasoyan, A., Sterbizzi, I., and Rizzolatti, G., 2004a. Investigation of auditory and motor properties in the STS (Unpublished). Keysers, C., Kohler, E., Umilta, M.A., Nanetti, L., Fogassi, L. and Gallese, V. (2003) Audiovisual mirror neurons and action recognition. Exp. Brain Res., 153: 628–636. Keysers, C. and Perrett, D.I. (2004) Demystifying social cognitions: a Hebbian perspective. Trends Cogn. Sci., 8: 501–507. Keysers, C., Wicker, B., Gazzola, V., Anton, J.L., Fogassi, L. and Gallese, V. (2004b) A touching sight: SII/PV activation during the observation and experience of touch. Neuron, 42: 335–346. Keysers, C., Xiao, D.K., Foldiak, P. and Perrett, D.I. (2001) The speed of sight. J. Cogn. Neurosci., 13: 90–101. Kohler, E., Keysers, C., Umilta, M.A., Fogassi, L., Gallese, V. and Rizzolatti, G. (2002) Hearing sounds, understanding actions: action representation in mirror neurons. Science, 297: 846–848. Krolak-Salmon, P., Henaff, M.A., Isnard, J., Tallon-Baudry, C., Guenot, M., Vighetto, A., Bertrand, O. and Mauguiere, F. (2003) An attention modulated response to disgust in human ventral anterior insula. Ann. Neurol., 53: 446–453. Leslie, K.R., Johnson-Frey, S.H. and Grafton, S.T. (2004) Functional imaging of face and hand imitation: towards a motor theory of empathy. Neuroimage, 21: 601–607. Logothetis, N.K. (2003) The underpinnings of the BOLD functional magnetic resonance imaging signal. J. Neurosci., 23: 3963–3971. Logothetis, N.K., Pauls, J. and Poggio, T. (1995) Shape representation in the inferior temporal cortex of monkeys. Curr. Biol., 5: 552–563. Luppino, G., Murata, A., Govoni, P. and Matelli, M. (1999) Largely segregated parietofrontal connections linking rostral intraparietal cortex (areas AIP and VIP) and the ventral premotor cortex (areas F5 and F4). Exp. Brain Res., 128: 181–187. Manthey, S., Schubotz, R.I. and von Cramon, D.Y. (2003) Premotor cortex in observing erroneous action: an fMRI study. Brain Res. Cogn. Brain Res., 15: 296–307. Markram, H., Lubke, J., Frotscher, M. and Sakmann, B. (1997) Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science, 275: 213–215. Matelli, M., Camarda, R., Glickstein, M. and Rizzolatti, G. (1986) Afferent and efferent projections of the inferior area-6 in the macaque monkey. J.Comp. Neurol., 251: 281–298. Mufson, E.J. and Mesulam, M.M. (1982) Insula of the oldworld monkey 2 Afferent cortical input and comments on the claustrum. J. Comp. Neurol., 212: 23–37. Nishitani, N. and Hari, R. (2000) Temporal dynamics of cortical representation for action. Proc. Natl. Acad. Sci. USA, 97: 913–918. Nishitani, N. and Hari, R. (2002) Viewing lip forms: cortical dynamics. Neuron, 36: 1211–1220.
400 Oram, M.W. and Perrett, D.I. (1994) Responses of anterior superior temporal polysensory (Stpa) neurons to biological motion stimuli. J. Cogn. Neurosci., 6: 99–116. Oram, M.W. and Perrett, D.I. (1996) Integration of form and motion in the anterior superior temporal polysensory area (STPa) of the macaque monkey. J. Neurophysiol., 76: 109–129. Penfield, W. and Faulk, M.E. (1955) The insula: further observations on its function. Brain, 78: 445–470. Perani, D., Fazio, F., Borghese, N.A., Tettamanti, M., Ferrari, S., Decety, J. and Gilardi, M.C. (2001) Different brain correlates for watching real and virtual hand actions. Neuroimage, 14: 749–758. Perrett, D.I., Harries, M.H., Bevan, R., Thomas, S., Benson, P.J., Mistlin, A.J., Chitty, A.J., Hietanen, J.K. and Ortega, J.E. (1989) Frameworks of analysis for the neural representation of animate objects and actions. J. Exp. Biol., 146: 87–113. Perrett, D.I., Hietanen, J.K., Oram, M.W. and Benson, P.J. (1992) Organization and functions of cells responsive to faces in the temporal cortex. Philos.Trans. R. Soc. London Series B-Biol. Sci., 335: 23–30. Perrett, D.I., Mistlin, A.J. and Chitty, A.J. (1987) Visual neurons responsive to faces. Trends Neurosci., 10: 358–364. Perrett, D.I., Mistlin, A.J., Harries, M.H. and Chitty, A.J. (1990) Understanding the visual appearance and consequences of hand actions. In: Goodale, M.A. (Ed.), Vision and Action: The Control of Grasping. Ablex Publishing, Norwood, NJ, pp. 163–180. Perrett, D.I., Oram, M.W., Harries, M.H., Bevan, R., Hietanen, J.K., Benson, P.J. and Thomas, S. (1991) Viewer-centered and object-centered coding of heads in the macaque temporal cortex. Exp. Brain Res., 86: 159–173. Perrett, D.I., Smith, P.A.J., Potter, D.D., Mistlin, A.J., Head, A.S., Milner, A.D. and Jeeves, M.A. (1984) Neurones responsive to faces in the temporal cortex: studies of functional organization, sensitivity to identity and relation to perception. J. Comp. Physiol. Psychol., 3: 197–208. Perrett, D.I., Smith, P.A.J., Potter, D.D., Mistlin, A.J., Head, A.S., Milner, A.D. and Jeeves, M.A. (1985) Visual cells in the temporal cortex sensitive to face view and gaze direction. Proc. R. Soc. Lond. B Biol. Sci., 223: 293–317. Phan, K.L., Wager, T.D., Taylor, S.F. and Liberzon, I. (2004) Functional neuroimaging studies of human emotions. CNS Spectrom., 9: 258–266. Phillips, M.L., Young, A.W., Scott, S.K., Calder, A.J., Andrew, C., Giampietro, V., Williams, S.C., Bullmore, E.T., Brammer, M. and Gray, J.A. (1998) Neural responses to facial and vocal expressions of fear and disgust. Proc. R. Soc. London B Biol. Sci., 265: 1809–1817. Phillips, M.L., Young, A.W., Senior, C., Brammer, M., Andrew, C., Calder, A.J., Bullmore, E.T., Perrett, D.I., Rowland, D., Williams, S.C., Gray, J.A. and David, A.S. (1997) A specific neural substrate for perceiving facial expressions of disgust. Nature, 389: 495–498.
Puce, A. and Perrett, D. (2003) Electrophysiology and brain imaging of biological motion. Philos. Trans. R. Soc. Lond. B Biol. Sci., 358: 435–445. Rijntjes, M., Dettmers, C., Buchel, C., Kiebel, S., Frackowiak, R.S. and Weiller, C. (1999) A blueprint for movement: functional and anatomical representations in the human motor system. J. Neurosci., 19: 8043–8048. Rizzolatti, G. and Craighero, L. (2004) The mirror-neuron system. Ann. Rev. Neurosci., 27: 169–192. Rizzolatti, G., Fadiga, L., Gallese, V. and Fogassi, L. (1996) Premotor cortex and the recognition of motor actions. Cogn. Brain Res., 3: 131–141. Rizzolatti, G., Fogassi, L. and Gallese, V. (2001) Neurophysiological mechanisms underlying the understanding and imitation of action. Nat. Rev. Neurosci., 2: 661–670. Rizzolatti, G. and Luppino, G. (2001) The cortical motor system. Neuron, 31: 889–901. Rizzolatti, G. and Matelli, M. (2003) Two different streams form the dorsal visual system: anatomy and functions. Exp. Brain Res., 153: 146–157. Saxe, R. (2005) Against simulation: the argument from error. Trends Cogn. Sci., 9: 174–179. Selemon, L.D. and Goldmanrakic, P.S. (1988) Common cortical and subcortical targets of the dorsolateral prefrontal and posterior parietal cortices in the rhesus-monkey — evidence for a distributed neural network subserving spatially guided behavior. J. Neurosci., 8: 4049–4068. Seltzer, B. and Pandya, D.N. (1978) Afferent cortical connections and architectonics of superior temporal sulcus and surrounding cortex in rhesus-monkey. Brain Res., 149: 1–24. Seltzer, B. and Pandya, D.N. (1994) Parietal, temporal, and occipital projections to cortex of the superior temporal sulcus in the rhesus-monkey — a retrograde tracer study. J. Comp. Neurol., 343: 445–463. Seung, Y., Kyong, J.S., Woo, S.H., Lee, B.T. and Lee, K.M. (2005) Brain activation during music listening in individuals with or without prior music training. Neurosci. Res., 52: 323–329. Siebert, M., Markowitsch, H.J. and Bartel, P. (2003) Amygdala, affect and cognition: evidence from 10 patients with Urbach-Wiethe disease. Brain, 126: 2627–2637. Singer, T., Seymour, B., O’Doherty, J., Kaube, H., Dolan, R.J. and Frith, C.D. (2004) Empathy for pain involves the affective but not sensory components of pain. Science, 303: 1157–1162. Small, D.M., Gregory, M.D., Mak, Y.E., Gitelman, D., Mesulam, M.M. and Parrish, T. (2003) Dissociation of neural representation of intensity and affective valuation in human gustation. Neuron, 39: 701–711. Sprengelmeyer, R., Young, A.W., Schroeder, U., Grossenbacher, P.G., Federlein, J., Buttner, T. and Przuntek, H. (1999) Knowing no fear. Proc. R. Soc. Lond. B Biol. Sci., 266: 2451–2456. Stern, D.N. (2000) The Interpersonal World of the Infant. Basic Books, New York.
401 Tai, Y.F., Scherfler, C., Brooks, D.J., Sawamoto, N. and Castiello, U. (2004) The human premotor cortex is ‘mirror’ only for biological actions. Curr. Biol., 14: 117–120. Tanne-Gariepy, J., Rouiller, E.M. and Boussaoud, D. (2002) Parietal inputs to dorsal versus ventral premotor areas in the macaque monkey: evidence for largely segregated visuomotor pathways. Exp. Brain Res., 145: 91–103. Umilta, M.A., Kohler, E., Gallese, V., Fogassi, L., Fadiga, L., Keysers, C. and Rizzolatti, G. (2001) I know what you are doing: a neurophysiological study. Neuron, 31: 155–165. van der Gaag, C., Minderaa, R., and Keysers, C., 2005. Emotion observation, recognition and imitation: towards an understanding of empathy of individual emotions. J. Cogn. Neurosci., Suppl S, 166. Vogt, B.A. (2005) Pain and emotion interactions in subregions of the cingulate gyrus. Nat. Rev. Neurosci., 6: 533–544. Waldvogel, D., van Gelderen, P., Muellbacher, W., Ziemann, U., Immisch, I. and Hallett, M. (2000) The relative metabolic demand of inhibition and excitation. Nature, 406: 995–998.
Wheaton, K.J., Thompson, J.C., Syngeniotis, A., Abbott, D.F. and Puce, A. (2004) Viewing the motion of human body parts activates different regions of premotor, temporal, and parietal cortex. Neuroimage, 22: 277–288. Wicker, B., Keysers, C., Plailly, J., Royet, J.P., Gallese, V. and Rizzolatti, G. (2003) Both of us disgusted in my insula: the common neural basis of seeing and feeling disgust. Neuron, 40: 655–664. Williams, L.M., Phillips, M.L., Brammer, M.J., Skerrett, D., Lagopoulos, J., Rennie, C., Bahramali, H., Olivieri, G., David, A.S., Peduto, A. and Gordon, E. (2001) Arousal dissociates amygdala and hippocampal fear responses: evidence from simultaneous fMRI and skin conductance recording. Neuroimage, 14: 1070–1079. Young, A.W., Aggleton, J.P., Hellawell, D.J., Johnson, M., Broks, P. and Hanley, J.R. (1995) Face Processing impairments after amygdalotomy. Brain, 118: 15–24. Zald, D.H. (2003) The human amygdala and the emotional evaluation of sensory stimuli. Brain Res. Brain Res. Rev., 41: 88–123.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 22
Empathizing: neurocognitive developmental mechanisms and individual differences Bhismadev Chakrabarti and Simon Baron-Cohen Autism Research Centre, University of Cambridge, Psychiatry Department, Douglas House, 18B Trumpington Rd, Cambridge CB2 2AH, UK
Abstract: This chapter reviews the Mindreading System model encompassing four neurocognitive mechanisms (ID, EDD, SAM, and ToMM) before reviewing the revised empathizing model encompassing two new neurocognitive mechanisms (TED and TESS). It is argued that the empathizing model is more comprehensive because it entails perception, interpretation, and affective responses to other agents. Sex differences in empathy (female advantage) are then reviewed, as a clear example of individual differences in empathy. This leads into an illustration of individual differences using the Empathy Quotient (EQ). Finally, the neuroimaging literature in relation to each of the neurocognitive mechanisms is briefly summarized and a new study is described that tests if different brain regions respond to the perception of different facial expressions of emotion, as a function of the observer’s EQ. Keywords: empathy; theory of mind; mindreading; neuroimaging; sex differences; psychopathology; individual differences; basic emotions to empathy by investigating if an individual’s level of empathy affects how their brain processes discrete emotions.
Introduction In this chapter, we take the concept of empathy, and consider it in terms of neurocognitive developmental mechanisms and in terms of individual differences. The first part of the chapter deals with two conceptual approaches to the development of the empathizing ability. The second part of the chapter presents some empirical evidence on a quantitative trait measure of empathy. A simple definition of ‘empathizing’ is that it is the lens through which we perceive and process emotions. We therefore review the literature from neuroimaging studies, which suggests that perception of discrete basic emotions is processed in different neural regions and networks. Finally, we describe a recent study that reconciles these two approaches
What is empathizing? Empathizing is the drive to identify another person’s emotions and thoughts, and to respond to these with an appropriate emotion (Davis, 1994). We use the term ‘drive’ but recognize that it also overlaps with the concept of a skill or an ability. We also focus on the definition of empathy given by Davis while recognizing that other authors may have a slightly different definition. Empathizing does not just entail the cold calculation of what someone else thinks and feels (or what is sometimes called mindreading). Psychopaths can do that much. Empathizing is also about having an appropriate emotional reaction inside you, an emotion triggered by the other person’s emotion.
Corresponding author. Fax: +44-1223-746033; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56022-4
403
404
Empathizing is done in order to understand another person, predict their behaviour, and to connect or resonate with them emotionally. Imagine you could recognize that ‘‘Jane is in pain,’’ but this left you cold, or detached, or happy, or preoccupied. This would not be empathizing. Now imagine you do not just see Jane’s pain, but you also automatically feel concerned, wincing yourself, and feeling a desire to run across and help alleviate her pain. This is empathizing. And empathizing extends to recognizing and responding to any emotion or state of mind, not just the more obvious ones, like pain. Empathy is a skill (or a set of skills). As with any other skill, such as athleticism or mathematical or musical ability, we all vary in it. In the same way that we can think about why someone is talented or average or even disabled in these other areas, so we can think about individual differences in empathy. Empathy is a defining feature of human relationships. Empathy stops you doing things that would hurt another person’s feelings. Empathy also stops you inflicting physical pain on a person or animal. Empathy allows you to tune into someone else’s world, setting aside your own world — your perception, knowledge, assumptions, or feelings. It allows you to see another side of an argument easily. Empathy drives you to care for, or offer comfort to, another person, even if they are unrelated to you and you stand to gain nothing in return. Empathy also makes real communication possible. Talking ‘‘at’’ a person is not real communication. It is a monologue. Real conversation is sensitive to this listener at this time. Empathy also provides a framework for the development of a moral code. Moral codes are built out of fellowfeeling and compassion.
Fractionating empathy Philosophical (Stein, 1989) and evolutionary (Brothers, 1990; Levenson, 1996; Preston and de Waal, 2002) accounts have suggested that empathizing is not a unitary construct. Possible constituent ‘fractions’ of empathy include (1) ‘emotional contagion/affective empathy’, (2) ‘cognitive empathy’, and (3) sympathy.
Cognitive empathy is involved in explicit understanding of another’s feelings and switching to take their perspective. Piaget referred to empathy as ‘decentering’, or responding nonegocentrically (Piaget and Inhelder, 1956). More recent developmental psychologists refer to this aspect of empathy in terms of using a ‘theory of mind’, or ‘mindreading’ (Astington et al., 1988; Whiten, 1991). Essentially, the cognitive component of empathizing entails setting aside your own current perspective, attributing a mental state (sometimes called an ‘attitude’) to the other person, and then inferring the likely content of their mental state, given their experience. The cognitive element also allows you to predict the other person’s mental state or behaviour. The second aspect to empathy is the ‘affective’ component (Hobson, 1993). A similar component in other accounts has been called ‘emotional contagion’, defined as the tendency to mimic and synchronise facial expressions automatically, vocalizations, postures, and movements with those of another person, to converge emotionally (Hatfield et al., 1992). This may be the most primitive component of empathy. For example, when witnessing someone else in a state of fear, if the observer ‘catches’ a similar state of fear, this acts as a ‘quickand-easy’ route to alerting oneself to environmental dangers without having to face the dangers oneself. A third component involves a ‘concern mechanism’ (Nichols, 2001) often associated with a prosocial/altruistic component, also termed ‘sympathy’. This is distinct from emotional contagion in not necessarily involving matched states between the observer and the person experiencing the emotion, and being possibly specific to a certain class of emotions (sadness and pain, but not disgust or happiness) in the other person. It represents a case where the observer feels both an emotional response to someone else’s distress and a desire to alleviate their suffering.
How does empathizing develop? The Mindreading System In 1994, Baron-Cohen proposed a model to specify the neurocognitive mechanisms that comprise the
405
‘Mindreading System’ (Baron-Cohen, 1994, 1995). Mindreading is defined as the ability to interpret one’s own or another agent’s actions as driven by mental states. The model was proposed in order to explain (1) ontogenesis of a theory of mind and (2) neurocognitive dissociations that are seen in children with or without autism. The model is shown in Fig. 1 and contains four components: the intentionality detector (ID), the eye direction detector (EDD), the shared attention mechanism (SAM), and finally the theory-of-mind mechanism (ToMM). ID and EDD build ‘Dyadic’ representations of simple mental states. ID automatically represents or interprets an agent’s self-propelled movement as a desire or goal-directed movement, a sign of its agency, or an entity with volition (Premack, 1990). For example, ID interprets an animate-like moving shape as ‘‘it wants x’’ or ‘‘it has goal y.’’ EDD automatically interprets or represents eye-like stimuli as ‘‘looking at me’’ or ‘‘looking at something else.’’ That is, EDD picks out that an entity with eyes can perceive. Both ID and EDD are
ID
EDD
0-9m
SAM
9-14m
ToMM
2-4yrs
Key: IDD = Intentionality Detector EDD = Eye Direction Detector SAM = Shared Attention Mechanism ToMM = Theory of Mind Mechanism m-months Fig. 1. Baron-Cohen’s (1994) model of mindreading. Key: IDD ¼ Intentionality Detector; EDD ¼ Eye Direction Detector; SAM ¼ Shared Attention Mechanism; ToMM ¼ Theoryof-Mind Mechanism.
developmentally prior to the other two mechanisms, and are active early in infancy, if not from birth. SAM is developmentally more advanced. SAM automatically represents or interprets if the self and another agent are (or are not) perceiving the same event. SAM does this by building ‘triadic’ representations. For example, where ID can build the dyadic representation ‘‘Mother wants the cup’’ and where EDD can build the dyadic representation ‘‘Mother sees the cup’’, SAM can build the triadic representation ‘‘Mother sees that I see the cup’’. As is apparent, triadic representations involve embedding or recursion. (A dyadic representation ‘‘I see a cup’’ is embedded within another dyadic representation ‘‘Mum sees the cup’’ to produce this triadic representation.) SAM takes its input from ID and EDD, and triadic representations are made out of dyadic representations. SAM typically functions from 9 to 14 months of age, and allows ‘joint attention’ behaviours such as protodeclarative pointing and gaze monitoring (Scaife and Bruner, 1975). ToMM is the jewel in the crown of the 1994 model of the Mindreading System. It allows epistemic mental states to be represented (e.g., ‘‘Mother thinks this cup contains water’’ or ‘‘Mother pretends this cup contains water’’), and it integrates the full set of mental-state concepts (including emotions) into a theory. ToMM develops between two and four years of age, and allows pretend play (Leslie, 1987), understanding of false belief (Wimmer and Perner, 1983), and understanding of the relationships between mental states (Wellman, 1990). An example of the latter is the seeing-leads-to-knowing principle (Pratt and Bryant, 1990), where a typical 3-year-old can infer that if someone has seen an event, then they will know about it. The model shows the ontogenesis of a theory of mind in the first 4 years of life, and justifies the existence of four components on the basis of developmental competence and neuropsychological dissociation. In terms of developmental competence, joint attention does not appear possible until 9–14 months of age, and joint attention appears to be a necessary but not sufficient condition for understanding epistemic mental
406
states (Baron-Cohen, 1991; Baron-Cohen and Swettenham, 1996). There appears to be a developmental lag between acquiring SAM and ToMM, suggesting that these two mechanisms are dissociable. In terms of neuropsychological dissociation, congenitally blind children can ultimately develop joint (auditory or tactile) attention (i.e., SAM), using the amodal ID rather than the visual EDD route. They can therefore go on to develop ToMM. Children with autism appear to be able to represent the dyadic mental states of seeing and wanting, but show delays in shared attention (Baron-Cohen, 1989b) and in understanding false belief (Baron-Cohen et al., 1985; BaronCohen, 1989a) — that is, in acquiring SAM and ultimately ToMM. It is this specific developmental delay that suggests that SAM and ToMM are dissociable from EDD.
TED
ID
EDD
SAM
TESS (14m)
0 -9m
9 -14m
ToMM (48m)
Key: As in Figure 1, but: TED = The Emotion Detector; and TESS = The Empathising SyStem
Shortcomings of the Mindreading System model: the Empathizing SyStem
Fig. 2. Baron-Cohen’s (2005) model of empathizing. Key: As in Fig. 1, with TED ¼ The Emotion Detector and TESS ¼ The Empathizing SyStem.
The 1994 model of the Mindreading System was revised in 2005 because of certain omissions and too narrow a focus. The key omission is that information about affective states, available to the infant perceptual system, has no dedicated neurocognitive mechanism. In Fig. 2, the revised model (Baron-Cohen, 2005) is shown and now includes a new fifth component: the emotion detector (TED). But the concept of mindreading (or theory of mind) makes no reference to the affective state in the observer triggered by recognition of another’s mental state. This is a particular problem for any account of the distinction between autism and psychopathy. For this reason, the model is no longer of ‘mindreading’ but is of ‘empathizing’, and the revised model also includes a new sixth component: The Empathizing SyStem (TESS). (TESS is spelt as it is to playfully populate the Mindreading Model with apparently anthropomorphic components.) Where the 1994 Mindreading System was a model of a passive observer (because all the components had simple decoding functions), the 2005 Empathizing SyStem is a model of an observer impelled towards action (because an emotion is triggered in the observer which
typically motivates the observer to respond to the other person). Like the other infancy perceptual input mechanisms of ID and EDD, the new component of TED can build dyadic representations of a special kind, namely, it can represent affective states. An example would be ‘‘Mother — is unhappy’’ or even ‘‘Mother — is angry with me.’’ Formally, we can describe this as agent-affective state proposition. We know that infants can represent affective states from as early as three months of age (Walker, 1982). As with ID, TED is amodal, in that affective information can be picked up from facial expression, or vocal intonation, ‘motherese’ being a particularly rich source of the latter (Field, 1979). Another person’s affective state is presumably also detectable from their touch (e.g., tense vs. relaxed), which implies that congenitally blind infants should find affective information accessible through both auditory and tactile modalities. TED allows the detection of the basic emotions (Ekman and Friesen, 1969). The development of TED is probably aided by simple imitation that is typical of infants (e.g., imitating caregiver’s expressions),
407
which in itself would facilitate emotional contagion (Meltzoff and Decety, 2003). When SAM becomes available, at 9–14 months of age, it can receive inputs from any of the three infancy mechanisms, ID, EDD, or TED. Here, we focus on how a dyadic representation of an affective state can be converted into a triadic representation by SAM. An example would be that the dyadic representation ‘‘Mother is unhappy’’ can be converted into a triadic representation ‘‘I am unhappy that Mother is unhappy’’, ‘‘Mother is unhappy that I am unhappy’’, etc. Again, as with perceptual or volitional states, SAM’s triadic representations of affective states have this special embedded or recursive property. TESS in the 2005 model is the real jewel in the crown. This is not to minimize the importance of ToMM, which has been celebrated for the last 20 years in research in developmental psychology (Leslie, 1987; Wimmer et al., 1988; Whiten, 1991). ToMM is of major importance in allowing the child to represent the full range of mental states, including epistemic ones (such as false belief), and is important in allowing the child to pull mentalistic knowledge into a useful theory with which to predict behaviour (Wellman, 1990; Baron-Cohen, 1995). But TESS allows more than behavioural explanation and prediction (itself a powerful achievement). TESS allows an empathic reaction to another’s emotional state. This is, however, not to say that these two modules do not interact. Knowledge of mental states of others made possible by ToMM could certainly influence the way in which an emotion is processed and/or expressed by TESS. TESS also allows for sympathy. It is this element of TESS that gives it the adaptive benefit of ensuring that organisms feel a drive to help each other.
M-representations versus E-representations To see the difference between TESS and ToMM, consider this example: ‘‘I see you are in pain’’. Here, ToMM is needed to interpret your facial expressions and writhing body movements in terms of your underlying mental state (pain). But now consider this further example: ‘‘I am devastated —
that you are in pain’’. Here, TESS is needed, since an appropriate affective state has been triggered in the observer by the emotional state identified in the other person. And where ToMM employs M-representations1 (Leslie, 1995) of the form agent-attitude-proposition (e.g., Mother — believes — Johnny took the cookie), TESS employs a new class of representations, which we can all E-representations2 of the form self-affective state [agentaffective state-proposition] (e.g., ‘‘I feel sorry that — Mom feels sad about — the news in the letter’’) (Baron-Cohen, 2003). The critical feature of this Erepresentation is that the self’s affective state is appropriate to and triggered by the other person’s affective state. Thus, TESS can represent [I am horrified — that you are in pain], or [I am concerned — that you are in pain], or [I want to alleviate — that you are in pain], but it cannot represent [I am happy — that you are in pain]. At least, it cannot do so if TESS is functioning normally. One could imagine an abnormality in TESS leading to such inappropriate emotional states being triggered, or one could imagine them arising from other systems (such as a competition system or a sibling-rivalry system), but these would not be evidence of TESS per se.
Dissociations between TED, ToMM, and TESS from neuropsychiatry Before leaving this revision of the model, it is worth discussing why the need for this has arisen. First, emotional states are an important class of mental states to detect in others, and yet the earlier model focused only on volitional, perceptual, informational, and epistemic states. Second, when it comes to pathology, it would appear that in autism TED might function, although this may be delayed (Hobson, 1986; Baron-Cohen et al., 1993, 1997c), at least in terms of detecting basic emotions. Even high-functioning people with autism or Asperger Syndrome have difficulties both in ToMM (when measured with mental-age appropriate tests) (Happe´, 1994; Baron-Cohen et al., 1997b, 2001) 1 2
M stands for mental. E stands for empathy.
408
and TESS (Attwood, 1997; Baron-Cohen et al., 1999a, b, 2003, 2004). This suggests that TED and TESS may be fractionated. In contrast, the psychiatric condition of psychopathy may entail an intact TED and ToMM, alongside an impaired TESS. The psychopath (or sociopath) can represent that you are in pain, or that you believe — that he is the gas-man, thereby gaining access to your house or your credit card. The psychopath can go on to hurt you or cheat you without having the appropriate affective reaction to your affective state. In other words, he or she does not care about your affective state (Mealey, 1995; Blair et al., 1997). Lack of guilt or shame or compassion in the presence of another’s distress are diagnostic of psychopathy (Cleckley, 1977; Hare et al., 1990). Separating TESS and ToMM thus allows a functional distinction to be drawn between the neurocognitive causes of autism and psychopathy.
Developmental dissociations Developmentally, one can also distinguish TED from TESS. We know that at three months of age, infants can discriminate facial and vocal expressions of emotion (Walker, 1982; Trevarthen, 1989), but that it is not until about 14 months that they can respond with appropriate affect (e.g., a facial expression of concern) to another’s apparent pain (Yirmiya et al., 1990) or show ‘‘social referencing.’’ Clearly, this account is skeletal in not specifying how many emotions TED is capable of recognizing. Our recent survey of emotions identifies that there are 412 discrete emotion concepts that the adult English language user recognizes (Baron-Cohen et al., submitted). How many of these are recognized in the first year of life is not clear. It is also not clear exactly how empathizing changes during the second year of life. We have assumed the same mechanism that enables social referencing at 14-months-old also allows sympathy and the growth of empathy across development. This is the most parsimonious model, though it may be that future research will justify further mechanisms that affect the development of empathy.
Sex differences in empathizing In the introduction to this chapter we promised to consider sex differences in empathizing. Some of the best evidence for individual differences in empathizing comes from the study of sex differences, where many studies converge on the conclusion that there is a female superiority in empathizing. Sex differences are best viewed as summated individual differences, on multiple dimensions that include genetic and epigenetic factors. Some of the observed behavioural differences are reviewed here. (1) Sharing and turn-taking. On average, girls show more concern for fairness, while boys share less. In one study, boys showed 50 times more competition, while girls showed 20 times more turn-taking (Charlesworth and Dzur, 1987). (2) Rough and tumble play or ‘‘rough housing’’ (wrestling, mock fighting, etc). Boys show more of this than girls do. Although there is a playful component, it can hurt or be intrusive, so it needs lower empathizing to carry it out (Maccoby, 1999). (3) Responding empathically to the distress of other people. Girls aged 1 year or more show greater concern through more sad looks, sympathetic vocalizations, and comforting. More women than men also report frequently sharing the emotional distress of their friends. Women also show more comforting, even of strangers, than men do (Hoffman, 1977). (4) Using a ‘theory of mind’. By three years of age, little girls are already ahead of boys in their ability to infer what people might be thinking or intending (Happe, 1995). This sex difference appears in some but not all studies (Charman et al., 2002). (5) Sensitivity to facial expressions. Women are better at decoding nonverbal communication, picking up subtle nuances from tone of voice or facial expression, or judging a person’s character (Hall, 1978). (6) Questionnaires measuring empathy. Many of these find that women score higher than men (Davis, 1994).
409
(7) Values in relationships. More women value the development of altruistic, reciprocal relationships, which by definition require empathizing. In contrast, more men value power, politics, and competition (Ahlgren et al., 1979). Girls are more likely to endorse co-operative items on a questionnaire and to rate the establishment of intimacy as more important than the establishment of dominance. Boys are more likely than girls to endorse competitive items and to rate social status as more important than intimacy (Knight et al., 1989). (8) Disorders of empathy (such as psychopathic personality disorder or conduct disorder) are far more common among males (Dodge, 1980; Blair, 1995). (9) Aggression, even in normal quantities, can only occur with reduced empathizing. Here again, there is a clear sex difference. Males tend to show far more ‘direct’ aggression (pushing, hitting, punching, etc.) while females tend to show more ‘indirect’ (or relational, covert) aggression (gossip, exclusion, bitchy remarks, etc.). Direct aggression may require an even lower level of empathy than indirect aggression. Indirect aggression needs better mindreading skills than does direct aggression, because its impact is strategic (Crick and Grotpeter, 1995). (10) Murder is the ultimate example of a lack of empathy. Daly and Wilson (1988) analysed homicide records dating back over 700 years, from a range of different societies. They found that ‘male-on-male’ homicide was 30–40 times more frequent than ‘female-on-female’ homicide. (11) Establishing a ‘dominance hierarchy’. Males are quicker to establish these. This in part may reflect their lower empathizing skills, because often a hierarchy is established by one person pushing others around, to become the leader (Strayer, 1980). (12) Language style. Girls’ speech is more cooperative, reciprocal, and collaborative. In concrete terms, this is also reflected in girls being able to keep a conversational
(13)
(14)
(15)
(16)
exchange with a partner going for longer. When girls disagree, they are more likely to express their different opinion sensitively, in the form of a question rather than an assertion. Boys’ talk is more ‘single-voiced discourse’ (the speaker presents their own perspective alone). The female speech style is more ‘double-voiced discourse’ (girls spend more time negotiating with the other person, trying to take the other person’s wishes into account) (Smith, 1985). Talk about emotions. Women’s conversation involves much more talk about feelings, while men’s conversation with each other tends to be more object or activity focused (Tannen, 1991). Parenting style. Fathers are less likely than mothers to hold their infant in a face-toface position. Mothers are more likely to follow through the child’s choice of topic in play, while fathers are more likely to impose their own topic. And mothers finetune their speech more often to match what the child can understand (Power, 1985). Face preference and eye contact. From birth, females look longer at faces, and particularly at people’s eyes, and males are more likely to look at inanimate objects (Connellan et al., 2001). Finally, females have also been shown to have better language ability than males. It seems likely that good empathizing would promote language development (BaronCohen et al., 1997a) and vice versa, so these may not be independent.
Leaving aside sex differences as one source of evidence for individual differences, one can see that empathy is normally distributed within the population. Figure 3 shows the data from the Empathy Quotient (EQ), a validated 60-item self-report questionnaire (Baron-Cohen and Wheelwright, 2004). It has been factor analysed to suggest the existence of three distinct components, which roughly correspond to the three-component model of empathy (Lawrence et al., 2004). Scores on the EQ show a quasi-normal distribution in several populations, with scores from
410 25
AS/HFA group Controls
15 10 5
76 to 80
71 to 75
66 to 70
61 to 65
56 to 60
51 to 55
46 to 50
41 to 45
36 to 40
31 to 35
26 to 30
21 to 25
16 to 20
11 to 15
0 to 5
0 6 to 10
Number of subjects
20
EQ score
Fig. 3. The distribution of EQ in the general population (dotted line). Also shown is the distribution of empathy scores from people with Asperger Syndrome (AS) or High Functioning Autism (HFA). (From Baron-Cohen and Wheelwright, 2004.)
people with Autism Spectrum Conditions (ASCs) clustering toward the lower end (see Fig. 3). The EQ shows significant sex differences (Goldenfeld et al., 2006). The search for the neural correlates of empathy has had two traditions of research, one focusing on theory-of-mind studies (involving largely intention attribution or emotion attribution) and another focusing on action understanding. The latter has gained considerable importance in recent years since the discovery of mirror neurons (Gallese et al., 2004). On finding increasing evidence of sex differences in the EQ in the general population, we sought to investigate the neural correlates of this trait measure of empathizing across the population. Since empathizing can be viewed as a lens through which we perceive and process emotions, we attempted to marry the two fields of emotion perception and empathizing. The following section briefly introduces the current state of the literature on the neural bases of basic emotions and the results of a recent study from our lab.
Neuroimaging studies of empathizing and emotion Neuroimaging studies have implicated the following different brain areas for performing tasks that tap components of the model of empathy
proposed above, presented in order of their development. 1. Studies of emotional contagion have demonstrated involuntary facial mimicry (Dimberg et al., 2000) as well as activity in regions of the brain where the existence of ‘mirror’ neurons has been suggested (Carr et al., 2003; Wicker et al., 2003; Jackson et al., 2005). 2. ID has been tested (Brunet et al., 2000) in a PET study in a task involving attribution of intentions to cartoon characters. Reported activation clusters included the right medial prefrontal (BA 9), inferior frontal (BA 47) cortices, superior temporal gyrus (BA 42), and bilateral anterior cingulate cortex. In an elegant set of experiments that required participants to attribute intentions to animations of simple geometric shapes (Castelli et al., 2000), it was found that the ‘intentionality’ score attributed by the participants to individual animations was positively correlated to the activity in superior temporal sulcus (STS), the temporo-parietal junction, and the medial prefrontal cortex. A subsequent study (Castelli et al., 2002) demonstrated a group difference in activity in the same set of structures between people with Autism/Asperger Syndrome and neurotypical controls. 3. EDD has been studied in several neuroimaging studies on gaze direction perception (Calder et al., 2002; Pelphrey et al., 2003; see Grosbras et al., 2005 for a review) and have implicated the posterior STS bilaterally. This evidence, taken together with similar findings from primate literature (Perrett and Emery, 1994), suggests this area to be a strong candidate for the anatomical equivalent of the EDD. 4. A recent imaging study (Williams et al., 2005) investigated the neural correlates of SAM and reported bilateral activation in anterior cingulate (BA 32,24), and medial prefrontal cortex (BA 9,10) and the body of caudate nucleus in a joint attention task, when compared to a control task involving nonjoint attention (see Frith and Frith, 2003 for a review). 5. Traditional ‘theory-of-mind’ (cognitive empathy) tasks have consistently shown activity
411
in medial prefrontal cortex, superior temporal gyrus, and the temporo-parietal junctions (Frith and Frith, 2003; Saxe et al., 2004). This could be equated to the brain basis of ToMM. 6. Sympathy has been relatively less investigated, with one study implicating the left inferior frontal gyrus, among a network of other structures (Decety and Chaminade, 2003). Work on ‘moral’ emotions has suggested the involvement of a network comprising the medial frontal gyrus, the medial orbitofrontal cortex, and the STS (Moll et al., 2002).
Neuroimaging of discrete emotions An increasing body of evidence from lesion, neuroimaging and electrophysiological studies suggest that these affect programs might have discrete neural bases (Calder et al., 2001). Fear is possibly the single most well investigated emotion. Passive viewing of fear expressions as well as experiencing fear (as induced through recalling a fear memory, or seeing fearful stimuli) activates the amygdala, orbitofrontal cortex and the anterior cingulate cortex (Morris et al., 1999; Damasio et al., 2000). There is considerable evidence from non-human primates (Kalin et al., 2001; Prather et al., 2001) and rats (LeDoux, 2000) to suggest a crucial role for these regions in processing fear. Visual or auditory perception of disgust expressions as well as experiencing disgust is known to activate the anterior insula and the pallidum (Phillips et al., 1997, 1998; Wicker et al., 2003). An increasing consensus on the role of the ventral striatum in processing reward from different sensory domains [receiving food rewards (O’Doherty et al., 2002), viewing funny cartoons (Mobbs et al., 2003), remembering happy events (Damasio et al., 2000)] concurs well with studies that report activation of this region in response to viewing happy faces (Phillips et al., 1998; Lawrence et al., 2004). Perception of angry expressions have been shown to evoke a response in the premotor cortex and the striatum (Grosbras and Paus, in press) as well as the lateral orbitofrontal cortex (Blair et al., 1999). The results of studies on the processing
of sad expressions are comparatively less consistent. Perception of sad face and induction of sad mood are both known to be associated with an increased response in the subgenual cingulate cortex (Mayberg et al., 1999; Liotti et al., 2000), the hypothalamus in humans (Malhi et al., 2004) and in rats (Shumake, Edwards et al., 2001) as well as in the middle temporal gyrus (Eugene, Levesque et al., 2003). There have been very few studies on the passive viewing of surprise. One study by (Schroeder et al., 2004) has reported bilateral activation in the parahippocampal region, which is known for its role in novelty detection from animal literature. While the discrete emotions model holds well for these relatively ‘simple’ emotions, the dimensional models (e.g. see (Rolls, 2002)) become increasingly relevant as we consider the more ‘socially complex’ emotions, e.g. pride, shame and guilt – since it would not be very economical to have discrete neural substrates for the whole gamut of emotions. These two models, however, need not be in conflict, since the more complex emotions can be conceptualized as being formed of a combination of the basic ones (i.e. with each of the ‘basic’ emotions representing a dimension in emotion space). Two major meta-analytic studies of neuroimaging literature on emotions highlight the role of discrete regions in primarily visual processing of different basic emotions (Phan et al., 2002; Murphy et al., 2003). Some studies using stimuli in other sensory modalities (olfactory (Anderson et al., 2003) gustatory (Small et al., 2003), auditory (Lewis et al., in press)) have shown the possibly dissociable role for the amygdala and the orbitofrontal cortex in processing emotions along the two dimensions of valence and arousal. The relative absence of neuroimaging studies on ‘complex’ emotions could be possibly due to the increased cultural variability of the elicitors as well as the display rules that these expressions entail. Among the few exceptions, guilt and embarrassment have been investigated by Takahashi et al., 2004 who reported activation in ventromedial prefrontal cortex, left Superior Temporal Sulcus (STS) and higher visual cortices when participants read sentences designed to evoke guilt or embarrassment. This, taken together with the areas underlying the ‘ToMM’ system could possibly
412
suggest an increased role of ‘theory-of-mind’ to make sense of these emotions.
‘Empathizing’ with discrete emotions? Returning to the concept of individual differences in empathizing, this poses an interesting question for the brain basis of perception of discrete emotions. Do we use a centralized ‘empathy circuit’ to make sense of all emotions? If so, can one detect differences in how discrete emotions are processed in individuals who are at different points on the EQ continuum? A direct approach to investigate individual differences in empathizing has been to test for sex differences in perception of emotions. Using facial Electromyography (EMG), one study (Helland, 2005) observed that females tend to show increased facial mimicry to facial expressions of happiness and anger when compared to males. In a metareview of neuroimaging results on sex differences on emotion perception, Wager et al. (2003) reported that females show increased bilaterality in emotion-relevant activation compared to males. This is not always found (Lee et al., 2002; Schienle et al., 2005). One of the reasons for this might have been the fact that sex differences are summated individual differences. Instead of such a broad category-based approach (as in sex-difference studies), an approach based on individual differences in self-report personality scores (Canli et al., 2002) or genetic differences (e.g., Hariri et al., 2002) may be more finely tuned. To test this model of individual variability, we asked if an individual’s score on the EQ predicted his/her response to four basic emotions (happy, sad, angry, and disgust). If empathizing were modulated by a unitary circuit, then individual differences in empathizing would correlate with activity in the same structures for all basic emotions. Twenty-five volunteers (13 female and 12 male) selected across the EQ space were scanned in a 3 T functional magnetic resonance imaging (fMRI) scanner on a passive viewing task using dynamic facial expressions as stimuli. It was found that activity in different brain regions correlated
with EQ scores for different basic emotions (Chakrabarti et al., in press). Different regional responses were found to correlate with the EQ for different emotions, suggesting that there is no unitary correlate of the personality trait of empathizing across these emotions. Specifically, for perception of happy faces, a parahippocampal-ventral striatal cluster response was positively correlated with the EQ. The role of this region in reward processing is well known (O’Doherty et al., 2004). This suggests that the more ‘‘empathic’’ a person is, the higher is his/her reward response to a happy face. Interestingly, the response from the same region correlated negatively with the EQ during perception of sad faces. This fits perfectly with the earlier results, i.e., the more empathic a person is, the lower is his/her reward response to a sad face. For happy and sad faces therefore, empathizing seems to involve mirroring. The higher a person’s EQ, the stronger the reward response to happy faces and vice versa for sad faces. This is in concordance with suggestions from earlier studies on pain perception (Singer et al., 2004) and disgust perception (Wicker et al., 2003), where observation and experience have been shown to be mediated by the same set of structures. One of the issues with the previous studies is a possible confound between ‘personal distress’ and empathizing. The novel element in our study is that we explicitly tested for the personality trait of empathizing in relation to perception of specific emotions. However, empathizing does not appear to be purely an index of mirroring. For perception of angry faces, EQ correlated positively to clusters centered on the precuneus/inferior parietal lobule, the superior temporal gyrus, and the dorsolateral prefrontal cortex. The posterior cingulate region is known to be involved in self/other distinction (Vogt, 2005), and the superior temporal gyrus is known for its role in ToMM tasks (Saxe et al., 2004). This suggests that higher EQ corresponds to higher activation in areas related to the distinction of self versus other, as well as those that are recruited to determine another person’s intentions. The dorsolateral prefrontal cortex is known for its role in decision making and context evaluation (Rahm et al., 2006). Higher EQ would therefore
413
predict better evaluation of the threat from an angry expression. Since expressions of anger are usually more socially urgent for attention than those of either sadness or happiness, it is essential that highly empathic persons do not merely ‘mirror’ the expression. A high empathizer’s perception of an angry face would therefore need to be accompanied by an accurate determination of the intentions of the person as well as an evaluation of the posed threat. In response to disgust faces, a cluster containing the dorsal anterior cingulate cortex and medial prefrontal cortices is negatively correlated with EQ, suggesting that the areas involved in attribution of mental states (primarily required for deciphering the ‘complex’ emotions) are selectively recruited less by people of high EQ. This is what might be expected, since disgust as an emotion is less interpersonal than anger or sadness, so resources for decoding complex emotional signals need not be utilized. Another cluster that includes the right insula and Inferior Frontal Gyrus (IFG) is negatively correlated with EQ. Given the wellestablished role of this region in processing disgust, this was a surprising result. We expected that an increased ability to empathize would result in an increased disgust response to facial expressions of disgust. The negative correlation suggests that people with high EQ had a lower insula-Inferior Frontal Gyrus (IFG) response to disgust expressions. A re-examination of the behavioural literature on disgust sensitivity reveals a similar result since Haidt et al. (1994) suggested that increased socialization leads to lower disgust sensitivity. Individuals with high EQ may socialize more than those with low EQ. Together, these results demonstrate variability among different basic emotions in how empathy interacts with them. This fits with the core idea that the different basic emotions have relatively independent evolutionary antecedents (Panksepp, 1998) and social communicatory functions (Izard and Ackerman, 2000). While some of the emotions involve more ‘mirroring’ (the same areas show activation during recognition and experience, e.g., the striatal response to happy faces correlating positively with EQ), others require an increased distinction between one’s own emotional state and
the other’s emotional state (e.g., the Superior Temporal Gyrus (STG) and Inferior Perietal Lobule (IPL)/precuneus response to angry faces correlating with EQ). This study provides support for the discrete emotions model discussed above, but reveals how empathy at the neural level is subtle and complex: Neural networks activated by perception of discrete emotions depend on the observer’s EQ. Empathy is likely to be determined by other individual differences, such as fetal testosterone (Knickmeyer et al., 2005; Knickmeyer and Baron-Cohen, 2006), genetic variation (Skuse et al., 1997; Chakrabarti et al., 2006), as well as early care or neglect (Fonagy et al., 1997). We conclude that more basic neuroscience into empathy will enrich our understanding of this most fundamental human quality. Abbreviations EDD EQ ID SAM TED TESS ToMM
Eye Direction Detector Empathy Quotient Intentionality Detector Shared Attention Mechanism The Emotion Detector The Empathy SyStem Theory-of-Mind Mechanism
Acknowledgements S.B-C. was supported by the MRC and the Lurie Marks Family Foundation during the period of this work. B.C. was supported by Trinity College, Cambridge. Parts of this chapter are reprinted from elsewhere (Baron-Cohen, 2005; Goldenfeld et al., 2006; Chakrabarti et al., in press). References Anderson, A., Christoff, K., Stappen, I., Panitz, D., Ghahremani, D., Glover, G., Gabrieli, J. and Sobel, N. (2003) Dissociated neural representations of intensity and valence in human olfaction. Nature Neuroscience, 6: 196–202. Ahlgren, A. and Johnson, D.W. (1979) Sex differences in cooperative and competitive attitudes from the 2nd to the 12th grades. Dev. Psychol., 15: 45–49.
414 Astington, J., Harris, P. and Olson, D. (1988) Developing Theories of Mind. Cambridge University Press, New York. Attwood, T. (1997) Asperger’s Syndrome. Jessica Kingsley, London, UK. Baron-Cohen, S. (1989a) The autistic child’s theory of mind: a case of specific developmental delay. J. Child Psychol. Psychiat., 30: 285–298. Baron-Cohen, S. (1989b) Perceptual role taking and protodeclarative pointing in autism. Br. J. Devl. Psychol., 7: 113–127. Baron-Cohen, S. (1991) Precursors to a theory of mind: understanding attention in others. In: Whiten, A. (Ed.), Natural Theories of Mind. Basil Blackwell, Oxford. Baron-Cohen, S. (1994) The mindreading system: new directions for research. Curr. Psychol. Cogn., 13: 724–750. Baron-Cohen, S. (1995) Mindblindness: An Essay on Autism and Theory of Mind. MIT Press/Bradford Books, Boston. Baron-Cohen, S. (2003) The Essential Difference: Men, Women and the Extreme Male Brain. Penguin, London. Baron-Cohen, S. (2005) The empathizing system: a revision of the 1994 model of the mindreading system. In: Ellis, B. and Bjorklund, D. (Eds.), Origins of the Social Mind. Guilford, New York, USA. Baron-Cohen, S., Baldwin, D. and Crowson, M. (1997a) Do children with autism use the Speaker’s Direction of Gaze (SDG) strategy to crack the code of language? Child Dev., 68: 48–57. Baron-Cohen, S., Jolliffe, T., Mortimore, C. and Robertson, M. (1997b) Another advanced test of theory of mind: evidence from very high functioning adults with autism or Asperger Syndrome. J. Child Psychol. Psychiat., 38: 813–822. Baron-Cohen, S., Leslie, A.M. and Frith, U. (1985). Does the autistic child have a ‘‘theory of mind’’? Cognition, 21: 37–46. Baron-Cohen, S., O’Riordan, M., Jones, R., Stone, V. and Plaisted, K. (1999a) A new test of social sensitivity: detection of faux pas in normal children and children with Asperger Syndrome. J. Autism Dev. Disord., 29: 407–418. Baron-Cohen, S., Richler, J., Bisarya, D., Gurunathan, N. and Wheelwright, S. (2003) The systemising quotient (SQ): an investigation of adults with Asperger Syndrome or high functioning autism and normal sex differences. Philos. Trans. R. Soc., 358: 361–374. Baron-Cohen, S., Spitz, A. and Cross, P. (1993) Can children with autism recognize surprise? Cogn. Emotion,, 7: 507–516. Baron-Cohen, S. and Swettenham, J. (1996) The relationship between SAM and ToMM: the lock and key hypothesis. In: Carruthers, P. and Smith, P. (Eds.), Theories of Theories of Mind. Cambridge University Press, Cambridge. Baron-Cohen, S. and Wheelwright, S. (2004) The empathy quotient (EQ): an investigation of adults with Asperger Syndrome or high functioning autism, and normal sex differences. J. Autism Dev. Disord., 34: 163–175. Baron-Cohen, S., Wheelwright, S., Hill, J. and Golan, O. (submitted) Development of the emotion lexicon. Baron-Cohen, S., Wheelwright, S., Hill, J., Raste, Y. and Plumb, I. (2001) The ‘‘reading the mind in the eyes’’ test revised version: a study with normal adults, and adults with
Asperger Syndrome or high-functioning autism. J. Child Psychol. Psychiat., 42: 241–252. Baron-Cohen, S., Wheelwright, S. and Jolliffe, T. (1997c) Is there a ‘‘language of the eyes’’? Evidence from normal adults and adults with autism or Asperger Syndrome. Vis. Cogn., 4: 311–331. Baron-Cohen, S., Wheelwright, S., Stone, V. and Rutherford, M. (1999b) A mathematician, a physicist, and a computer scientist with Asperger Syndrome: performance on folk psychology and folk physics test. Neurocase, 5: 475–483. Blair, R.J. (1995) A cognitive developmental approach to morality: investigating the psychopath. Cognition, 57: 1–29. Blair, R.J.R., Morris, J., Frith, C., Perrett, D.I. and Dolan, R.J. (1999) Dissociable neural responses to facial expressions of sadness and anger. Brain, 122: 883–893. Blair, R.J., Jones, L., Clark, F. and Smith, M. (1997). The psychopathic individual: a lack of responsiveness to distress cues? Psychophysiology, 34: 192–198. Brothers, L. (1990) The neural basis of primate social communication. Motiv. Emotion, 14: 81–91. Brunet, E., Sarfati, Y., Hardy-Bayle, M.-C. and Decety, J. (2000) A PET investigation of the attribution of intentions with a non-verbal task. NeuroImage, 11: 157–166. Calder, A.J., Lawrence, A.D., Keane, J., Scott, S.K., Owen, A.M., Christoffels, I. and Young, A.W. (2002) Reading the mind from eye gaze. Neuropsychologia, 40: 1129–1138. Calder, A.J., Lawrence, A.D. and Young, A.W. (2001) Neuropsychology of fear and loathing. Nature Rev. Neurosci., 2: 352–363. Canli, T., Sivers, H., Whitfield, S.L., Gotlib, I. and Gabrieli, J. (2002) Amygdala response to happy faces as a function of extraversion. Science, 296: 2191. Carr, L.M., Iacoboni, M., Dubeau, M.-C., Mazziotta, J.C. and Lenzi, G.L. (2003) Neural mechanisms of empathy in humans: a relay from neural systems for imitation to limbic areas. Proc. Natl. Acad. Sci. USA, 100: 5497–5502. Castelli, F., Frith, C., Happe, F. and Frith, U. (2002) Autism, Asperger Syndrome and brain mechanisms for the attribution of mental states to animated shapes. Brain, 125: 1839–1849. Castelli, F., Happe, F., Frith, U. and Frith, C. (2000) Movement and mind: a functional imaging study of perception and interpretation of complex intentional movement patterns. NeuroImage, 12: 314–325. Chakrabarthi, B., Bullmore, E.T. and Baron-Cohen, S. (in press) Empathizing with basic emotions: common and discrete neural substrates. Chakrabarti, B., Kent, L., Suckling, J., Bullmore, E.T. and Baron-Cohen, S. (2006) Variations in human cannabinoid receptor (CNR1) gene modulate striatal response to happy faces. Eur. J. Neurosci., 23: 1944–1948. Charlesworth, W.R. and Dzur, C. (1987) Gender comparisons of preschoolers’ behavior and resource utilization in group problem-solving. Child Dev., 58: 191–200. Charman, T., Ruffman, T. and Clements, W. (2002) Is there a gender difference in false belief development. Socl. Dev., 11: 1–10.
415 Cleckley, H.M. (1977) The Mask of Sanity: An Attempt to Clarify Some Issues About The So-called Psychopathic Personality. Mosby, St Louis. Connellan, J., Baron-Cohen, S., Wheelwright, S., Ba’tki, A. and Ahluwalia, J. (2001) Sex differences in human neonatal social perception. Infant Behav. Dev., 23: 113–118. Crick, N.R. and Grotpeter, J.K. (1995) Relational aggression, gender, and social-psychological adjustment. Child Dev., 66: 710–722. Daly, M. and Wilson, M. (1988) Homicide. Aldine de Gruyter, New York. Damasio, A.R., Grabowski, T.J., Bechara, A., Damasio, H., Ponto, L.L.B., Parvizi, J. and Hichwa, R.D. (2000) Subcortical and cortical brain activity during the feeling of selfgenerated emotions. Nat. Neurosci., 3: 1049–1056. Davis, M.H. (1994) Empathy: A Social Psychological Approach. Westview Press, CO. Decety, J. and Chaminade, T. (2003) Neural correlates of feeling sympathy. Neuropsychologia, 41: 127–138. Dimberg, U., Thunberg, M. and Elmehed, K. (2000) Unconscious facial reactions to emotional facial expressions. Psychol. Sci., 11: 86–89. Dodge, K. (1980) Social cognition and children’s aggressive behaviour. Child Dev., 51: 162–170. Ekman, P. and Friesen, W. (1969) The repertoire of non-verbal behavior: categories, origins, usage, and coding. Semiotica, 1: 49–98. Eugene, F., Levesque, J., Mensour, B., Leroux, J.M., Beaudoin, G., Bourgouin, P. and Beauregard, M. (2003) The impact of individual differences on the neural circuitry underlying sadness. NeuroImage, 19: 354–364. Field, T. (1979) Visual and cardiac responses to animate and inanimate faces by term and preterm infants. Child Dev., 50: 188–194. Fonagy, P., Steele, H., Steele, M. and Holder, J. (1997). Attachment and theory of mind: overlapping constructs? ACPP Occasional Papers, 14: 31–40. Frith, U. and Frith, C. (2003) Development and neurophysiology of mentalizing. Philos. Trans. R. Soc., 358: 459–473. Gallese, V., Keysers, C. and Rizzolatti, G. (2004) A unifying view of the basis of social cognition. Trends Cogn. Sci., 8: 396–403. Goldenfeld, N., Baron-Cohen, S., Wheelwright, S., Ashwin, C. and Chakrabarti, B. (2005) Empathizing and systemizing in males and females, and autism: a test of neural competition theory. In: Farrow T. and Woodruff P. (Eds.), Empathy and Mental Illness. Cambridge University Press, Cambridge. Grosbras, M.-H. and T. Paus. (in press) Brain networks involved in viewing angry hands or faces. Cereb. Cortex. Grosbras, M.-H., Laird, A.R. and Paus, T. (2005) Cortical regions involved in eye movements, shifts of attention and gaze perception. Hum. Brain Mapp., 25: 140–154. Haidt, J., McCauley, C. and Rozin, P. (1994) Individual differences in sensitivity to disgust: a scale sampling seven domains of disgust elicitors. Person. Individ. Diff., 16: 701–713. Hall, J.A. (1978) Gender effects in decoding nonverbal cues. Psychol. Bull., 85: 845–857.
Happe, F. (1994) An advanced test of theory of mind: understanding of story characters’ thoughts and feelings by able autistic, mentally handicapped, and normal children and adults. J. Autism Dev. Disord., 24: 129–154. Happe, F. (1995) The role of age and verbal ability in the theory of mind task performance of subjects with autism. Child Dev., 66: 843–855. Hare, R.D., Hakstian, T.J., Ralph, A., Forth-Adelle, E., et al. (1990) The revised psychopathy checklist: reliability and factor structure. Psychol. Assess., 2: 338–341. Hariri, A.R., Mattay, V.S., Tessitore, A., Kolachana, B., Fera, F., Goldman, D., Egan, M. and Weinberger, D.R. (2002) Serotonin transporter genetic variation and the response of the human amygdala. Science, 297: 400–403. Hatfield, E., Cacioppo, J.T. and Rapson, R.L. (1992) Emotional contagion. In: Clark, M.S. (Ed.), Review of Personality and Social Psychology: Emotion and Behaviour. Sage Publications, Newbury Park, p. CA. Helland, S. (2005) Gender differences in facial imitation, Unpublished thesis, University of Lund, available online at http:// theses.lub.lu.se/archive/sob/psy/psy03022/PSY03022.pdf Hobson, R.P. (1986) The autistic child’s appraisal of expressions of emotion. J. Child Psychol. Psychiat., 27: 321–342. Hobson, R.P. (1993) Autism and The Development of Mind. Lawrence Erlbaum Associates, NJ. Hoffman, M.L. (1977) Sex differences in empathy and related behaviors. Psychol. Bull., 84: 712–722. Izard, C. and Ackerman, B. (2000) Motivational, organizational, and regulatory functions of discrete emotions. In: Haviland-Jones, J. and Lewis, M. (Eds.), Handbook of emotions. Guilford Press, New York, pp. 253–264. Jackson, P., Meltzoff, A.N. and Decety, J. (2005) How do we perceive the pain of others? A window into the neural processes involved in empathy. NeuroImage, 24: 771–779. Kalin, N.H., Shelton, S.E. and Davidson, R.J. (2001) The primate amygdala mediates acute fear but not the behavioral and physiological components of anxious temperament. Journal of Neuroscience, 21: 2067–2074. Knickmeyer, R., Baron-Cohen, S., Raggatt, P. and Taylor, K. (2006) Foetal testosterone and empathy. Hormones and Behaviour, 49: 282–292. Knickmeyer, R., Baron-Cohen, S., Raggatt, P. and Taylor, K. (2005) Foetal testosterone, social cognition, and restricted interests in children. J. Child Psychol. Psychiat., 45: 1–13. Knight, G.P., Fabes, R.A. and Higgins, D.A. (1989) Gender differences in the cooperative, competitive, and individualistic social values of children. Motiv. Emotion, 13: 125–141. Lawrence, A.D., Chakrabati, B. et al. (2004) Looking at happy and sad faces: an fMRI study. Annual meeting of the Cognitive Neuroscience Society, San Diego, USA, Cognitive Neuroscience Society. Lawrence, E.J., Shaw, P., Baker, D., Baron-Cohen, S. and David, A.S. (2004) Measuring Empathy — reliability and validity of the empathy quotient. Psychol. Med., 34: 911–919. LeDoux, J. (2000) Emotion circuits in the brain. Annual Review of Neuroscience, 23: 155–184.
416 Lee, T., Liu, H.-L., Hoosain, R., Liao, W.-T., Wu, C.-T., Yuan, K., Chan, C., Fox, P. and Gao, J. (2002) Gender differences in neural correlates of recognition of happy and sad faces in humans assessed by functional magnetic resonance imaging. Neurosci. Lett., 333: 13–16. Leslie, A.M. (1987) Pretence and representation: The origins of ‘‘theory of mind’’. Psychol. Rev., 94: 412–426. Leslie, A. (1995) ToMM, ToBy, and agency: core architecture and domain specificity. In: Hirschfeld, L. and Gelman, S. (Eds.), Domain Specificity in Cognition and Culture. Cambridge University Press, New York. Levenson, R.W. (1996) Biological substrates of empathy and facial modulation of emotion: two facets of the scientific legacy of John Lazetta. Motiv. Emotion, 20: 185–204. Lewis, P., Critchley, H., Rotshtein, P. and Dolan, R. (in press) Neural correlates of processing valence and arousal in affective words. Cerebral Cortex. Liotti, M., Mayberg, H.S., Brannan, S.K., McGinnis, S., Jerabek, P. and Fox, P.T. (2000) Differential limbic–cortical correlates of sadness and anxiety in healthy subjects: implications for affective disorders. Biol. Psychiat., 48: 30–42. Maccoby, E. (1999) The Two Sexes: Growing Up Apart, Coming Together. Harvard University Press, Cambridge, USA. Malhi, G., Lagopoulos, J., Ward, P., Kumari, V., Mitchell, D., Parker, G., Ivanovski, B. and Sachdev, P. (2004) Cognitive generation of affect in bipolar depression: an fMRI study. Eur. J. Neurosci., 19: 741–754. Mayberg, H.S., Liotti, M., Brannan, S.K., McGinnis, S., Mahurin, R.K., Jerabek, P.A., Silva, J.A., Tekell, J.L., Martin, C.C., Lancaster, J.L. and Fox, P.T. (1999) Reciprocal limbic-cortical function and negative mood: converging PET findings in depression and normal sadness. Am. J. Psychiat., 156: 675–682. Mealey, L. (1995) The sociobiology of sociopathy: an integrated evolutionary model. Behav. Brain Sci., 18: 523–599. Meltzoff, A.N. and Decety, J. (2003) What imitation tells us about social cognition: a rapproachement between developmental psychology and cognitive neuroscience. Philos. Trans. R. Soc., 358: 491–500. Mobbs, D., Greicius, M.D., Abdel-Azim, E., Menon, V. and Reiss, A.L. (2003) Humor modulates the mesolimbic reward centres. Neuron, 40: 1041–1048. Moll, J., de Oliveira-Souza, R., Eslinger, P., Bramati, I., Mourao-Miranda, J., Andreiuolo, P. and Pessoa, L. (2002) The neural correlates of moral sensitivity: A functional magnetic resonance imaging investigation of basic and moral emotions. J. Neurosci., 22: 2730–2736. Morris, J., Ohman, A. and Dolan, R.J. (1999) A subcortical pathway to the right amygdala mediating ‘unseen’ fear. PNAS, 96: 1680–1685. Murphy, F.C., Nimmo-Smith, I. and Lawrence, A.D. (2003) Functional neuroanatomy of emotions: a meta-analysis. Cogn. Affect. Behav. Neurosci., 3: 207–233. Nichols, S. (2001) Mindreading and the cognitive architecture underlying altruistic motivation. Mind Lang., 16: 425–455. O’Doherty, J. (2004) Reward representations and reward-related learning in the human brain: insights from neuroimaging. Current Opinion in Neurobiology, 14: 776–796.
O’Doherty, J., Deichmann, R., Critchley, H.D. and Dolan, R.J. (2002) Neural responses during anticipation of a primary taste reward. Neuron, 33: 815–826. Panksepp, J. (1998) Affective neuroscience: the foundations of human and animal emotions. Oxford University Press, New York. Pelphrey, K.A., Singerman, J.D., Allison, T. and McCarthy, G. (2003) Brain activation evoked by perception of gaze shifts: the influence of context. Neuropsychologia, 41: 156–170. Perrett, D.I. and Emery, N. (1994) Understanding the intentions of others from visual signals: neurophysiological evidence. Curr. Psychol. Cogn., 13: 683–694. Phan, K.L., Wager, T., Taylor, S.F. and Liberzon, I. (2002) Functional neuroanatomy of emotion: a meta-analysis of emotion activation studies in PET and fMRI. Neuroimage, 16: 331–348. Phillips, W., Baron-Cohen, S. and Rutter, M. (1998) Understanding intention in normal development and in autism. Br. J. Dev. Psychol., 16: 337–348. Phillips, M.L., Young, A.W., Scott, S., Calder, A.J., Andrew, C., Giampietro, V., Williams, S., Bullmore, E.T., Brammer, M.J. and Gray, J. (1998) Neural responses to facial and vocal expressions of fear and disgust. Proceedings of the Royal Society London B, 265: 1809–1817. Phillips, M., Young, A., Senior, C., Brammer, M., Andrew, C., Calder, A., Bullmore, E., Perrett, D., Rowland, D., Williams, S., Gray, J. and David, A. (1997) A specific neural substrate for perceiving facial expressions of disgust. Nature, 389: 495–498. Piaget, J. and Inhelder, B. (1956) The Child’s Conception of Space. Routledge and Kegan Paul, London. Power, T.G. (1985) Mother- and father-infant play: a developmental analysis. Child Dev., 56: 1514–1524. Prather, M., Lavenex, P., Mauldin-Jourdain, M., Mason, W., Capitanio, J., Mendoza, S. and Amaral, D. (2001) Increased social fear and decreased fear of objects in monkeys with neonatal amygdala lesions. Neuroscience, 106: 653–658. Pratt, C. and Bryant, P. (1990) Young children understand that looking leads to knowing (so long as they are looking into a single barrel). Child Dev., 61: 973–983. Premack, D. (1990) The infant’s theory of self-propelled objects. Cognition, 36: 1–16. Preston, S.D. and de Waal, F.B.M. (2002) Empathy: its ultimate and proximate bases. Behav. Brain Sci., 25: 1–72. Rahm, B., Opwis, K., Kaller, C., Spreer, J., Schwarzvald, R., et al. (2006) Tracking the subprocesses of decision-based action in the human frontal lobes. Neuroimage, 30: 656–667. Rolls, E.T. (2002). Neural basis of emotions. In: Smelsner N. and Baltes, P. (Eds.), International Encyclopedia of the Social and Behavioral Sciences. Elsevier, Amsterdam. Saxe, R., Carey, S. and Kanwisher, N. (2004) Understanding other minds: linking developmental psychology and functional neuroimaging. Annu. Rev. Psychol., 55: 87–124. Scaife, M. and Bruner, J. (1975) The capacity for joint visual attention in the infant. Nature, 253: 265–266. Schienle, A., Schafer, A., Stark, R., Walter, B. and Vaitl, D. (2005) Gender differences in the processing of disgust- and fear-inducing pictures: an fMRI study. Neuroreport, 16: 277–280.
417 Schroeder, U., Hennenlotter, A., Erhard, P., Haslinger, B., Stahl, R., Lange, K. and Ceballos-Baumann, A. (2004) Functional neuroanatomy of perceiving suprised faces. Hum. Brain Mapp., 23: 181–187. Shumake, J., Edwards, E. and Gonzalez-Lima, F. (2001) Hypermetabolism of paraventricular hypothalamus in the congenitally helpless rat. Neurosci. Lett., 311: 45–48. Singer, T., Seymour, B., O’Doherty, J., Kaube, H., Dolan, R.J. and Frith, C.D. (2004) Empathy for pain involves the affective but not sensory components of pain. Science, 303: 1157–1167. Skuse, D.H., James, R.S., Bishop, D.V.M., Coppins, B., Dalton, P., Aamodt-Leeper, G., Bacarese-Hamilton, M., Creswell, C., McGurk, R. and Jacobs, P.A. (1997) Evidence from Turner’s syndrome of the imprinted x-linked locus affecting cognitive function. Nature, 287: 705–708. Small, D., Gregory, M., Mak, Y., Gitelman, D., Mesulam, M. and Parrish, T. (2003) Dissociation of neural representation of intensity and affective valuation in human gustation. Neuron, 39: 701–711. Smith, P.M. (1985) Language, The Sexes and Society. Basil Blackwell, Oxford. Stein, E. (1989) On The Problem of Empathy. ICS Publications, Washington DC. Strayer, F.F. (1980) Child ethology and the study of preschool soical relations. In: Foot, H.C., Chapman, A.J. and Smith, J.R. (Eds.), Friendship and Social Relations in Children. Wiley, New York. Takahashi, H., Yahata, N., Koeda, M., Matsuda, T., Asai, K. and Okubo, Y. (2004) Brain activation associated with evaluative process of guilt and embarrassment: An fMRI study. NeuroImage, 23: 967–974. Tannen, D. (1991) You Just Don’t Understand: Women and Men in Conversation. Virago, London.
Trevarthen, C., 1989. The relation of autism to normal sociocultural development: the case for a primary disorder in regulation of cognitive growth by emotions. In: Lelord, G., Muk, J., Petit, M. (Eds.), Autisme er troubles du developpment global de l’enfant. Expansion Scientifique Francaise, Paris. Vogt, B.A. (2005) Pain and emotion interactions in subregions of the cingulate gyrus. Nat. Rev. Neurosci., 6: 533–544. Wager, T., Phan, K.L., Liberzon, I. and Taylor, S. (2003) Valence gender and lateralization of functional brain anatomy: A meta-analysis of findings from neuroimaging. NeuroImage, 19: 513–531. Walker, A.S. (1982) Intermodal perception of exptessive behaviours by human infants. J. Exp. Child Psychol., 33: 514–535. Wellman, H. (1990) Children’s Theories of Mind. Bradford/ MIT Press, Cambridge MA. Whiten, A. (1991) Natural Theories of Mind. Basil Blackwell, Oxford. Wicker, B., Perrett, D.I., Baron-Cohen, S. and Decety, J. (2003) Being the target of another’s emotion: a PET study. Neuropsychologia, 41: 139–146. Williams, J.H.G., Waiter, G.D., Perra, O., Perrett, D.I. and Whiten, A. (2005) An fMRI study of joint attention experience. NeuroImage, 25: 133–140. Wimmer, H., Hogrefe, J. and Perner, J. (1988) Children’s understanding of informational access as a source of knowledge. Child Dev., 59: 386–396. Wimmer, H. and Perner, J. (1983) Beliefs about beliefs: representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition, 13: 103–128. Yirmiya, N., Kasari, C., Sigman, M. and Mundy, P. (1990) Facial expressions of affect in autistic, mentally retarded, and normal children. J. Child Psychol. Psychiat.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 23
The multiple facets of empathy: a survey of theory and evidence Susanne Leiberg1,2, and Silke Anders1,3 1
Institute of Medical Psychology and Behavioral Neurobiology, University of Tu¨bingen, Tu¨bingen, Germany 2 Department of Psychiatry and Psychotherapy, RWTH Aachen University, Aachen, Germany 3 Section for Experimental MR of the CNS, University of Tu¨bingen, Tu¨bingen, Germany
Abstract: Empathy is the ability to perceive and understand other people’s emotions and to react appropriately. This ability is a necessary prerequisite for successful interpersonal interaction. Empathy is a multifaceted construct including low-level mechanisms like emotional contagion as well as high-level processes like perspective-taking. The ability to empathize varies between individuals and is considered a stable personality trait: some people are generally more successful in empathizing than others. In this chapter we will first present different conceptualizations of the construct of empathy, and refer to empathyregulating processes as well as to the relationship between empathy and social behavior. Then, we will review peripheral physiological and brain imaging studies pertaining to low- and high-level empathic processes, empathy-modulating processes, and the link between empathy and social behavior. Further, we will present evidence regarding interindividual differences in these processes as an important source of information for solving the conundrum of how the comprehension of others’ emotions is achieved by our brains. Keywords: empathy; emotional contagion; simulation theory; perspective-taking; theory of mind; emotion regulation; neuroimaging; psychophysiology interpersonal interactions. In most general terms, empathy refers to the ability to accurately perceive and understand another person’s emotions and to react appropriately. How this capability of understanding others’ emotions can be conceptualized and how it is influenced by psychological and social factors has been debated since the beginning of the last century. With the advent of modern neuroimaging methods, interest in empathy has been spurred, which has produced a considerable number of studies shedding light on the neural bases of empathy. This review is divided into two sections: The first part pertains to the theoretical background and reviews different conceptualizations of empathy. It will be shown that most researchers view empathy
Introduction As humans, we live in a highly complex social environment. Not only do we interact with our closest kin, but maintain a complicated social network with friends, colleagues, acquaintances, and the like. To achieve our goals in daily life, we have to deal with complete strangers or, worse, with people we would rather choose to avoid. Empathy — the capability to understand other people’s emotions — is crucial for comprehending intentions and behaviors as well as for adapting our own behavior in order to achieve smooth Corresponding author. Tel.: + 49-241-8088550;
Fax: +49-241-8082401; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56023-6
419
420
as a multifaceted construct involving low-level affective processes, like emotional contagion, sometimes regarded as automatic processes, and high-level cognitive processes, like perspectivetaking. The second part illustrates how peripheral physiological and neuroimaging studies contributed to the understanding of empathy. We will review studies related to contagion-like processes and perspective-taking. Additionally, we will dedicate one paragraph to individual differences in these processes, since we believe that the analysis of individual differences is useful for elucidating the neurobiology of empathy. Finally, studies will be reviewed that have investigated processes that may modulate empathic responses and studies that have addressed the relationship between empathy and social behavior.
Theoretical background Empathy has been the subject of investigation in many different research areas, including personality psychology (Allport, 1961), psychotherapy research (Rogers, 1975), social psychology (Batson and Coke, 1981), developmental psychology (Eisenberg and Strayer, 1987) and, recently, social neurosci-
ence (Decety and Jackson, 2004; Blair, 2005). Theodor Lipps (1903) used the German word ‘‘einfu¨hlen,’’ meaning ‘‘to feel into,’’ to explain how a person comes to know about the inner state of others. He believed that we achieve an understanding of another person by internally imitating the other’s gestures and actions (an idea that later came to be called ‘‘motor mimicry’’). Later on, ‘‘einfu¨hlen’’ was translated into ‘‘empathy’’ by Titchener (1909). Since the time of Lipps and Titchener, many definitions of empathy have been put forward (see Table 1), but still today a concise, agreed-upon definition of empathy is missing. The common denominator of the definitions listed in Table 1 is the compassionate responding to another person’s emotions (perhaps excluding Ickes’ (1997) definition). Disagreement subsists regarding the implementation of empathy and whether it occurs in a contagion-like fashion or depends on higher level cognitive processes. ‘‘Contagion-like’’ refers to ‘‘the tendency to automatically mimic and synchronize facial expressions, vocalizations, postures, and movements with those of another person and, consequently, to converge emotionally’’ (Hatfield et al., 1994). In contrast, higher level cognitive processes in empathy refer to actively taking the perspective of another person.
Table 1. Definitions of the term ‘‘empathy’’ Group
Definition
Ax (1964) Batson et al. (1987)
An autonomic nervous system state, which tends to simulate that of another person Other-oriented feelings of concern, compassion, and tenderness experienced as a result of witnessing another person’s suffering Affective response that stems from the apprehension or comprehension of another’s emotional state or condition and is similar to what the other person is feeling or would be expected to feel Affective response more appropriate to someone else’s situation than to ones own Complex form of psychological inference in which observation, memory, knowledge, and reasoning are combined to yield insights into the thoughts and feelings of others Any process where the attended perception of the object’s state generates a state in the subject that is more applicable to the object’s state or situation than to the subject’s own prior state or situation To perceive the internal frame of reference of another with accuracy and with the emotional components and meanings, which pertain thereto as if one were the person, but without ever losing the ‘‘as if’’ condition The capacity for participating in, or a vicarious experiencing of another’s feelings, volitions, or ideas and sometimes another’s movements to the point of executing bodily movements resembling his Attempt by one self-aware self to comprehend unjudgmentally the positive and negative experiences of another self
Eisenberg (2000) Hoffman (1984) Ickes (1997) Preston and de Waal (2002) Rogers (1959)
Webster’s Dictionary (1971) Wispe´ (1986)
421
Simulation theory and Theory– theory Researchers in line with Lipps stress the importance of contagion-like imitative processes, endorsing a simulationist account of empathy, whereas others emphasize the contribution of higher level cognitive processes favoring a theory–theory (TT) account of empathy. Simulation theory (ST) posits that we understand other people’s mental states, including their emotions, by reproducing or simulating the other’s mental state in our own minds. While traditional STs assume different simulation mechanisms varying in their degree of automaticity (Gordon, 1986; Goldman, 2005, 2006), others appear to reserve the term for an automatic process (Gallese et al., 2004). TT, on the other hand, holds that we use a lay theory of psychology to attribute mental states (such as beliefs, intentions, desires, or emotions) to others, thereby understanding their behavior. TT is closely related, if not equivalent, to the concept of theory of mind (ToM). It is argued that an organism possesses a ToM if it (a) imputes mental states to itself and others, (b) recognizes that others have mental states that can differ from its own, and (c) is aware that the behavior of other persons can be explained by their mental states (Premack and Woodruff, 1978; Frith and Frith, 2003). In the earlier literature, the term ToM appears to refer to the knowledge of the fact that others have a mind with mental states possibly differing from one’s own (Premack and Woodruff, 1978), whereas in recent years it has come to denote the ability or process of inferring others’ mental states, also called ‘‘mentalizing’’ or ‘‘mindreading’’ (Gallagher and Frith, 2003). A dichotomy has been (artificially) established between those researchers favoring ST (Gallese et al., 2004) and those who support TT (Saxe, 2005a). Following this discussion, one gets the impression that both sides talk at cross-purposes although neither of them holds that all aspects of understanding others can be sufficiently explained by only one of the theories (Gordon, 2005; Goldman and Sebanz, 2005; Mitchell, 2005; Saxe, 2005b, 2005c). Instead, a hybrid model of ST and TT is advocated for explaining our ability to understand others’ minds. However, the researchers involved
in this debate still differ in the importance they concede to either approach.
Contagion-like processes On the basis of the ideomotor theory (Prinz, 1987; Hommel et al., 2001), which assumes that an action is stored as a sensory feedback representation in our brains, Preston and de Waal (2002) proposed a perception-action model of empathy, reminiscent of Lipps’ internal imitation. This representation is activated when observing somebody perform the action, and will in turn prime the activation of the corresponding motor representation in the observer because of their overlap. Preston and de Waal transferred this idea to the concept of empathy. They surmise that the observation of a person in an emotional state automatically activates a representation of that emotional state, including its associated autonomic, somatic, and motor responses in the observer. The greater the overlap between the emotional state we observe and our representation of this emotional state, the greater the activation of this emotional state in ourselves. This model is in accordance with other recent theories, which emphasize the importance of the activation of representations for reenactment of associated feelings like the influential somatic marker hypothesis (Damasio, 1994) and conceptualizations of ST in terms of neural mirror networks (Gallese and Goldman, 1998; Gallese et al., 2004). STs as outlined so far can account for low-level contagion-like empathic responses to clear displays of emotions of similar beings in unequivocal situations. However, we are not only able to empathize with someone displaying a familiar emotional expression or with someone in a well-known emotional situation, but are capable to understand the emotions of others in diverse situations. And our empathic reactions to the very same emotional cues vary markedly depending on the situational context and our relationship with the other person. For instance, our empathic reaction to a person expressing fear will be much stronger when this person is actually being attacked than when this person’s fear is due to experiencing the thrills of a
422
haunted house. Furthermore, we can understand others’ emotions even if they do not explicitly show them. Knowing your friend, you will probably know that she is sad after she had failed her exam, although she does not have a sad facial expression, voice, or posture. Empathic reactions can be triggered by propositional information like verbal and written accounts of emotions, or completely internally, by thoughts or imagination. Finally, and probably most importantly, we can derive the inner state of a person when our own situation is completely different and even when we have never faced a similar situation — situations likely compromising successful simulation. Imagine, for example, how reading the letter of your sister in which she describes how she had to resign in her last horse jumping competition: you will be more likely to share her feelings when you are an ambitious sportsman who eagerly participates in competitions, but even if you hate sports and fear horses, you might have an idea what your sister felt. How can low-level simulation models accommodate these instances of empathizing? We argue that a model describing our ability to empathize cannot stop at simple contagion-like simulation mechanisms but has to add elaborate, cognitive processes, which can account for empathic responding in the mentioned examples. Baron-Cohen and colleagues (Baron-Cohen, 2005; see Chakrabarti and Baron-Cohen, this volume) have proposed a model of empathy comprising several lower and higher level mechanisms. The components of this model are thought to develop at different time points in a human’s life. Lower level mechanisms develop early and constitute the affective component of empathy in that they are emotional contagion-like processes. Higher level mechanisms, like ToM and the so-called Empathizing System, develop later on. Preston and de Waal (2002) recognize the involvement of higher level cognitive processes but do not specify exactly when these processes are employed and how they interact with the more basic simulation processes. These authors state that their development is closely related to the size of prefrontal cortex (PFC), which accommodates functions like working memory, goal assessment, and inhibitory con-
trol. These functions enable us to infer emotional states from more symbolic information, to integrate expressive and other cues, to differentiate between our own emotions and those of another person, and to inhibit prepotent responses.
Perspective-taking A higher level cognitive process by which understanding others’ emotions can be achieved is perspective-taking. Perspective-taking incorporates integrating information from different sources when inferring the other’s mental state and inhibiting one’s own perspective if necessary. A role of perspective-taking in empathy (Feshbach, 1978; Davis, 1996; Batson et al., 2003) and understanding others’ minds in general (Gordon, 1986; Goldman, 2006) has been acknowledged. Two different forms of perspective-taking have been distinguished (Stotland, 1969; Batson et al., 1997a; Davis et al., 2004): a self-focused form, in which one imagines how oneself would feel in the target’s situation (first-person perspective-taking), and an other-focused form, in which one imagines what the target is thinking and feeling (third-person perspective-taking). While both types of perspective-taking have been shown to generate empathic responses, only the former induces distress in the observer (Stotland, 1969; Batson et al., 1997a). Interestingly, the term perspective-taking is used by both, traditional simulation theorists and theory theorists. In traditional ST, perspective-taking is a second simulation process, which differs from contagion-like simulation in automaticity and level of awareness. In comparison to the contagion-like simulation processes where the observation of an emotional state in the target leads instantaneously to the activation of the representation of that emotional state in the observer, perspective-taking requires the observer to deliberately activate a representation of the target’s state and to integrate context information and knowledge about the other’s beliefs, desires, and preferences. Theory theorists have used the term perspective-taking to describe how one makes inferences about another person’s inner state using theoretical knowledge about the other’s situation.
423
As perspective-taking incorporates the integration of information about the person and the context it can improve the accuracy of simulation of or the predictions about the other’s emotional state. By knowing that your friend’s goal is to graduate from university you can understand that she is sad failing the exam, although her demeanor may not indicate this. In the same line, when you know that your sister loves the feeling of taking one seemingly insurmountable hurdle after the other, you can understand that she was disappointed having to resign from the race. Nevertheless, perspectivetaking does not necessarily lead to a correct representation of the other’s mind and may thus, like contagion-like processes, entail mistakes in inferring the other’s emotion.
Modulatory processes To meet situational demands but also to insure our personal well-being, it is sometimes indispensable to modulate our empathic responding. In some instances, our intentions or goals require enhancement of empathic responses. As a psychotherapist, for example, one is often confronted with people that appraise and react to situations differently than oneself. In this and other settings, where contagion-like processes are likely to be compromised, perspective-taking can substitute or amplify contagion-like processes and thus help to better understand the other’s emotions and to intensify our empathic response. In other instances, it is vital for our personal well-being to control contagionlike processes (Bandura, 1997). In our daily life we are often confronted with negative emotions, distress, and pain of other people. When we watch TV or go on the Internet, we are flooded with reports of ongoing wars and other inhumanities. If contagion-like simulation processes were automatic in a sense that they could not be controlled, all these instances would lead to an internal reproduction of stressful emotional states. One simple way to prevent empathic responding is to divert attention away from the aversive situation. Other ways to inhibit unwanted vicarious emotional reactions are emotion-regulation processes, i.e., processes that aim to modify emotional responses in order to
match situational demands (Gross, 1999, 2002). In empathy, two types of regulatory processes can be distinguished: one that controls one’s own emotions (self-focused) and one that controls the level of engagement with the other person (other-focused) (Reik, 1949; Eisenberg, 2000). The former might be similar to regulatory processes employed to modulate our emotional reactions to emotional stimuli in a nonsocial context.
Empathy and social behavior Empathy is a prerequisite for prosocial behavior (Eisenberg and Fabes, 1991; Trobst et al., 1994; Batson et al., 1997b; Batson, 1998). Only if we understand that our friend is sad about failing the exam we will make an effort to console her. Moreover, it is believed that the ability to regulate one’s emotions or to disengage oneself from the other person’s emotional state also figures prominently in prosocial behavior (Batson, 1991; Eisenberg, 2000). According to Eisenberg (Eisenberg and Strayer, 1987; Eisenberg, 2000), empathy can lead either to an other-oriented response or to personal distress, depending on how well someone can regulate one’s own emotion and the engagement with the observed person. If we become too distressed by empathizing with another person and are not capable of regulating our empathic response, we will rather try to alleviate our own distress than attend to the other person. Interestingly, the psychoanalyst (Reik, 1949) subsumed both processes — making the other’s experience one’s own (incorporation) and moving back from the merged inner relationship to a position of separate identity (detachment) — under the concept of empathy. It should be stressed, though, that the ability to empathize with, and to distance oneself from a person in distress is not a sufficient condition for prosocial behavior to arise. First, other factors, like the perceived urgency and the observer’s perceived efficacy of helping, influence the likelihood of helping behavior. As Preston and de Waal (2002) have suggested, the occurrence of prosocial behavior will be the outcome of ‘‘a complex cost/benefit analysis on the perceived effectiveness of helping and the effect of helping on short and long-term goals.’’
424
Second, in many real-life social situations, other people will be present, which may either foster or inhibit prosocial behavior (see, e.g., the phenomenon of responsibility diffusion, Darley and Latane, 1968). Finally, understanding another person’s emotions can as well lead to betrayal or deception of that person. Summarizing these theoretical accounts of empathy, we conclude that although an agreedupon definition of empathy is still missing and the neural underpinnings of empathy-related processes
are far from being understood, consensual opinions on some aspects of empathy are emerging. Empathy is a multifaceted construct incorporating contagion-like processes subserved by a perceptionrepresentation coupling and higher level processes like perspective-taking. Perspective-taking can modulate or substitute contagion-like processes. Furthermore, empathic responses can be subject to modulation. Empathy and the ability to modulate empathic responses are essential for social behavior. Fig. 1 depicts a schematic overview of
Fig. 1. Schematic overview of how understanding of others’ emotional states and an appropriate response toward the other person may evolve. Numbers on the arrows signify studies, which have addressed the process marked by the arrow. The numbers correspond to the following studies: 1Lundquist and Dimberg (1995), 2Dimberg et al. (2000), 3Vaughan and Lanzetta (1980), 4Dimberg (1982), 5 Wiesenfeld et al. (1984), 6McHugo et al. (1985), 7Levenson and Ruef (1992), 8Sonnby-Borgstro¨m (2002), 9Carr et al. (2003), 10Wicker et al. (2003), 11Leslie et al. (2004), 12Lanzetta and Englis (1989), 13Singer et al. (2006), 14Dimberg (1988), 15Stotland (1969), 16Hynes et al. (2006), 17Vo¨llm et al. (2006), 18Shamay-Tsoory et al. (2005a), 19Shamay-Tsoory et al. (2005b), 20Beauregard et al. (2001), 21Critchley et al. (2002), 22Ochsner et al. (2002), 23Levesque et al. (2003), 24Ochsner et al. (2004b), 25Phan et al. (2005), 26Eippert et al. (in press), 27 Eisenberg and Fabes (1991), 28Trobst et al. (1994), 29Batson et al. (1997b) and 30Batson (1998).
425
how these processes could work together in understanding other’s emotion and the generation of an other-oriented response. Evidence from peripheral physiological and neuroimaging studies Contagion-like processes As noted in the introduction, emotional contagion refers to the unintentional imitation of an observed person’s facial expression, tone of vocalization, posture, and movements, which consequently leads to an emotional state in the observer similar to the one of the target (Hatfield et al., 1994). Emotional contagion, in turn, will facilitate the understanding of the other’s emotion (Lipps, 1903; Hoffman, 1984). In this section we will present evidence for emotional contagion on different levels of emotional reactions, including motor as well as autonomic nervous system (ANS) responses and brain activity. Contagion-like motor responses More than a hundred years ago, Lipps (1903) assumed that we involuntarily imitate others’ facial expressions to understand their feelings. Less than 30 years ago, first proof for this hypothesis arose from electromyographic studies showing that looking at different emotional facial expressions evokes corresponding differential changes in facial muscle activity in the observer (Vaughan and Lanzetta, 1980; Dimberg, 1982; McHugo et al., 1985). When subjects are exposed to pictures of angry and happy facial expressions, they react with increased activity in the corrugator supercilii muscle and the zygomaticus major muscle, respectively (Dimberg, 1982, 1988). Notably, these facial reactions occur as early as 300 ms after stimulus onset and are also observed after subliminal stimulus presentation, ruling out that subjects voluntarily imitate the presented facial expressions (Lundquist and Dimberg, 1995; Dimberg et al., 2000). The existence of motor imitation is well proven, but its relation to emotional experience and emotion understanding is still under debate. While some studies have demonstrated that involuntary
imitation of emotional facial expressions results in corresponding subjective emotional experiences (Dimberg, 1988; Lundquist and Dimberg, 1995), others have failed to find a consistent relationship between motor imitation and induced emotions (Gump and Kulik, 1997; Hess and Blairy, 2001). Evidence favoring a facilitative effect of motor imitation on emotion understanding includes the finding that subjects prevented from imitating by putting a pen between their lips detect emotional changes in facial expressions significantly later than subjects free to imitate (Niedenthal et al., 2001). Furthermore, the anecdotic finding that facial resemblance between long-term spouses, ascribed to repeated facial imitation through the course of the relationship, is related to marital happiness (Zajonc et al., 1987), supports the notion that facial imitation plays a role in understanding others’ emotions. On the contrary, in an electromyography (EMG) study by Hess and Blairy (2001) where subjects viewed video clips and were asked to judge their own and the displayed emotions, no relation was found between facial imitation and induced emotions or judgment accuracy. In another study (McHugo et al., 1985), subjects viewed videotapes of the former president of the U.S.A. Ronald Reagan showing emotional facial expressions. Subjects exhibited congruent EMG responses to Reagan’s positive and negative facial expressions regardless of whether they approved of him or not. Interestingly, though, self-reported feelings varied with liking. Subjects in favor of Reagan reported emotions similar to the displayed facial expressions while subjects opposed to him reported negative feelings to all expressions. These latter results question the assumed connection between facial imitation, emotional experience, and recognition of emotional facial expressions (Blairy et al., 1999; Hess and Blairy, 2001). Thus, it has been suggested that facial imitation has a more communicative function and serves to signal others that we comprehend their emotional state (Bavelas et al., 1986; Hess et al., 1999). This hypothesis has been corroborated by the finding that observers’ motor imitation increased with the availability of eye contact with the target. When eye contact with the target was not achieved, subjects did not imitate the target’s facial expression (Bavelas et al., 1986).
426
Furthermore it was shown that subjects whose behavior was imitated perceive an interaction as smoother and the partner as more likable than subjects whose behavior was not imitated (Chartrand and Bargh, 1999). Further dispute concerns the question whether facial imitation occurs automatically in the sense that it does not depend on attention or intention. While Dimberg et al. (2000) demonstrated that facial imitation occurs rapidly and is elicited by stimuli the subject is not aware of, Lanzetta and Englis (1989) found a striking influence of the context on facial imitation in social interactions. Depending on whether subjects were engaged in a cooperative or competitive interaction, the observer’s EMG responses were either congruent or incongruent with the target’s facial expressions; Lanzetta and colleagues termed this behavior empathy and counterempathy, respectively. Since the EMG was sampled at a very low rate in this study (1 Hz), it is impossible to decide at what point in time these context-dependent responses occurred. Studies investigating whether different EMG response components exist, namely a rapid contagion-like response component and a later component that is influenced by contextual factors, are lacking.
Contagion-like autonomic responses A considerable body of research indicates that emotional contagion is also found in ANS activity. The linkage of the peripheral physiological responses between two people during communication was first demonstrated in studies of psychotherapist–client interaction. It was observed that heart rates of psychotherapists and clients changed synchronously during therapy sessions (DiMascio et al., 1955, 1957; Coleman et al., 1956; Stanek et al., 1973). On the basis of these findings, Kaplan and Bloom (1960) hypothesized a physiological component of empathy. The significance of shared physiology in empathy was later stressed by Levenson (Levenson and Ruef, 1992; Levenson, 2003). His first evidence for contagion-like autonomic responses was the finding that subjects experienced the same physiological change when participating in a conversation and when later watching this conversation on
videotape (Gottman and Levenson, 1985). Subsequently, Levenson and Ruef (1992) demonstrated that this ‘‘physiological linkage’’ is important for the understanding of others’ emotions. In this study, subjects watching video segments of marital interactions were instructed to rate the feelings of one of the spouses. The greater the synchrony between the subject’s and the target’s changes in heart rate, the greater was the accuracy with which the subject rated the target’s feelings (Levenson and Ruef, 1992). Levenson states that ‘‘synchrony of the autonomic responses of two people is associated with heightened emotional closeness and greater capacity for empathic accuracy’’ (Levenson, 1996). Lanzetta and Englis (1989) showed that ANS activity in response to a target’s facial expression is, like facial activity, modulated by context. Subjects expecting cooperation with the target showed increased autonomic arousal, indicated by elevated skin conductance responses (SCRs), when the target’s facial expression signaled distress, while they evinced decreased arousal in response to displays of pleasure. Subjects expecting competition with the target, in contrast, showed increased arousal to pleasure displays and relaxation to distress displays.
Contagion-like brain responses The discovery of mirror neurons (Gallese et al., 1996; Rizzolatti et al., 1996) revealed a possible neural implementation of contagion-like empathic processes. Originally, the term mirror neurons was introduced by Rizzolatti and colleagues (Gallese et al., 1996; Rizzolatti et al., 1996) to describe neurons in area F5 of the monkey premotor cortex that are activated during the execution as well as the observation of goal-directed actions (Iacoboni et al., 1999; Rizzolatti et al., 2001; Keysers and Perrett, 2004). Mirror neurons with similar properties are thought to exist in the inferior gyrus of the human PFC (Brodmann area [BA] 44). This area has been seen activated in neuroimaging studies during the execution and observation of goaldirected behavior (Iacoboni et al., 1999; Molnar-Szakacs et al., 2005). Notably, this mirror neuron network is also activated during first-person and third-person motor imagery (Fourkas et al.,
427
2005). However, neural networks with mirror properties seem to be neither restricted to goal-directed behavior nor confined to the PFC. Evidence is accumulating for the existence of mirror networks for facial expressions, sensations, and emotions. Recordings of single-neuron activity from the human anterior cingulate cortex (ACC) revealed increased activity during both reception and observation of painful stimulation (Hutchison et al., 1999). This evidence for a mirror neuron network for pain was supported by functional magnetic resonance imaging (fMRI) studies, which found increased activity in the ACC when subjects received a painful stimulus, observed a signal indicating that a loved one will receive a painful stimulus (Singer et al., 2004), directly observed a person getting a painful stimulus (Morrison et al., 2004; Jackson et al., 2005), or observed facial expressions of pain (Botvinick et al., 2005). Some of these studies identified the anterior insula as another component of this circuit for observing and receiving pain (Singer et al., 2004; Jackson et al., 2005; Botvinick et al., 2005). The insula as well as the ACC has been implicated in mediating the affective component of pain (Peyron et al., 2000). Contagion-like responses to another person’s pain do not seem to be restricted to affective networks. A recent transcranial magnetic stimulation (TMS) study (Avenanti et al., 2005) has shown that motor excitability is altered during observation of painful stimulation. Observing pinpricks to another person’s hand led to a reduction of motor evoked potentials to TMS over the motor cortex. The existence of mirror systems in emotion has been postulated by many researchers (Adolphs, 1999; Gallese, 2003a, b). They suggested that mirror networks constitute the neural base of contagionlike processes in empathy. First evidence for the existence of mirror networks for facial expressions emerged from a study by Carr et al. (2003). Viewing and deliberately imitating different facial expression resulted in increased activation in the pars opercularis of the inferior frontal gyrus, the premotor cortex, the insula, the superior temporal cortex, and the amygdala. In line with other research groups (Buck, 1980; Dimberg, 1988; Preston and de Waal, 2002), Carr et al. (2003) posit that representations of observed facial expressions activate
representations of corresponding emotions and thereby generate empathy. Corroborating evidence for a mirror neuron network for facial expressions comes from a study by Leslie et al. (2004) showing an overlap of activation for deliberate imitation and viewing emotional facial expressions in the right premotor cortex, the fusiform gyrus, and the right superior temporal cortex. Both studies found overlapping activity in the right premotor and the superior temporal cortex, pointing to a likely involvement of these areas in a mirror network of emotional facial expressions. Since these studies did not directly induce emotions in the subjects (although emotions might have been induced indirectly by producing and viewing facial expressions), they do not conclude that the areas found are part of a shared network for affect. Wicker et al. (2003) were the first to investigate overlapping neural activity during the actual experience and observation of an emotion. They elicited disgust by having their subjects inhale repulsive odorants. In the viewing condition, subjects observed someone inhaling something from a jar and subsequently showing a facial expression of disgust. Overlapping activity was found in the anterior insula and the ACC. While this is the first evidence of common neural activity during experience and observation of an emotion, it has to be shown whether the involvement of the insula is restricted to the experience and recognition of disgust, as suggested by lesion studies (Calder et al., 2000; Adolphs et al., 2003). Although these studies suggest that the observation of emotionally aroused people does lead to an activation of emotional mirror networks in a contagion-like manner, it has to be noted that in all neuroimaging studies discussed so far subjects were simply instructed to watch the other person in the viewing condition. None of the studies controlled for higher level cognitive processes. Thus, it remains to be shown how far the activations found in these studies depend on higher level processing.
Perspective-taking While contagion-like simulation processes may be at the core of our capacity to empathize, it is generally believed that higher level processes including
428
perspective-taking also play a role in empathy (Feshbach, 1978; Batson et al., 1997a). Owing to its complexity, perspective-taking is much more difficult to investigate with peripheral physiological measures than contagion-like processes. Neuroimaging methods on the other hand are especially suitable for studying processes that underlie perspective-taking in that they allow dissociation of different processes and are sensitive to subtle experimental manipulations. Stotland (1969) conducted the first and, to our knowledge, the only study that tried to elucidate the relationship between perspective-taking, subjective reports of empathy and distress, and a peripheral physiological component of empathic responding: the SCR. He distinguished between two forms of perspective-taking: one in which the subjects imagined how they would feel if they were exposed to the painful heat stimulus that was applied to a person they were observing (firstperson perspective-taking) and another one in which the subjects had to imagine what the other person was feeling in that situation (third-person perspective-taking). Both types of deliberate perspective-taking led to more empathic feelings and an increase in SCR compared to passively viewing the painful treatment. Additionally, the self-focused perspective-taking resulted in distress feelings. Of note, in both conditions the SCR did not begin to increase until 30 s after the experimenter had announced that the painful heat was applied to the victim, suggesting the dominance of more complex cognitive processes in both kinds of perspective-taking. Neuroimaging studies point to a role of the PFC in higher level empathic processes. An early fMRI study by Farrow et al. (2001) found ventromedial, dorsomedial, and ventrolateral PFC (vmPFC, dmPFC, and vlPFC, respectively) as well as temporal regions activated when subjects were asked to judge the emotional state of a person in a written scenario. Hynes et al. (2006) investigated whether inferring others’ emotions (emotional perspectivetaking) and inferring others’ thoughts (conceptual perspective-taking) rely on distinct neural networks: subjects were presented with written scenarios and required to indicate what the character was feeling (emotional perspective-taking) or thinking
(conceptual perspective-taking). Both conditions resulted in increased activity in medial PFC and right temporo-parietal regions. Additionally, emotional perspective-taking elicited a stronger hemodynamic response in the vmPFC and the vlPFC than conceptual perspective-taking. The authors interpreted these findings as evidence for the existence of distinct neural systems: one underlying perspectivetaking in general and others that are engaged depending on the type of information that has to be inferred. Specifically, the vmPFC is believed to be engaged in inferring another person’s emotional state. The finding of selective vmPFC activation during emotional perspective-taking is in accordance with studies demonstrating that patients with vmPFC lesions have striking difficulties with mentalizing tasks involving emotional processing but not with those devoid of emotional processing (Shamay-Tsoory et al., 2005b). However, the vmPFC has also been implicated in passively viewing social as opposed to nonsocial emotional scenes (Moll et al., 2002) as well as in linking a specific situation with its emotional value (Damasio, 1996) and emotional self-reflection (Lane et al., 1997; Mitchell et al., 2005). Thus, it is not clear whether the vmPFC is particularly involved in adopting the emotional perspective of another person or other aspects of empathy. Other neuroimaging studies contrasting mentalizing tasks with and without emotional processing demands do not support a role of the vmPFC in emotional perspective-taking (Shamay-Tsoory et al., 2005a; Vo¨llm et al., 2006). In a positron emission tomography study, healthy subjects listened to interviews in which characters described a distressful conflict or a neutral situation, and were asked questions about the characters’ feelings and thoughts. The dorsomedial, but not the ventromedial PFC, was more strongly activated during listening to and answering questions about the distressful interview than during listening to and answering questions about the neutral one (Shamay-Tsoory et al., 2005a). Using cartoonmentalizing tasks, Vo¨llm et al. (2006) found increased activation common to both emotional and conceptual perspective-taking in bilateral dmPFC, bilateral temporoparietal junction, and left temporal pole. Emotional perspective-taking was
429
associated with stronger activity in the lower part of the right dmPFC, the right ACC, the right posterior cingulate cortex, and the left amygdala. All neuroimaging studies reviewed above contrasted mentalizing tasks with and without emotional processing and found ventral and/or dorsal medial prefrontal areas more strongly activated during emotional mentalizing (see Fig. 2 for a summary of these studies’ results). Nevertheless, discrepancies in the topography of activation between these studies are evident, particularly concerning the involvement of the vmPFC or dmPFC. While some of these differences might be due to the different stimuli used, interpretation of these findings is further complicated by the fact that one study (Hynes et al., 2006) restricted the analysis to the ventral PFC (zo0 and y43, MNI coordinates). In addition to neuroimaging studies investigating possible differences between emotional and conceptual perspective-taking, also studies focusing on mentalizing in general have identified the medial PFC, especially the anterior paracingulate cortex (aPCC; the region at the border of BAs 9, 10, and
32) as a key structure in inferring other people’s mental states (Fletcher et al., 1995; Brunet et al., 2000; Castelli et al., 2000; McCabe et al., 2001; Vogeley et al., 2001; Berthoz et al., 2002; Gallagher and Frith, 2003; Walter et al., 2004; but see Bird et al., 2004). However, the aPCC might not be crucial for all types of mentalizing. Walter et al. (2004) have shown that the aPCC is more strongly activated during mentalizing tasks requiring the understanding of a person who is engaged in a social interaction than the understanding of a person who is not. The authors propose that the aPCC is primarily involved in understanding other people’s intentions in social interactions. While it is possible that the aPCC is exclusively involved in reasoning about social interactions, other areas of the medial PFC could play a role in mentalizing tasks in general, irrespective of the nature of the stimuli. But what exact role do these structures play in mentalizing processes? One idea is that medial PFC regions support the decoupling of one’s own and the other’s perspective (Leslie, 1994; Gallagher and Frith, 2003). This hypothesis is bolstered by the finding that the medial PFC is activated in
Fig. 2. Activation maxima in the medial and lateral prefrontal cortex found during emotional as opposed to conceptual perspectivetaking (filled icons) or in both, emotional and conceptual perspective-taking (open icons).
430
competitive games only when subjects believe they are competing with another participant and not when they think they are playing against a computer (McCabe et al., 2001; Gallagher et al., 2002). Other researchers (Vorauer and Ross, 1999; Ruby and Decety, 2001, 2003, 2004) have proposed a different contribution of the medial PFC to mentalizing. They advocate that while we adopt the perspective of another person, the medial PFC inhibits the self-perspective we predominantly view situations from (for a review concerning the dominance of the self-perspective see Birch and Bloom, 2004). Several studies have investigated differences in brain activation when subjects had to adopt a first-person or third-person perspective during visual, motor, conceptual, and emotional perspective-taking (Ruby and Decety, 2001, 2003, 2004; Vogeley et al., 2001; Grezes et al., 2004; Vogeley et al., 2004). Taking the perspective of another person resulted in stronger activation of medial prefrontal areas (dorsal and ventral alike), left temporal regions, and right inferior parietal cortex, regardless of the task, suggesting a role of these brain regions in the inhibition of the selfperspective. Anticipating the emotional reactions of one’s mother in an imagined emotional situation compared to anticipating one’s own emotional reactions in the same situation additionally activated the left temporal pole (Ruby and Decety, 2004), a region that has been associated with autobiographical recall (Fink et al., 1996). This finding nicely fits with the assumption that putting oneself into an emotional situation (first-person perspectivetaking, see theoretical background) relies less on activation of past knowledge than anticipating the emotional reactions of another person in a specific situation (third-person perspective-taking). While these studies provide valuable information concerning the neural networks mediating thirdperson perspective-taking, they do not strictly differentiate between the two mechanisms likely involved in this process: inhibition of one’s own perspective and inference of the other’s perspective. A recent study has tried to unravel these two mechanisms by modulating the self-perspective inhibition demands in false-belief tasks (Samson et al., 2005). A standard false-belief task was used for the high-inhibition condition in which subjects
had to say in which of two boxes a person would look for an object, knowing themselves where the object was and knowing that the person did not have this knowledge. This condition would thus require the inhibition of one’s own belief and the inference of the other’s belief. The task was modified for the low-inhibition condition in that it no longer entailed discrepant knowledge of the subject. In this condition the subject did not know where the object was placed. The low-inhibition condition thus only necessitates the inference of the other’s belief. A patient with a lesion to the right PFC extending into the right superior temporal gyrus performed normally on the low self-perspective inhibition task but was markedly impaired in the high self-perspective inhibition task. This supports the notion that prefrontal areas are involved in the suppression of the self-perspective when we try to take the perspective of another person but not in the inference of another person’s perspective per se. The fact that impairments in tasks with high selfperspective inhibition demands were found for both conceptual and emotional perspective-taking suggests that the inhibition of the self-perspective might rely on similar networks in both conceptual and emotional tasks. Interestingly, the impairments of three patients with intact PFC and STG but lesions to the left temporoparietal junction were not restricted to perspective-taking that required inhibition of the self-perspective (Apperly et al., 2004; Samson et al., 2004). The authors argue that the deficits of these three patients on false-belief tasks are not due to impaired inhibition of the self-perspective but to a different dysfunctional process, namely the inference of another person’s mental states. Thus, inhibition of the self-perspective and inference of the other’s perspective are putatively distinct neural processes subserved by prefrontal and temporoparietal regions, respectively.
Interindividual differences in empathic processes People differ in their ability to understand and share other’s feelings. Considering these individual differences might further elucidate some of the mechanisms that underlie empathy (see Chakrabarti and
431
Baron-Cohen, this volume). Studies investigating individual differences in dispositional empathy have used a wealth of different self-report measures to assess a person’s tendency to understand and share another person’s feelings. These self-report measures usually assess either ‘‘emotional empathy,’’ i.e., the tendency to vicariously experience another person’s emotion (similar to emotional contagion), or ‘‘cognitive empathy,’’ i.e., the ability to take another person’s perspective (Mehrabian, 1997; Baron-Cohen and Wheelwright, 2004). A typical item in questionnaires assessing emotional empathy would be ‘‘I am often quite touched by things that I see happen,’’ whereas ‘‘When I’m upset at someone, I usually try to ‘put myself in his shoes’ for a while’’ would be representative for cognitive empathy (items taken from the interpersonal reactivity index; Davis, 1983). Some items in these questionnaires, though, are not easy to assign to either of these two classes, since they contain aspects of both. In Table 2 we provide an overview of the measures used to assess dispositional empathy, including their subscales, and indicate when a relation with either contagion-like or perspective-taking processes has been found. EMG studies have shown more pronounced mimicking behavior in high than in low dispositional empathic subjects when they passively viewed emotional facial expressions (Sonnby-Borgstro¨m, 2002; Dimberg et al., 2005). Interestingly, subjects scoring high on dispositional empathy showed not
only more mimicking behavior but also a higher correspondence between facial EMG activity and self-reported feelings than low-empathic subjects. In high dispositional empathic subjects, activity in the zygomaticus major muscle correlated positively with the valence of the self-reported feeling, whereas in low dispositional empathic subjects this correlation was negative. Low empathic subjects smiled more when they reported to feel angry than when they reported to feel happy. Subjects scoring high on dispositional empathy report greater emotional responding and show higher SCR than low empathic subjects when viewing an empathyevoking film (Eisenberg et al., 1991). This confirms earlier findings showing that high empathic women exhibited larger SCR and more congruent facial expressions than low empathic women when watching videotapes of smiling, calm, and crying babies (Wiesenfeld et al., 1984). Hitherto there have not been many studies explicitly investigating individual differences in higher level empathic processes. First evidence for a relation between interindividual differences in cognitive empathic processes and emotional responding comes from a study demonstrating a positive correlation between the subjects’ dispositional ability to take the perspective of another person and their self-reported emotional responding to a film when they were instructed to imagine the emotions of the film characters (Eisenberg et al., 1991).
Table 2. Empathy questionnaire and their relation to contagion-like processes and perspective-taking Measure
Subscales
Truax rating scale (Truax, 1961) Questionnaire measure of emotional empathy (Mehrabian and Epstein, 1972)
Studies finding relation to contagion-like processes
Shamay-Tsoory et al. (2005a) Sonnby-Borgstro¨m (2002)
Wiesenfeld et al. (1984) Interpersonal reactivity index (Davis, 1983)
Perspective-taking Fantasy Empathic concern Personal distress
Balanced emotional Empathy scale (Mehrabian, 1997) Empathy scale (Leibetseder et al., 2001)
Studies finding relation to perspective-taking
Singer et al. (2004) Singer et al. (2004)
Empathic preparedness; concernment
Leiberg et al. (unpublished data)
432
These results suggests that there might be marked differences in both contagion-like and higher level empathic processes between subjects, which are reflected in questionnaire scores and which should be taken into account in order to resolve some of the discrepancies between studies investigating the relation between emotional responding and emotional judgments reviewed above. To investigate whether interindividual differences are also reflected in brain activity when different subjects are confronted with people in distress, we conducted an fMRI study in which subjects passively viewed photographs of victims of violence before they were asked to regulate their empathic responding (Leiberg et al., unpublished data). Using intersubject correlation analyses, we found that activity in the vmPFC during passively viewing disturbing photographs increased with dispositional emotional empathy. Together with the study by Singer et al. (2004), which found that overlapping activity in the ACC and the anterior insula during reception of a painful stimulus and observation of a signal indicating that a loved one will receive a painful stimulus was stronger in subjects who scored high on self-report measures of empathy, our study provides evidence for a relation between dispositional empathy and activation differences in neural circuits underlying simulation processes in empathy. In the study by Shamay-Tsoory et al. (2005a), where subjects had to listen to and answer questions about empathy-inducing or neutral-theme interviews, the relationship between individual differences in empathic accuracy and brain activity was also investigated. The answers of the subjects to the questions about the empathy-inducing interview were rated for their level of empathic accuracy. Empathic accuracy correlated positively with activity in the dmPFC, one of the areas that were more activated during listening to empathyeliciting interviews than to neutral ones. Thus, subjects who were more accurate in their understanding of the characters’ emotions showed stronger dmPFC activity. In our study, after passive viewing, subjects were instructed to actively engage with or disengage from the displayed person in distress. After each trial, subjects rated their success in engagement or disengagement.
Correlation analysis demonstrated a stronger activity in the vmPFC and the aPCC during engagement for subjects who rated their attempts to engage with the target as more successful than subjects who rated their attempts as less successful (Leiberg et al., 2005). As discussed above, both areas have been implicated in active empathic processes. The aPCC has been linked to decoupling one’s own from another person’s perspective and to self-perspective inhibition. The vmPFC appears to serve a more general role in emotional processing and might be specific for emotional as opposed to conceptual perspective-taking. Interestingly, activation in the vmPFC correlated not only with the average success ratings of the subjects but also with the average increase in startle eye blink responses of each subject during empathizing trials, indicating that deliberate perspective-taking might have activated some emotional simulation networks. Both aspects of empathic processing — taking the perspective of another person and activating appropriate emotional responses — might be more developed in some persons than in others. However, in this study, self-reported success in empathizing and differences in vmPFC and aPCC activity did not correlate with dispositional empathy, as assessed with self-report questionnaires, suggesting that current psychological instruments might not capture all aspects of interindividual differences in emotional perspective-taking or that situational empathic reactions are determined by more than the perception of one’s own ability to empathize. The relation between individual differences in dispositional empathy and empathic processes, i.e., contagion-like as well as more cognitive ones, is far from being understood. This is partly due to the item heterogeneity of self-report measures of dispositional empathy and the inconclusive assignment of those items to the different aspects of empathy, i.e., emotional and cognitive empathy.
Modulatory processes Two types of modulatory processes can be distinguished that serve to attenuate vicarious emotional responding in empathy: one that is focused on
433
one’s own emotions (self-focused emotion regulation) and another one that is focused on one’s engagement with the other person (other-focused emotion regulation). The first one is also employed to modulate emotional reactions to nonsocial emotional stimuli. Psychophysiological studies have shown that peripheral physiological measures like corrugator supercilii activity, startle eye blink amplitude, and SCR are sensitive to emotion regulation (Jackson et al., 2000; Eippert et al., in press). In recent years, neuroimaging studies (Beauregard et al., 2001; Ochsner et al., 2002, 2004b; Levesque et al., 2003; Phan et al., 2005; Eippert et al., in press) have investigated the neural bases of emotion regulation (for a review see Ochsner and Gross, 2005). It has been demonstrated that voluntarily downregulating electrodermal activity in a biofeedback task (Critchley et al., 2002) or downregulating sexual arousal (Beauregard et al., 2001) as well as reappraising negative visual stimuli in neutral terms (Ochsner et al., 2002, 2004b; Levesque et al., 2003; Phan et al., 2005; Eippert et al., in press) is associated with increased activity in the vmPFC, the dorsolateral PFC (dlPFC), and the ACC and with decreased activity in the amygdala. Deliberate upregulation of negative emotions also produced increased activity in the vmPFC, the dlPFC, and the ACC, as well as increased activity in the amygdala. While the amygdala and possibly also the vmPFC are core structures in emotional responding, the dlPFC and the ACC have been implicated in the cognitive control of behavior (Cabeza and Nyberg, 2000; Miller and Cohen, 2001), conflict monitoring (Botvinick et al., 2004), and the inhibition of predominant responses (Garavan et al., 1999; Braver et al., 2001). The studies on emotion regulation suggest that the dlPFC and the ACC play a role in modulating emotional responses by exerting topdown control on areas involved in the processing of emotional stimuli. However, the stimulus material employed in these studies was rather homogeneous. All but two studies (Beauregard et al., 2001; Levesque et al., 2003) used pictures from the International Affective Picture System (IAPS; Lang et al., 2005) varying largely in content and emotional meaning. The other two studies employed film excerpts.
None of these studies distinguished between potentially empathy-eliciting social-emotional stimuli and stimuli without any social content. In another study, Harenski and Hamann (2006) explored whether downregulating emotional responses to social stimuli with moral or nonmoral content recruits differential neural networks. Notably, both conditions entailed increased activity in ventrolateral and dorsal PFC, whereby decreasing emotional response to stimuli with moral content recruited additional regions. However, as presumably not all of these stimuli induced empathic emotional responses, it is not clear how these results relate to emotion regulation in empathy. It is still unclear whether the structures involved in regulating one’s own emotions also mediate the modulation of the engagement with another person. Recent brain imaging studies support the view that networks subserving self-focused and otherfocused responses are partly distinct. Evaluation of one’s own and another person’s emotions elicits common activation in the medial PFC and differential activation in other parts of the medial and the lateral PFC (Ochsner et al., 2004a). Internally and externally focused processing of stimuli engages different prefrontal regions (Christoff et al., 2003; Ochsner et al., 2004b). In our study (Leiberg et al., unpublished data), subjects were asked not only to actively empathize with a depicted person in distress, but also, in a different condition, to disengage from that person by thinking that the photograph was not real but from a movie, or that the person was not a person but an actor or a doll. Disengaging from, vs. engaging with the victim, yielded stronger activation in vlPFC, dmPFC, and dlPFC. Thus, it appears that other-focused disengagement engages additional brain regions compared to self-focused emotion regulation (Ochsner et al., 2004b; Eippert et al., in press), supporting the notion of two distinct modulatory processes.
Empathy and social behavior The capability to understand and share other people’s emotions has an impact on our behavior toward these people (see also Chakrabarti and Baron-Cohen, this volume). On the one hand, a
434
positive relation has been found between empathy and helping behavior (Eisenberg and Fabes, 1991; Trobst et al., 1994; Batson et al., 1997b; Batson, 1998), which is inverted if observers of a person in need of help are overwhelmed by their vicarious emotional experience and are not able to distance themselves from the observed person’s emotional state (Batson, 1991; Eisenberg and Fabes, 1998). On the other hand, an inverse relation has been observed between empathy and aggressive behavior (Miller and Eisenberg, 1988; Davis, 1996; Mehrabian, 1997). Supporting the notion that deviant social behavior might be a result of diminished empathic responding, boys with disruptive behavior disorder have been found to show lower scores on an empathy questionnaire and significantly less corrugator supercilii activity to angry expressions than agematched healthy controls (de Wied et al., 2006). Neuroimaging studies are only beginning to identify the neural substrates underlying the link between empathy and behavior. Mathiak and Weber (in press) investigated neural activity underlying aggressive behavior in virtual environments. During violent acts, subjects exhibited reduced activity in the medial PFC and the amygdala — areas that have been found active when engaging with, or taking the perspective of, a person in distress (Hynes et al., 2006; Vo¨llm et al., 2006). Reduced activity in these areas suggests that subjects distanced themselves from and suppressed empathic responding to the victims-to-be. It would be of great interest to see whether activity in these areas is increased when the subjects’ task is not to kill opponents but to help associates. Virtual reality appears to be well suited to study the relationship between empathy and social behavior. In another study from our laboratory (Lotze et al., submitted), subjects played a competitive reaction time task against an opponent and whichever person was faster in responding to a cue was allowed to administer an aversive stimulus of his choice to the other. While in fact the second participant was a collaborator of the experimenter, subjects believed that their opponent was a fellow subject and saw him receiving the aversive stimulus. Subjects who scored high on a self-report psychopathy scale exhibited significantly smaller
activity in the vmPFC and significantly smaller SCR during adjustment of the stimulus intensity than subjects who scored low on this measure. Diminished autonomic and vmPFC responding in highly callous subjects was associated with stronger self-reported feelings of aggression and more aggressive behavior. Given that the vmPFC has been implicated in emotion processing in general and emotional perspective-taking in particular (Hynes et al., 2006), these findings support a connection between vicarious emotional responses and the inhibition of aggressive behavior.
Summary and conclusions The investigation of the nature of empathy, i.e., the ability to perceive, share, and understand others’ emotions, has attracted many researchers from different fields. Two main approaches to the construct of empathy can be identified (see Table 1): one that is focusing on the contagion-like manner in which the perception of a person in an emotional state can result in a similar affective response in the observer, and one that concentrates on the understanding of another person’s emotion by means of perspective-taking. Two processes have been proposed to underlie empathy. One that assumes that the understanding of other people’s emotions depends on an internal simulation process, which relies on shared representations between the observer and the target (simulation theory), and another that proposes that humans possess a theory of mind, which they use to attribute mental states like emotions to other people and thereby understand their emotions (theory–theory). Most neuroscientists favor the view that empathy can be achieved by both types of processes that work closely intertwined. In most of the situations when we observe someone in an emotional state, some contagion-like processes will be more or less automatically initiated. To what extent contagion-like processes are employed and result in a ‘‘correct’’ representation depends on the current emotional state of the observer and the experience the observer has with the target’s situation. When a situation is ambiguous or too complex, other processes, like perspective-taking, which do not
435
necessarily produce the same emotional state in the observer as in the target, will be used to understand the other’s emotions. Peripheral physiological and neuroimaging studies provide converging evidence for the significance of both contagion-like processes and perspective-taking in empathy. They demonstrate automatic imitation of observed motor behavior, synchrony in ANS activity during empathic judgments, and shared neural circuits for executing and viewing facial expressions, and experiencing and observing emotions. It remains to be seen whether the activation of a representation of the target’s emotional state in the observer is necessary and sufficient for understanding other people’s emotions. The results so far are equivocal concerning the relationship between activated representations, subjective feelings, and the understanding of the other person’s emotions. While some studies on contagion-like effects on motor and ANS activity find a connection between these three parameters, others do not. There is tentative evidence suggesting that contagion-like processes do not proceed in a strictly automatic manner but may be influenced by contextual factors (Lanzetta and Englis, 1989). A very recent fMRI study (Singer et al., 2006) employing the Prisoner’s Dilemma paradigm has shown that the brain activity of male subjects during observing an opponent in pain was modulated by the perceived fairness of the opponent. Activity in pain-related brain regions was significantly reduced when observing an unfair compared to a fair opponent in pain. Because of the rather coarse time resolution of fMRI, it is not possible to decide whether this modulating influence of context is already present at the initial neural response or this response is completely stimulus-driven and later on altered by contextual factors. Neuroimaging studies suggest that prefrontal areas, specifically the medial prefrontal cortex, play a role in higher level empathic processes that enable us to infer another person’s emotional state in the presence of insufficient or conflicting information, or when the other person’s emotional situation is completely alien to us. The aPCC in particular has been implicated in the inhibition of the first-person perspective when adoption of
another person’s position is required to understand her mental state. On the other hand, more ventromedial prefrontal regions seem to be involved in the processing of emotional information and possibly represent emotional responses elicited when adopting another person’s perspective. If perspective-taking is a secondary process eliciting simulation in a top-down manner, one would assume that in addition to the medial prefrontal areas involved in inhibition of the first-person perspective, similar brain regions as in studies on contagion-like processes would be activated. The reviewed emotional perspective-taking studies do point this way (Vo¨llm et al., 2006), but bearing in mind that the employed paradigms vary, it is likely too early to come to a definite conclusion. Questionnaire and behavioral studies have shown that the ability to empathize with another person does not only vary depending on the situation but also interindividually. Contagion effects on motor, ANS, and brain activity are predicted by dispositional empathy as assessed with questionnaire measures. Subjects scoring high on these questionnaires exhibit stronger mimicking behavior, increases in SCR, and activity in empathy-related brain areas. A relation between interindividual differences in the ability to take the perspective of another person and brain activity during higher level empathic processes has so far only been shown for measures of situational empathy, like empathic accuracy or success of empathizing, rather than dispositional empathy. This could be due to the heterogeneity of the questionnaire items, some assessing affective, some cognitive aspects of empathy. Intentional processes regulate empathic processes and the ensuing responses. Both the ability to empathize with other people and the ability to regulate vicarious emotional responses are prerequisites for prosocial behavior. Diminished empathic responding has been shown to be related to deviant social behavior like enhanced aggression. Medial prefrontal regions are implicated in the suppression of empathic responding likely necessary to perform acts of aggression. Presently, neuroimaging studies are beginning to elucidate the neural substrates underlying the link between empathy and social behavior.
436
Abbreviations ACC aPCC BA dlPFC dmPFC EMG fMRI PFC SCR ST ToM TMS TT vlPFC vmPFC
anterior cingulate cortex anterior paracingulate cortex Brodmann area dorsolateral prefrontal cortex dorsomedial prefrontal cortex electromyography functional magnetic resonance imaging prefrontal cortex skin conductance response simulation theory theory of mind transcranial magnetic stimulation theory theory ventrolateral prefrontal cortex ventromedial prefrontal cortex
Acknowledgments This work was supported by Deutsche Forschungsgemeinschaft and the Junior Science Program of the Heidelberger Academy of Sciences and Humanities. We thank Robert Langner for valuable discussions and Krystyna Swirszcz for helpful comments.
References Adolphs, R. (1999) Social cognition and the human brain. Trends Cogn. Sci., 3: 469–479. Adolphs, R., Tranel, D. and Damasio, A.R. (2003) Dissociable neural systems for recognizing emotions. Brain Cogn., 52: 61–69. Allport, G. (1961) Pattern and Growth in Personality. Holt, Rinehart & Winston, New York. Apperly, I.A., Samson, D., Chiavarino, C. and Humphreys, G.W. (2004) Frontal and temporo-parietal lobe contributions to theory of mind: neuropsychological evidence from a falsebelief task with reduced language and executive demands. J. Cogn. Neurosci., 16: 1773–1784. Avenanti, A., Bueti, D., Galati, G. and Aglioti, S.M. (2005) Transcranial magnetic stimulation highlights the sensorimotor side of empathy for pain. Nat. Neurosci., 8: 955–960. Ax, A.A. (1964) Goals and methods of psychophysiology. Psychophysiology, 1: 8–25.
Bandura, A. (1997) Self-Efficacy: The Exercise of Control. Freeman, New York. Baron-Cohen, S. (2005) The empathizing system: A revision of the 1994 model of the Mindreading System. In: Ellis, B. and Bjorklund, D. (Eds.), Origins of the Social Mind. Guilford, New York, pp. 468–492. Baron-Cohen, S. and Wheelwright, S. (2004) The empathy quotient: an investigation of adults with Asperger syndrome or high functioning autism, and normal sex differences. J. Autism. Dev. Disord., 34: 163–175. Batson, C.D. (1991) The Altruism Question: Towards a SocialPsychological Answer. Erlbaum, Mahwah, NJ. Batson, C.D. (1998) Altruism and prosocial behavior. In: Gilbert, D.T., Fiske, S.T. and Lindzey, G. (Eds.), The Handbook of Social Psychology. McGraw-Hill, Boston, pp. 282–316. Batson, C.D. and Coke, J. (1981) Empathy: a source of altruistic motivation for helping. In: Rushton, J. and Sorrentino, R. (Eds.), Altruism and Helping Behavior. Erlbaum, Hillsdale, NJ, pp. 167–187. Batson, C.D., Early, S. and Salvarani, G. (1997a) Perspective taking: imagining how another feels versus imagining how you would feel. Pers. Soc. Psychol. Bull., 23: 751–758. Batson, C.D., Fultz, J. and Schoenrade, P.A. (1987) Adult’s emotional reactions to the distress of other’s. In: Eisenberg, N. and Strayer, J. (Eds.), Empathy and Its Development. Cambridge University Press, Cambridge, pp. 163–184. Batson, C.D., Lishner, D.A., Carpenter, A., Dulin, L., Harjusola-Webb, S., Stocks, E.L., Gale, S., Hassan, O. and Sampat, B. (2003) ‘‘yAs you would have them do unto you’’: Does imagining yourself in the other’s place stimulate moral action? Pers. Soc. Psychol. Bull., 29: 1190–1201. Batson, C.D., Sager, K., Garst, E., Kang, M., Rubchinsky, K. and Dawson, K. (1997b) Is empathy-induced helping due to self-other merging? J. Pers. Soc. Psychol., 73: 495–509. Bavelas, J.B., Black, A., Chovil, N. and Lemery, C.R. (1986) ‘I show how you feel’: motor mimicry as a communicative act. J. Pers. Soc. Psychol., 50: 322–329. Beauregard, M., Levesque, J. and Bourgouin, P. (2001) Neural correlates of conscious self-regulation of emotion. J. Neurosci., 21: RC165. Berthoz, S., Armony, J.L., Blair, R.J. and Dolan, R.J. (2002) An fMRI study of intentional and unintentional (embarrassing) violations of social norms. Brain, 125: 1696–1708. Birch, S.A. and Bloom, P. (2004) Understanding children’s and adults’ limitations in mental state reasoning. Trends Cogn. Sci., 8: 255–260. Bird, C.M., Castelli, F., Malik, O., Frith, U. and Husain, M. (2004) The impact of extensive medial frontal lobe damage on ‘Theory of Mind’ and cognition. Brain, 127: 914–928. Blair, R.J. (2005) Responding to the emotions of others: dissociating forms of empathy through the study of typical and psychiatric populations. Conscious Cogn., 14: 698–718. Blairy, S., Herrera, P. and Hess, U. (1999) Mimicry and the judgment of emotional facial expressions. JNB, 23: 5–41. Botvinick, M.M., Cohen, J.D. and Carter, C.S. (2004) Conflict monitoring and anterior cingulate cortex: an update. Trends Cogn. Sci., 8: 539–546.
437 Botvinick, M., Jha, A.P., Bylsma, L.M., Fabian, S.A., Solomon, P.E. and Prkachin, K.M. (2005) Viewing facial expressions of pain engages cortical areas involved in the direct experience of pain. Neuroimage, 25: 312–319. Braver, T.S., Barch, D.M., Gray, J., Molfese, D.L. and Snyder, A. (2001) Anterior cingulate cortex and response conflict: effects of frequency, inhibition and errors. Cereb. Cortex, 11: 825–836. Brunet, E., Sarfati, Y., Hardy-Bayle´, M.-C. and Decety, J. (2000) A PET investigation of the attribution of intentions with a nonverbal task. Neuroimage, 11: 157–166. Buck, R. (1980) Nonverbal behavior and the theory of emotion: the facial feedback hypothesis. J. Pers. Soc. Psychol., 38: 811–824. Cabeza, R. and Nyberg, L. (2000) Imaging cognition II: an empirical review of 275 PET and fMRI studies. J. Cogn. Neurosci., 12: 1–47. Calder, A.J., Keane, J., Manes, F., Antoun, N. and Young, A.W. (2000) Impaired recognition and experience of disgust following brain injury. Nat. Rev. Neurosci., 3: 1077–1078. Carr, L., Iacoboni, M., Dubeau, M.C., Mazziotta, J.C. and Lenzi, G.L. (2003) Neural mechanisms of empathy in humans: a relay from neural systems for imitation to limbic areas. Proc. Natl. Acad. Sci. USA, 100: 5497–5502. Castelli, F., Happe, F., Frith, U. and Frith, C.D. (2000) Movement and mind: a functional imaging study of perception and interpretation of complex intentional movement patterns. Neuroimage, 12: 314–325. Chartrand, T.L. and Bargh, J.A. (1999) The chameleon effect: the perception-behavior link and social interaction. J. Pers. Soc. Psychol., 76: 893–910. Christoff, K., Ream, J.M., Geddes, L.P. and Gabrieli, J.D. (2003) Evaluating self-generated information: anterior prefrontal contributions to human cognition. Behav. Neurosci., 117: 1161–1168. Coleman, R.M., Greenblatt, M. and Solomon, H.C. (1956) Physiological evidence of rapport during psychotherapeutic interviews. Dis. Nervous. Systems, 17: 71–77. Critchley, H.D., Melmed, R.N., Featherstone, E., Mathias, C.J. and Dolan, R.J. (2002) Volitional control of autonomic arousal: a functional magnetic resonance study. Neuroimage, 16: 909–919. Damasio, A.R. (1994) Descartes’ Error: Emotion, Reason, and the Human Brain. Putnam, New York. Damasio, A.R. (1996) The somatic marker hypothesis and the possible functions of the prefrontal cortex. Philos. Trans. R. Soc. Lond. B Biol. Sci., 351: 1413–1420. Darley, J.M. and Latane, B. (1968) Bystander intervention in emergencies: diffusion of responsibility. J. Pers. Soc. Psychol., 8: 377–383. Davis, M.H. (1983) Measuring individual differences in empathy: evidence for a multidimensional approach. J. Pers. Soc. Psychol., 44: 113–126. Davis, M.H. (1996) Empathy a Social Psychological Approach. Westview, Boulder, CO. Davis, M.H., Soderlund, T., Cole, J., Gadol, E., Kute, M., Myers, M. and Weihing, J. (2004) Cognitions associated with
attempts to empathize: how do we imagine the perspective of another? Pers. Soc. Psychol. Bull., 30: 1625–1635. Decety, J. and Jackson, P.L. (2004) The functional architecture of human empathy. Behav. Cogn. Neurosci. Rev., 3: 71–100. de Wied, M., van Boxtel, A., Zaalberg, R., Goudena, P.P. and Matthys, W. (2006) Facial EMG responses to dynamic emotional facial expressions in boys with disruptive behavior disorders. J. Psychiatr. Res., 40: 112–121. DiMascio, A., Boyd, R.W. and Greenblatt, M. (1957) Physiological correlates of tension and antagonism during psychotherapy: a study of ‘‘interpersonal physiology’’. Psychosom. Med., 19: 99–104. DiMascio, A., Boyd, R.W., Greenblatt, M. and Solomon, H.C. (1955) The psychiatric interview: a sociophysiologic study. Dis. Nervous Systems, 16: 4–9. Dimberg, U. (1982) Facial reactions to facial expressions. Psychophysiology, 19: 643–647. Dimberg, U. (1988) Facial electromyography and the experience of emotion. J. Psychophysiol., 3: 277–282. Dimberg, U., Andreasson, P. and Thunberg, M. (2005) Empathy and facial reactions to facial expressions. Psychophysiology, 42(Suppl 1): 50. Dimberg, U., Thunberg, M. and Elmehed, K. (2000) Unconscious facial reactions to emotional facial expressions. Psychol. Sci., 11: 86–89. Eippert, F., Veit, R., Weiskopf, N., Erb, M. and Birbaumer, N. Coping with fear: brain activity during emotion regulation. Hum. Brain Mapp., In press. Eisenberg, N. (2000) Emotion, regulation, and moral development. Annu. Rev. Psychol., 51: 665–697. Eisenberg, N. and Fabes, R.A. (1991) Prosocial behavior and empathy: a multimethod, developmental perspective. In: Clark, M. (Ed.), Review of Personality and Social Psychology. Sage, Newbury Park, CA, pp. 34–61. Eisenberg, N. and Fabes, R.A. (1998) Prosocial development. In: Damon, W. (Ed.) Handbook of Child Psychology: Vol. 3. Social, Emotional, and Personality Development. Wiley, New York, pp. 701–778. Eisenberg, N., Fabes, R.A., Schaller, M., Miller, P., Carlo, G., Poulin, R., Shea, C. and Shell, R. (1991) Personality and socialization correlates of vicarious emotional responding. J. Pers. Soc. Psychol., 61: 459–470. Eisenberg, N. and Strayer, J. (1987) Empathy and Its Development. Cambridge University Press, Cambridge. Farrow, T.F., Zheng, Y., Wilkinson, I.D., Spence, S.A., Deakin, J.F., Tarrier, N., Griffiths, P.D. and Woodruff, P.W. (2001) Investigating the functional anatomy of empathy and forgiveness. Neuroreport, 12: 2433–2438. Feshbach, N.D. (1978) Studies of empathic behavior in children. In: Maher, B.A. (Ed.), Progress in Experimental Personality Research. Academic Press, New York, pp. 1–47. Fink, G.R., Markowitsch, H.J., Reinkemeier, M., Bruckbauer, T., Kessler, J. and Heiss, W.D. (1996) Cerebral representations of one’s own past: neural networks involved in autobiographical memory. J. Neurosci., 16: 4275–4282. Fletcher, P.C., Happe, F., Frith, U., Baker, S.C., Dolan, R.J., Frackowiak, R.S. and Frith, C.D. (1995) Other minds in the
438 brain: a functional imaging study of ‘theory of mind’ in story comprehension. Cognition, 57: 109–128. Fourkas, A.D., Avenanti, A., Urgesi, C. and Aglioti, S.M. (2006) Corticospinal facilitation during first and third person imagery. Exp. Brain Res., 168: 143–151. Frith, U. and Frith, C.D. (2003) Development and neurophysiology of mentalizing. Philos. Trans. R. Soc. Lond. B Biol. Sci., 358: 459–473. Gallagher, H.L. and Frith, C.D. (2003) Functional imaging of ‘theory of mind’. Trends Cogn. Sci., 7: 77–83. Gallagher, H.L., Jack, A.I., Roepstorff, A. and Frith, C.D. (2002) Imagining the intentional stance in a competitive game. Neuroimage, 16: 814–821. Gallese, V. (2003a) The manifold nature of interpersonal relations: the quest for a common mechanism. Philos. Trans. R. Soc. Lond. B. Biol. Sci., 358: 517–528. Gallese, V. (2003b) The roots of empathy: the shared manifold hypothesis and the neural basis of intersubjectivity. Psychopathology, 36: 171–180. Gallese, V., Fadiga, L., Fogassi, L. and Rizzolatti, G. (1996) Action recognition in the premotor cortex. Brain, 119(Pt 2): 593–609. Gallese, V. and Goldman, A. (1998) Mirror neurons and the simulation theory of mind-reading. Trends Cogn. Sci., 2: 493–501. Gallese, V., Keysers, C. and Rizzolatti, G. (2004) A unifying view of the basis of social cognition. Trends Cogn. Sci., 8: 396–403. Garavan, H., Ross, T.J. and Stein, E.A. (1999) Right hemispheric dominance of inhibitory control: an event-related functional MRI study. Proc. Natl. Acad. Sci. USA, 96: 8301–8306. Goldman, A. (2005) Mirror systems, social understanding and social cognition. http://www.interdisciplines.org/mirror/ papers/3/6/1/. Goldman, A. (2006) Simulating Minds: The Philosophy, Psychology, and Neuroscience of Mindreading. Oxford University Press, New York. Goldman, A.I. and Sebanz, N. (2005) Simulation, mirroring, and a different argument from error. Trends Cogn. Sci., 9: 320. Gordon, R.M. (1986) Folk psychology as simulation. Mind Lang., 1: 158–170. Gordon, R.M. (2005) Simulation and systematic errors in prediction. Trends Cogn. Sci., 9: 361–362. Gottman, J.M. and Levenson, R.W. (1985) A valid measure for obtaining self-report of affect. J. Consult. Clin. Psychol., 53: 151–160. Grezes, J., Frith, C.D. and Passingham, R.E. (2004) Inferring false beliefs from the actions of oneself and others: an fMRI study. Neuroimage, 24: 744–750. Gross, J.J. (1999) Emotion regulation: past, present, future. Cogn. Emotion, 13: 551–573. Gross, J.J. (2002) Emotion regulation: affective, cognitive, and social consequences. Psychophysiology, 39: 281–291. Gump, B.B. and Kulik, J.A. (1997) Stress, affiliation, and emotional contagion. J. Pers. Soc. Psychol., 72: 305–319. Harenski, C.L. and Hamann, S. (2006) Neural correlates of regulating negative emotions related to moral violations. Neuroimage, 30: 313–324.
Hatfield, E., Cacioppo, J.T. and Rapson, R. (1994) Emotional Contagion. Cambridge University Press, New York. Hess, U. and Blairy, S. (2001) Facial mimicry and emotional contagion to dynamic emotional facial expressions and their influence on decoding accuracy. Int. J. Psychophysiol., 40: 129–141. Hess, U., Blairy, S. and Philippot, P. (1999) Facial mimicry. In: Philippot, P., Feldman, R. and Coats, E. (Eds.), The Social Context of Nonverbal Behavior. Cambridge University Press, New York, pp. 213–241. Hoffman, M.L. (1984) Interaction of affect and cognition in empathy. In: Izard, C., Kagan, J. and Zajonc, R. (Eds.), Emotions, Cognition, and Behavior. Cambridge University Press, New York, pp. 103–131. Hommel, B., Mu¨sseler, J., Aschersleben, G. and Prinz, W. (2001) The theory of event coding (TEC): a framework for perception and action planning. Behav. Brain Sci., 24: 849–878. Hutchison, W.D., Davis, K.D., Lozano, A.M., Tasker, R.R. and Dostrovsky, J.O. (1999) Pain-related neurons in the human cingulate cortex. Nat. Neurosci., 2: 403–405. Hynes, C.A., Baird, A.A. and Grafton, S.T. (2006) Differential role of the orbital frontal lobe in emotional versus cognitive perspective-taking. Neuropsychologia, 44: 374–383. Iacoboni, M., Woods, R.P., Brass, M., Bekkering, H., Mazziotta, J.C. and Rizzolatti, G. (1999) Cortical mechanisms of human imitation. Science, 286: 2526–2528. Ickes, W. (1997) Empathic Accuracy. Guilford, New York. Jackson, D.C., Malmstadt, J.R., Larson, C.L. and Davidson, R.J. (2000) Suppression and enhancement of emotional responses to unpleasant pictures. Psychophysiology, 37: 515–522. Jackson, P.L., Meltzoff, A.N. and Decety, J. (2005) How do we perceive the pain of others? A window into the neural processes involved in empathy. Neuroimage, 24: 771–779. Kaplan, H.B. and Bloom, S.W. (1960) The use of sociological and social-psychological concepts in physiological research: a review of selected experimental studies. J. Nervous Mental Disord., 131: 128–134. Keysers, C. and Perrett, D.I. (2004) Demystifying social cognition: a Hebbian perspective. Trends Cogn. Sci., 8: 501–507. Lane, R.D., Fink, G.R., Chau, P.M. and Dolan, R.J. (1997) Neural activation during selective attention to subjective emotional responses. Neuroreport, 8: 3969–3972. Lang, P.J., Bradley, M.M. and Cuthbert, B.N. (2005) International affective picture system (IAPS): digitized photographs, instruction manual and affective ratings. Technical Report A-6. Lanzetta, J.T. and Englis, B.G. (1989) Expectations of cooperation and competition and their effects on observers’ vicarious emotional responses. J. Pers. Soc. Psychol., 56: 543–554. Leiberg, S., Eippert, F., Veit, R., Birbaumer, N. and Anders, S. (2005) Prefrontal networks in empathy [abstract]. Presented at the 11th Conference on Functional Mapping of the Human Brain, June 12–16, 2005, Toronto, Canada. Available on CD-Rom in Neuroimage, Vol. 26, Supplement 1.
439 Leibetseder, M., Laireiter, A.-R., Riepler, A. and Ko¨ller, T. (2001) E-SKala: Fragebogen zur Erfassung von EmpathieBeschreibung und psychometrische Eigenschaften. Z. Differen. Diagn. Psychol., 22: 70–85. Leslie, A.M. (1994) Pretending and believing: issues in the theory of ToMM. Cognition, 50: 211–238. Leslie, K.R., Johnson-Frey, S.H. and Grafton, S.T. (2004) Functional imaging of face and hand imitation: towards a motor theory of empathy. Neuroimage, 21: 601–607. Levenson, R.W. (1996) Biological substrates of empathy and facial modulation of emotion: two facets of the scientific legacy of John Lanzetta. Motiv. Emotion, 20: 185–204. Levenson, R.W. (2003) Blood, sweat, and fears. Ann. N Y Acad. Sci., 1000: 348–366. Levenson, R.W. and Ruef, A.M. (1992) Empathy: a physiological substrate. J. Pers. Soc. Psychol., 63: 234–246. Levesque, J., Eugene, F., Joanette, Y., Paquette, V., Mensour, B., Beaudoin, G., Leroux, J.M., Bourgouin, P. and Beauregard, M. (2003) Neural circuitry underlying voluntary suppression of sadness. Biol. Psychiatry, 53: 502–510. Lipps, T. (1903) Einfu¨hlung, innere Nachahmung, und Organempfindungen. Arch. gesamte Psychol., 1: 185–204. Lotze, M., Veit, R., Anders, S. and Birbaumer, N. The role of medial prefrontal cortex in the control of social-interactive aggression, Submitted. Lundquist, L.-O. and Dimberg, U. (1995) Facial expressions are contagious. J. Psychophysiol., 9: 203–211. Mathiak, K. and Weber, D.L. Towards brain correlates of natural behavior: fMRI during violent video games. Hum. Brain Mapp., electronically published. McCabe, K., Houser, D., Ryan, L., Smith, V. and Trouard, T. (2001) A functional imaging study of cooperation in two-person reciprocal exchange. Proc. Natl. Acad. Sci. USA, 98: 11832–11835. McHugo, G.J., Lanzetta, J.T., Sullivan, D.G., Masters, R.D. and Englis, B.G. (1985) Emotional reactions to a political leader’s expressive displays. J. Pers. Soc. Psychol., 49: 1513–1529. Mehrabian, A. (1997) Relations among personality scales of aggression, violence, and empathy: validational evidence bearing on the risk of eruptive violence scale. Aggress. Behav., 23: 433–445. Mehrabian, A. and Epstein, N. (1972) A measure of emotional empathy. J. Pers., 40: 525–543. Miller, E.K. and Cohen, J.D. (2001) An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci., 24: 167–202. Miller, P.A. and Eisenberg, N. (1988) The relation of empathy to aggressive and externalizing/antisocial behavior. Psychol. Bull., 103: 324–344. Mitchell, J.P. (2005) The false dichotomy between simulation and theory-theory: the argument’s error. Trends Cogn. Sci., 9: 363–364. Mitchell, J.P., Banaji, M.R. and Macrae, C.N. (2005) The link between social cognition and self-referential thought in the medial prefrontal cortex. J. Cogn. Neurosci., 17: 1306–1315.
Molnar-Szakacs, I., Iacoboni, M., Koski, L. and Mazziotta, J.C. (2005) Functional segregation within pars opercularis of the inferior frontal gyrus: evidence from fMRI studies of imitation and action observation. Cereb. Cortex, 15: 986–994. Moll, J., Oliveira-Souza, R., Eslinger, P.J., Bramati, I.E., Mourao-Miranda, J., Andreiuolo, P.A. and Pessoa, L. (2002) The neural correlates of moral sensitivity: a functional magnetic resonance imaging investigation of basic and moral emotions. J. Neurosci., 22: 2730–2736. Morrison, I., Lloyd, D., di Pellegrino, G. and Roberts, N. (2004) Vicarious responses to pain in anterior cingulate cortex: is empathy a multisensory issue? Cogn. Affect. Behav. Neurosci., 4: 270–278. Niedenthal, P.M., Brauer, M., Halberstadt, J.B. and InnesKer, A.H. (2001) When did her smile drop? Facial mimicry and the influences of emotional state on the detection of change in emotional expression. Cogn. Emotion, 15: 853–864. Ochsner, K.N. and Gross, J.J. (2005) The cognitive control of emotion. Trends Cogn. Sci., 9: 242–249. Ochsner, K.N., Bunge, S.A., Gross, J.J. and Gabrieli, J.D. (2002) Rethinking feelings: an FMRI study of the cognitive regulation of emotion. J. Cogn. Neurosci., 14: 1215–1229. Ochsner, K.N., Knierim, K., Ludlow, D.H., Hanelin, J., Ramachandran, T., Glover, G. and Mackey, S.C. (2004a) Reflecting upon feelings: an fMRI study of neural systems supporting the attribution of emotion to self and other. J. Cogn. Neurosci., 16: 1746–1772. Ochsner, K.N., Ray, R.D., Cooper, J.C., Robertson, E.R., Chopra, S., Gabrieli, J.D. and Gross, J.J. (2004b) For better or for worse: neural systems supporting the cognitive down- and up-regulation of negative emotion. Neuroimage, 23: 483–499. Peyron, R., Laurent, B. and Garcia-Larrea, L. (2000) Functional imaging of brain responses to pain. a review and meta-analysis (2000). Neurophysiol. Clin., 30: 263–288. Phan, K.L., Fitzgerald, D.A., Nathan, P.J., Moore, G.J., Uhde, T.W. and Tancer, M.E. (2005) Neural substrates for voluntary suppression of negative affect: a functional magnetic resonance imaging study. Biol. Psychiatry, 57: 210–219. Premack, D. and Woodruff, G. (1978) Does the chimpanzee have a theory of mind? Behav. Brain Sci., 1: 515–526. Preston, S.D. and de Waal, F.B. (2002) Empathy: its ultimate and proximate bases. Behav. Brain Sci., 25: 1–20. Prinz, W. (1987) Ideomotor action. In: Heuer, H. and Sanders, A.F. (Eds.), Perspectives on Perception and Action. Erlbaum, Hillsdale, NJ, pp. 47–76. Reik, T. (1949) Character Analysis. Farrar, Strauss, Giroux, New York. Rizzolatti, G., Fadiga, L., Gallese, V. and Fogassi, L. (1996) Premotor cortex and the recognition of motor actions. Brain Res. Cogn. Brain Res., 3: 131–141. Rizzolatti, G., Fogassi, L. and Gallese, V. (2001) Neurophysiological mechanisms underlying the understanding and imitation of action. Nat. Rev. Neurosci., 2: 661–670. Rogers, C. (1959) A theory of therapy, personality, and interpersonal relationships as developed in the client-centered
440 framework. In: Koch, J.S. (Ed.) Psychology: a Study of a Science: Vol. 3. Formulations of the Person in the Social Context. McGraw-Hill, New York, pp. 184–256. Rogers, C. (1975) Empathic: an unappreciated way of being. The Couns. Psychol., 2: 2–10. Ruby, P. and Decety, J. (2001) Effect of subjective perspective taking during simulation of action: a PET investigation of agency. Nat. Neurosci., 4: 546–550. Ruby, P. and Decety, J. (2003) What you believe versus what you think they believe: a neuroimaging study of conceptual perspective-taking. Eur. J. Neurosci., 17: 2475–2480. Ruby, P. and Decety, J. (2004) How would you feel versus how do you think she would feel? a neuroimaging study of perspective-taking with social emotions. J. Cogn. Neurosci., 16: 988–999. Samson, D., Apperly, I.A., Chiavarino, C. and Humphreys, G.W. (2004) Left temporoparietal junction is necessary for representing someone else’s belief. Nat. Neurosci., 7: 499–500. Samson, D., Apperly, I.A., Kathirgamanathan, U. and Humphreys, G.W. (2005) Seeing it my way: a case of a selective deficit in inhibiting self-perspective. Brain, 128: 1102–1111. Saxe, R. (2005a) Against simulation: the argument from error. Trends Cogn. Sci., 9: 174–179. Saxe, R. (2005b) Hybrid vigour: reply to Mitchell. Trends Cogn. Sci., 9: 364. Saxe, R. (2005c) On ignorance and being wrong: reply to Gordon. Trends Cogn. Sci., 9: 362–363. Shamay-Tsoory, S.G., Lester, H., Chisin, R., Israel, O., Bar-Shalom, R., Peretz, A., Tomer, R., Tsitrinbaum, Z. and Aharon-Peretz, J. (2005a) The neural correlates of understanding the other’s distress: a positron emission tomography investigation of accurate empathy. Neuroimage, 27: 468–472. Shamay-Tsoory, S.G., Tomer, R., Berger, B.D., Goldsher, D. and Aharon-Peretz, J. (2005b) Impaired ‘‘affective theory of mind’’ is associated with right ventromedial prefrontal damage. Cogn. Behav. Neurol., 18: 55–67. Singer, T., Seymour, B., O’Doherty, J., Kaube, H., Dolan, R.J. and Frith, C.D. (2004) Empathy for pain involves the affective but not sensory components of pain. Science, 303: 1157–1162. Singer, T., Seymour, B., O’Doherty, J.P., Stephan, K.E., Dolan, R.J. and Frith, C.D. (2006) Empathic neural responses are modulated by the perceived fairness of others. Nature, 439: 466–469. Sonnby-Borgstro¨m, M. (2002) Automatic mimicry reactions as related to differences in emotional empathy. Scand. J. Psychol., 43: 433–443. Stanek, B., Hahn, R. and Mayer, H. (1973) Biometric findings on cardiac neurosis. III. Changes in ECG and heart rate in
cardiophobic patients and their doctor during psychoanalytical inital interviews. Psychother. Psychosom., 22: 289–299. Stotland, E. (1969) Exploratory investigations of empathy. In: Berkowitz, L. (Ed.), Advances in Experimental Social Psychology. Academic Press, New York, pp. 271–314. Titchener, E. (1909) Experimental Psychology of the Thought Processes. Macmillan, New York. Trobst, K.K., Collins, R.L. and Embree, J.M. (1994) The role of emotion in social support provision: gender, empathy and expressions of distress. J. Soc. Pers. Relat., 11: 45–62. Truax, C.B. (1961) A scale for measurement of empathy. Psychiatr. Inst., 1: 12. Vaughan, K.B. and Lanzetta, J.T. (1980) Vicarious instigation and conditioning of facial expressive and autonomic responses to a model’s expressive display of pain. J. Pers. Soc. Psychol., 38: 909–923. Vogeley, K., Bussfeld, P., Newen, A., Herrmann, S., Happe, F., Falkai, P., Maier, W., Shah, N.J., Fink, G.R. and Zilles, K. (2001) Mind reading: neural mechanisms of theory of mind and self-perspective. Neuroimage, 14: 170–181. Vogeley, K., May, M., Ritzl, A., Falkai, P., Zilles, K. and Fink, G.R. (2004) Neural correlates of first-person perspective as one constituent of human self-consciousness. J. Cogn. Neurosci., 16: 817–827. Vo¨llm, B.A., Taylor, A.N., Richardson, P., Corcoran, R., Stirling, J., McKie, S., Deakin, J.F. and Elliott, R. (2006) Neuronal correlates of theory of mind and empathy: a functional magnetic resonance imaging study in a nonverbal task. Neuroimage, 29: 90–98. Vorauer, J.D. and Ross, M. (1999) Self-awareness and feeling transparent: failing to suppress one’s self. J. Exp. Soc. Psychol., 35: 415–440. Walter, H., Adenzato, M., Ciaramidaro, A., Enrici, I., Pia, L. and Bara, B.G. (2004) Understanding intentions in social interaction: the role of the anterior paracingulate cortex. J. Cogn. Neurosci., 16: 1854–1863. Wicker, B., Keysers, C., Plailly, J., Royet, J.P., Gallese, V. and Rizzolatti, G. (2003) Both of us disgusted in My insula: the common neural basis of seeing and feeling disgust. Neuron, 40: 655–664. Wiesenfeld, A.R., Whitman, P.B. and Malatesta, C.Z. (1984) Individual differences among adult women in sensitivity to infants: evidence in support of an empathy concept. J. Pers. Soc. Psychol., 46: 118–124. Wispe´, L. (1986) The distinction between sympathy and empathy: to call forth a concept, a word is needed. J. Pers. Soc. Psychol., 50: 314–321. Zajonc, R.B., Adelmann, P.K., Murphy, S.T. and Niedenthal, P.M. (1987) Convergence in the physical appearance of spouses. Motiv. Emotion (Hist. Arch.), 11: 335–346.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 24
Partly dissociable neural substrates for recognizing basic emotions: a critical review Andreas Hennenlotter1, and Ulrike Schroeder2 1
Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, StephanstraX e 1A, D-04103 Leipzig, Germany 2 Klinik Holthausen, Am Hagen 20, D-45527 Hattingen, Germany
Abstract: Facial expressions are powerful non-verbal displays of emotion which signal valence information to others and constitute an important communicative element in social interaction. Six basic emotional expressions (fear, disgust, anger, surprise, happiness, and sadness) have been shown to be universal in their performance and perception. Recently, a growing number of clinical and functional imaging studies have aimed at identifying partly dissociable neural subsystems for recognizing basic emotions. Convincing results have been obtained for fearful and disgusted facial expressions only. Empirical evidence for a specialized neural representation of anger, surprise, sadness, or happiness is more limited, primarily due to lack of clinical cases with selective impairments in recognizing these emotions. In functional imaging research, the detection of dissociable neural responses requires direct comparisons of signal changes associated with the perception of different emotions, which are often not provided. Only recently has evidence been obtained that the recruitment of emotion-specific neural subsystems may be closely linked to characteristic facial features of single expressions such as the eye region for fearful faces. Investigations into the neural systems underlying the processing of such diagnostic cues for each of the six basic emotions may be helpful to further elucidate their neural representation. While the argument for phylogenetic continuity plays an important role in contemporary explanations of emotions, Darwin’s vestigialism has largely been replaced by the view that expressions of emotion are adaptive and had been selected for social communication (Schmidt and Cohn, 2001). Emotion that is manifested by facial expression signals occurrences of value, and being able to transfer and receive such information undoubtedly confers a survival advantage. It is generally accepted today that six basic emotional expressions (happiness, surprise, fear, sadness, anger, and disgust) are universal in their performance and in their perception (Ekman et al., 1969, 1987). Motivated by recent advances in cognitive neurosciences, a growing number of clinical and functional imaging studies have aimed at identifying
Introduction The formal evolutionary treatment of human facial expressions began with Charles Darwin’s ‘‘The Expression of the Emotions in Man and Animals’’ (Darwin, 1872). Darwin found evidence for continuity in bodily movements and facial gestures that humans shared with animals. He used these resemblances across species to argue for common descent. Darwin’s view of facial expressions, however, was not ‘‘evolutionary’’ at all, because he did not consider them as adaptations but accidents or vestiges of earlier evolutionary stages in which the intellect was of less importance (Fridlund, 1994). Corresponding author. Tel.: +49-341-99-40-266; Fax: +49341-99-40-260; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56024-8
443
444
common as well as dissociable neural substrates associated with the recognition of basic emotions. Posterior occipito-temporal regions have been associated with the perceptual analysis of facial expressive features (Haxby et al., 2000). The extraction of emotional meaning from faces has been linked to the ventral prefrontal cortex and somatosensoryrelated regions of the right hemisphere where lesions result in a general impairment in facial emotion recognition (Rapcsak et al., 1993; Adolphs et al., 1996, 2000; Hornak et al., 1996). Consistent with these findings, functional imaging studies have revealed ventral prefrontal and somatosensory-related activations primarily in association with explicit recognition tasks such as labeling expressed emotion and facial expression matching (Nakamura et al., 1999; Narumoto et al., 2000; Gorno-Tempini et al., 2001; Winston et al., 2003). Findings from clinical studies in patients with focal brain lesions or specific neurological disorders have also provided evidence that recognition can be impaired for specific emotions such as fear (e.g., Adolphs et al., 1999) and disgust (e.g., Sprengelmeyer et al., 1996; Gray et al., 1997). Accordingly, functional imaging studies revealed that certain brain regions are primarily implicated in the processing of specific emotions (e.g., Morris et al., 1996; Phillips et al., 1997), suggesting that processing of individual emotions might be subserved by partly separable neural subsystems (Calder et al., 2001). In this review, we will critically examine findings from lesion, neurophysiologic, and functional imaging studies with respect to the question of whether recognition of basic emotions is subserved by partly dissociable neural subsystems. By including each of the six basic emotions, we provide an important extension to previous reviews that either focused on the two most extensively studied emotions, namely fear and disgust (Calder et al., 2001), or examined the neural basis of face (Posamentier and Abdi, 2003) and facial expression processing (Adolphs, 2002a, 2002b; Blair, 2003) more generally.
Fear Facial expressions of fear have been described as unconditioned stimuli that allow to rapidly convey
information to conspecifics that a novel stimulus is aversive and should be avoided (Mineka and Cook, 1993). Thereby, fearful faces enable a quick and defensive response that is critical for survival. Accordingly, fear expressions are recognized significantly faster out of visual noise than neutral faces (Reinders et al., 2005) and their processing occurs automatically, largely independently of attention and awareness (Esteves et al., 1994; Ohman et al., 2001). Human lesion studies have found impaired recognition of fearful faces following amygdala damage (Adolphs et al., 1994; Young et al., 1995; Calder et al., 1996; Sprengelmeyer et al., 1999; Anderson and Phelps, 2000; Sato et al., 2002). The first case (SM) was described by Adolphs et al. (1994, 1995). SM suffers from Urbach–Wiethe disease that has caused a nearly complete bilateral destruction of the amygdala as well as a small portion of the adjacent entorhinal cortex, while sparing other subcortical and cortical regions. In comparison to control subjects (unilateral amygdala damage, other brain damage, and healthy subjects), SM showed abnormal ratings of facial expressions of fear and, to a lesser extent, anger and surprise that could not be explained by deficits in basal cognitive or visual functions. Impairment of fear recognition was also observed for patients with right unilateral anteromedial temporal lobectomies that included the amygdala (Anderson et al., 2000). In a larger group involving nine patients with bilateral amygdala damage, individual performances of fear recognition varied considerably, ranging from severely impaired to completely normal (Adolphs et al., 1999). Preserved fear recognition ability for two of those patients with bilateral amygdala lesions (GT and EP) is consistent with an earlier study that found no significant impairments in GT’s and EP’s ratings of any emotion (Hamann et al., 1996). However, a re-examination of these two patients and a third patient, GP (Schmolck and Squire, 2001), using a different method of analyzing these patients’ ratings and an additional forced-choice labeling task, revealed impairments of all three patients, particularly for recognizing fear and sadness. Besides deficits in fear recognition, patients with amygdala damage are often impaired on recognition of other
445
emotions (e.g., Young et al., 1995; Calder et al., 1996; Young et al., 1996; Broks et al., 1998; Adolphs et al., 1999; Rapcsak et al., 2000; Schmolck and Squire, 2001; Sato et al., 2002). However, given the heterogeneity of amygdala lesions (bilateral/unilateral, complete/partial) and various etiologies (congenital, encephalitis, and surgical), the finding of impaired fear recognition is surprisingly consistent. Furthermore, varying degrees of additional damage to other brain regions observed in these patients may also contribute to recognition deficits in more than just one emotion (Rapcsak et al., 2000). Using various techniques (e.g., ‘‘Bubbles,’’ Gosselin and Schyns, 2001; Schyns et al., 2002) to estimate which aspects of a facial expression are most important when recognizing specific emotions, a recent study elucidated the mechanism by which amygdala damage may compromise fear recognition (Adolphs et al., 2005). The authors could show that the fear recognition deficit of a patient with complete bilateral amygdala lesions (SM) is caused by an inability to use information from the eye region of faces, which is the most important feature for identifying this emotion. Her deficit in recognizing fearful expressions completely disappeared when she was instructed to explicitly look at the eyes. Interestingly, the eye region is also important for recognizing anger and sadness (Smith et al., 2005), which, apart from fear, has often been reported to be impaired after amygdala damage (Adolphs et al., 1999). Numerous imaging studies on facial emotion recognition have found amygdala activation in response to fearful faces (Breiter et al., 1996; Morris et al., 1996, 1998; Phillips et al., 1997, 1998b, 2004; Sprengelmeyer et al., 1998; Whalen et al., 1998, 2001; Thomas et al., 2001; Wright et al., 2001; Yang et al., 2002; Fischer et al., 2003). Some of these studies reported a dissociable response to fearful faces, particularly within the left amygdala. In these studies, signal changes associated with the perception of fear were directly compared to signal changes elicited by the perception of a different emotion such as happiness (Morris et al., 1996, 1998; Whalen et al., 1998; Wright et al., 2001), disgust (Phillips et al., 1998b, 2004), or anger (Whalen et al., 2001). The studies
of Morris et al. (1996, 1998) additionally allowed comparing signal changes associated with the perception of different expression intensities of the same emotion. Both studies found a differential response to fearful faces relative to happy faces within the left amygdala, which increased with increasing intensity of fear expressions. The failure to detect activation of the right amygdala may be explained by the finding of more rapid habituation of the right than the left amygdala (Phillips et al., 2001; Wright et al., 2001). It is worth mentioning that dissociable responses within the left amygdala (revealed by direct comparisons of signal changes associated with the perception of different emotions) have not always been found for perception of fearful faces (Winston et al., 2003), and occasionally have been observed for emotions other than fear, such as sadness (Blair et al., 1999) and disgust (GornoTempini et al., 2001). In line with evidence from a recent lesion study (Adolphs et al., 2005), findings derived from functional imaging research have also provided evidence that the amygdala may be specifically involved in the processing of information from the eye region of faces. These studies showed that amygdala responses to fearful faces are modulated by the degree of threat-related ambiguity as a function of gaze direction (Adams et al., 2003), that fearful eyes alone are sufficient to produce amygdala responses (Morris et al., 2002), and that these responses appear to be driven by the size of the white scleral field and not by the outline of the eye (Whalen et al., 2004). Findings from lesion and functional imaging research therefore provide converging evidence concerning both, the important role of the amygdala in fear recognition and the specific facial cues that it may be attuned to.
Disgust The facial expression of disgust signals important information regarding the quality of food, potential physical contamination and disease, and induces avoidance behavior in conspecifics. The adaptive significance of disgust has been related to a specific form of threat response associated with
446
an internal defense system, as opposed to an external defense system related to fear (Calder et al., 2001). The first evidence that perception of disgusted faces might be associated with a particular brain region came from an investigation on emotion recognition in people with Huntington’s disease, a dominantly inherited neurodegenerative disorder. These patients showed a disproportionately severe impairment in recognizing disgusted faces (Sprengelmeyer et al., 1996), which was later replicated in a sample of Chinese patients suffering from Huntington’s disease (Wang et al., 2003). Subsequent studies in pre-symptomatic Huntington’s disease gene carriers revealed a selective deficit in recognizing disgust, i.e., recognition of other emotions was intact or only mildly impaired (Gray et al., 1997; Hennenlotter et al., 2004; Sprengelmeyer et al., 2005). Only one study reported a more generalized deficit of emotion recognition in people with manifest Huntington’s disease and no emotion recognition deficit in people at risk of carrying the Huntington gene (Milders et al., 2003). Since Huntington’s disease has been conceptualized primarily as a basal ganglia disorder, Sprengelmeyer et al. (1996) proposed that disgust recognition may be closely associated with the basal ganglia. However, pathological changes in Huntington’s disease are not confined to the striatum, but also affect cortical regions (de la Monte et al., 1988; Jernigan et al., 1991), including the insular cortex (Thieben et al., 2002). The latter region has been implicated in impaired disgust recognition by a single case study of a patient who suffered from focal damage to the left insula and putamen (Calder et al., 2000). Moreover, Krolak-Salmon et al. (2003) demonstrated intracerebral event-related potentials to facial expressions of disgust from insular contacts in patients suffering from drug-refractory temporal lobe epilepsy. Apart from neural degeneration of the basal ganglia, early neural loss in the insula may be considered as a possible explanation for impaired disgust recognition in Huntington’s disease. Converging evidence for a role of both the insula and basal ganglia in disgust recognition comes from functional imaging studies in healthy subjects (Phillips et al., 1997, 1998b; Sprengelmeyer et al.,
1998; Wicker et al., 2003; Schroeder et al., 2004a). The insula was found to be activated in response to disgusted faces in two imaging studies that involved direct comparisons with fearful (Phillips et al., 1998b) and surprised facial expressions (Schroeder et al., 2004a), while the putamen was found to be activated in one of these studies only (Phillips et al., 1998b). Recently, findings from a combined clinical and functional imaging study elucidated the neural mechanisms that may underlie impaired disgust recognition in pre-symptomatic Huntington’s disease (Hennenlotter et al., 2004). In Huntington’s disease gene carriers, perception of disgusted relative to neutral faces was associated with significantly decreased activation of the left anterior insulo-opercular region, closely corresponding to the location of the primary taste cortex in humans (Frey and Petrides, 1999; Small et al., 1999, 2003). This finding is consistent with the notion that perception of others’ disgust and that of taste are closely linked (Rozin and Fallon, 1987) and probably share a similar neural substrate (Phillips et al., 1997). In conclusion, both clinical and functional imaging studies suggest a specific involvement of the insula and basal ganglia in disgust recognition, although the differential functions of these regions in recognizing disgust are still open to question.
Anger Recognition of angry faces has been proposed to serve several, partly overlapping adaptive functions each associated with specific neuroanatomical circuitries. In the context of a defense system relating to threats to the acquisition of valuable resources, the ventral striatum dopamine system has been suggested to be implicated in the recognition of angry faces as signals of conspecific challenge (Lawrence et al., 2002). Angry expressions have also been proposed to signal discontent in order to discourage socially inappropriate and unexpected behavior (Averill, 1982). They may thus serve as a cue for behavioral extinction and reversal learning which has been closely associated with the orbitofrontal cortex (Dias et al., 1996; Rolls, 1996). Finally, as threatening stimuli, angry
447
expressions might engage the amygdala similar to expressions of fear (Blair et al., 1999). In support of the notion that angry expressions are processed as threatening stimuli, similar to expressions of fear, recognition of anger has been most consistently reported to be impaired after amygdala damage in addition to fear (Adolphs et al., 1994, 1994; Young et al., 1995; Calder et al., 1996; Broks et al., 1998; Sato et al., 2002). A role for the amygdala in anger recognition is further consistent with the finding that bilateral amygdala damage impairs the use of information from the eye region (Adolphs et al., 2005) which is, apart from fear, also the diagnostic region for recognizing anger expressions (Smith et al., 2005). However, there is also evidence for preserved anger recognition in some patients with amygdala lesions (Adolphs et al., 1994; Calder et al., 1996) and postencephalitic patients with amygdala lesions (Broks et al., 1998). On the basis of observations from comparative research demonstrating altered dopamine activity during aggressive encounters between conspecifics (Redolat et al., 1991; Miczek et al., 2002; Ferrari et al., 2003) and evidence pointing at a contribution of dopamine to humans’ aggressive behavior (Tiihonen et al., 1995; Lawrence et al., 2003), a separate line of investigations focused on the role of the dopamine system and the ventral striatum in anger recognition (Lawrence et al., 2002; Calder et al., 2004). In a recent study, Lawrence et al. (2002) showed that after acute administration of the dopamine D2-class receptor antagonist sulpiride, subjects are significantly worse at recognizing angry faces, though there are no such impairments in recognizing other facial expressions. In a following study, Calder et al. (2004) investigated whether recognition of facial and vocal signals of anger and self-reported experience of anger would be affected by damage to the ventral striatal dopamine system. To address this question, they studied a case series of four human subjects with focal lesions affecting the ventral striatum. All four demonstrated a disproportionately severe impairment in recognizing human signals of aggression, in particular facial expressions, whereas a control group of individuals with damage to more dorsal basal ganglia regions showed no signs of anger
impairments. Given that functional imaging studies have implicated ventrolateral prefrontal regions in processing of angry faces (Sprengelmeyer et al., 1998; Blair et al., 1999; Phillips et al., 1999), the authors propose that human signals of aggression are processed by a frontostriatal system. A role for frontostriatal circuitries in anger recognition is in line with findings of a recent study (Schroeder et al., 2004b) investigating facial expression processing in Parkinson’s disease patients with subthalamic nucleus (STN) deep brain stimulation. STN stimulation is an accepted form of treatment for patients with Parkinson’s disease who have medically intractable motor symptoms. Schroeder et al. (2004b) found that STN stimulation selectively reduces recognition of angry faces, but leaves intact recognition of other emotions. Since in animals the STN is also targeted by limbic cortices (Parent and Hazrati, 1995) such as the orbitofrontal and anterior cingulate cortex (Canteras et al., 1990), the results of the study point at a possible role of these regions in anger recognition. In fact, lesions involving the orbitofrontal cortex have been associated with changes in aggressive behavior in humans (Blair, 2001; Brower and Price, 2001) and difficulties in recognizing angry and disgusted faces (Blair and Cipolotti, 2000). Furthermore, transcranial magnetic stimulation (TMS) over the medial-frontal cortex has been found to impair processing of angry, but not happy facial expressions (Harmer et al., 2001b). Findings from functional imaging studies have implicated the orbitofrontal cortex (Blair et al., 1999), anterior cingulate (Sprengelmeyer et al., 1998; Blair et al., 1999; Strauss et al., 2005), as well as the amygdala (Whalen et al., 2001; Yang et al., 2002; Adams et al., 2003) in processing of angry faces. The finding of orbitofrontal cortex activation in response to angry faces by Blair et al. (1999) might be of particular interest, since this region showed a differential response to angry faces (when compared to sad faces). Additionally, increasing intensity of angry expressions was found to be associated with enhanced activity of the orbitofrontal cortex. Convergent evidence has been provided by two meta-analyses that indicated a particular role for the lateral orbitofrontal cortex in processing of angry faces (Phan et al., 2002;
448
Murphy et al., 2003). When activated by angry expressions, the orbitofrontal cortex has been suggested to suppress current behavior either through inhibition or by activation of an alternative behavioral response (Blair et al., 1999). Other studies reported amygdala activation in response to angry faces (Whalen et al., 2001; Yang et al., 2002), in particular when they signal ambiguous threats in the case of averted gaze (Adams et al., 2003). However, none of these imaging studies reported a dissociable response of the amygdala to angry expressions. Notably, a recent study found that angry faces evoke sensitization in several regions including the anterior cingulate and basal ganglia but not within the amygdala (Strauss et al., 2005) where fearful faces have reproducibly evoked habituation (Breiter et al., 1996; Wright et al., 2001; Fischer et al., 2003). Findings from functional imaging research are therefore more consistent with the proposed role of frontal regions such as the orbitofrontal (Blair et al., 1999) and anterior cingulate cortex (Sprengelmeyer et al., 1998; Blair et al., 1999; Strauss et al., 2005) in anger recognition than with a specific involvement of the amygdala.
Surprise The facial expression of surprise has already been described by Charles Darwin in 1872, who proposed novelty and unexpectedness as elicitors. Since surprise can predict a positive as well as a negative outcome (Tomkins and McCarter, 1964), it may be considered as the most controversial expression of all six basic emotions. To date only a single case study reported impaired surprise recognition following bilateral amygdala damage (Adolphs et al., 1994). However, SM’s impairment in recognizing surprise was not selective since she also showed deficits in recognizing other emotions that were most pronounced for fear. Three imaging studies have investigated the neural basis of surprise perception so far (Kim et al., 2003, 2004; Schroeder et al., 2004a). The study by Kim et al. (2003) aimed at investigating the neural correlates implicated in positive versus negative evaluation of surprised faces. More
negative interpretations of surprised faces were associated with greater signal changes in the right ventral amygdala, while more positive interpretations were associated with greater signal changes in the ventral medial prefrontal cortex. Perception of surprised faces relative to neutral faces resulted in dorsal amygdala activation. Similar results were obtained when the interpretation of surprised faces was determined by contextual experimental stimuli, rather than subjective judgment (Kim et al., 2004). However, since none of these studies included emotional expressions other than surprise, the question of whether a specific neural subsystem is involved in surprise perception remains open. The first study that aimed at identifying a specialized neural system for surprise perception (Schroeder et al., 2004a) was based on a psychological model where surprise is conceived as an evolutionary old mechanism to analyze unexpected events in order to update knowledge for successful individual-environmental transaction (Schutzwohl, 1998). Regions implicated in the detection of novel or unexpected events were therefore expected to be specifically involved in the perception of surprised faces. Compared to both neutral and disgusted faces, perception of surprised faces consistently yielded activation of the right posterior parahippocampal gyrus. In fact, the right parahippocampal gyrus was the only region that showed significant activation in the direct comparison with disgusted faces. Since the parahippocampal gyrus has been implicated in the processing of novel stimuli as compared to familiar stimuli (Stern et al., 1996; Gabrieli et al., 1997), perception of surprise in others may be closely related to the recognition or evaluation of novel stimuli in the environment which is thought of as an initial step in memory formation (Fernandez et al., 1998). Given the small number of functional imaging studies that focused on surprise perception and the considerable variations in experimental designs and data analyses, the different roles of the amygdala and the parahippocampal gyrus in recognizing this emotion remain unclear. With respect to a possible role of the amygdala in surprise perception, imaging paradigms are needed that also involve the presentation of fearful faces that have
449
been found to specifically engage the amygdala. Studies on emotion recognition in patients with amygdala lesions, however, generally fail to find impairments in surprise recognition. It is therefore interesting to note that the only case of a patient with concomitant deficits in surprise recognition (SM) suffered from lesions of the amygdala that extended into the enthorinal cortex (Adolphs et al., 1994), a part of the parahippocampal gyrus. Given the robust activation of the posterior parahippocampal gyrus during perception of surprised faces (Schroeder et al., 2004a), one might speculate that this region is specifically involved in the perception of surprise. In order to substantiate this finding in the future, studies are needed that investigate emotion recognition in patients with focal lesions of the posterior medial temporal lobes.
Happiness Smile has been suggested to be an important signal of cooperative intention and affiliation during social interaction (Schmidt and Cohn, 2001). Human infants smile more when an adult’s eye gaze is fixed on them than when the gaze is averted (Haines and Muir, 1969). As a positive fitness consequence, these smiles subsequently elicit responsive and attentive parental behavior (Jones and Raag, 1989). Smiling is also the most easily recognized expression. Following the norms published by Ekman and Friesen (1976), mean accuracy for recognition of facial expressions of happiness reaches 100% (Young et al., 1996). To date, there is no evidence for impaired recognition of happy faces following damage of specific brain regions or in patients suffering from neurological or psychiatric disorders. Only one patient with amygdala damage was reported to be impaired in her appraisal of happiness to a lesser extent (Anderson and Phelps, 2000). The preserved ability to recognize happy faces found in most patients with amygdala lesion may be explained by the finding that damage of the amygdala impairs the use of information from the eyes, whereas the use of information from the mouth region, which
is the diagnostic region for happiness (Smith et al., 2005), remains normal (Adolphs et al., 2005). Findings from neuroimaging studies on perception of happy faces revealed no consistent pattern of activation. Various regions have been implicated in the perception of happiness, including the basal ganglia (Morris et al., 1996, 1998), inferior/ orbitofrontal cortex (Dolan et al., 1996; GornoTempini et al., 2001), anterior cingulate cortex (Dolan et al., 1996; Kesler-West et al., 2001), and amygdala (Breiter et al., 1996; Pessoa et al., 2002; Yang et al., 2002; Winston et al., 2003; Hennenlotter et al., 2005). In their meta-analysis, Phan et al. (2002) found that nearly 70% happiness induction studies reported activation in the basal ganglia/ventral striatum, which is consistent with work implicating the dopaminergic system and basal ganglia/ventral striatum in reward processing and incentive reward motivation (Koepp et al., 1998; Knutson et al., 2001). This finding, however, could not be replicated in a second meta-analysis by Murphy et al. (2003) for facial expressions of happiness. In a sub-analysis of studies that used facial expressions of emotion as stimuli, the rostral supracallosal anterior cingulate cortex and dorsomedial prefrontal cortex were most consistently activated for happiness instead. Whereas some studies reported signal increases in the amygdala to positively valenced facial expressions (Breiter et al., 1996; Pessoa et al., 2002; Yang et al., 2002; Winston et al., 2003; Hennenlotter et al., 2005), others do not or have found signal decreases in the amygdala (Morris et al., 1996; Whalen et al., 1998). In the study of Whalen et al. (1998), signal decreases in response to happy faces were found in the ventral amygdala, whereas both fearful and happy faces activated the sublenticular substantia innominata (SI), suggesting a spatial dissociation of regions that respond to emotional valence (ventral amygdala including the basolateral amygdala) versus salience or arousal value (dorsal amygdala/SI region including the central nucleus of the amygdala) (Whalen et al., 1994, 1998; Somerville et al., 2004). Given that the amygdala’s activity is enhanced by faces containing dynamic information (LaBar et al., 2003), inconsistencies in findings concerning its involvement in pleasant facial affect may be related to the
450
lack of temporal cues in static facial displays that have mostly been used in these studies. This notion is supported by the finding of a recent study by Hennenlotter et al. (2005) who reported robust activation of the bilateral amygdala during passive observation of dynamic smile expressions (short video sequences) compared to observation of neutral expressions of the same individuals. Beyond its well-known involvement in processing fearful faces, these findings suggest that the amygdala (in particular its dorsal parts) may play a generalized role in facial emotion processing by modulating the vigilance level in response to both positive and negative facial expressions (Yang et al., 2002; Winston et al., 2003). Since recognition of happy faces is usually not impaired in patients with amygdala lesion, it is, however, implausible that the amygdala is part of a neural network that is more specifically involved in the representation of pleasant facial affect.
Sadness The facial expression of sadness has been linked to the inhibition of aggression and the elicitation of empathic reactions and prosocial behavior (Eisenberg et al., 1989; Harmer et al., 2001a). Accordingly, psychopaths, who are characterized by disregard for others and aggressive behaviors, fail to show normal autonomic responses to sad facial expressions (House and Milligan, 1976; Aniskiewicz, 1979; Chaplin et al., 1995; Blair et al., 1997). Psychopathy has been suggested to reflect early amygdala damage (Blair et al., 1999; Blair, 2001) since psychopathic individuals show impairments in fear conditioning and augmentation of startle reflex (e.g., Patrick et al., 1993; Lucas et al., 2000) similar to patients with amygdala lesions (Bechara et al., 1995; LaBar et al., 1995; Angrilli et al., 1996). Given the hyporesponsiveness to sad faces and indications of amygdala dysfunction in these patients, the amygdala has been suggested to be involved in the processing of sad facial expressions (Blair et al., 1999). This suggestion is supported by the finding that some patients with right amygdaloid lesion show deficits in judging the intensity of several negative emotions including sad
expressions (Anderson and Phelps, 2000; Anderson et al., 2000). Most patients with bilateral amygdala lesions, however, have been found to be unimpaired in recognizing sad faces (e.g., Adolphs et al., 1994, 1999; Calder et al., 1996). Rather than being necessary for recognizing sad faces, the amygdala may be involved in the activation of concomitant autonomic responses (Blair et al., 1999). In line with this notion, administration of the beta-adrenoceptor blocker propranolol was found to impair the speed at which sad faces are recognized (Harmer et al., 2001a). In a functional magnetic resonance imaging (fMRI) study, Blair et al. (1999) reported that the neural response in the left amygdala, right inferior and middle temporal gyri was significantly greater toward sad expressions as opposed to angry expressions. Activation within these regions increased as a function of the degree of sadness expressed in the morphed face stimuli. The anterior cingulate cortex and right temporal pole correlated with increasing expression intensity of both sad and angry faces. Unfortunately, the study did not include fearful faces as control stimuli that would have allowed testing whether there is a differential response in the amygdala to sad versus fearful facial expressions (Posamentier and Abdi, 2003). Activations of the right inferior and middle temporal gyri may reflect top-down modulatory effects of the amygdala onto the visual processing stream (Morris et al., 1998; Vuilleumier et al., 2001) and have also been reported for perception of other emotions (e.g., Kesler-West et al., 2001; Vuilleumier et al., 2001, 2003; Schroeder et al., 2004a). Involvement of the amygdala during perception of sad faces has been confirmed recently by two fMRI studies (Yang et al., 2002; Winston et al., 2003). In contrast to the findings of Blair et al. (1999), however, Winston et al. (2003) failed to find evidence for a differential response of the amygdala to sad faces. Yang et al. (2002) only compared patterns of activation for emotional faces (including sadness) relative to neutral faces leaving it open whether the amygdala showed a dissociable response to sad faces. Two further studies (Phillips et al., 1998a; Kesler-West et al., 2001) found no activation of the amygdala for sad faces relative to neutral faces.
451
In conclusion, evidence for the existence of a specific neural subsystem for recognizing sad faces is still lacking. Amygdala activations during perception of sad faces have been related to the activation of concomitant autonomic responses (Blair et al., 1999) and may reflect modulation of the vigilance level in response to emotionally valenced stimuli in general (Yang et al., 2002; Winston et al., 2003).
General conclusion Convincing results for specialized neural representations of basic emotions have been obtained for fearful and disgusted faces only. Clinical and functional imaging studies have provided converging evidence for a double dissociation of fear and disgust recognition. Whereas the amygdala is primarily involved in recognizing facial expressions of fear (Adolphs et al., 1994; Young et al., 1995; Breiter et al., 1996; Calder et al., 1996; Morris et al., 1996, 1998; Phillips et al., 1997, 1998b, 2004; Sprengelmeyer et al., 1998, 1999; Whalen et al., 1998, 2001; Anderson and Phelps, 2000; Thomas et al., 2001; Wright et al., 2001; Sato et al., 2002; Yang et al., 2002; Fischer et al., 2003), the insular cortex and basal ganglia are implicated in recognizing facial expressions of disgust but not fear (Sprengelmeyer et al., 1996, 1998, 2005; Gray et al., 1997; Phillips et al., 1997, 1998b; Calder et al., 2000; Wang et al., 2003; Wicker et al., 2003; Hennenlotter et al., 2004; Schroeder et al., 2004a). Empirical evidence for a specialized neural representation of anger, surprise, sadness, or happiness is more limited, primarily due to lack of clinical data that allow drawing causal inferences from selective impairments in recognizing these emotions. In functional imaging research, the detection of emotion-specific neural subsystems requires direct comparisons of signal changes associated with the perception of different emotions, i.e., expressions of one emotion are used as a baseline condition for the target emotion. Since direct comparisons allow tracking the blood oxygen level dependent (BOLD) response across different emotion conditions, it becomes possible to distinguish regions associated with the perception of a specific
emotion from regions more generally involved in facial expression processing. Such inferences are not possible when only neutral facial expressions are used as baseline condition. Unfortunately, studies often fail to provide direct comparisons, making it difficult to evaluate the findings with respect to emotion-specific neural subsystems. Some have argued that in functional imaging studies on emotion recognition, differential habituation effects in emotion processing regions might account for the finding of emotion-specific responses of some regions, particularly in the case of repeated presentations of the same expression as in block design paradigms (Winston et al., 2003; Strauss et al., 2005). Converging evidence from clinical research on emotion recognition in patients with amygdala lesions (e.g., Adolphs et al., 1999) and Huntington’s disease (e.g., Sprengelmeyer et al., 1996), however, clearly backs up the findings derived from functional imaging studies on fear and disgust recognition (e.g., Phillips et al., 1997, 1998b; Sprengelmeyer et al., 1998). Moreover, imaging studies on fear recognition consistently yielded activation of the amygdala despite the fact that considerable habituation effects have been reported for this regions (Breiter et al., 1996; Wright et al., 2001; Fischer et al., 2003). It is further important to note that, in functional imaging research mostly implicit tasks (e.g., gender classification) have been used, whereas in clinical studies subjects are usually instructed to explicitly rate or categorize emotional expressions. Imaging studies on explicit emotion recognition (e.g., Gorno-Tempini et al., 2001; Winston et al., 2003) frequently failed to find dissociable responses to specific emotions within those regions where lesions have been associated with impairments in emotion recognition such as the amygdala (e.g., Adolphs et al., 1999) and insula/basal ganglia (Calder et al., 2000). One reason for this finding may be that frontal regions recruited by explicit recognition tasks may attenuate neural responses in limbic regions such as the amygdala (Critchley et al., 2000; Hariri et al., 2000). Only recently has evidence been obtained that the activation of emotion-specific neural subsystems may be closely linked to characteristic facial features of single expressions such as the eye
452
region for fearful faces (Whalen et al., 2001; Morris et al., 2002; Adolphs et al., 2005). These findings suggest that not all facial expressive features are equally important in recognizing single emotions. Smith et al. (2005) demonstrated that diagnostic cues for recognizing specific emotions could be identified by comparative analyses of the diagnostic filtering functions of human observers. The use of such ‘‘effective faces’’ as stimuli in functional imaging research may therefore be a promising method to further differentiate the neural subsystems implicated in the recognition of specific emotions.
References Adams Jr., R.B., Gordon, H.L., Baird, A.A., Ambady, N. and Kleck, R.E. (2003) Effects of gaze on amygdala sensitivity to anger and fear faces. Science, 300: 1536. Adolphs, R. (2002a) Neural systems for recognizing emotion. Curr. Opin. Neurobiol., 12: 169–177. Adolphs, R. (2002b) Recognizing emotion from facial expressions: psychological and neurological mechanisms. Behav. Cognit. Neurosci. Rev., 1: 21–61. Adolphs, R., Damasio, H., Tranel, D., Cooper, G. and Damasio, A.R. (2000) A role for somatosensory cortices in the visual recognition of emotion as revealed by three-dimensional lesion mapping. J. Neurosci., 20: 2683–2690. Adolphs, R., Damasio, H., Tranel, D. and Damasio, A.R. (1996) Cortical systems for the recognition of emotion in facial expressions. J. Neurosci., 16: 7678–7687. Adolphs, R., Gosselin, F., Buchanan, T.W., Tranel, D., Schyns, P. and Damasio, A.R. (2005) A mechanism for impaired fear recognition after amygdala damage. Nature, 433: 68–72. Adolphs, R., Tranel, D., Damasio, H. and Damasio, A. (1994) Impaired recognition of emotion in facial expressions following bilateral damage to the human amygdala. Nature, 372: 669–672. Adolphs, R., Tranel, D., Damasio, H. and Damasio, A.R. (1995) Fear and the human amygdala. J. Neurosci., 15: 5879–5891. Adolphs, R., Tranel, D., Hamann, S., Young, A.W., Calder, A.J., Phelps, E.A., Anderson, A., Lee, G.P. and Damasio, A.R. (1999) Recognition of facial emotion in nine individuals with bilateral amygdala damage. Neuropsychologia, 37: 1111–1117. Anderson, A.K. and Phelps, E.A. (2000) Expression without recognition: Contributions of the human amygdala to emotional communication. Psychol. Sci., 11: 106–111. Anderson, A.K., Spencer, D.D., Fulbright, R.K. and Phelps, E.A. (2000) Contribution of the anteromedial temporal lobes to the evaluation of facial emotion. Neuropsychology, 14: 526–536.
Angrilli, A., Mauri, A., Palomba, D., Flor, H., Birbaumer, N., Sartori, G. and di Paola, F. (1996) Startle reflex and emotion modulation impairment after a right amygdala lesion. Brain, 119(Pt 6): 1991–2000. Aniskiewicz, A.S. (1979) Autonomic components of vicarious conditioning and psychopathy. J. Clin. Psychol., 35: 60–67. Averill, J.R. (1982) Anger and Aggression: An Essay on Emotion. Springer, New York. Bechara, A., Tranel, D., Damasio, H., Adolphs, R., Rockland, C. and Damasio, A.R. (1995) Double dissociation of conditioning and declarative knowledge relative to the amygdala and hippocampus in humans. Science, 269: 1115–1118. Blair, R.J. (2001) Neurocognitive models of aggression, the antisocial personality disorders, and psychopathy. J. Neurol. Neurosurg. Psychiatry, 71: 727–731. Blair, R.J. (2003) Facial expressions, their communicatory functions and neuro-cognitive substrates. Philos. Trans. R. Soc. Lond. B Biol. Sci., 358: 561–572. Blair, R.J. and Cipolotti, L. (2000) Impaired social response reversal. A case of ‘acquired sociopathy’. Brain, 123(Pt 6): 1122–1141. Blair, R.J., Jones, L., Clark, F. and Smith, M. (1997) The psychopathic individual: A lack of responsiveness to distress cues? Psychophysiology, 34: 192–198. Blair, R.J.R., Morris, J.S., Frith, C.D., Perrett, D.I. and Dolan, R.J. (1999) Dissociable neural responses to facial expressions of sadness and anger. Brain, 122: 883–893. Breiter, H.C., Etcoff, N.L., Whalen, P.J., Kennedy, W.A., Rauch, S.L., Buckner, R.L., Strauss, M.M., Hyman, S.E. and Rosen, B.R. (1996) Response and habituation of the human amygdala during visual processing of facial expression. Neuron, 17: 875–887. Broks, P., Young, A.W., Maratos, E.J., Coffey, P.J., Calder, A.J., Isaac, C.L., Mayes, A.R., Hodges, J.R., Montaldi, D., Cezayirli, E., Roberts, N. and Hadley, D. (1998) Face processing impairments after encephalitis: amygdala damage and recognition of fear. Neuropsychologia, 36: 59–70. Brower, M.C. and Price, B.H. (2001) Neuropsychiatry of frontal lobe dysfunction in violent and criminal behaviour: a critical review. J. Neurol. Neurosurg. Psychiatry, 71: 720–726. Calder, A.J., Keane, J., Lawrence, A.D. and Manes, F. (2004) Impaired recognition of anger following damage to the ventral striatum. Brain, 127: 1958–1969. Calder, A.J., Keane, J., Manes, F., Antoun, N. and Young, A.W. (2000) Impaired recognition and experience of disgust following brain injury. Nat. Neurosci., 3: 1077–1078. Calder, A.J., Lawrence, A.D. and Young, A.W. (2001) Neuropsychology of fear and loathing. Nat. Rev. Neurosci., 2: 352–363. Calder, A.J., Young, A.W., Rowland, D., Perrett, D.I., Hodges, J.R. and Etcoff, N.L. (1996) Facial emotion recognition after bilateral amygdala damage. Cogn. Neuropsychol., 13: 669–745. Canteras, N.S., Shammah-Lagnado, S.J., Silva, B.A. and Ricardo, J.A. (1990) Afferent connections of the subthalamic
453 nucleus: a combined retrograde and anterograde horseradish peroxidase study in the rat. Brain Res., 513: 43–59. Chaplin, T.C., Rice, M.E. and Harris, G.T. (1995) Salient victim suffering and the sexual responses of child molesters. J. Consult. Clin. Psychol., 63: 249–255. Critchley, H., Daly, E., Phillips, M., Brammer, M., Bullmore, E., Williams, S., Van Amelsvoort, T., Robertson, D., David, A. and Murphy, D. (2000) Explicit and implicit neural mechanisms for processing of social information from facial expressions: a functional magnetic resonance imaging study. Hum. Brain Mapp., 9: 93–105. Darwin, C. (1872) The Expression of the Emotions in Man and Animals. John Murray, London. de la Monte, S.M., Vonsattel, J.P. and Richardson Jr., E.P. (1988) Morphometric demonstration of atrophic changes in the cerebral cortex, white matter, and neostriatum in Huntington’s disease. J. Neuropathol. Exp. Neurol., 47: 516–525. Dias, R., Robbins, T.W. and Roberts, A.C. (1996) Dissociation in prefrontal cortex of affective and attentional shifts. Nature, 380: 69–72. Dolan, R.J., Fletcher, P., Morris, J., Kapur, N., Deakin, J.F. and Frith, C.D. (1996) Neural activation during covert processing of positive emotional facial expressions. Neuroimage, 4: 194–200. Eisenberg, N., Fabes, R.A., Miller, P.A., Fultz, J., Shell, R., Mathy, R.M. and Reno, R.R. (1989) Relation of sympathy and personal distress to prosocial behavior: a multimethod study. J. Pers. Soc. Psychol., 57: 55–66. Ekman, P. and Friesen, W.V. (1976) Pictures of Facial Affect. Consulting Psychologist Press, Palo Alto, CA. Ekman, P., Friesen, W.V., O’Sullivan, M., Chan, A., Diacoyanni-Tarlatzis, I., Heider, K., Krause, R., LeCompte, W.A., Pitcairn, T., Ricci-Bitti, P.E., et al. (1987) Universals and cultural differences in the judgments of facial expressions of emotion. J. Pers. Soc. Psychol., 53: 712–717. Ekman, P., Sorenson, E.R. and Friesen, W.V. (1969) Pan-cultural elements in facial displays of emotion. Science, 164: 86–88. Esteves, F., Parra, C., Dimberg, U. and Ohman, A. (1994) Nonconscious associative learning: Pavlovian conditioning of skin conductance responses to masked fear-relevant facial stimuli. Psychophysiology, 31: 375–385. Fernandez, G., Weyerts, H., Schrader-Bolsche, M., Tendolkar, I., Smid, H.G., Tempelmann, C., Hinrichs, H., Scheich, H., Elger, C.E., Mangun, G.R. and Heinze, H.J. (1998) Successful verbal encoding into episodic memory engages the posterior hippocampus: a parametrically analyzed functional magnetic resonance imaging study. J. Neurosci., 18: 1841–1847. Ferrari, P.F., Gallese, V., Rizzolatti, G. and Fogassi, L. (2003) Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex. Eur. J. Neurosci., 17: 1703–1714. Fischer, H., Wright, C.I., Whalen, P.J., McInerney, S.C., Shin, L.M. and Rauch, S.L. (2003) Brain habituation during repeated exposure to fearful and neutral faces: a functional MRI study. Brain Res. Bull., 59: 387–392.
Frey, S. and Petrides, M. (1999) Re-examination of the human taste region: a positron emission tomography study. Eur. J. Neurosci., 11: 2985–2988. Fridlund, A.J. (1994) Human Facial Expression: An Evolutionary View. Academic Press, San Diego. Gabrieli, J.D., Brewer, J.B., Desmond, J.E. and Glover, G.H. (1997) Separate neural bases of two fundamental memory processes in the human medial temporal lobe. Science, 276: 264–266. Gorno-Tempini, M.L., Pradelli, S., Serafini, M., Pagnoni, G., Baraldi, P., Porro, C., Nicoletti, R., Umita, C. and Nichelli, P. (2001) Explicit and incidental facial expression processing: an fMRI study. Neuroimage, 14: 465–473. Gosselin, F. and Schyns, P.G. (2001) Bubbles: a technique to reveal the use of information in recognition tasks. Vision Res., 41: 2261–2271. Gray, J.M., Young, A.W., Barker, W.A., Curtis, A. and Gibson, D. (1997) Impaired recognition of disgust in Huntington’s disease gene carriers. Brain, 120: 2029–2038. Haines, S.M. and Muir, D.W. (1969) Infant sensitivity to adult eye direction. Child Dev., 67: 1940–1951. Hamann, S.B., Stefanacci, L., Squire, L.R., Adolphs, R., Tranel, D., Damasio, H. and Damasio, A. (1996) Recognizing facial emotion. Nature, 379: 497. Hariri, A.R., Bookheimer, S.Y. and Mazziotta, J.C. (2000) Modulating emotional responses: effects of a neocortical network on the limbic system. Neuroreport, 11: 43–48. Harmer, C.J., Perrett, D.I., Cowen, P.J. and Goodwin, G.M. (2001a) Administration of the beta-adrenoceptor blocker propranolol impairs the processing of facial expressions of sadness. Psychopharmacology (Berl.), 154: 383–389. Harmer, C.J., Thilo, K.V., Rothwell, J.C. and Goodwin, G.M. (2001b) Transcranial magnetic stimulation of medial-frontal cortex impairs the processing of angry facial expressions. Nat. Neurosci., 4: 17–18. Haxby, J.V., Hoffman, E.A. and Gobbini, M.I. (2000) The distributed human neural system for face perception. Trends Cogn. Sci., 4: 223–233. Hennenlotter, A., Schroeder, U., Erhard, P., Castrop, F., Haslinger, B., Stoecker, D., Lange, K.W. and Ceballos-Baumann, A.O. (2005) A common neural basis for receptive and expressive communication of pleasant facial affect. Neuroimage, Jan: 26(2): 581–591. Hennenlotter, A., Schroeder, U., Erhard, P., Haslinger, B., Stahl, R., Weindl, A., von Einsiedel, H.G., Lange, K.W. and Ceballos-Baumann, A.O. (2004) Neural correlates associated with impaired disgust processing in pre-symptomatic Huntington’s disease. Brain, 127: 1446–1453. Hornak, J., Rolls, E.T. and Wade, D. (1996) Face and voice expression identification in patients with emotional and behavioural changes following ventral frontal lobe damage. Neuropsychologia, 34: 247–261. House, T.H. and Milligan, W.L. (1976) Autonomic responses to modeled distress in prison psychopaths. J. Pers. Soc. Psychol., 34: 556–560. Jernigan, T.L., Salmon, D.P., Butters, N. and Hesselink, J.R. (1991) Cerebral structure on MRI, Part II: specific changes in
454 Alzheimer’s and Huntington’s diseases. Biol. Psychiatry, 29: 68–81. Jones, S.S. and Raag, T. (1989) Smile production in older infants: the importance of a social recipient for the facial signal. Child Dev., 60: 811–818. Kesler-West, M.L., Andersen, A.H., Smith, C.D., Avison, M.J., Davis, C.E., Kryscio, R.J. and Blonder, L.X. (2001) Neural substrates of facial emotion processing using fMRI. Brain Res. Cogn. Brain Res., 11: 213–226. Kim, H., Somerville, L.H., Johnstone, T., Alexander, A.L. and Whalen, P.J. (2003) Inverse amygdala and medial prefrontal cortex responses to surprised faces. Neuroreport, 14: 2317–2322. Kim, H., Somerville, L.H., Johnstone, T., Polis, S., Alexander, A.L., Shin, L.M. and Whalen, P.J. (2004) Contextual modulation of amygdala responsivity to surprised faces. J. Cogn. Neurosci., 16: 1730–1745. Knutson, B., Adams, C.M., Fong, G.W. and Hommer, D. (2001) Anticipation of increasing monetary reward selectively recruits nucleus accumbens. J. Neurosci., 21: RC159. Koepp, M.J., Gunn, R.N., Lawrence, A.D., Cunningham, V.J., Dagher, A., Jones, T., Brooks, D.J., Bench, C.J. and Grasby, P.M. (1998) Evidence for striatal dopamine release during a video game. Nature, 393: 266–268. Krolak-Salmon, P., Henaff, M.A., Isnard, J., Tallon-Baudry, C., Guenot, M., Vighetto, A., Bertrand, O. and Mauguiere, F. (2003) An attention modulated response to disgust in human ventral anterior insula. Ann. Neurol., 53: 446–453. LaBar, K.S., Crupain, M.J., Voyvodic, J.T. and McCarthy, G. (2003) Dynamic perception of facial affect and identity in the human brain. Cereb. Cortex, 13: 1023–1033. LaBar, K.S., LeDoux, J.E., Spencer, D.D. and Phelps, E.A. (1995) Impaired fear conditioning following unilateral temporal lobectomy in humans. J. Neurosci., 15: 6846–6855. Lawrence, A.D., Calder, A.J., McGowan, S.W. and Grasby, P.M. (2002) Selective disruption of the recognition of facial expressions of anger. Neuroreport, 13: 881–884. Lawrence, A.D., Evans, A.H. and Lees, A.J. (2003) Compulsive use of dopamine replacement therapy in Parkinson’s disease: reward systems gone awry? Lancet Neurol., 2: 595–604. Lucas, J.A., Rippeth, J.D., Uitti, R.J., Shuster, E.A. and Wharen, R.E. (2000) Neuropsychological functioning in a patient with essential tremor with and without bilateral VIM stimulation. Brain Cogn., 42: 253–267. Miczek, K.A., Fish, E.W., De Bold, J.F. and De Almeida, R.M. (2002) Social and neural determinants of aggressive behavior: pharmacotherapeutic targets at serotonin, dopamine and gamma-aminobutyric acid systems. Psychopharmacology (Berl.), 163: 434–458. Milders, M., Crawford, J.R., Lamb, A. and Simpson, S.A. (2003) Differential deficits in expression recognition in genecarriers and patients with Huntington’s disease. Neuropsychologia, 41: 1484–1492. Mineka, S. and Cook, M. (1993) Mechanisms involved in the observational conditioning of fear. J. Exp. Psychol. Gen., 122: 23–38.
Morris, J.S., deBonis, M. and Dolan, R.J. (2002) Human amygdala responses to fearful eyes. Neuroimage, 17: 214–222. Morris, J.S., Friston, K.J., Buchel, C., Frith, C.D., Young, A.W., Calder, A.J. and Dolan, R.J. (1998) A neuromodulatory role for the human amygdala in processing emotional facial expressions. Brain, 121: 47–57. Morris, J.S., Frith, C.D., Perrett, D.I., Rowland, D., Young, A.W., Calder, A.J. and Dolan, R.J. (1996) A differential neural response in the human amygdala to fearful and happy facial expressions. Nature, 383: 812–815. Murphy, F.C., Nimmo-Smith, I. and Lawrence, A.D. (2003) Functional neuroanatomy of emotions: a meta-analysis. Cogn. Affect. Behav. Neurosci., 3: 207–233. Nakamura, K., Kawashima, R., Ito, K., Sugiura, M., Kato, T., Nakamura, A., Hatano, K., Nagumo, S., Kubota, K., Fukuda, H. and Kojima, S. (1999) Activation of the right inferior frontal cortex during assessment of facial emotion. J. Neurophysiol., 82: 1610–1614. Narumoto, J., Yamada, H., Iidaka, T., Sadato, N., Fukui, K., Itoh, H. and Yonekura, Y. (2000) Brain regions involved in verbal or non-verbal aspects of facial emotion recognition. Neuroreport, 11: 2571–2576. Ohman, A., Lundqvist, D. and Esteves, F. (2001) The face in the crowd revisited: a threat advantage with schematic stimuli. J. Pers. Soc. Psychol., 80: 381–396. Parent, A. and Hazrati, L.N. (1995) Functional anatomy of the basal ganglia. II. The place of subthalamic nucleus and external pallidum in basal ganglia circuitry. Brain Res. Brain Res. Rev., 20: 128–154. Patrick, C.J., Bradley, M.M. and Lang, P.J. (1993) Emotion in the criminal psychopath: startle reflex modulation. J. Abnorm. Psychol., 102: 82–92. Pessoa, L., McKenna, M., Gutierrez, E. and Ungerleider, L.G. (2002) Neural processing of emotional faces requires attention. Proc. Natl. Acad. Sci. USA, 99: 11458–11463. Phan, K.L., Wager, T., Taylor, S.F. and Liberzon, I. (2002) Functional neuroanatomy of emotion: a meta-analysis of emotion activation studies in PET and fMRI. Neuroimage, 16: 331–348. Phillips, M.L., Bullmore, E.T., Howard, R., Woodruff, P.W., Wright, I.C., Williams, S.C., Simmons, A., Andrew, C., Brammer, M. and David, A.S. (1998a) Investigation of facial recognition memory and happy and sad facial expression perception: an fMRI study. Psychiatry Res., 83: 127–138. Phillips, M.L., Medford, N., Young, A.W., Williams, L., Williams, S.C., Bullmore, E.T., Gray, J.A. and Brammer, M.J. (2001) Time courses of left and right amygdalar responses to fearful facial expressions. Hum. Brain Mapp., 12: 193–202. Phillips, M.L., Williams, L.M., Heining, M., Herba, C.M., Russell, T., Andrew, C., Bullmore, E.T., Brammer, M.J., Williams, S.C., Morgan, M., Young, A.W. and Gray, J.A. (2004) Differential neural responses to overt and covert presentations of facial expressions of fear and disgust. Neuroimage, 21: 1484–1496. Phillips, M.L., Williams, L., Senior, C., Bullmore, E.T., Brammer, M.J., Andrew, C., Williams, S.C. and David, A.S. (1999) A differential neural response to threatening
455 and non-threatening negative facial expressions in paranoid and non-paranoid schizophrenics. Psychiatry Res., 92: 11–31. Phillips, M.L., Young, A.W., Scott, S.K., Calder, A.J., Andrew, C., Giampietro, V., Williams, S.C., Bullmore, E.T., Brammer, M. and Gray, J.A. (1998b) Neural responses to facial and vocal expressions of fear and disgust. Proc. R. Soc. Lond. B Biol. Sci., 265: 1809–1817. Phillips, M.L., Young, A.W., Senior, C., Brammer, M., Andrew, C., Calder, A.J., Bullmore, E.T., Perrett, D.I., Rowland, D., Williams, S.C., Gray, J.A. and David, A.S. (1997) A specific neural substrate for perceiving facial expressions of disgust. Nature, 389: 495–498. Posamentier, M.T. and Abdi, H. (2003) Processing faces and facial expressions. Neuropsychol. Rev., 13: 113–143. Rapcsak, S.Z., Comer, J.F. and Rubens, A.B. (1993) Anomia for facial expressions: neuropsychological mechanisms and anatomical correlates. Brain Lang., 45: 233–252. Rapcsak, S.Z., Galper, S.R., Comer, J.F., Reminger, S.L., Nielsen, L., Kaszniak, A.W., Verfaellie, M., Laguna, J.F., Labiner, D.M. and Cohen, R.A. (2000) Fear recognition deficits after focal brain damage: a cautionary note. Neurology, 54: 575–581. Redolat, R., Brain, P.F. and Simon, V.M. (1991) Sulpiride has an antiaggressive effect in mice without markedly depressing motor activity. Neuropharmacology, 30: 41–46. Reinders, A.A., den Boer, J.A. and Buchel, C. (2005) The robustness of perception. Eur. J. Neurosci., 22: 524–530. Rolls, E.T., (1996) The orbitofrontal cortex. Philos. Trans. R. Soc. Lond. B Biol. Sci., 351: 1433–1443; discussion 1443–1444. Rozin, P. and Fallon, A.E. (1987) A perspective on disgust. Psychol. Rev., 94: 23–41. Sato, W., Kubota, Y., Okada, T., Murai, T., Yoshikawa, S. and Sengoku, A. (2002) Seeing happy emotion in fearful and angry faces: qualitative analysis of facial expression recognition in a bilateral amygdala-damaged patient. Cortex, 38: 727–742. Schmidt, K.L. and Cohn, J.F. (2001) Human facial expressions as adaptations: Evolutionary questions in facial expression research. Am. J. Phys. Anthropol., Suppl., 33: 3–24. Schmolck, H. and Squire, L.R. (2001) Impaired perception of facial emotions following bilateral damage to the anterior temporal lobe. Neuropsychology, 15: 30–38. Schroeder, U., Hennenlotter, A., Erhard, P., Haslinger, B., Stahl, R., Lange, K.W. and Ceballos-Baumann, A.O. (2004a) Functional neuroanatomy of perceiving surprised faces. Hum. Brain Mapp., 23: 181–187. Schroeder, U., Kuehler, A., Hennenlotter, A., Haslinger, B., Tronnier, V.M., Krause, M., Pfister, R., Sprengelmeyer, R., Lange, K.W. and Ceballos-Baumann, A.O. (2004b) Facial expression recognition and subthalamic nucleus stimulation. J. Neurol. Neurosurg. Psychiatry, 75: 648–650. Schutzwohl, A. (1998) Surprise and schema strength. J. Exp. Psychol. Learn. Mem. Cogn., 24: 1182–1199. Schyns, P.G., Bonnar, L. and Gosselin, F. (2002) Show me the features! Understanding recognition from the use of visual information. Psychol. Sci., 13: 402–409.
Small, D.M., Gregory, M.D., Mak, Y.E., Gitelman, D., Mesulam, M.M. and Parrish, T. (2003) Dissociation of neural representation of intensity and affective valuation in human gustation. Neuron, 39: 701–711. Small, D.M., Zald, D.H., Jones-Gotman, M., Zatorre, R.J., Pardo, J.V., Frey, S. and Petrides, M. (1999) Human cortical gustatory areas: a review of functional neuroimaging data. Neuroreport, 10: 7–14. Smith, M.L., Cottrell, G.W., Gosselin, F. and Schyns, P.G. (2005) Transmitting and decoding facial expressions. Psychol. Sci., 16: 184–189. Somerville, L.H., Kim, H., Johnstone, T., Alexander, A.L. and Whalen, P.J. (2004) Human amygdala responses during presentation of happy and neutral faces: correlations with state anxiety. Biol. Psychiatry, 55: 897–903. Sprengelmeyer, R., Rausch, M., Eysel, U.T. and Przuntek, H. (1998) Neural structures associated with recognition of facial expressions of basic emotions. Proc. R. Soc. Lond. B Biol. Sci., 265: 1927–1931. Sprengelmeyer, R., Schroeder, U., Young, A.W. and Epplen, J.T. (2006) Disgust in pre-clinical Huntington’s disease: a longitudinal study. Neuropsychologia, 44(4): 518–533. Sprengelmeyer, R., Young, A.W., Calder, A.J., Karnat, A., Lange, H., Homberg, V., Perrett, D.I. and Rowland, D. (1996) Loss of disgust. Perception of faces and emotions in Huntington’s disease. Brain, 119: 1647–1665. Sprengelmeyer, R., Young, A.W., Schroeder, U., Grossenbacher, P.G., Federlein, J., Buttner, T. and Przuntek, H. (1999) Knowing no fear. Proc. R. Soc. Lond. B Biol. Sci., 266: 2451–2456. Stern, C.E., Corkin, S., Gonzalez, R.G., Guimaraes, A.R., Baker, J.R., Jennings, P.J., Carr, C.A., Sugiura, R.M., Vedantham, V. and Rosen, B.R. (1996) The hippocampal formation participates in novel picture encoding: evidence from functional magnetic resonance imaging. Proc. Natl. Acad. Sci. USA, 93: 8660–8665. Strauss, M.M., Makris, N., Aharon, I., Vangel, M.G., Goodman, J., Kennedy, D.N., Gasic, G.P. and Breiter, H.C. (2005) fMRI of sensitization to angry faces. Neuroimage, 26: 389–413. Thieben, M.J., Duggins, A.J., Good, C.D., Gomes, L., Mahant, N., Richards, F., McCusker, E. and Frackowiak, R.S. (2002) The distribution of structural neuropathology in pre-clinical Huntington’s disease. Brain, 125: 1815–1828. Thomas, K.M., Drevets, W.C., Whalen, P.J., Eccard, C.H., Dahl, R.E., Ryan, N.D. and Casey, B.J. (2001) Amygdala response to facial expressions in children and adults. Biol. Psychiatry, 49: 309–316. Tiihonen, J., Kuikka, J., Bergstrom, K., Hakola, P., Karhu, J., Ryynanen, O.P. and Fohr, J. (1995) Altered striatal dopamine re-uptake site densities in habitually violent and nonviolent alcoholics. Nat. Med., 1: 654–657. Tomkins, S.S. and McCarter, R. (1964) What and where are the primary affects? Some evidence for a theory. Percept. Mot. Skills, 18: 119–158. Vuilleumier, P., Armony, J.L., Driver, J. and Dolan, R.J. (2001) Effects of attention and emotion on face processing in the
456 human brain: an event-related fMRI study. Neuron, 30: 829–841. Vuilleumier, P., Armony, J.L., Driver, J. and Dolan, R.J. (2003) Distinct spatial frequency sensitivities for processing faces and emotional expressions. Nat. Neurosci., 6: 624–631. Wang, K., Hoosain, R., Yang, R.M., Meng, Y. and Wang, C.Q. (2003) Impairment of recognition of disgust in Chinese with Huntington’s or Wilson’s disease. Neuropsychologia, 41: 527–537. Whalen, P.J., Kagan, J., Cook, R.G., Davis, F.C., Kim, H., Polis, S., McLaren, D.G., Somerville, L.H., McLean, A.A., Maxwell, J.S. and Johnstone, T. (2004) Human amygdala responsivity to masked fearful eye whites. Science, 306: 2061. Whalen, P.J., Kapp, B.S. and Pascoe, J.P. (1994) Neuronal activity within the nucleus basalis and conditioned neocortical electroencephalographic activation. J. Neurosci., 14: 1623–1633. Whalen, P.J., Rauch, S.L., Etcoff, N.L., McInerney, S.C., Lee, M.B. and Jenike, M.A. (1998) Masked presentations of emotional facial expressions modulate amygdala activity without explicit knowledge. J. Neurosci., 18: 411–418. Whalen, P.J., Shin, L.M., McInerney, S.C., Fischer, H., Wright, C.I. and Rauch, S.L. (2001) A functional MRI study of hu-
man amygdala responses to facial expressions of fear versus anger. Emotion, 1: 70–83. Wicker, B., Keysers, C., Plailly, J., Royet, J.P., Gallese, V. and Rizzolatti, G. (2003) Both of us disgusted in My insula: the common neural basis of seeing and feeling disgust. Neuron, 40: 655–664. Winston, J.S., O’Doherty, J. and Dolan, R.J. (2003) Common and distinct neural responses during direct and incidental processing of multiple facial emotions. Neuroimage, 20: 84–97. Wright, C.I., Fischer, H., Whalen, P.J., McInerney, S.C., Shin, L.M. and Rauch, S.L. (2001) Differential prefrontal cortex and amygdala habituation to repeatedly presented emotional stimuli. Neuroreport, 12: 379–383. Yang, T.T., Menon, V., Eliez, S., Blasey, C., White, C.D., Reid, A.J., Gotlib, I.H. and Reiss, A.L. (2002) Amygdalar activation associated with positive and negative facial expressions. Neuroreport, 13: 1737–1741. Young, A.W., Aggleton, J.P., Hellawell, D.J., Johnson, M., Broks, P. and Hanley, J.R. (1995) Face processing impairments after amygdalotomy. Brain, 118(Pt 1): 15–24. Young, A.W., Hellawell, D.J., Van De Wal, C. and Johnson, M. (1996) Facial expression processing after amygdalotomy. Neuropsychologia, 34: 31–39.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 25
Integration of emotion and cognition in patients with psychopathy Monika Sommer1, Go¨ran Hajak1, Katrin Do¨hnel1, Johannes Schwerdtner1, Jo¨rg Meinhardt2 and Ju¨rgen L. Mu¨ller1 1
Department of Psychiatry, Psychotherapy and Psychosomatics, University of Regensburg, Universita¨tsstraX e 84, D-93053 Regensburg, Germany 2 Ludwig-Maximilian University, Munich, Germany
Abstract: Psychopathy is a personality disorder associated with emotional characteristics like impulsivity, manipulativeness, affective shallowness, and absence of remorse or empathy. The impaired emotional responsiveness is considered to be the hallmark of the disorder. There are two theories that attempt to explain the emotional dysfunction and the poor socialization in psychopathy: (1) the low-fear model and (2) the inhibition of violence model. Both approaches are supported by several studies. Studies using aversive conditioning or the startle modulation underline the severe difficulties in processing negative stimuli in psychopaths. Studies that explore the processing of emotional expressions show a deficit of psychopathic individuals for processing sad or fearful facial expressions or vocal affect. In the cognitive domain, psychopaths show performance deficits in the interpretation of the motivational significance of stimuli. Studies investigating the impact of emotions on cognitive processes show that in psychopaths in contrast to healthy controls negative emotions drain no resources from a cognitive task. It is suggested that dysfunctions in the frontal cortex, especially the orbitofrontal cortex, the cingulate cortex and the amygdala are associated with the emotional and cognitive impairments. Keywords: psychopathy; emotional impairment; emotion–cognition interaction unreliability, inability to accept blame or shame, lack of emotions, egocentricity, failure to learn from experiences, and inability to follow goals. Cleckley’s definition of psychopathy was included in the DSM-II. Subsequent revisions of the DSM changed primarily personality-based descriptions of the disorder to primarily behavioral-based descriptions such as antisocial personality disorder, conduct disorder, and oppositional defiant disorder. This conceptual change was associated with the expectation that behavioral characteristics were more reliably assessed than were personality traits (Cloninger, 1978). Lilienfeld (1994) criticized that behavioral classifications alone are too narrow and emphasized the importance of a personality component to the
Concepts of psychopathy The concept of psychopathy was first introduced by Phillippe Pinel approximately 200 years ago. Pinel (1809) characterized the ‘‘Mania sans de´lir’’ by emotional instability and social drift (Sass and Herpertz, 1995). More recent conceptualizations are linked to the study by Cleckley (1941) and his book ‘‘The Mask of Sanity.’’ In addition to antisocial behavior, Cleckley provided extensive clinical descriptions of specific characterological features of psychopathic patients, e.g., superficial charm, Corresponding author. Tel.: +49-941-9412050; Fax: +49-941-9412065; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56025-X
457
458
assessment of psychopathy. Moreover, these conceptualizations ignore the emotional characteristics of individuals with psychopathy. The current state-of-the-art diagnostic instrument in the identification of adult psychopathy is Hare’s Psychopathy Checklist — Revised (PCL-R; Hare et al., 2000). Hare’s work on psychopathy with adult offenders has shown that two correlated factors are important for the description of psychopathy: Factor 1 includes emotional characteristics like affective shallowness, superficial charm, and absence of remorse or empathy; Factor 2 describes juvenile and adult delinquency and aggression. These two factors contain both personality components and concrete behavioral aspects. Psychopathy and emotion Impaired emotional responsiveness is considered to be the hallmark of psychopathy (Cleckley, 1976; Hare, 1991). A growing body of research has focused on the nature and bases of affective disturbances in psychopaths. It is suggested that the emotional deficits associated with psychopathy interfere with the development of moral reasoning and put the individual at risk for developing high levels of antisocial behavior. Currently, there are two models that attempt to explain the emotional dysfunction and the poor socialization in psychopaths: the low-fear model (e.g., Fowles, 1980; Patrick, 1994) and the empathy dysfunction, violence inhibition mechanism (VIM) model (e.g. Blair, 1995, 2001). The low-fear model of psychopathy The low-fear model suggests that the poor socialization in psychopaths is the result of their reduced ability to experience fear. As a result, they show a lack of anticipation of aversive events and a reduced ability to adjust their behavior in response to the negative consequences their behavior has led to in the past (Birbaumer et al., 2005). Empirical support for the low-fear model is drawn from experiments studying fear conditioning and from startle experiments. Using Pavlovian
conditioning, most studies have revealed weaker aversive conditioning in psychopaths as compared to healthy controls (Hare and Quinn, 1971). Lykken (1957) describes reduced anxiety levels based on questionnaires, and less electrodermal responses to conditioned stimuli (CSs) that were previously associated with shocks. Most, but not all of succeeding studies found a lack of skin conductance responding to aversive stimuli (for a review see Hare, 1998). Results regarding cardiovascular responses have varied. Some studies reported deficient phasic heart rate reactivity to aversive stimuli, other studies found increased anticipatory responses — a result that might indicate a successful anticipatory coping with the stimulus (Hare et al., 1978; Hare, 1982). Studies using event-related potentials in response to or in anticipation of aversive stimuli show very inconsistent results. Whereas some studies found no difference in components like N100, P200, or P300 between psychopaths and healthy controls (for a review see Raine, 1989), other studies found smaller P300 responses in psychopaths (Kiehl et al., 1999). Flor et al. (2002) studied aversive Pavlovian conditioning in noncriminal psychopaths and healthy controls. An unpleasant odor was used as unconditioned stimulus (US) and neutral faces as conditioned stimuli (CSs). Event-related potentials to the CS were used as indices. Whereas the healthy controls show significant differentiation between CS+ and CS , the psychopaths failed to exhibit a conditioned response while unconditioned responses were comparable between the two groups. N100, P200, and P300 to the CSs revealed that psychopaths were not deficient in information processing and showed even better anticipatory responding than the healthy controls indicated by the terminal contingent negative variation (tCNV). In a fMRI paradigm this research group studied the brain circuits underlying aversive conditioning. They used neutral faces as CSs and painful pressure as US and found that psychopaths in contrast to healthy controls show no activation in the limbic-prefrontal circuit including amygdala, orbitofrontal cortex, insula, and anterior cingulate. Additionally, they failed to show conditioned skin conductance and emotional valence ratings, although contingency and arousal ratings do not differ (Birbaumer et al., 2005). The
459
authors suggest that a hypoactive frontolimbic circuit may underlie the deficient responses (Flor et al., 2002; Veit et al., 2002; Birbaumer et al., 2005). In contrast to the results by Birbaumer et al. (2005), Schneider et al. (2000) found in an aversive conditioning paradigm using neutral faces as CS and an unpleasant odor as US that patients with an antisocial personality disorder in contrast to healthy controls show more activation in the amygdala and the dorsolateral prefrontal cortex (PFC). They found no differences between the two groups on the behavioral level. Schneider et al. (2000) suggest that this increase results from an additional effort to form negative emotional associations. In summary, the results indicate that psychopaths are able to detect the association between CS and US, but it seems that they do not process the emotional significance of the association. Another way for studying the processing of aversive affective states is the modulation of the startle response, especially the increase in the reflexive startle response during presentation of an aversive stimulus (Lang et al., 1990, 1998). Patrick et al. (1993) report that whereas the autonomic or self-reported responses to affective pictures did not differ between psychopaths and healthy controls, psychopaths show no startle potentiation to aversive pictures. Levenston et al. (2000) studied psychopathic and nonpsychopathic prisoners and showed that for psychopaths startle was inhibited during victim scenes and only weakly potentiated during threat. The authors interpreted the absence of a normal startle potentiation as a weakness in the capacity of aversive cues to prime defensive actions. In sum, the aversive conditioning and the startle experiments underline the severe difficulties in processing negative stimuli in individuals with psychopathy. The neural correlates underlying these impairments, especially the role of limbic areas and the PFC, is unclear.
The VIM of psychopathy The VIM suggests that there is a system that preferentially responds to sad and fearful stimuli
(Blair et al., 2002, 2001a). The model postulates that the functional integrity of this system is crucial for moral socialization. Negative emotions of other people are aversive for healthy subjects; therefore, they learn to avoid initiating behavior that results in sadness or fear of others. The VIM model predicts that psychopaths should show particular difficulty when processing sad and fearful expressions (Blair et al., 2002). Blair (2003) emphasizes that facial expressions have a communicatory function. They transmit rapidly information on the valence of objects or situations. The author suggests that expressions of fearfulness, sadness and happiness are reinforcers that modulate the probability of future behavior. Blair (1995, 2003) suggests that for psychopaths sad and fearful expressions are not aversive unconditioned stimuli. Therefore, psychopaths do not learn to avoid committing behavior that causes harm to others. Empirical support for this position is drawn from studies with children having psychopathic tendencies and with psychopathic adults showing reduced autonomic arousal to sad but not angry facial expressions (Blair, 1999, 2001, 2003; Stevens et al., 2001; Blair et al., 2001a; McMurran et al., 2002). In a study investigating the ability of adult psychopaths to process vocal affect, Blair et al. (2002) show that psychopathic inmates were particularly impaired in the recognition of fearful vocal affect. According to Blair’s (2003) expressions that serve as positive or negative reinforcers preferentially activate the amygdala. He argues that studies showing a reduced amygdaloid volume of psychopathic individuals relative to comparison individuals (Tiihonen et al., 2000), reduced amygdala activation during an emotional memory task (Kiehl et al., 2001), and aversive conditioning tasks (Veit et al., 2002) support his assumptions. Some studies examined amygdala activation in psychopaths (Schneider et al., 2000; Veit et al., 2002; Mu¨ller et al., 2003) and found inconsistent results. Whereas Veit et al. (2002) report reduced activation of the amygdala, the orbitofrontal cortex, the anterior cingulate, and the insula in psychopaths, Schneider et al. (2000) studying patients with antisocial personality disorder found enhanced amygdala and PFC activation. In an emotion induction paradigm, Mu¨ller et al. (2003)
460
found increased activation of the amygdala and prefrontal regions during processing of negative affective pictures, whereas pleasant affective pictures induce more activation in the orbitofrontal cortex. In sum, psychopaths processing sad or fearful social stimuli, like facial expressions or vocal affect, make more errors and show different cortical activation patterns in contrast to nonpsychopaths. Nevertheless, further investigation is needed to determine the neural correlates of this dysfunction, particularly the role of the orbitofrontal cortex (OFC) and the amygdala. Though we know nothing about the processing of angry faces in psychopaths. Like sad and fearful expressions angry faces have high communicatory functions. They signal discontent with a present behavior of an other person or with a given situation (Lawrence et al., 2002). Therefore, they may support behavioral extinction and reversal learning (Rolls, 1996). Similarly to the processing of sad and fearful expressions, the processing of angry expressions is closely related to the amygdala (Adolphs et al., 1999). Therefore, impairment of the amygdala should influence the processing of anger in psychopathic patients.
Cognition and psychopathy Although several studies report that there are no differences in neuropsychological functions between psychopaths and nonpsychopaths (Sutker and Allain, 1987; Hart et al., 1990), a possible impairment of the orbitofrontal cortex (OFC) in psychopathy is discussed. LaPierre et al. (1995) found in comparison to nonpsychopathic inmates that psychopathic inmates are impaired on all neuropsychological tests considered sensitive to orbitofrontal and ventromedial dysfunctions including a visual go/no-go task, Porteus Maze Q-scores (i.e., rule braking errors), and an odor identification task. LaPierre et al. (1995) observe no performance deficits of psychopathic patients on measures sensitive to dorsolateral-prefrontal and posterorolandic function (i.e., the Wisconsin card sorting test). Considering that the OFC is involved in altering previously acquired stimulus-reward
associations when they become inappropriate (Rolls, 2000), Newman et al. (1987) found that psychopaths show impaired extinction of previously rewarding responses in a single-pack card playing task. Blair et al. (2001b) found that the performance of boys with psychopathic tendencies on the four-pack gambling task developed by Bechara et al. (1999) was impaired relative to a control group. Two other studies (Schmitt et al., 1999; Blair and Cipolotti, 2000) found no differences between psychopaths and healthy controls on performance of the gambling task. Mitchell et al. (2002) investigated the performance of psychopaths on two neuropsychological tasks believed to be sensitive to dorsolateral prefrontal and orbitofrontal cortex functioning: the intradimensional/ extradimensional (ID/ED) shift task and the gambling task. They found that psychopaths were less likely to avoid making risky selections over the course of the gambling task relative to comparison individuals. On the ID/ED task the performance of psychopaths was not different from controls on attentional set-shifting, but they show significant impairments on response reversal. The authors suggest that the performance impairment may be representative of a dysfunction within a neural circuit including the amygdala and the OFC. The amygdala is involved in the formation of stimulusreinforcement associations (LeDoux, 1998). The OFC is involved in encoding motivational significance of cues and the incentive value of expected outcomes and is particularly important for appropriate responding when reinforcement contingencies change (Schoenbaum et al., 1998). Amygdala and OFC together are considered to play a key role in encoding and implementing associative information about the motivational significance of stimuli (Schoenbaum et al., 2000). There are some limitations for considering the OFC as the key cortical structure in psychopathy. Blair et al. (2001a) found differences in the OFC impairment between adults and boys with psychopathic tendencies. Considering the evidence for interdependence and functional connectivity of the OFC and the amygdala, Mitchell et al. (2002) argue that a primary deficit within the amygdala could give rise to deficits associated with OFC impairment and that the differences in OFC
461
impairment between adults and boys may be developmental consequences of the disorder. Thus it is unclear, whether the cognitive deficits of psychopathic patients are independent from the emotional impairment or whether they are secondary consequences of the emotional impairments.
The influence of emotions on cognitive processes Cognitive processes like memory, attention, and the inhibition of prepared responses and emotions both function as control systems influencing and regulating behavior (Carver and Schreier, 1990; Kosslyn and Koenig, 1992; Braver and Cohen, 2000). But emotion and cognition are not two separately working information-processing domains. Both processes are closely intertwined. Cognitive processes are able to regulate emotions (Ochsner and Gross, 2005) and conversely, emotional and motivational factors can significantly affect cognitive performance (Simpson et al., 2001b; Davidson, 2002; Davidson et al., 2003). In his model about the emotion–cognition interaction, Gray (2001) postulates that approach and withdrawal emotional states can enhance or impair cognitive performance depending on the particular emotion and cognitive process involved, and that different emotional states can have opposites effects. Heller and Nitschke (1998) found that positively valenced emotion tends to facilitate performance on tasks that depend more on the left PFC, whereas negatively valenced emotion tends to facilitate performance on tasks dependent on right PFC. According to Tomarken and Keener (1998) hypothesis approach- and withdrawal-related emotions bias the ability of PFC to organize behavior. Cognitive resource models predict that all emotional states consume resources and that these resources are no longer available for controlled cognition. Ellis and Ashbrook (1988) introduced an approach for explaining the effects of primary negative emotions (e.g., depressive states) on task performance. On the basis of the concept of limited processing capacity, their resource allocation model (RAM) assumes that emotions, especially
the cognitive consequences of emotions (e.g. affect-evoked intrusive thoughts), may increase the information processing load and drain attentional resources that otherwise might be devoted to task performance. Many results of different behavioral studies support this hypothesis (e.g. Forgas, 1995). Using a dual-task paradigm and P300 as a direct index of resources, Meinhardt and Pekrun (2003) examine the model assumptions with event-related EEG potentials (ERPs) and show that not only negative but also positive emotions compete with task-related processing resources. These results suggest that the RAM by Ellis and Ashbrook (1988) can be expanded to pleasant and appetitive experiences as well. In the past years, a growing number of studies have tried to characterize the neural basis of the interrelationship between cognition and emotion. For healthy subjects, studies with positron emission tomography (PET) or functional magnetic resonance imaging (fMRI) show a dynamic interplay between cognition and emotion (Mayberg et al., 1999; Bush et al., 2000; Yamasaki et al., 2002). Drevets and Raichle (1998) describe a reciprocal association between emotional and cognitive brain areas. In their review of brain mapping studies, they mention that in areas involved in emotional processing, such as the amygdala, the orbitofrontal cortex, and the ventral anterior cingulate cortex (ACC), blood flow increases during emotionrelated tasks, but blood flow decreases during performance of attention-demanding cognitive tasks. Conversely, blood flow increases during cognitive tasks in the dorsal anterior cingulate and the dorsolateral prefrontal cortices, and decreases in these areas during experimentally induced or pathological emotional states. The authors assume that such reciprocal patterns of regional cerebral blood flow may reflect an important cross-modal interaction during mental operations. Two studies demonstrated that the degree of activity reduction in emotion processing brain areas depends on a combined effect of attentional demands of the task and accompanying performance anxiety (Simpson et al., 2001a, b). Another way to evaluate the effects of emotion on cognitive processes is to examine how affective stimuli (e.g. words, pictures) modulate the activity
462
in brain regions known to participate in certain cognitive functions (Whalen et al., 1998; Simpson et al., 2000; Compton et al., 2003). Using fMRI Whalen et al. (1998) found that the emotional Stroop task activates a ventral subregion in the ACC, whereas a nonemotional-counting Stroop task engages a more dorsal region of the ACC. Compton et al. (2003) found that dorsolateral frontal lobe activity was increased by negative and incongruent color words, indicating that these regions are involved in sustaining selective attention in the presence of salient distractors. Activity in the left lateral OFC increased for negative as compared with neutral emotional words. Ignoring negative emotional words increases bilateral occipitotemporal activity and decreases amygdala activity. The authors suggest that emotion and attention are related via a network of regions that monitor for salient information, maintain attention on the task, suppress irrelevant information, and select appropriate responses. The majority of experiments interested in the interrelationship between emotion and cognition used emotional states (e.g. test anxiety) or pathological traits (e.g. depression) to study the impact of emotions on cognitive processes, or they studied the cognitive processing of emotional stimuli (e.g. emotional Stroop). Only few studies evaluate the effects of induced emotions on cognitive processes. Gray et al. (2002) examine the conjoint effects of cognitive task and emotional state manipulations on lateral PFC. The authors suggest that the integration of emotion and cognition would allow goal-directed control of behavior dependent on the emotional context, and this complex function is mediated by lateral PFC. They use fMRI to assess brain activity during the performance of two different demanding cognitive tasks (three-back memory task for verbal stimuli or affective faces) after a preceding emotional video was seen. An emotion–cognition interaction was found in bilateral PFC and was interpreted as evidence that emotion and cognition conjointly and equally contribute to the control of thought and behavior. Erk et al. (2003) investigated the impact of an emotional context on subsequent recall of emotionally neutral material. They found that in a
positive encoding context, recall was predicted by activation of right anterior parahippocampal and extrastriate visual brain areas, whereas in a negative encoding context recall was predicted by activation of the amygdala. In summary, studies investigating the impact of emotions on cognitive processes, while using different paradigms, show that there is a dynamic exchange between cognitive task performance and emotional states. Especially areas of the PFC, the cingulate cortex, and the limbic cortex play an important role. It seems to be that the degree to which both domains recruit similar or different brain regions depends on the degree to which they involve similar or different processing components (Compton et al., 2003).
The interaction of emotion and cognition in psychopathy In spite of empirical evidence that emotion processing is critically imbalanced in psychopathy and that emotional and cognitive processes are closely intertwined, it can be assumed that the emotional impairment of psychopaths also influences their cognitive processing. Williamson et al. (1991) studied psychopaths during performance of an emotional lexical decision task. In contrast to healthy controls, psychopaths show no difference in reaction time and no P200 differentiation between emotional and neutral words. However, they show better accuracy for emotional than neutral words. Lorenz and Newman (2002) investigated the influence of emotional cues on a lexical-decision task and showed that low-anxious psychopaths did not differ from controls regarding appraisal of emotional cues and lexical decisions. However, psychopaths were impaired in using the emotional informations. Intrator et al. (1997) used single photon emission computed tomography (SPECT) and found that psychopaths show greater activation bilaterally in temporofrontal areas for affective words in contrast to neutral words. They suggest that the activation increase is a functional correlate of extra effort, which is required to solve the task.
463
Gordon et al. (2004) studied a nonpsychiatric population using the psychopathy personality inventory (PPI; Lilienfeld and Andrews, 1996) to create subgroups with high and low trait measures of psychopathy. Participants performed a recognition task that required attention be given to either the affect or identity of face stimuli. No significant behavioral differences were found. But in the affect recognition task, low-scored participants show greater activation in right inferior frontal cortex, right amygdala, and medial PFC. Participants with high scores of trait psychopathy show significantly greater activation in visual cortex and right dorsolateral PFC. They found no differences between the two groups in response to the identity recognition task. Gordon et al. (2004) suggest that individuals with high trait measures of psychopathy show a special neural signature associated with different task processing strategies. All these studies used emotional stimuli to evaluate the affective influence on cognitive processes. But how does an affective context influence the processing of a cognitive task in psychopathy? Mu¨ller et al. (submitted) studied the impact of pleasant and unpleasant emotions on interference processing in psychopathic patients and healthy controls. They use fMRI to assess brain activity during the performance of a Simon task after emotion induction with positive, negative, and neutral pictures from the International Affective Picture System (IAPS; CSEA-NIMH, 1995) (Sommer et al., 2004). Whereas control subjects made more errors in the negative emotional context than in the positive or neutral context, psychopaths show no influence of the induced emotion on error rates, although they do not differ from healthy controls in their valence ratings. For healthy controls the activation data show a network of areas, which was sensitive to the ‘‘Emotion Task’’ interaction, including superior and inferior frontal gyrus, anterior cingulate, putamen, and thalamus. Especially negative emotions lead to less activation during incompatible trials compared to compatible trials. For psychopathic patients, no interaction effect was found. Whereas for healthy controls the negative emotions increase the information processing load and drain resources that otherwise might be devoted to the Simon task, in psychopathic patients
negative emotions consume no resources. These results suggest that for psychopaths, negative emotions do not drain attention and possibly have less informational content. Therefore, their regulatory influence on behavior is lower than in nonpsychopaths.
Conclusion Impaired emotional processing is a key factor in psychopathy. Psychopathic individuals show severe difficulties in aversive conditioning (Flor et al., 2002; Schneider et al., 2000; Birbaumer et al., 2005), the modulation of the startle response (Patrick et al., 1993; Levenston et al., 2000), and the processing of sad and fearful social stimuli, like affective facial expressions (Blair, 2001, 2003) or vocal affect (Blair et al., 2002). In psychopaths, negative emotions show no impact on cognitive processing (Mu¨ller et al., 2005). In summary, whereas in healthy controls negative emotions affect ongoing behavior, this regulatory influence is absent in psychopaths and this may be a key factor for their socialization problems.
References Adolphs, R., Tranel, D., Hamann, S., Young, A.W., Calder, A.J., Phelps, E.A., Anderson, A., Lee, G.P. and Damasio, A.R. (1999) Recognition of facial emotion in nine individuals with bilateral amygdala damage. Neuropsychologia, 37: 1111–1117. Bechara, A., Damasio, H., Damasio, A.R. and Lee, G.P. (1999) Different contributions of the human amygdala and ventromedial prefrontal cortex to decision-making. J. Neurosci., 19: 5473–5481. Birbaumer, N., Veit, R., Lotze, M., Erb, M., Hermann, C., Grodd, W. and Flor, H. (2005) Deficient fear conditioning in psychopathy. Arch. Gen. Psychiat., 62: 799–805. Blair, R.J.R. (1995) A cognitive developmental approach to morality: investigating the psychopaths. Cognition, 57: 1–29. Blair, R.J.R. (1999) Psychophysiological responsiveness to the distress of others in children with autism. Pers. Individ. Diff., 26: 477–485. Blair, R.J.R. (2001) Neurocognitive models of aggression, the antisocial personality disorders, and psychopathy. J. Neurol. Neurosurg. Psychiatry, 71: 727–731. Blair, R.J.R. (2003) Facial expressions, their communicatory functions and neuro-cognitive substrates. Philos. Trans. R. Soc. Lond. B, 358: 561–572.
464 Blair, R.J.R. and Cipolotti, L. (2000) Impaired social response reversal: A case of ‘acquired sociopathy’. Brain, 123: 1122–1141. Blair, R.J., Colledge, E., Murray, L. and Mitchell, D.G. (2001a) A selective impairment in the processing of sad and fearful expressions in children with psychopathic tendencies. J. Abnorm. Child Psychol., 29: 491–498. Blair, R.J.R., Colledge, E. and Mitchell, D.G.V. (2001b) Somatic markers and response reversal: is there orbitofrontal cortex dysfunction in boys with psychopathic tendencies? J. Abnorm. Child Psychol., 29: 499–511. Blair, R.J.R., Mitchell, D.G.V., Richell, R.A., Kelly, S., Leonard, A., Newman, C. and Scott, S.K. (2002) Turning a deaf ear to fear: impaired recognition of vocal affect in psychopathic individuals. J. Abnorm. Psychol., 111: 682–686. Braver, T.S. and Cohen, J.D. (2000) On the control of control: the role of dopamine in regulating prefrontal function and working memory. In: Monsell, S. and Driver, J. (Eds.), Control of Cognitive Processes: Attention and Performance XVIII. MIT Press, Cambridge, MA, pp. 713–737. Bush, G., Luu, P. and Posner, M.I. (2000) Cognitive and emotional influence in anterior cingulate cortex. Trends Cogn. Sci., 4: 215–222. Carver, C.S. and Schreier, M.F. (1990) Origins and functions of positive and negative affect: a control process view. Psychol. Rev., 97: 19–35. Center for the Study of Emotion and Attention (CSEA-NIMH) (1995) The International Affective Picture System: Digitized Photographs. The Center for Research in Psychophysiology, University of Florida, Gainesville, FL. Cleckley, H. (1941) The Mask of Sanity. Mosby, St. Louis, MO. Cleckley, H. (1976) The Mask of Sanity (5th edition). Mosby, St. Louis, MO. Cloninger, C.R. (1978) The antisocial personality. Hosp. Pract., 13: 97–106. Compton, R.J., Banich, M.T., Mohanty, A., Milham, M.P., Herrington, J., Miller, G.A., Scalf, P.E., Webb, A. and Heller, W. (2003) Paying attention to emotion: a fMRI investigation of cognitive and emotional Stroop tasks. Cogn. Affect. Behav. Neurosci., 3: 81–96. Davidson, R.J. (2002) Anxiety and affective style: role of prefrontal cortex and amygdala. Biol. Psychiatry, 51: 68–80. Davidson, R.J., Scherer, K.R. and Goldsmith, H.H. (2003) Handbook of Affective Sciences. Oxford University Press, New York. Drevets, W.C. and Raichle, M.E. (1998) Reciprocal suppression of regional cerebral blood flow during emotional versus higher cognitive processes: implications for interactions between emotion and cognition. Cogn. Emotion, 12: 353–385. Ellis, H.C. and Ashbrook, P.W. (1988) Resource allocation model of the effects of depressed mood states on memory. In: Fiedler, K. and Forgas, J. (Eds.), Affect, Cognition and Social Behavior. Hogrefe, Toronto, pp. 25–43. Erk, S., Kiefer, M., Grothe, J., Wunderlich, A.P., Spitzer, M. and Walter, H. (2003) Emotional context modulates subsequent memory effect. Neuroimage, 18: 439–447.
Flor, H., Birbaumer, N., Hermann, C., Ziegler, S. and Patrick, C.J. (2002) Aversive Pavlovian conditioning in psychopaths: peripheral and central correlates. Psychophysiology, 39: 505–518. Forgas, J.P. (1995) Mood and judgement: the affect infusion model (AIM). Psychol. Bull., 117: 39–66. Fowles, D.C. (1980) The three arousal model: implications of Gray’s two-factor learning theory for heart rate, electrodermal activity, and psychopathy. Psychophysiology, 17: 87–104. Gordon, H.L., Baird, A.A. and End, A. (2004) Functional differences among those high and low on a trait measure of psychopathy. Biol. Psychiatry, 56: 516–521. Gray, J.R. (2001) Emotional modulation of cognitive control: approach-withdrawal states double-dissociate spatial from verbal two-back task performance. J. Exp. Psychol. Gen., 130: 436–452. Gray, J.R., Braver, T.S. and Raichle, M.E. (2002) Integration of emotion and cognition in the lateral prefrontal cortex. Proc. Natl. Acad. Sci. USA, 99: 4115–4120. Hare, R.D. (1982) Psychopathy and physiological activity during anticipation of an aversive stimulus in a distraction paradigm. Psychophysiology, 19: 266–271. Hare, R.D. (1991) Manual for the Hare Psychopathy ChecklistRevised. Multi Health Systems, Toronto, Canada. Hare, R.D. (1998) Psychopathy, affect, and behavior. In: Cooke, D., Forth, A. and Hare, R. (Eds.), Psychopathy: Theory, Research, and Implications for Society. Kluwer, Dordrecht, pp. 105–137. Hare, R.D., Clark, D., Grann, M. and Thornton, D. (2000) Psychopathy and the predictive validity of the PCL-R: an international perspective. Behav. Sci. Law, 18: 623–645. Hare, R.D., Frazelle, J. and Cox, D.N. (1978) Psychopathy and physiological responses to threat of an aversive stimulus. Psychophysiology, 15: 165–172. Hare, R.D. and Quinn, M.J. (1971) Psychopathy and autonomic conditioning. J Abnorm. Psychol., 77: 223–235. Hart, S.D., Forth, A.E. and Hare, R.D. (1990) Performance of criminal psychopaths on selected neuropsychological tests. J. Abnorm. Psychol., 99: 374–379. Heller, W. and Nitschke, J.B. (1998) The puzzle of regional brain activity in depression and anxiety: the importance of subtypes and comorbidity. Cogn. Emotion, 12: 421–447. Intrator, J., Hare, R., Stritzke, P., Brichtswein, K., Dorfman, D., Harpur, T., Bernstein, D., Handelsman, L., Schaefer, C., Keilp, J., Rosen, J. and Machac, J. (1997) A brain imaging (single photon emission computerized tomography) study of semantic and affective processing in psychopaths. Biol. Psychiatry, 42: 96–103. Kiehl, K.A., Brink, J., Hare, R.D. and McDonald, J. (1999) Reduced P300 responses in criminal psychopaths during a visual oddball task. Biol. Psychiatry, 45: 1498–1507. Kiehl, K.A., Smith, A.M., Hare, R.D., Mendrek, A., Forster, B.B., Brink, J. and Liddle, P.F. (2001) Limbic abnormalities in affective processing by criminal psychopaths as revealed by functional magnetic resonance imaging. Biol. Psychiatry, 50: 677–684.
465 Kosslyn, S.M. and Koenig, O. (1992) Wet Mind. Free Press, New York. Lang, P.J., Bradley, M.M. and Cuthbert, B.N. (1990) Emotion, attention, and the statle reflex. Psychol. Rev., 97: 377–395. Lang, P.J., Bradley, M.M. and Cuthbert, B.N. (1998) Emotion and motivation: measuring affective perception. J. Clin. Neurophysiol., 15: 397–408. LaPierre, D., Braun, C.M.J. and Hodgins, S. (1995) Ventral frontal deficits in psychopathy: neuropsychological test findings. Neuropsychologia, 33: 139–151. Lawrence, A.D., Calder, A.J., McGowan, S.W. and Grasby, P.M. (2002) Selective disruption of the recognition of facial expressions of anger. Neuroreport, 13: 881–884. LeDoux, J. (1998) The Emotional Brain. Weidenfeld & Nicholson, New York. Levenston, G.K., Patrick, C.J., Bradley, M.M. and Lang, P.J. (2000) The psychopath as observer: emotion and attention in picture processing. J. Abnorm. Psychol., 109: 373–385. Lilienfeld, S.O. (1994) Conceptual problems in the assessment of psychopathy. Clin. Psychol. Rev., 14: 17–38. Lilienfeld, S.O. and Andrews, B.P. (1996) Development and preliminary validation of self-report measure of psychopathic personality traits in noncirminal populations. J. Pers. Assess., 66: 488–524. Lorenz, A.R. and Newman, J.P. (2002) Deficient response modulation and emotion processing in low-anxious Caucasian psychopathic offenders: results from a lexical decision task. Emotion, 2: 91–104. Lykken, D.T. (1957) A study of anxiety in the sociopathic personality. J. Abnorm. Soc. Psychol., 55: 6–10. Mayberg, H.S., Liotti, M., Brannan, S.K., McGinnis, S., Mahurin, R.K., Jerabek, P.A., Silva, J.A., Tekell, J.L., Martin, C.C., Lancaster, J.L. and Fox, P.T. (1999) Reciprocal limbiccortical function and negative mood: converging PET findings in depression and normal sadness. Am. J. Psychiatry, 156: 675–682. McMurran, M., Blair, M. and Egan, V. (2002) An investigation of the correlations between aggression, impulsiveness, social problem-solving, and alcohol use. Aggress. Behav., 28: 439–445. Meinhardt, J. and Pekrun, R. (2003) Attentional resource allocation to emotional events: an ERP study. Cogn. Emotion, 17: 477–500. Mitchell, D.G.V., Colledge, E., Leonard, A. and Blair, R.J.R. (2002) Risky decisions and response reversal: is there evidence of orbitofrontal cortex dysfunction in psychopathic individuals? Neuropsychologia, 40: 2013–2022. Mu¨ller, J.L., Sommer, M., Wagner, V., Lange, K., Taschler, H., Ro¨der, C.H., Schuierer, G., Klein, H. and Hajak, G. (2003) Abnormalities in emotion processing within cortical and subcortical regions in criminal psychoaths: evidence from a functional magnetic resonance imaging study using pictures with emotional content. Biol. Psychiatry, 54: 152–162. Mu¨ller, J.L., Weber, T., Sommer, M., Do¨hnel, K., Meinhardt, J. and Hajak, G. Why criminal psychopaths act in cold
blood: evidence from a fMRI study on the interaction of cognition and emotion. Am. J. Psychiatry, submitted. Newman, J.P., Patterson, C.M. and Kosson, D.S. (1987) Response preservation in psychopaths. J. Abnorm. Psychol., 96: 145–148. Ochsner, K.N. and Gross, J.J. (2005) The cognitive control of emotion. Trends Cogn. Sci., 9: 242–249. Patrick, C.J. (1994) Emotion and psychopathy: startling new insights. Psychophysiology, 31: 319–330. Patrick, C.J., Bradley, M.M. and Lang, P.J. (1993) Emotion in the criminal psychopath: startle reflex modulation. J. Abnorm. Psychol., 102: 82–92. Pinel, P. (1809) Traite´ medico-philosophique sur l’alie´nation mentale (2nd edition). Brosson, Paris. Raine, A. (1989) Evoked potentials and psychopathy. Int. J. Psychophysiol., 8: 1–16. Rolls, E.T. (1996) The orbitofrontal cortex. Philos. Trans. R. Soc. Lond. B Biol. Sci., 351: 1433–1443. Rolls, E.T. (2000) The orbitofrontal cortex and reward. Cereb. Cortex, 10: 284–294. Sass, H. and Herpertz, S. (1995) The history of personality disorders. In: Berrios, G. and Porter, R. (Eds.), A History of Clinical Psychiatry. Athlone, London, pp. 633–644. Schmitt, W.A., Brinkley, C.A. and Newman, J.P. (1999) Testing Damasio’s somatic marker hypothesis with psychopathic individuals: risk takers or risk averse? J. Abnorm. Psychol., 108: 538–543. Schneider, F., Habel, U., Kessler, C., Posse, S., Godd, W. and Mu¨ller-Ga¨rtner, H.-W. (2000) Functional imaging of conditioned aversive emotional responses in antisocial personality disorder. Neuropsychobiology, 42: 192–201. Schoenbaum, G., Chiba, A.A. and Gallagher, M. (1998) Orbitofrontal cortex and basolateral amygdala encode expected outcomes during learning. Nat. Neurosci., 1: 155–159. Schoenbaum, G., Chiba, A.A. and Gallagher, M. (2000) Changes in functional connectivity in orbitofrontal cortex and basolateral amygdala during learning and reversal training. J. Neurosci., 20: 5179–5189. Simpson, J.R., Drevets, W.C., Snyder, A.Z., Gusnard, D.A. and Raichle, M.E. (2001a) Emotion-induced changes in human medial prefrontal cortex: II. during anticipatory anxiety. Proc. Natl. Acad. Sci. USA, 98: 688–693. Simpson, J.R., O¨ngu¨r, D., Akbudak, E., Conturo, T.E., Ollinger, J.M., Snyder, A.Z., Gusnard, D.A. and Raichle, M.E. (2000) The emotional modulation of cognitive processing: an fMRI study. J. Cogn. Neurosci., 12: 157–170. Simpson, J.R., Snyder, A.Z., Gusnard, D.A. and Raichle, M.E. (2001b) Emotion-induced changes in human medial prefrontal cortex: I. during cognitive task performance. Proc. Natl. Acad. Sci. USA, 98: 683–687. Sommer, M., Mu¨ller, J.L., Weber, T. and Hajak, G. (2004) Die Bedeutung von Affekt und Emotion fu¨r psychiatrische Erkrankungen. Psychiatr. Prax., 31: S64. Stevens, D., Charman, T. and Blair, R.J.R. (2001) Recognition of emotion in facial expressions and vocal tones in children with psychopathic tendencies. J. Genet. Psychol., 162: 201–210.
466 Sutker, P.B. and Allain, A.N. (1987) Cognitive abstraction, shifting and control: clinical sample comparisons on psychopaths and non-psychopaths. J. Abnorm. Psychol., 96: 73–75. Tiihonen, J., Hodgins, S., Vaurio, O., Laasko, M., Repo, E., Soinen, H., Aronen, H.J., Nieminen, P. and Savolainen, L. (2000) Amygdaloid vo lume loss in psychopathy. Soc. Neurosci. Abstr., 2017. Tomarken, A.J. and Keener, A.D. (1998) Frontal brain asymmetry and depression: a self-regulatory perspective. Cogn. Emotion, 12: 387–420. Veit, R., Flor, H., Erb, M., Hermann, C., Lotze, M., Grodd, W. and Birbaumer, N. (2002) Brain circuits involved in
emotional learning in antisocial behavior and social phobia in humans. Neurosci. Lett., 328: 233–236. Whalen, P.J., Bush, G., McNally, R.J., Wilhelm, S., McInerney, S.C., Jenike, M.A. and Rauch, S.L. (1998) The emotional counting stroop paradigm: a functional magnetic resonance imaging probe of the anterior cingulate affective division. Biol. Psychiatry, 44: 1219–1228. Williamson, S., Harpur, T.J. and Hare, R.D. (1991) Abnormal processing of affective words by psychopaths. Psychophyiology, 28: 260–273. Yamasaki, H., LaBar, K.S. and McCarthy, G.M. (2002) Dissociable prefrontal brain systems for attention and emotion. Proc. Natl. Acad. Sci. USA, 99: 11447–11451.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 26
Disordered emotional processing in schizophrenia and one-sided brain damage Katarzyna Kucharska-Pietura1, 1
Whitchurch Hospital, Cardiff and Vale NHS Trust, Cardiff CF 14 7XB, UK
Abstract: The work concentrates on the problem of human emotions in healthy and pathologically changed brains, mainly in persons afflicted with schizophrenia or with organic impairments localized in one of the cerebral hemispheres. This chapter presents the state of current knowledge concerning the hemispheric lateralization of emotions among healthy people, psychiatric patients, and patients with one-sided brain lesion, on the basis of clinical observations, the results of experimental work, and the newest neuroimaging techniques. The numerous experiments and scientific methods used to assess the hemispheric lateralization of emotions and the discrepancies in their results point toward a lack of consistent theory in the field of hemispheric specialization in the regulation of emotional processes. Particular scientific interest was taken in the emotions of persons afflicted with schizophrenia, either in its early or late stages. This was inspired by the emotional behavior of schizophrenic patients on a psychiatric ward and their ability to perceive and express emotions during various stages of the schizophrenic process. In order to examine the cerebral manifestations of emotional deficits and the specialization of cerebral hemispheres for emotional processes, the author has described the emotional behavior of patients with unilateral cerebral stroke, i.e., patients with damage to the right or left cerebral hemisphere. Overall, the inferior performance of emotional tasks by right-hemisphere-damaged patients compared to other groups might support righthemisphere superiority for affect perception despite variations in the stimuli used. Keywords: hemispheric specialization; schizophrenia; one-sided brain damage
chimeric
faces;
perception;
emotion
expression;
two main hypotheses. The first one assumes that structures analyzing both positive and negative emotions are related exclusively to the right hemisphere (RH) (Borod et al., 1983, 1988; David and Cutting, 1990; Kucharska-Pietura et al., 2002). According to the second standpoint — valence hypothesis — the brain laterality regarding emotions is dependent on the emotional valence: The RH controls negatively valenced emotions and the left hemisphere (LH) those that are positively valenced (Reuter-Lorenz and Davidson, 1981; Bryden et al., 1982; Mandal et al., 1998; Adolphs et al., 2001; Simon-Thomas et al., 2005). Heilman (1997) updated this hypothesis by
Hemispheric specialization for emotional processing Increasingly more experimental data and clinical studies demonstrate the brain asymmetry in the processing of affect (Reuter-Lorenz and Davidson, 1981; Sackeim et al., 1982; Dawson and Fisher, 1994; Altenmu¨ller et al., 2002; Noesselt et al., 2005). The clinical observation of psychiatric and neurological patients with right- or left-brain-hemisphere damaged, the findings of amytal trials, and neuropsychological experimental studies confirm Corresponding author. Tel.: +44-2920693941; Fax: +44-2920627954; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56026-1
affects
467
468
discussing a model of a modular cortical network influencing the limbic structures. According to this theory, the frontal lobes play a key role for valence with the left mediating positive emotions and the right negative affect. Furthermore, Heilman (1997) highlighted the crucial role of the RH in activating arousal systems and of the LH in modulating inhibition of these systems. Finally, he concluded that the orbito-frontal regions mediate avoidance behaviors and the parietal lobes mediate approach behaviors. Emotional processing in the context of brain asymmetry has been studied utilizing the methods of functional neuroimaging (Northoff et al., 2004; Simon-Thomas et al., 2005). Blood et al. (1999) using positron emission tomography (PET) assessed emotional responses to pleasant and unpleasant music in healthy volunteers. The authors showed a significant increase of cerebral blood flow in the right parahippocampal gyrus in response to unpleasant stimuli and a decrease of blood flow in the orbitofrontal cortex and the subcallosal cingulum. Noesselt et al. (2005) used event-related functional magnetic resonance imaging (fMRI) to assess the asymmetrical brain activation during processing of fearful versus neutral faces in healthy subjects. They found right-lateralized brain activations for fearful minus neutral left-hemifield faces in right visual areas as well as more activity in the right than in left amygdale. This might suggest right-lateralized emotional processing during bilateral stimulation involving enhanced coupling of the amygdale and RH extrastriate cortex. Utilizing electroencephalography (EEG), Schmidt and Trainor (2001) assessed frontal brain electrical activity in healthy controls during their rating on valence and intensity of four orchestral pieces. The authors found a greater activity relative to the left frontal EEG in response to positive stimuli and greater activity relative to the right frontal activity for negative stimuli. A similar lateralization pattern was shown by Altenmu¨ller et al. (2002) who found positive emotional attributions accompanied by an increase in the left temporal activation and negative emotional attributions by an increase in the right frontotemporal cortex. Further, event-related potentials (ERPs)
activity was examined during a higher cognitive task (Simon-Thomas et al., 2005). The authors observed that the RH is more influenced by negative emotions and concurrent cognitive processing is consistent with the valence hypothesis proposing, that the RH is dominant for processing withdrawal-related emotions, and that LH may play crucial role in approach-related emotions.
The perception of emotional chimeric faces in schizophrenia and brain damage: further evidence of right-hemisphere dysfunction Neuropsychological research has established that the right cerebral hemisphere plays a special role in tasks involving the direction of spatial attention — especially to the contra-lateral hemispace — recognition of faces, perception, and possibly generation of affect (Davidson, 1984; Borod and Koff, 1989; Pally, 1998; Lane et al., 1999; KucharskaPietura et al., 2002). One of the simplest to administer yet most robust neuropsychological test that elicits right cerebral hemisphere ‘‘dominance’’ is the chimeric faces test. This was first developed by Levy et al. (1983) using chimeras combining photographs of open-mouthed smiles with neutral facial expressions, joined at the vertical meridian. Normal dextrals perceive those faces with a smile to their left as happier. This ‘‘left hemiface bias’’ is observed regardless of the means of presentation (in free vision or tachistoscopically) (Grega et al., 1988; Phillips and David, 1997) and has been interpreted as evidence of RH dominance in facial processing (Kolb et al., 1983; Levy et al., 1983). Adaptations of this technique have been developed such as the use of happy–sad chimeric face drawings (David, 1989). The realism of the faces (Hoptman and Levy, 1988), the valence of the expression (Christman and Hackworth, 1993), the relevance of affect to the task (Luh et al., 1991, 1994), and age (Levine and Levy, 1986) have all been found to exert little effect on the perceptual bias which is remarkably robust. Indeed a similar perceptual asymmetry has been found with nonfacial stimuli (Luh et al., 1991, 1994) but it is of smaller magnitude. So while stimuli utilizing facial expressions evoke the strongest bias, it is more accurate to
469
refer to the phenomenon as a left-hemispatial bias (LHB). Further evidence for the RH ‘‘dominance’’ on this task comes from the fact that non-right handers show a reduction or even reversal of the LHB (Levy et al., 1983; Hoptman and Levy, 1988; David, 1989; Luh et al., 1994). Versions of the chimeric faces test have been used to probe RH functioning in patients with psychiatric disorders (David and Cutting, 1990; Gooding et al., 2001). Such tests have the advantage of being simple and quick to administer and they require very little effortful cognitive processing. Jaeger et al. (1987) were first to report a reduction in LHB in patients with unipolar depressive illness. David and Cutting (1990) replicated the reduced LHB in depression using a happy–sad chimeric drawings test (David, 1989). They also showed a nonsignificant increased LHB in patients with mania and an absence of bias in schizophrenia. These findings were replicated in an independent sample (David, 1993). Apart from non-right handedness (Levy et al., 1983; David, 1989; Luh et al., 1994; Jessimer and Markham, 1997), the other factors which have been shown to modify LHB using facial expression chimeras are psychiatric disorders and certain
personality types (Jessimer and Markham, 1997). David and Cutting (1990) were first to demonstrate markedly reduced LHB in patients with schizophrenia and this has been replicated (Phillips and David, 1997; Lior and Nachson, 1999). Furthermore reductions in LHB have recently been shown in males with schizotypal traits (Mason and Claridge, 1999) and young people who are theoretically ‘‘psychosis prone’’ (Luh and Gooding, 1999). Kucharska-Pietura and Klimkowski (2002b) compared 100 remitted schizophrenic inpatients with both healthy controls (n ¼ 50) and the patients with RH (n ¼ 30) and LH damage (n ¼ 30) and found that the pattern of the performance on the chimeric faces was similar between schizophrenic patients (particularly chronic ones) and the right-brain-damaged group that showed reduced LHB. Further, left-brain-damaged patients did not affect LHB and was very comparable to the pattern of results obtained in the healthy controls (Fig. 1). The results show that the schizophrenic patients and right-brain-damaged patients had a significantly weaker LHB when judging affect from schematic chimeric faces, compared with healthy
Fig. 1. Mean values of LHF bias in each subject group: S, first episode schizophrenics; CS, chronic schizophrenic; N, normal controls; R right-brain-damaged patients; L, left-brain-damaged patients.
470
subjects and left-brain-damaged patients. This was most marked in the chronic patients and rightbrain-damaged patients who showed comparable pattern of performance, on average a right-sided perceptual bias. This in turn suggests RH dysfunction or, at the very least, an alteration in the balance between the hemispheres. The most obvious interpretation of these results in relation to schizophrenia is that it is a disorder of RH functioning. Indeed there is considerable evidence for this from a range of neuropsychological studies, particularly those employing emotional or facial stimuli (David and Cutting, 1990; Kucharska-Pietura et al., 2002). Furthermore, the findings are extended in that they show that the loss of LHB is present early in the course of the disorder but is more pronounced in chronic patients (Bryson et al., 1998; Wo¨lwer et al., 1996; Kucharska-Pietura et al., 2002). Longitudinal studies are required to determine whether lack of perceptual bias precedes or follows a chronic course. Lior and Nachson (1999) found a weaker than normal left-hemiface bias in their sample of schizophrenic patients and reported that the preponderance of symptoms, negative or positive, influenced the affect label designated to the stimuli. Kucharska-Pietura et al. (2002) showed that LHB was not influenced by symptom type, or medication dose. Further, self-reported mood did not appear to correlate with LHB in controls or patients — replicating previous studies (Harris and Snyder, 1992; David and Cutting, 1990). The effect of motor lateralization was explored in more detail by sub-dividing each group — all of whom reported right-handedness — in terms of foot and eye preference. The results for the controls and first-episode patients show that, within these strict parameters, atypical motor laterality affects LHB in the predicted way, that is, non-right sidedness is associated with reduced left- or indeed right-perceptual bias. The pattern in the chronic patients is less clear, presumably because they fail to show the normal LHB as a group, but is consistent with earlier reports of anomalous lateralization (Piran et al., 1982; Robertson and Taylor, 1987; Gorynia and Dudeck, 1996). The most obvious interpretation of these results is that they reflect RH dysfunction (David and
Cutting, 1992), particularly in systems that direct spatial attention (Phillips and David, 1997). The role of the LH is unclear with respect to this task, performance being unaffected by LH resections (Kolb et al., 1983). However, an imbalance between the two hemispheres, with the right failing to predominate and the left apparently ‘‘overactive’’ would also account for the results. It is more difficult to link this to particular features of the disorder since there is no obvious link to symptom clusters or self-reported mood. In summary, the majority of the results replicated the lack of the normal left-perceptual bias in schizophrenic patients, regardless of symptom profile, which is attributable to right-brain dysfunction. The use of chimeric faces as a means of tapping anomalous spatial attentional bias in people with or at risk of developing schizophrenia is recommended. Future work using functional neuroimaging techniques or cortical electrophysiology may shed light on the nature and precise anatomical location of the dysfunction.
Emotion perception in schizophrenia and one-sided brain damage Over the last few decades there have been numerous studies examining the perception of facial affect in healthy population and pathology (Pally, 1998; Herbener et al., 2005; Martin et al., 2005; Scholten et al., 2005), however the neural correlates of emotional and facial processes are still less clear cut and need further investigation. Perception of facial emotion is thought to be a complex cognitive ability which relies on the integrity of a select set of more basic neurocognitive processes such as visual scanning, working memory, and vigilance which may be asymmetrically distributed across the cerebral hemispheres (Kee et al., 1998). According to Bruce and Young (1986), facial affect processing is held to be dependent upon view-centered descriptions created at the structural encoding stage and involves neurological pathways different from the pathways used in unfamiliar face matching and familiar face recognition. This model has been confirmed by Cicone et al. (1980), Etcoff (1984), and Bowers et al. (1985).
471
A deficit in emotional perception in schizophrenia has been reported by numerous authors (Feinberg et al., 1986; Borod et al., 1993; Kerr and Neale, 1993; Edwards et al., 2001; Silver et al., 2002; Herbener et al., 2005). However, as yet the precise implications of this are unclear. All the results of the studies carried out in the last decade found a general cognitive deficit in schizophrenia, but in addition also found a specific emotional deficit, usually in the context of affect processes (Borod et al., 1993; Kerr and Neale, 1993; Salem et al., 1996). In support of this are studies showing that schizophrenia patients have a generalized performance deficit encompassing all facial emotions, but also have nonemotional faces which might suggest right cerebral hemisphere dysfunction (Novic et al., 1984; Feinberg et al., 1986; Archer and Hay, 1992; Heimberg et al., 1992; KucharskaPietura and Klimkowski, 2002a). However, others appear to have a specific difficulty with perceiving negative emotions in visual (Muzekari and Bates, 1977; Mandal and Rai, 1987; Cramer et al., 1989) and nonvisual modality (Mandal, 1998; Edwards et al., 2001). The question of the outcome of deficits in emotion perception still remains open. There are some findings demonstrating that schizophrenic patients in later stages of illness were significantly impaired compared to those in early stages of illness in recognizing all examined expressions (KucharskaPietura and Klimkowski, 2002a, b; Silver et al., 2002). This finding suggests a progressive impairment in emotion identification in schizophrenia, which may have resulted from treatment with typical antipsychotics, institutionalization, or the illness itself. However, only a prospective design would allow determining whether emotion perceptive deficits were truly progressive in schizophrenia. To date, the link between the perceptual emotion deficit and psychopathology has not been confirmed in schizophrenia. Moreover, the stability of such deficit and lack of correlations between performance on any emotion recognition tasks and variables have been already proven (Gaebel and Wo¨lwer, 1992; Kucharska-Pietura et al., 2002a, b). This pattern of results might be explained from some recent models advocating progressive, essentially neurodegenerative processes
involving exitotoxicity in addition to neurodevelopmental ones (Garver et al., 1999). Here, illness duration rather than symptom ratings per se would then be predicted to be correlated with task performance deficits. Further research examining the background of deficits in the perception of facial emotions is needed. Neuropsychological research activity on nonverbal behavior mainly leads to the conclusion that affect relies on specific neural pathways and, more particularly, that the RH plays a dominant role in various emotional processes (Davidson, 1984; Gainotti, 1984). This notion is supported by the findings that right-brain-damaged patients are significantly less accurate in decoding both positive and negative affect compared to left-brain-damaged patients and healthy controls (Bowers et al., 1985). However, involvement of the RH in emotion processing might be only a particular instantiation of the holistic processes for which it is assumed to be specialized (Buck, 1985). The right-brain-damaged patients and schizophrenic subjects at their early and late stages of illness revealed significantly greater impairment in the performance of facial affect than healthy controls (Kucharska-Pietura and Klimkowski, 2002a, b). Interestingly, the extent of these deficits among right-brain-damaged patients and chronic schizophrenics was comparable. Furthermore, both patient groups showed significant difficulties compared to healthy controls in recognition of unfamiliar neutral faces, although this deficit was less than the deficit of facial affect perception. Bowers et al. (1985) showed that the impairments of facial affect performance among RH-damaged patients remained even when their perceptual identity performance was statistically partialled out. This might suggest that the deficit in facial affect perception could not entirely stem from visuoperceptual impairment but also from different cognitive processes involved in matching views of unfamiliar faces (Bowers et al., 1985). We found a significant relationship between the affected hemisphere and valence for emotional decoding (Kucharska-Pietura et al., 2003). Rightbrain-damaged patients were significantly impaired compared to healthy volunteers in perceiving negative emotions only, whereas for perception of
472
positive ones, the group differences did not reach significance (Kucharska-Pietura and Klimkowski 2002a). These results lend some support to valence theory (Sackeim et al., 1982). Thus, one would predict that right brain damage leads to impaired perception negative but not positive emotion. The predominance of right-sided activation in recent neuroimaging studies may also reflect the essential role of the RH in the perception of negative emotion per se (Phillips et al., 1998; Lane et al., 1999). Borod and Koff (1989) examining recognition of facial affect and vocal prosody reported that schizophrenic patients performed poorly compared to normals but did not differ from patients with RH damage. In another study of Borod et al. (1993), schizophrenic patients, right-brain-damaged patients, and controls did not differ in identification of positive emotions, but schizophrenics and RH-damaged patients were significantly impaired relative to controls in identifying negative emotions, but they were not different from each other. To assess the category preferences in emotional decoding, the mean maximal and minimal ratings given to each of nine emotion categories were determined. Overall, the highest percentage of agreement was observed for happiness in each subject group. All subjects were significantly less accurate in perceiving facial emotions, such as shame, contempt, and disgust. Interestingly, both patient groups were significantly impaired compared to healthy volunteers in recognizing all examined expressions, although valence expression comparison showed significantly greater impairment in perceiving negative affect. These results lend some support to the conclusion that affect relies on specific neural pathways and, more particularly, that the RH plays a dominant role in processing of emotions strongly connected with survival, preparation for action, and vigilance processes (Heilman, 1982). In agreement with our data, the schizophrenic patients have been found before to recognize happiness most accurately and shame and disgust less correctly (Dougherty et al., 1974). Muzekari and Bates (1977) reported that chronic schizophrenic patients relative to healthy controls labeled negative emotions poorly but not
positive ones. Moreover, Mandal and Palchoudhury (1985) revealed significantly greater impairment in recognition of fear and anger by chronic schizophrenics compared to normals. These findings showing much worse ability for schizophrenia in identification of negative affect might also suggest the differential deficit in emotional decoding (Bell et al., 1997). Such deficit may stem from inefficiency of information processing following RH dysfunction. Another conclusion reached from the study is lack of significant correlation between performance on all perceptual measures and the demographic and clinical variables. Although, there was a main effect of age and of Mini Mental State Examination (MMSE) (Folstein et al., 1975) on facial task performance, McDowell et al. (1994) evaluated age differences in facial emotional decoding in healthy controls. Their findings suggest that the elderly were more impaired in processing negative affect compared to the younger group, while their ability to process positive affect remains intact. This might support the hypothesis that the RH declines more rapidly than LH in the aging process. In our study positive and negative psychopathology as measured by the Positive and Negative Syndrome Scale (PANSS) (Kay et al., 1987) has no influence on the task performance in schizophrenia which it is in accord with previous evidence of a lack of relationship between the perceptual emotion deficit and psychopathology (Novic et al., 1984; Lewis and Garver, 1995; Bellack et al., 1996; Kucharska-Pietura et al., 2002). Moreover, there was also no significant correlation of age, years of education, and the mean score of MMSE, which might suggest stable deficits in the perception of emotions. Current mood did not also appear to alter the task performance among schizophrenics. Moreover, lack of significant correlation between emotion perception performance and neuroleptic dose in both our study and previous works may militate against neuroleptic effects being a crucial factor responsible for deficit in emotion performance (Schneider et al., 1992; Lewis and Garver, 1995; Kucharska-Pietura et al., 2002; Kucharska-Pietura and Klimkowski, 2002a). This data like our results seems to suggest stable
473
properties of perceptual deficit more than its statedependent nature. The question of the stability and durability of deficits in the emotional decoding still remains open. Further investigations are needed to find a solution for the nature of generalized poor performance in emotional decoding in schizophrenia and unilateral brain damage.
Emotional expression in schizophrenia and brain damage Despite the fact that emotional disturbance is regarded as one of the most prominent schizophrenic symptoms, increasingly more often it has been indicated that in this illness there is nothing as simple as emotional disappearance (Mazurkiewicz, 1958). At present, searching for the mechanisms of ‘‘removing the feelings from our observation’’ in schizophrenia, pointed out by Bleuler, which has earlier been one of the topical issues (Bleuler, 1906), has become the challenging aim. There are still controversies as to the nature and magnitude of these disturbances. Experimental studies, carried out mainly in the last two decades, brought detailed findings confirming difficulties of schizophrenic patients in expression of emotions, particularly positive emotions, and also the failure of these patients in the identification of their own emotional experiences. Measurement of the activity of mimic muscles using electromyography in schizophrenia showed a higher activity of the corrugator supercili muscle and lesser activity of the zygomatic muscle compared to healthy controls (Earnst et al., 1996; Kring and Neale, 1996). This pattern of results might suggest that schizophrenic patients showed better ability to express negative emotions than positive ones. Studies on consequences of the damage to the right- or left-brain hemisphere stressed diverse profiles of emotional disturbances and different hypotheses have been advocated in relation to brain asymmetry in affect processing (Jackson, 1879; Gainotti, 1972; Bryden et al., 1982; KucharskaPietura et al., 2003). First hypothesis dates back to Jackson (1879) who observed inadequate attitude of the patients with RH damage to their own state
and situation, their difficulties in communication of emotions and elated mood. Furthermore, these patients have difficulties receiving and evaluating emotional signs (Gainotti, 1972; Herzyk and Rozenkiewicz, 1994; Kucharska-Pietura and Klimkowski 2002b; Kucharska-Pietura and Hunca-Bednarska, 2002). The impaired emotional expressivity of right-brain-damaged patients is displayed by poor mimic expression (Kaplan et al., 1990), disturbances in expressive prosody, and weakened gestures (Pally, 1998). ‘‘Catastrophic’’ reactions, depression, outbursts of tears, anger and aggression towards the environment, and complaints about health are characteristic of people with left-brain-hemisphere damage (Gainotti, 1972). Left-side brain damage does not result in disturbances of emotional expression (KucharskaPietura and Hunca-Bednarska, 2002; KucharskaPietura and Klimkowski, 2002b). Difficulty in recognizing and evaluating emotional signs in these patients was not reported. The assessment of emotional behavior in patients with paranoid schizophrenia during the early and late stages of schizophrenic process and in patients with unilateral brain damage was carried out with the Observational Scale of Emotional Behavior (Kucharska-Pietura and Klimkowski, 2002b) in which three categories related to the laterality of brain dysfunction were distinguished: (A) mood expression behavior related to attitude processes, (B) emotional reaction type, and (C) the form of the emotional communication. Indicatory scales, ordered from 1 to 31, are included in these categories. Attitude processes are expressed in the attitude of the studied individual to the situation of the examination, to the person carrying out this examination, by the self-estimation of the individual’s own health state and in the mood of those examined. The assessment of emotional reaction types included several behavioral aspects such as the frequency of diminishing the degree of the disorder, inadequate behaviors (e.g., joking in situations which are commonly considered joyless), euphoric reactions, and depressed behaviors (e.g., feeling of resignation). Part (C) of the Observational Scale of Emotional Behavior was aimed to assess the forms of
474
emotional communication. On one side, communication is the expression and conveying of information; on the other, it is reception and recognition of information. The scale content involves emotional expression (voluntary and involuntary), ability to verbalize emotional experiences (speaking about emotional experience), and ability to recognize emotionally colored communicates and to verbalize the understanding. The Observational Scale of Emotional Behavior was enriched by experimental tasks with the purpose of catching some behavior related to emotional communication. These tasks involved: voluntary face mimicry, the ability to express emotions with voice intonation and gesture when asked by the interviewer, and also understanding of verbal jokes and colloquial linguistic phrases regarding emotions (e.g., ‘‘I’m weak in the knees’’). The experimental trials were applied to capture emotional reactions difficult to observe in natural settings. The assessment of emotional reactions was based on observation and introspective material was not used. The observations were carried out in a natural social context during individual talks with the studied patients. The analysis of our data revealed statistically significant differences between the groups of patients, which were shown in their attitude to the examination, self-evaluation, present mood, and the types of emotional reaction. Individuals with left-side and right-side brain damage were different with regards to activity, adequacy of their evaluation, the quality of their mood, and the pattern of the emotional reactions. Both studied groups were found to be opposite in the assessed diameters (e.g., decreased mood — elated mood; positive attitude to the situation of the examination — negativism; inadequacy of the evaluation of their own health state — adequacy of this evaluation). Of those studied, the chronically schizophrenic individuals were characterized by the highest passivity and the highest mood indifference. The assessment of attitude processes and emotional reactions in patients with schizophrenia places them in the position between those with leftand right-side brain damage. The patients with left-side brain damage reveal decreased mood,
anxiety, and complained about the bad state of their health. As a rule, these symptoms are treated by the environment as a normal reaction to the everyday functioning impairment due to a loss of motion and language abilities. The patients with right-side brain damage evaluate their health state inadequately, overestimate their abilities, and show elated mood and often lack of insight of their illness (Herzyk and Rozenkiewicz, 1994; Pally, 1998). Using the Observational Scale of Emotional Behavior for the assessment of the schizophrenic patients’ emotionality presents a kind of novelty. However, the obtained findings remain consistent with generally established standpoints. The patients with longer lasting illness revealed passive attitude to the situation of examination and avoided emotional involvement significantly more often. This passivity is often inseparably connected with a negative dimension of the schizophrenic process. Behavioral change manifested itself in the loss of interest, feelings of pointlessness, passivity, and an indifferent attitude towards the environment (Kucharska-Pietura and Klimkowski, 2002b). It is worth stressing that the present study investigated those behaviors of the patients that are met commonly in their everyday living situations. The study was concentrated on social, although perhaps superficial, aspects of their functioning. The question, which should be asked at this stage of consideration, regards a possible relationship between changes in attitude processes and type of emotional reactions and unilateral brain damage. While damage in one of the cerebral hemispheres was evidenced in neurological patients, the alleged laterality of ‘‘damage’’ in schizophrenia remains unclear. Patients with schizophrenia seem to be more different from right-–brain-damaged patients. But with regards to the adequacy of their evaluation of their own health state and positive attitude to the examination, they appear to be closer to those with left-side brain damage. However, passiveness growing along with persistence in time of the schizophrenic process (chronic patients) and indifferent mood observed in all schizophrenic patients determine their specific ‘‘intermediate’’ position.
475
‘‘Decreasing the degree of their own disorders’’ was most often observed in the right-brain-damaged patients and in the schizophrenic patients. Those patients when asked about their dysfunctions frequently denied their occurrence or answered that they did not disturb them. Whilst taking up the task, they were frequently uncritical of their mistakes. The feelings of resignation in the group of schizophrenic patients occurred significantly less than in individuals with left-side brain lesion. These patients evaluated their present situation less frequently in a negative light. They did not feel any sense in treatment and in rehabilitation and their attitude to their future was critical. It is interesting that in the studied population of patients with brain damage, inadequate behaviors occurred more frequently than in schizophrenic patients. To sum up, a dichotomy division of emotional disturbances based on hemispheric damage localization appears to present a simplification of diagnostic understanding. It does not explain as to what degree emotional changes are direct consequences of brain mechanism disturbances and to what degree are the secondary reaction due to the perceptual, cognitive, and movement deficits (Herzyk and Rozenkiewicz, 1994). The attitude which processes disturbances and emotional reaction in the group of schizophrenic patients might not so much be the result of asymmetry of the brain hemispheres but rather the result of frontal lobes dysfunction which are responsible for steering and control of emotional behaviors (Heilman et al., 2000; Kohler et al., 2000). Data based on part (C) of the Observational Scale of Emotional Behavior showed that schizophrenic patients differed most from the other groups with respect to verbalizing emotions (Kucharska-Pietura and Hunca-Bednarska, 2002; Kucharska-Pietura and Klimkowski, 2002b). They spoke about emotional experiences significantly less frequently and did not understand verbal humor and wit significantly more often than others. They also had more difficulties in understanding common verbal expressions regarding emotions. This latter difficulty was ever increasing along with the duration of the disease. Patients with left-side brain damage spoke about emotional experiences more frequently than
those with right-sided damage — therefore the first group was closer to the norm. Wits and common phrase understanding regarding emotions did not distinguish left- and right-brain-damaged patients, nevertheless ‘‘left-sided’’ patients showed a tendency to understand wit better (statistical significance was close). Individuals with left-side brain damage behaved more frequently similar to healthy individuals with regards to speaking about emotional experiences, humor, and verbal wit understanding. Furthermore, schizophrenic patients exhibited deficits in the verbalization of emotions and did not differ significantly in this respect from all the other studied groups. Interestingly, verbalization of emotion, understood as speaking about emotional experiences and verbal wit understanding, did not worsen along with the duration of the illness, while the capacity for understanding the common linguistic phrases did deteriorate. The findings of this study indicate that schizophrenic individuals show similarity to the rightbrain-damaged patients with regards to expression and verbalization of emotions; however, these functions were significantly more deteriorated in schizophrenic patients. Similarity was greatest in nonverbal emotional expression, particularly in the ability for voluntary facial expression. No particular similarity was seen between these two groups in their ability to communicate emotionally with gestures. Verbalization of emotions (speaking about emotional experiences) significantly differentiated between those with right- and left-sided brain damage. Therefore, patients with rightbrain-hemisphere injury were closer to those with schizophrenia than all the other groups studied. At the same time, verbalization of emotions considered as humor and verbal wit understanding did not differ significantly between left- and right-brain-damaged patients (only a trend toward lower/higher significance was observed); therefore it is very difficult to discuss a clear similarity between those ‘‘right-sided’’ and those with schizophrenia. ‘‘Right-sided’’ patients presented good understanding of common linguistic phrases and no dysfunctions in this respect were observed. Difficulties in understanding verbal information with emotional content appeared to be specific to
476
the schizophrenic patients. With respect to the ability to talk about emotional experiences and verbal wit understanding, it may be that there was a deficit of pre-illness mental life in those patients or a deficit related to becoming ill itself, and then it remained on a relatively constant level. Our findings obtained from the Observational Scale of Emotional Behavior regarding emotional communication (emotional expression and understanding of emotional context) are in agreement with data from previous studies. Research carried out using a similar evaluation showed that patients with right-sided brain damage have difficulties in the perception and assessment of emotional content and fail to express emotions (Kaplan et al., 1990; Kohler et al., 2000). In our studies impaired mimic expression and weakened gesture were commonly observed (interestingly, weakened gesture was recorded also in individuals with left-sided brain damage; nevertheless, this weakness was less marked). In our study ‘‘right-sided’’ patients did not differ from the ‘‘left-sided’’ with regards to understanding verbal wit and common linguistic phrases related to emotions. Recent experimental studies confirm decreased expression of those suffering from schizophrenia (Stuss et al., 2000). No emotional dysfunction was found in patients with left-sided brain damage. Comparison of the emotional behaviors of individuals with schizophrenia to those with right-side brain damage allows us to formulate a hypothesis that in schizophrenia there is an anatomical irregularity or malfunction localized in the right brain hemisphere. Rotenberg (1994) suggested that schizophrenia results from functional insufficiency of the RH in combination with subtle brain damage. In this model RH hypoactivity leads to compensatory hyperactivity on the left that results in enhancement of dopaminergic system. Individuals with schizophrenia differ from those with right-side brain damage with regards to understanding wit and verbal humor as well as conveying feelings through common linguistic phrases, in the area they disclosed a peculiar helplessness. This specificity of schizophrenic patients could be explained by their characteristic cognitive deficits, particularly their tendency to have a given, particular attitude (metaphoric character
of linguistic phrases describing emotional states). Cognitive deficits in schizophrenic patients are probably related to more general disturbances in brain functioning than it happens to be in the case of patients with right-side brain injury. Bozikas et al. (2004) in their study showed significant relationships of affect perception, both facial and vocal as well as in everyday scenarios with several cognitive abilities. Their findings might support the notion that deficits in processing emotional stimuli in schizophrenia could be attributed to impairment in more basic neurocognitive domains. Undoubtedly, carefully well-thought-out experiments complementary to observations carried out in a natural context are necessary in order to explain these scientific controversies. Conclusions The patients with right-side brain damage reveal elated mood, inadequate self-assessment, authorization of behaviors, and the active and often negativistic attitude to the environment and depreciation of the degree of their disorders. The patients with left-side brain damage showed lowered mood, feelings of resignation, and active or rarely passive attitude to the environment and proper self-assessment. Mood indifference, mostly passive attitude to the environment and depreciation of the degree of their illness in the patients with schizophrenia, was something that was drawing attention. Deficits revealed in schizophrenic patients as to the emotional communication (both emotional expression and understanding of emotional situations) were more marked than in patients with unilateral brain damage. With regards to emotional behaviors, those suffering from schizophrenia were generally more similar to those with right-side brain damage than to those with left-side brain damage. This fact can be explained by similar localization of brain dysfunctions in both groups. The specificity of schizophrenic patients consisted in difficulties of understanding jokes and language expressions with emotional content, which should be understood not only in connection to the disturbances in the emotional sphere but also to cognitive dysfunctions and to more general brain dysfunctions (Kucharska-Pietura and Hunca-Bednarska, 2002).
477
References Adolphs, R., Jansari, A. and Tranel, D. (2001) Hemispheric perception of emotional valence from facial expressions. Neuropsychology, 15: 516–524. Altenmu¨ller, E., Schu¨rmann, K., Lim, V.K. and Parlitz, D. (2002) Hits to the left, flops to the right: different emotions during listening to music are reflected in cortical lateralization patterns. Neuropsychologia, 40: 2242–2256. Archer, J. and Hay, D.C. (1992) Face processing in psychiatric conditions. Br. J. Clin. Psychol., 31: 45–61. Bell, M., Bryson, G. and Lysaker, P. (1997) Positive and negative affect recognition in schizophrenia: a comparison with substance abuse and normal control subjects. Psychiatr. Res., 73: 73–82. Bellack, A.S., Blanchard, J.J. and Mueser, K.T. (1996) Cue availability and affect perception in schizophrenia. Schizophr. Bull., 22: 535–544. Bleuler, E. (1906) Affektivita¨t, Suggestibilita¨t, Paranoia. Verlag, Halle. Blood, A.J., Zatorre, R.J., Bermudez, P. and Evans, A.C. (1999) Emotional responses to pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nat. Neurosci., 2: 382–387. Borod, J.C., Kent, J., Koff, E., Martin, C. and Alpert, M. (1988) Facial asymmetry while posing positive and negative emotions: support for the right hemisphere hypothesis. Neuropsychologia, 26: 759–764. Borod, J.C. and Koff, E. (1989) The neuropsychology of emotion: Evidence from normal, neurological, and psychiatric populations. In: Perecman, T. (Ed.), Integrating Theory and Practice in Clinical Neuropsychology. Hillsdale, New York, pp. 175–215. Borod, J., Koff, E. and Caron, H.S. (1983) Right hemispheric specialization for the expression and appreciation of emotion: a focus on the face. In: Perecman, T. (Ed.), Cognitive Processing in the Right Hemisphere. Academic Press, New York, pp. 83–110. Borod, J.C., Martin, C.C., Alpert, M., Brozgold, A. and Welkowitz, J. (1993) Perception of facial emotion in schizophrenic and right brain-damaged patients. J. Nerv. Ment. Dis., 181: 494–502. Bowers, D., Bauer, R.M., Coslett, H.B. and Heilman, K.M. (1985) Processing of faces by patients with unilateral hemisphere lesions. Dissociation between judgements of facial affect and facial identity. Brain Cogn., 4: 258–272. Bozikas, V.P., Kosmidis, M.H., Anezoulaki, D., Giannakou, M. and Karavatos, A. (2004) Relationship of affect recognition with psychopathology and cognitive performance in schizophrenia. J. Int. Neuropsychol. Soc., 10: 549–558. Bruce, V. and Young, A. (1986) Understanding face recognition. Br. J. Psychol., 77: 305–327. Bryden, M.P., Lay, R.G. and Sugarman, J.H. (1982) A left-ear advantage for identifying the emotional quality of toned sequences. Neuropsychologia, 20: 83–87. Bryson, G., Bell, M., Kaplan, E., Greig, T. and Lysaker, P. (1998) Affect recognition in deficit syndrome schizophrenia. Psychiatr. Res., 77: 113–120.
Buck, R. (1985) Prime theory: an integrated view of motivation and emotion. Psychol. Rev., 92: 389–413. Christman, S.D. and Hackworth, M.D. (1993) Equivalent perceptual asymmetries for free viewing of positive and negative emotional expressions in chimeric faces. Neuropsychologia, 31: 621–624. Cicone, M., Wapner, W. and Gardner, H. (1980) Sensitivity to emotional expression and situations in organic patients. Cortex, 16: 145–158. Cramer, P., Weegmann, M. and O’Neil, M. (1989) Schizophrenia and the perception of emotions: how accurately do schizophrenics judge the emotional states of others? Br. J. Psychiatry, 155: 225–228. David, A.S. (1989) Perceptual asymmetry for happy–sad chimeric faces: effects of mood. Neuropsychologia, 27: 1289–1300. David, A.S. (1993) Spatial and selective attention in the cerebral hemispheres in depression, mania and schizophrenia. Brain Cogn., 23: 166–180. David, A.S. and Cutting, J.C. (1990) Affect, affective disorder and schizophrenia. A neuropsychological investigation of right hemisphere function. Br. J. Psychiatry, 156: 491–495. David, A.S. and Cutting, J.C. (1992) Categorical-semantic and spatial imaging judgements of non-verbal stimuli in the cerebral hemispheres. Cortex, 28: 39–51. Davidson, R.J. (1984) Affect, cognition, and hemispheric specialization. In: Izard, C.E., Kagan, J. and Zajonc, R.B. (Eds.), Emotion, Cognition, and Behavior. Cambridge University Press, Cambridge, pp. 320–365. Dawson, G. and Fisher, K.W. (1994) Human Behavior and Developing Brain. Guilford, New York. Dougherty, F.E., Bartlett, E.S. and Izard, C.E. (1974) Response of schizophrenics to expressions of the fundamental emotions. J. Clin. Psychol., 7: 243–246. Earnst, K.S., Kring, A.M., Kadar, M.A., Salem, J.E., David, A.S. and Loosen, P.T. (1996) Facial expression in schizophrenia. Biol. Psychiatry, 40: 556–558. Edwards, J., Pattison, P.E., Jackson, H.J. and Wales, R.J. (2001) Facial affect and affective prosody recognition in first-episode schizophrenia. Schizophr. Res., 48: 235–253. Etcoff, N.L. (1984) Selective attention to facial identity and facial emotion. Neuropsychologia, 22: 281–295. Feinberg, T.E., Rifkin, A., Schaffer, C. and Walker, E. (1986) Facial discrimination and emotional recognition in schizophrenia and affective disorders. Arch. Gen. Psychiatry, 43: 276–279. Folstein, M.F., Folstein, S.E. and McHugh, P.R. (1975) Minimental state examination: a practical method for grading the cognitive state of patients. Psychol. Res., 12: 189–198. Gaebel, W. and Wo¨lwer, W. (1992) Facial expression and emotional face recognition in schizophrenia and depression. Eur. Arch. Psy. Clin. Neurosci., 242: 46–52. Gainotti, G. (1972) Emotional behavior and hemispheric side of the lesion. Cortex, 8: 41–55. Gainotti, G. (1984) Some methodological problems in the study of the relationships between emotions and cerebral dominance. J. Clin. Neuropsychol., 6: 11–121.
478 Garver, D.L., Nair, T.R., Christensen, J.D., Holcomb, J., Ramberg, J. and Kingsbury, S. (1999) Atrophic and static (neurodevelopmental) schizophrenic psychoses: premorbid functioning, symptoms, and neuroleptic response. Neuropsychology, 21: 82–92. Gooding, D.C., Luh, K.E. and Tallent, K.A. (2001) Evidence of schizophrenia patients’ reduced perceptual biases in response to emotion chimera. Schizophr. Bull., 27: 709–716. Gorynia, I. and Dudeck, U. (1996) Patterns of lateral preference in psychotic patients. Neuropsychologia, 34: 105–111. Grega, D.M., Sackeim, H.A., Sanchez, E., Cohen, B.H. and Hough, S. (1988) Perceiver bias in the processing of human faces: neuropsychological mechanisms. Cortex, 24: 91–117. Harris, L.J. and Snyder, P.J. (1992) Subjective mood state and perception of emotion in chimeric faces. Cortex, 28: 471–481. Heilman, K.M. (1982) Discussant comments. In: Borod, J.C. and Buck, R. (Eds.), Asymmetries in Facial Expression: Method and Meaning. Symposium conducted at the International Neuropsychological Society, Pittsburgh. Heilman, K.M. (1997) The neurobiology of emotional experience. J. Neuropsychiat. Clin. Neurosci., 9: 439–448. Heilman, K.M., Blonder, L.X., Bowers, D. and Crucian, S.P. (2000) Neurological disorders and emotional dysfunction. In: Borod, J.C. (Ed.), The Neuropsychology of Emotion. Oxford University Press, New York, pp. 367–412. Heimberg, C., Gur, R.E., Erwin, R.J., Shtasel, D.L. and Gur, R.C. (1992) Facial emotion discrimination: III. Behavioral findings in schizophrenia. Psychiatr. Res., 42: 253–265. Herbener, E.S., Hill, S.K., Marvin, R.W. and Sweeney, J.A. (2005) Effects of antipsychotic treatment on emotion perception deficits in first-episode schizophrenia. Am. J. Psychiatry, 62: 1746–1748. Herzyk, A. and Rozenkiewicz, J. (1994) Neuropsychologiczna diagnoza zaburzen´ emocjonalnych. In: Klimkowski, M. and Herzyk, A. (Eds.), Neuropsychologia kliniczna. Wybrane zagadnienia. UMCS, Lublin, pp. 31–73. Hoptman, M.J. and Levy, J. (1988) Perceptual asymmetries in left- and right-handers for cartoon and real faces. Brain Cogn., 8: 178–188. Jackson, J.H. (1879) On affections of speech from diseases of the brain. Brain, 2: 203–222. Jaeger, J., Borod, J.C. and Peselow, E. (1987) Depressed patients have atypical biases in the perception of emotional chimeric faces. J. Abnorm. Psychol., 96: 321–324. Jessimer, M. and Markham, R. (1997) Alexithymia: A right hemisphere dysfunction specific to recognition of certain facial expressions? Brain Cogn., 34: 246–258. Kaplan, J.A., Brownell, H.H., Jacobs, J.R. and Gardner, H. (1990) The effects of right hemisphere damage on the pragmatic interpretation of conversational remarks. Brain Lang, 38: 315–333. Kay, S.R., Opler, L.A. and Fiszbein, A. (1987) Positive and Negative Syndrome Scale (PANSS) Rating Manual. Social and Behavioural Sciences Documents. San Rafael, Canada. Kee, K.S., Kern, R.S. and Green, M.F. (1998) Perception of emotion and neurocognitive functioning in schizophrenia: what’s the link? Psychiatr. Res., 81: 57–65.
Kerr, S.L. and Neale, J.M. (1993) Emotion perception in schizophrenia: specific deficit or further evidence of generalized poor performance? J. Abnorm. Psychol., 102: 312–318. Kohler, Ch.G., Gur, R.C. and Gur, R.E. (2000) Emotional processing in schizophrenia: a focus on affective states. In: Borod, J.C. (Ed.), The Neuropsychology of Emotion. Oxford University Press, New York. Kolb, B., Milner, B. and Taylor, L. (1983) Perception of faces by patients with localized cortical excisions. Can. J. Psychol., 37: 8–18. Kring, A.M. and Neale, J.M. (1996) Do schizophrenic patients show a disjunctive relationship among expressive, experiential, and psychophysiological components of emotion? J. Abnorm. Psychol., 105: 249–257. Kucharska-Pietura, K., David, A.S., Dropko, P. and Klimkowski, M. (2002) The perception of emotional chimeric faces in schizophrenia: further evidence of right hemisphere dysfunction. Neuropsychiatry Neuropsychol. Behav. Neurol., 15: 72–78. Kucharska-Pietura, K. and Hunca-Bednarska, A. (2002) Emotional behavior in schizophrenia and one-sided brain damage. Cerebral hemispheric asymmetry. Part I Psychiatr. Pol., 36: 421–434. Kucharska-Pietura, K. and Klimkowski, M. (2002a) Perception of facial affect in chronic schizophrenia and right brain damage. Acta Neurobiol. Exp., 62: 33–43. Kucharska-Pietura, K. and Klimkowski, M. (2002b) Clinical Aspects of Emotions in Healthy and Disordered Brain. Wydawnictwo Medyczne, Krakow, pp. 67–174. Kucharska-Pietura, K., Phillips, M., Gernand, W. and David, A.S. (2003) Perception of emotions from faces and voices following unilateral brain damage. Neuropsychologia, 41: 1082–1090. Lane, R.D., Chua, P.M. and Dolan, R.J. (1999) Common effects of emotional valence, arousal and attention on neural activation during visual processing of pictures. Neuropsychologia, 37: 989–997. Levine, S.C. and Levy, J. (1986) Perceptual asymmetry for chimeric faces across the life span. Brain Cogn., 5: 291–306. Levy, J., Heller, W. and Banich, M.T. (1983) Asymmetry of perception in free viewing of chimeric faces. Brain Cogn., 2: 404–419. Lewis, S. and Garver, D. (1995) Treatment and diagnostic subtype in facial affect recognition in schizophrenia. J. Psychiatr. Res., 29: 5–11. Lior, R. and Nachson, I. (1999) Impairments in judgement of chimeric faces by schizophrenic and affective patients. Int. J. Neurosci., 97: 185–209. Luh, K.E. and Gooding, D.C. (1999) Perceptual biases in psychosis-prone individuals. J. Abnorm. Psychol., 108: 283–289. Luh, K.E., Redl, J. and Levy, J. (1994) Left- and right-handers see people differently: free-vision perceptual asymmetries for chimeric stimuli. Brain Cogn., 25: 141–160. Luh, K.E., Rueckert, L.M. and Levy, J. (1991) Perceptual asymmetries for free viewing of several types of chimeric stimuli. Brain Cogn., 16: 83–103.
479 Mandal, M.K. and Palchoudhury, S. (1985) Responses to facial expression of emotion in depression. Psychol. Rep., 56: 633–654. Mandal, M.K. and Rai, A. (1987) Responses to facial emotion and psychopathology. Psychiatry Res., 20: 317–323. Mandal, M.K., Pandey, R. and Prasad, A.B. (1998) Facial expressions of emotions and schizophrenia: a review. Schizophr. Bull., 24: 399–412. Martin, F., Baudouin, J.Y., Tiberghien, G. and Franck, N. (2005) Processing emotional expression and facial identity in schizophrenia. Psychiatry Res., 134: 43–53. Mason, O. and Claridge, G. (1999) Individual differences in schizotypy and reduced asymmetry using the chimeric faces task. Cogn. Neuropsychiatry, 4: 289–301. Mazurkiewicz, J. (1958) Wst˛ep do psychofizjologii normalnej. Tom II, PZWL, Warszawa, pp. 10–78. McDowell, C.L., Harrison, D.W. and Demaree, H.A. (1994) Is right hemisphere decline in the perception of emotion a function of aging? Int. J. Neurosci., 79: 1–11. Muzekari, L.H. and Bates, M.E. (1977) Judgment of emotion among chronic schizophrenics. J. Clin. Psychol., 33: 662–666. Noesselt, T., Driver, J., Heinze, H.J. and Dolan, R. (2005) Asymmetrical activation in the human brain during processing of fearful faces. Curr. Biol., 15: 424–429. Northoff, G., Heinzel, A., Bermpohl, F., Niese, R., Pfennig, A., Pascual-Leone, A. and Schlaug, G. (2004) Reciprocal modulation and attenuation in the prefrontal cortex: an fMRI study on emotional–cognitive interaction. Hum. Brain Mapp., 21: 202–212. Novic, J., Luchins, D.L. and Perline, R. (1984) Facial affect recognition in schizophrenia: is there adifferential deficit? Br. J. Psychiatry, 144: 533–537. Pally, R. (1998) Bilaterality: Hemispheric specialization and integration. Int. J. Psycho-Anal., 79: 565–577. Phillips, M.L. and David, A.S. (1997) Viewing strategies for simple and chimeric faces: an investigation of perceptual bias in normals and schizophrenic patients using visual scan paths. Brain Cogn., 35: 225–238. Phillips, M.L., Young, A.W., Scott, S.K., Calder, A.J., Andrew, C. and Giampietro, V. (1998) Neural responses to facial and vocal expressions of fear and disgust. Proc. R. Soc. Lond. B Biol. Sci., 256: 1809–1817. Piran, N., Bigler, E.D. and Cohen, D. (1982) Motoric laterality and eye dominance suggest unique pattern of cerebral
organization in schizophrenia. Arch. Gen. Psychiatry, 39: 1006–1010. Reuter-Lorenz, P.A. and Davidson, R. (1981) Differential contributions of the two cerebral hemispheres to the perception of happy and sad faces. Neuropsychologia, 19: 609–613. Robertson, G. and Taylor, P.J. (1987) Laterality and psychosis: neuropsychological evidence. Br. Med. Bull., 43: 634–650. Rotenberg, V.S. (1994) An integrative psychophysiological approach to brain hemisphere functions in schizophrenia. Neurosci. Behav. Rev., 18: 487–495. Sackeim, H., Greenberg, M., Weiman, A., Gur, R., Hungerbuhler, J. and Geschwind, N. (1982) Functional brain asymmetry in the expression of positive and negative emotions: lateralization of insult in cases of uncontrollable emotional outbursts. Arch. Neurol., 19: 210–218. Salem, J.E., Kring, A.M. and Kerr, S.L. (1996) More evidence for generalized poor performance in facial emotion perception in schizophrenia. J. Abnorm. Psychol., 105: 480–483. Schmidt, L.A. and Trainor, L.J. (2001) Frontal brain electrical activity (EEG) distinguishes valence and intensity of musical emotions. Cogn. Emotion, 5: 482–500. Schneider, F., Ellgring, H., Friedrich, J., Fus, I., Beyer, T., Heimann, H. and Himer, W. (1992) The effects of neuroleptics on facial action in schizophrenic patients. Pharmacop sychiatry, 25: 233–239. Scholten, M.R., Aleman, A., Montagne, B. and Kahn, R.S. (2005) Schizophrenia and processing of facial emotions: sex matters. Schizophr. Res., 78: 61–67. Silver, H., Shlomo, N., Turner, T. and Gur, R.C. (2002) Perception of happy and sad facial expressions in chronic schizophrenia: evidence for two evaluative systems. Schizophr. Res., 55: 171–177. Simon-Thomas, E.R., Role, K.O. and Knight, R.T. (2005) Behavioral and electrophysiological evidence of a right hemisphere bias for the influence of negative emotion on higher cognition. J. Cogn. Neurosci., 17: 518–529. Stuss, D.T., van Reekum, R. and Murphy, K.J. (2000) Differentiation of states and causes of apathy. In: Borod, J.C. (Ed.), The Neuropsychology of Emotion. Oxford University Press, New York. Wo¨lwer, W., Streit, M., Polzer, U. and Gaebel, W. (1996) Facial affect recognition in the ourse of schizophrenia. Eur. Arch. Psychiatry Clin. Neurosci., 246: 165–170.
Anders, Ende, Jungho¨fer, Kissler & Wildgruber (Eds.) Progress in Brain Research, Vol. 156 ISSN 0079-6123 Copyright r 2006 Elsevier B.V. All rights reserved
CHAPTER 27
The biochemistry of dysfunctional emotions: proton MR spectroscopic findings in major depressive disorder Gabriele Ende1,, Traute Demirakca1,2 and Heike Tost1 1
NMR Research in Psychiatry, Central Institute of Mental Health, J5, 68159 Mannheim, Germany 2 Heidelberg Academy of Science, Heidelberg, Germany
Abstract: Key neural systems involved in the processing and communication of emotions are impaired in patients with major depressive disorder (MDD). Emotional and behavioral symptoms are thought to be caused by damage or dysfunction in specific areas of the brain that are responsible for directing attention, motivating behavior, and learning the significance of environmental stimuli. Functional brain studies with positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) give support for functional abnormalities in MDD that are predominantly located in areas known to play an important role in the communication and processing of emotions. Disturbances in emotional processing as they are observed in MDD, if any, have very subtle morphometrical brain correlates. With proton magnetic resonance spectroscopy (1H MRS), brain metabolites can be measured noninvasively in vivo, thus furthering the understanding of the effects of changes in neurotransmitters within the brain. The current literature on 1 H MRS studies in MDD is small with a large diversity of MRS methods applied, brain regions studied, and metabolite changes found. Nevertheless, there is strong evidence that changes in neurometabolite concentrations in MDD occur within brain regions, which are involved in the processing and communication of emotions that can be monitored by 1H MRS. This review summarizes the literature about biochemical changes quantified via 1H MRS in MDD patients in brain regions that play an important role for the communication and processing of emotions. Keywords: proton magnetic resonance spectroscopy (1H MRS); major depressive disorder (MDD); emotions the neurobiology of depression have been made, such as significant breakthroughs in genomics, imaging, and the identification of key neural systems involved in cognition, emotion, and behavior. Many anatomical studies involving imaging of the amygdala, the hippocampus, and the prefrontal cortex give evidence for subtle morphometrical changes in MDD. Several brain neurotransmitter systems, such as glutamate, g-aminobutyric acid (GABA), serotonin, norepinephrine, and dopamine have been implicated in depression
Introduction Today, major depressive disorder (MDD) is viewed as a malfunction of particular circuits that connect the limbic system with the prefrontal cortex, the brain stem, and hypothalamus, which control basic functions such as sleep, appetite, and libido. Major advances in our knowledge about Corresponding author. Tel.: +49-621-1703-2971; Fax: +49621-1703-3005; E-mail:
[email protected] DOI: 10.1016/S0079-6123(06)56027-3
481
482
and mania. These transmitter systems — as well as other neurochemical systems such as membrane-bound signal transduction systems and intracellular signaling systems that modulate gene transcription and protein synthesis — play an important role in the etiology of depression. Several lines of evidence suggest that central cortical inhibitory mechanisms, especially those associated with GABA neurotransmission, may play a role in the pathophysiology of major depression. Emotional and behavioral biases and their neuronal correlates An important aspect of affective disorders is the disturbance of emotional processing itself. An element of the symptoms connected with depression is the prolonged involuntary processing of emotional information in form of elaboration (Mathews et al., 1996) or rumination (Nolen-Hoeksema, 2000) on negative topics, and higher sensitivity to negative events (Nandrino et al., 2004). These processing deficits form the focus of cognitive theories of depression (Willner, 1984; Haaga and Beck, 1995; Beck, 2005). They certainly have an influence on the behavior of the patients and probably on the course of the depressive episodes. The facilitation of processing negative information may reinforce depressed mood and contribute to the maintenance of the disorder (Bradley et al., 1996; Watkins et al., 1996; Bradley et al., 1997; Watkins et al., 2000). This volume focuses on the communication of emotions. The results of several studies associated depressed mood with specific abnormalities in the communication of emotions, e.g., the identification of facial expression; negative cognition regarding the self; and dysfunctional appraisal of social events and situations. These abnormalities may lead to impaired interpersonal functioning (Surguladze et al., 2004). Behavior Several studies dealing with the influence of emotions on cognitive processing consider that
there is a mood congruent processing bias in MDD patients: positive or ambiguous events tend to be received as negative (Beck et al., 1979; Teasdale and Russell, 1983; Segal et al., 1996). In memory tests, MDD patients perform better with negative events or stimuli (Lloyd and Lishman, 1975; Clark and Teasdale, 1982). Moreover, MDD patients do not seem to perceive neutral faces as unambiguous signals of being emotionally neutral. This is suggested by the fact that they recognized neutral faces less accurately than emotional faces and needed more time to process them (Leppanen et al., 2004). Other studies suggested that clinical depression may affect the processing of emotional information at the affect discrimination level. Patients are more likely to misinterpret the emotional valence of facial expression (Gur et al., 1992; Persad and Polivy, 1993). Moreover, the ability to discriminate neutral–sad faces is impaired, they misinterpret neutral faces as sad and happy faces as neutral (Surguladze et al., 2005). In situations where verbal stimulus material is used, medicated and unmedicated depressed patients show mood congruent biases in affective go/no-go tasks (Murphy et al., 1999; Erickson et al., 2005). While processing happy words, depressed patients are characterized by a prolonged reaction time and an increasing error rate, whereas healthy controls make more errors for sad words. Similar results were found within a lexical decision task (Weisbrod et al., 1999) where patients reacted faster to negative words than positive or neutral words.
Functional neuroimaging The results of resting state studies, mainly positron emission tomography (PET) and single photon emission computed tomography (SPECT) can be divided into two findings, i.e., either hypo- or hyperactivation was found. Hypoarousal — diminished metabolism or reduced regional cerebral blood flow — was observed in frontal brain regions, e.g., the anterior cingulate gyrus (ACG), the dorsolateral prefrontal cortex (DLPFC), the orbitofrontal cortex (OFC), and the medial prefrontal cortex (Bench et al., 1992; Dolan et al., 1994; Bremner et al., 1997; Vollmert et al., 2004).
483
Hyperarousal has been found in brain regions that are part of the limbic system with the amygdala–hippocampal formation and the basal ganglia (Ho et al., 1996; Mayberg et al., 1999). The resting state activity of the amygdala–hippocampal formation was not only found to be higher in depressed patients (Ho et al., 1996); moreover, a correlation with the amount of negative affect has been found (Abercrombie et al., 1998), and a reduction following successful treatment (Mayberg et al., 1999, 2000). Functional imaging studies, investigating selfinduced transient sadness in healthy controls (Mayberg et al., 1999; Liotti et al., 2000), detected activation of parts of the limbic system (subgenual ACC, dorsal insula) and a simultaneous reduced activation of cortical regions (parietal cortex, DLPFC). The diminished activation of these cortical regions, responsible for the regulation of sustained attention and vigilance, provides an explanation for the cognitive deficits of depressed patients. In summary, both transient sadness and depressive episodes lead to activation and deactivation of very similar parts of the limbic system and the attentional system. The processing of emotional stimuli, particularly communication relevant stimuli as faces and words, has been well studied in healthy controls. The damage of the amygdala leads to the inability to identify fear in facial expression (Adolphs et al., 1994). Amygdala activation has been found during the processing of happy, sad, fearful, and angry faces in healthy controls (Breiter et al., 1996; Morris et al., 1996; Yang et al., 2002). Processing of the emotional expression of faces by depressed patients was investigated with implicit tasks where the subjects do not directly process the emotional valence but the gender of the faces. Studies with MDD patients focused on the difference between sad, happy, and neutral faces. Emotions like anger, anxiety, or disgust are poorly investigated. Hyperactivation of the amygdala was found in several studies (Sheline et al., 2001; Fu et al., 2004; Surguladze et al., 2005). This hyperactivation is reduced after treatment (Sheline et al., 2001; Fu et al., 2004). Other regions that are known to show hyperactivity in depressed patients during resting state or mood induction did also respond abnormally to emotional faces.
Increased activation in patients has been seen especially in the insula (Fu et al., 2004; Gotlib et al., 2005), the hippocampal and parahippocampal gyrus (Fu et al., 2004), the ACG (Fu et al., 2004; Gotlib et al., 2005), and the inferior frontal and inferior parietal gyrus (Gotlib et al., 2005; Surguladze et al., 2005). Activation of the amygdala was also found in some studies of word/language processing of different emotional valence (Hamann and Mao, 2002; Maddock et al., 2003; Herbert et al., 2005) in healthy controls. Furthermore, activation differences for emotional compared to neutral words were found in putamen (Hamann and Mao, 2002), medial prefrontal cortex, insula, superior temporal and inferior parietal cortex (Fossati et al., 2004), anterior and posterior cingulate gyrus, inferior frontal, and orbitofrontal regions (Hamann and Mao, 2002; Maddock et al., 2003; Herbert et al., 2005). Depressed patients show an increased activation, especially to sad words in the lateral OFC, the anterior temporal cortex (Elliott et al., 2002), and the left inferior parietal gyrus (Canli et al., 2004). Increased activation of the amygdala was not directly found, but the activation of the amygdala was prolonged up to 25 s after stimulus presentation (Siegle et al., 2002), and this activation correlated with self-reported rumination of the depressed patients. Prolonged activation after negative words was also observed in the frontal, the inferior parietal cortex, and the posterior cingulate gyrus. It can be summarized that both the limbic and frontal brain regions were found to activate abnormally in MDD patients when emotional stimuli had to be processed. A summary of neuroimaging findings and key brain regions of emotional processing biases in MDD is given in Table 1.
Brain structural alterations in MDD Early empirical studies found a correlation between specific brain injury and the prevalence of MDD. In 1985 Robinson was first to conclude that physical impairment is not the major determinant of emotional response to brain injury, but damage within specific brain regions may be playing a role
484 Table 1. Neuroimaging findings of emotional processing biases in major depressive disorder Authors
Sample
Method
Main results and key regions
Mayberg et al. (1999)
8P
PET
Abercrombie et al. (1998)
27 P, 24 C
PET
Ho et al. (1996)
10 P, 12 C
PET
Videbech et al. (2002)
42 P, 47 C
PET
After remission: mright DPFC, IP, dorsal ACG, PC, pons, left PFC k subgenual ACG, ventral, middle, posterior insula, HC, hypothalamus Metabolism AMY correlation with negative affect P m PC, AMY, HC, OC, TC, P k OFC, ACG, BG P m HC, ACG, cerebellum, BG, rCBF in HC correlation with HAMD
Pardo et al. (1993) George et al. (1995) Mayberg et al. (1999)
11 P, 11 C 8C
PET PET PET
Liotti et al. (2002)
17 P ( 10 RP), 8 C
PET
Thomas et al. (2001) Sheline et al. (2001)
5 P, 5 C 11 P, 11 C
fMRI fMRI, ROI
Fu et al. (2004)
19 P, 19 C
fMRI
Gotlib et al. (2005)
18 P, 18 C
fMRI
Surguladze et al. (2005)
16 P, 14 C
fMRI
Canli et al. (2005)
17 P
Resting state/no stimulation
Mood induction Sadness: m infFG, OFC Sadness: m right ACG bilateral infFG Transient sadness: m subgenual ACG, insula, CV, right premotC, k right DPFC, IP, dorsal ACG, PC, right TC, bilateral infFG Transient sadness: P&C k PAR, PC, posterior infTG, m insula, CV, motC/premotC P k medial OFC, anterior TH; RP k pregenual ACG, C k, right PFC m subgenual ACG, RP more similar to acute P than to C
Stimulation with faces Fearful faces: P children k AMY Masked emotional faces: pretreatment P m left AMY (sad faces mm) posttreatment: P Ø, HØ Gender decision task, sad faces: pre treatment: P m left AMY&HC, PHG, HTH, VSTR (PUT, GP), insula, CAU, TH, dorsal CG, IP Posttreatment: P k left AMY, VSTR (PUT, GP), CAU, TH, dorsal CG, ACG, IP, right VSTR, TH, IP Gender decision task, happy faces: P m left subgenual ACG, left midFG, right supFG. C m right infTG, left insula Sad faces: P m l infFG & subgenual ACG, C m right midTG, right infFG. AMY Ø Gender decision task; increasing happiness: C m & P k bilateral fusiform G, right PUT Increasing sadness: P m C k right fusiform G, left PHG&AMY, left PUT. P: negative correlation BDI & BOLD right fusiform G Emotional faces: AMY correlation with symptom improvement (high AMY BOLD — low BDI at T2)
Verbal communication/verbal stimuli Nandrino et al. (2004)
26 P, 26 C
EEG
Shestyuk et al. (2005)
15 P, 16 C
EEG
First episode P: positive stimuli k P300; recurrent P: negative stimuli m P300 P: positive stimuli k brain activity
485 Table 1 (continued ) Authors
Sample
Method
Main results and key regions
Elliott et al. (2002)
10 P, 11 C
fMRI
Canli et al. (2004)
15 P, 15 C
fMRI
Siegle et al. (2002)
7 P, 11 C
fMRI
George et al. (1997)
11 P (7 MDD), 11 C
PET
Emotional go/nogo task: relevant emotional targets: P m right lateral PFC, C m infFG, right ventral CG, right Pulvinar, midTG, preCG, postCG. Happy targets: C m rostral right ACG, right medPFC, right anterior TC, midTG, bilateral medFG. irrelevant emotional distractors: P m bilateral lateral OFC, anterior TC , sad distractors: P m lateral right OFC, bilateral anterior TC Lexical decision task: sad words P m left IP, C m supTG, cerebellum, happy words; P Ø, C m AMY, infFG, supTG Valence identification task, negative words P Ø bilateral AMY (25s), midFG, supFG, IP, PCG P negative correlation AMY & DLPFC, P AMY correlated with self-reported rumination MD P (MDD+BP) sad vs. standard stroop: mvisual cortex, cerebellum, H Ø, sad stroop: H vs. P Ø
Note: - ¼ sustained (prolonged) BOLD response; m ¼ significant increased BOLD response; k ¼ significant decreased BOLD response; Ø ¼ no significant differences; ACG ¼ anterior cingulate gyrus; AMY ¼ amygdala; BG ¼ basal ganglia; C ¼ healthy controls; CAU ¼ caudate; CG ¼ cingulate gyrus; CV ¼ cerebellar vermis; DLPFC ¼ dorsolateral prefrontal cortex; DPFC ¼ dorsal prefrontal cortex; FC ¼ frontal cortex; GP ¼ globus pallidus; GR ¼ gyrus rectus (medial OFC); HAMD ¼ Hamilton scale of depression; HC ¼ hippocampus; HTH ¼ hypothalamus; infFG ¼ inferior frontal gyrus; IP ¼ inferior parietal cortex; MD ¼ mood disorder; MDD ¼ major depressive disorder; MedPFC ¼ medial prefrontal cortex; MFG ¼ middle frontal gyrus; midTG ¼ middle temporal gyrus; motC ¼ motor cortex; OC ¼ occipital cortex; OFC ¼ orbitofrontal cortex; P ¼ major depressive disorder patients; PAR ¼ parietal cortex; PC ¼ posterior cingulate; PFC ¼ prefrontal cortex; PHG ¼ parahippocampal gyrus; preCG ¼ precentral gyrus; premotC ¼ premotor cortex; PUT ¼ putamen; rCBF ¼ regional cerebal blood flow; ROI ¼ region of interest analysis; RP ¼ remitted patients; supFG ¼ superior frontal gyrus; supTG ¼ superior temporal gyrus; TC ¼ temporal cortex; TH ¼ thalamus; VSTR ¼ ventral striatum.
in the type of emotional response (Robinson and Lipsey, 1985). Another fact that encouraged the search for persistent morphometric impairments is that a substantial number of MDD patients suffer from sustained neuropsychological deficits (O’Brien et al., 2004). Current empirical evidence points toward traceable alterations of brain structure on several levels, ranging from gross macroscopic lesions to subtle volumetric alterations, cellular pathology, and biochemical correlates of neurogenesis disturbance.
White matter lesions (WML) On the macroscopic level, the patchy degeneration of subcortical brain parenchyma manifests in the so-called WML on MRI has been identified as a frequent concomitant of mood disturbances.
Although observed frequently in the aging brain (de Leeuw et al., 2001), the number and size of WML are also associated with the development of neuropsychiatric impairments, especially depression (Steffens et al., 1999). While the precise pathogenesis of WML remains to be elucidated, vascular risk factors like hypertensia, diabetes, and hypercholesterinemia clearly promote the development of gliosis and demyelination due to chronic focal ischemia (Taylor et al., 2003a). On the neural network level, the lesions are thought to disrupt neural pathways involved in regular affective and cognitive information processing, especially within the prefrontal lobe. According to an early study by Coffey et al. (1989), only 14% of aged healthy subjects but 55% of patients with late-life depression exhibit large and confluent WML (Coffey et al., 1989). The severity of vascular risk factors and associated
486
WML is of clinical significance, as they predict the development of delusional states and persistent cognitive deficits in geriatric depression (O’Brien et al., 1997). A recent longitudinal study by Taylor and co-workers in 2003 evidenced a poorer therapeutic outcome for MDD patients with a pronounced progression of WML (Taylor et al., 2003b). The development and prognosis of MDD is thus crucially influenced by the focal destruction of brain parenchyma; however, the clear age-dependence and the lack of a correlation with other established brain volumetric alterations suggest an independent vascular risk factor for depressive states in elderly MDD patients (Janssen et al., 2004).
Subtle morphometric changes In contrast to classical neurological diseases, the evidenced brain alterations in MDD — if any — are very subtle, i.e., the pathophysiological information arises primarily from group averaging, and not from the analysis of an individual data set (e.g., as in the case of WML). All morphometric MRI approaches are based on the acquisition of precise anatomical images of the brain using highresolution 3D sequences (e.g., MPRAGE, FLASH 3D). Global segmentation algorithms provide only general information about tissue volumes without regional specificity (i.e., total gray matter, white matter, and cerebrospinal fluid). Two recent studies employed this approach; one reported a negative association of illness duration and cerebral gray matter volumes in female MDD (Lampe et al., 2003), while the other indicated a poorer clinical outcome for patients with enlarged cerebrospinal fluid spaces (Cardoner et al., 2003). Voxel-based morphometry (VBM), in contrast, is a fully automated technique that allows the unbiased examination of the whole brain on a voxelby-voxel basis. Interestingly, only one VBM study has been published in MDD research so far (Taki et al., 2005), reporting a bilateral gray matter reduction of the medial prefrontal lobe in elderly males with subthreshold depression. Our own preliminary VBM analysis of 10 MDD patients compared to 10 matched healthy controls shows a
reduction of the gray matter in both amygdalae for the MDD-patients. The vast majority of morphometric studies in MDD examined preselected regions of interest (ROI) that had been characterized as ‘‘dysfunctional’’ in previous functional neuroimaging experiments. As outlined above, especially the emotional and cognitive processing modules of the medial temporal and prefrontal lobe have been identified as functional key regions. About half of these studies focused on the volumetric analysis of the hippocampus (HC) formation (see Table 2), which is a reliable predictor of memory dysfunction in MDD (O’Brien et al., 2004; Hickie et al., 2005). Significant reductions in HC volumes were repeatedly reported for both early- and late-onset MDD (Bremner et al., 2000; Frodl et al., 2004; Janssen et al., 2004; Hickie et al., 2005). The precise impact of these findings is still under debate, e.g., the question being whether the whole HC formation is affected or only parts of it (Neumeister et al., 2005), or whether these alterations are best described as volume reductions or shape deformations (Posener et al., 2003). It has been shown that stress, whether environmental or social, actually changes the shape, size, and number of neurons in the hippocampus and it is hypothesized that the manifold mechanisms of antidepressive action follow a final common pathway, i.e., a final induction of specific alterations of neuroadaptation in specific brain regions. In such a modern disease, model behavioral depressive changes relate in part to alterations in hippocampal function. These are thought to be induced through activation of cyclic AMP response element-binding protein (CREB) and neurotrophins such as brain derived neurotrophic factor (BDNF) (McEwen, 2000; Duman, 2002a). Other ROI-based analyses focused on the amygdala and the ventral prefrontal cortex. In the human brain, both areas are substantially involved in the emotional evaluation of sensory stimuli and the generation of appropriate behavioral and endocrinological responses. In the case of the amygdala, a substantial number of studies reported significant volume decreases in MDD (Frodl et al., 2004; Hastings et al., 2004; Rosso et al., 2005), a
487 Table 2. Brain morphometric findings in major depressive disorder Authors
Sample
Method
Main result
Key region
Taki et al. (2005)
34 P, 109 C
OVBM
MedPFC
Coryell et al. (2005)
10 P, 10 C
ROI
Lacerda et al. (2004)
31 P, 34 C
ROI
Ballmaier et al. (2004)
24 P, 19 C
SEG, ROI
Almeida et al. (2003)
51 P, 37C
ROI
Steffens et al. (2003)
30 P, 40C
ROI
Taylor et al. (2003c)
41 P, 40 C
ROI
Steingard et al. (2002)
19 P, 38 C
SEG, ROI
Bremner et al. (2002)
15 P, 20 C
ROI
Nolan et al. (2002)
22 P, 22 C
ROI
Botteron et al. (2002)
48 P, 17 C
ROI
Aged male P with subthreshold depression: MedPFC GMk, precentral gyrus GMk P with psychotic features: left posterior subgenual ACGk, subgroup with ACGm at 4-year follow-up P: left medial and right lateral OFC GMk; gender effect: medial OFC GMk in male P Aged P: ACG GMk WMk CSFm, GR GMk WMk CSFm, OFC GMk CSFk, DLPFC+, precentral gyrus+ Right FCk in late-onset P compared to early-onset P and C; lacking correlation with cognitive performance Aged P: left OFC volume predicts Benton Visual Retention Test performance Aged P: OFCk predicts impairment in activities of daily living Adolescent P: whole brain volumek, FC WMk, FC GMm Remitted P: OFCk, subgenual ACG+, dorsal ACG+ Left PFCm in pediatric P with nonfamiliar MDD compared to familiar MDD and C; familiar MDD: PFC+ compared to C Female P with early-onset: left subgenual ACGk
Neumeister et al. (2005) Taylor et al. (2005)
31 P, 57 C 135 P, 83 C
ROI ROI
Hickie et al. (2005)
66 P, 20 C
ROI
Hastings et al. (2004)
18 P, 18 C
ROI
Rosso et al. (2005) Janssen et al. (2004)
20 P, 24 C 28 P, 41 C
ROI ROI
O’Brien et al. (2004)
61 P, 40 C
ROI
Frodl et al. (2004)
30 P, 30 C
ROI
Posener et al. (2003)
27 P, 42 C
ROI
MacMillan et al. (2003)
23 P, 23 C
ROI
Prefrontal cortex SACG
OFC ACG, OFC, GR
Total FC
OFC OFC Total FC OFC PFC
SACG
Medial temporal lobe Anterior HCk, posterior HC+ HCk in late-onset P homozygous for the L allele of the serotonin transporter gene 5HTTLPR HCk in early- and late-onset P, association with deficient visual and verbal memory performance Female P: AMYk compared to female C; male P: left subgenual ACGk compared to female P; HC+; OFC+ Pediatric P: AMYk; HC+ Female early-onset P: right HCk, no association with subcortical white matter lesions, PHG+, OFC+ Aged P: hypercortisolemia, right HCk, association with persisting mild cognitive impairment Longitudinal study: nonremitted patients at 1-year follow-up exhibited AMYk HCk at baseline HC+, but highly significant differences in HC shape, especially in the subiculum Drug-naive pediatric P: significantly increased AMY/HC volume ratio compared to C, association with anxiety severity
HC HC
HC
AMY
AMY HC
HC
HC, AMY
HC HC
488
Table 2 (continued ) Authors
Sample
Method
Main result
Key region
Vythilingam et al. (2002)
32 P, 14 C
ROI
HC
Bremner et al. (2000)
16 P, 16 C
ROI
Influence of environmental factors: left HCk in female P with childhood trauma compared to nontraumatized P and C. P: Left HCk, AMY+, CAU+, FC+, TC+
Lacerda et al. (2003)
25 P, 48 C
ROI
GP
Naismith et al. (2002)
47 P, 20 C
ROI
CAU+, PUT+, GP+; significant GP asymmetry decrease; GP/PUT volume: association with clinical parameters Psychomotor slowing in aged P is predicted by CAUk and methylenetetrahydrofolate reductase genotype
Lekwauwa et al. (2005)
25 P
ROI
Lavretsky et al. (2005)
41 P, 41 C
ROI
Vythilingam et al. (2004)
38 P, 33 C
ROI
Sheline et al. (2003)
38 P
ROI
Hsieh et al. (2002)
60 P
ROI
Vakili et al. (2000)
38 P, 20 C
ROI
HC, AMY
Basal ganglia
CAU
Treatment effects Pre-post ECT: smaller right HC volumes predict better treatment outcome OFC GMm in P treated with antidepressants compared to drug-naive P; OFC GMk in both P groups compared to controls Pre-post SSRI in medication-free P: HC+ compared to C at baseline, HC+ after successful SSRI treatment Female P: length of untreated MDD episodes predicts smaller HC GM volume (antidepressants - neuroprotective) Pre-post antidepressants: smaller total/ right HC volumes predict poorer treatment outcome Female fluoxetine responder: right HCm compared to nonresponder. Whole P group: HC+ compared to C
HC OFC
HC
HC
HC
HC
Notes: P ¼ major depressive disorder patients; C ¼ healthy controls; ROI ¼ region of interest analysis (manual tracing); SEG ¼ global segmentation in gray matter (GM) — white matter (WM) — cerebrospinal fluid (CSF); OVBM ¼ optimized voxel-based morphometry; m ¼ significant volumetric increase; k ¼ significant volumetric decrease; + ¼ no significant volumetric differences; MDD ¼ major depressive disorder; FC ¼ frontal cortex; PFC ¼ prefrontal cortex; TC ¼ temporal cortex; DLPFC ¼ dorsolateral prefrontal cortex; AGC ¼ anterior cingulate gyrus; sAGC ¼ subgenual anterior cingulate gyrus; MedPFC ¼ medial prefrontal cortex; OFC ¼ orbitofrontal cortex; GR ¼ gyrus rectus (medial OFC); AMY ¼ amygdala; HC ¼ hippocampus; PHG ¼ parahippocampal gyrus; PUT ¼ putamen; CAU ¼ caudate; GP ¼ globus pallidus; SSRI ¼ selective serotonin reuptake inhibitor.
finding that could be replicated in our own VBM morphometric study (see Fig. 1). Within the prefrontal cortex, most volumetric analyses targeted the key regions of cortical mood regulation, the OFC, and subgenual anterior cingulate cortex (sACG). For both regions, a significant gray matter decrease was reported in early-onset and aged
MDD patients (Ballmaier et al., 2004; Lacerda et al., 2004; Coryell et al., 2005). On the cognitivebehavioral level, the structural integrity of the OFC proved to be of particular importance, as it predicted deficits in visuospatial memory (Steffens et al., 2003) and impairments in activities of daily living (Taylor et al., 2003c) associated with MDD.
489
Fig. 1. Results of the voxel-based morphometry analysis. Regions with significant more gray matter volume in 10 healthy subjects compared to 10 depressed patients.
In summary, especially fine structural anomalies of HC, amygdala, and OFC are regarded as stable correlates of the disorder, as they were also evidenced in unmedicated patients (Bremner et al., 2002; Lacerda et al., 2004; Neumeister et al., 2005), populations at risk (Omura et al., 2005; Taki et al., 2005) and pediatric samples (Steingard et al., 2002; Rosso et al., 2005). In line with established diathesis-stress models, the formation of morphometric alterations seems to be facilitated by both genetic vulnerability factors (e.g., familiar MDD (Nolan et al., 2002) and 5-HTTLPR serotonin transporter genotype (Taylor et al., 2005)) and environmental stressors (e.g., childhood trauma, Vythilingam et al., 2002). On the microscopic level, the findings are in good accordance with the known cellular and neurochemical alterations associated with MDD, i.e., anomalies in the density and size of neuronal and glial cells (Cotter et al., 2002; Hamidi et al., 2004), neurotrophin-induced disturbances of cellular plasticity (Duman, 2002b), and neurogenesis (Kempermann and Kronenberg, 2003). In conclusion, consistent functional and structural imaging findings in MDD reported changes predominantly in the emotionally relevant networks, including the prefrontal cortex and the limbic system with the hippocampus and the amygdala.
Magnetic resonance spectroscopy (MRS) and MDD There appears to be a strong relationship between neurotransmitter levels in the brain and clinical
depression. Antidepressant medications work for many patients, but there is no absolute certainty of the actual relationship between neurotransmitters and depression. The effects of neurotransmitters are extremely difficult to study in vivo. Neurotransmitters are present in very small quantities, they are only available in certain locations within the brain, and they disappear very quickly once they are used. Because they are removed so quickly, they cannot be measured directly. What can be measured are the so-called metabolites — the remaining substances in the brain after neurotransmitters have been used. By measuring these metabolites, understanding of the effects of changes in neurotransmitters in the brain can be gained. Proton magnetic resonance spectroscopy (1H MRS) allows quantitative and noninvasive access to a number of metabolites in different brain regions in vivo. MRS is a noninvasive technique that exhibits relatively high-spatial resolution and requires neither radioactive tracers nor ionizing radiation. Since metabolite concentrations are more than 103 times smaller than the tissue water concentration, spatial and temporal resolution is proportionally lower for MRS compared to MRI applications. There are hundreds of metabolites produced by the human brain, but only some can reliably be detected and quantified using 1H MRS. The significant neurometabolites that have been measured in patients with major depression are as follows: glutamate, glutamine, and GABA, Nacetylaspartate (NAA), choline-containing compounds (Cho), creatine and phosphocreatine (tCr),
490
and myo-inositol (mI). In the past, most investigators have expressed MRS results in terms of peak ratios, resulting in the ambiguity of whether one metabolite is increased or another decreased. Although quantitative MRS would make intersubject comparisons more rigorous, no generally accepted standard quantitation method for in vivo data exists. This is primarily due to the difficulty of computing absolute metabolite concentrations from MR signal intensities, including correction of coil loading, B1-field inhomogeneity, and others. But by themselves these corrections do not provide a measurement of concentration (e.g., in mmol/L) and thus cannot be used for direct comparisons of data taken on different instruments, or at other laboratories. Therefore, most of the studies, which report metabolite values avoiding ratios, report ‘‘arbitrary units’’ (a.u.) or ‘‘institutional units’’ (i.u.) and make use of the term ‘‘semiquantitative’’ measure. Several studies have shown that metabolite concentrations vary in different brain regions. The concentrations are different in gray matter (GM)
and white matter (WM) and have only negligible concentrations in CSF (Wang and Zimmerman, 1998; Hetherington et al., 2001) and also vary with the brain region. Cortical gray matter ranges from about 3 to 6 mm in thickness and the orientation of this thickness varies (Noworolski et al., 1999). With typical voxel sizes of 1–4 cm3 for 1H MRS, most spectroscopic voxels consist a mixture of different tissues and CSF, which makes the comparison of measured metabolite concentrations between subjects difficult. Therefore, regardless of the applied method for MRS data acquisition and evaluation, the tissue composition of the voxel from which the spectrum is obtained plays an important role (Schuff et al., 1997), especially for the detection of discrete abnormalities in MR-detectable metabolites in MDD patients. Yet, many previous MRS studies did not take this influence into account. Within the following, an introduction to the MR observable metabolite resonances will be given. Fig. 2 illustrates key brain regions for emotional processing and MDD that have been
Fig. 2. Key brain regions for emotional processing and MDD that have been investigated with 1H MR spectroscopy.
491
investigated with 1H MR spectroscopy. The results of MRS studies in several key regions for emotional processing in patients suffering from major depressive episodes will be presented and discussed and a summary is given in Table 3.
15% of the brain energy consumption is GABA related (Shulman et al., 2004). The in vivo detection of GABA via 1H MRS requires special editing methods that have been pioneered by Rothman and colleagues from Yale (Rothman et al., 1993).
MR-detectable brain metabolites that have been found altered in MDD
N-acetylaspartate (NAA) NAA, which is solely found in the brain, is present in both gray and white matter and was identified as a neuronal marker in early histological studies, and also as a surrogate marker of neuronal and axonal functioning and integrity. The NAA resonance is typically the tallest peak in the normal brain spectrum. Since its decrease or disappearance can be due to either cell death or axonal injury, it is considered to be a measure of neural tissue viability and/or function.
Glutamate, glutamine, and GABA Glutamate is the major excitatory, and GABA the major inhibitory neurotransmitter in the human brain. Both glutamate and GABA are linked to metabolism through a neurotransmitter cycle between neurons and glia. In this cycle, neurotransmitter molecules released by the neurons are taken up by transporters in surrounding glial cells. In the glia they are converted to glutamine, which is released to the neuron where it is used for the resynthesis of the neurotransmitter. Glutamate is present at an even higher concentration as N-acetyl aspartate in the brain, though in practice glutamate, glutamine, and GABA signals are barely detectable using clinical MR scanners. The MR sensitivity for their detection is poor due to their signal intensity being spread over a large number of closely spaced multiplet resonances and from signal cancellation of overlapping resonances due to phase differences at longer echo times (TE). In order to detect these resonances at a field strength of 1.5 T, pulse sequences with short TE under 30 m/s have to be employed. As a rule spectra acquired at short TE have an increased signalto-noise ratio compared to longer TE spectra and may allow for the quantification of a variety of overlapping signals. At a field strength of 1.5 T, attempts have been made to quantify the overlapping signals of glutamate, glutamine, and GABA at the 2.3 ppm position, labeled as Glx. The origin of the glutamate signal (e.g., intra vs. extracellular) in 1H MRS brain spectra cannot be subclassified. All mobile molecules will contribute equally to the observed glutamate resonance. An example for a brain 1H MR spectrum acquired at 1.5 T with the (relative) short TE of 30 ms is shown in Fig. 3. GABA is synthesized primarily from the glutamate precursor in GABAergic neurons. About
Choline-containing compounds (Cho) The MR detectable Cho signal represents the trimethyl ammonium resonance of a variety of Cho. The MRS resonance is composed of acetylcholine (ACh), phosphocholine (PC), glycerophosphocholine (GPC), and free choline. Most of the signal arises from PC and GPC, free choline is less than 5% and the contribution from ACh is negligible (Boulanger et al., 2000). An increased Cho signal most likely reflects an increase in membrane turnover. Phosphatidylcholine, the major choline-containing metabolite of the normal brain, is MR-invisible in myelin, cell membranes, and other brain lipids under normal circumstances. However, under certain conditions, visible choline may be released from this pool (Danielsen and Ross, 1999). In proton MR spectroscopy (nonmembrane bound), PC and GPC cannot be distinguished — both are detected within one choline peak. Creatine and phosphocreatine (tCr) In normal brain metabolism, phosphocreatine supplies phosphate to adenosine diphosphate (ADP), resulting in the production of an adenosine triphosphate (ATP) molecule and the release of creatine. Thus, total creatine (creatine plus phosphocreatine, tCr) should be a reliable marker of brain
492
Table 3. Proton MR spectroscopic findings in major depressive disorder Authors (by brain region)
Sample
Method
Main result: abnormal metabolite levels
Key region
Frey et al. (1998)
22 P, 22 C
Decreased m-Ino/tCr
Right frontal lobe
Steingard et al. (2000)
17 P, 22 C
Farchione et al. (2002)
11 aP, 11C
Kumar et al. (2002)
20 P, 18 C
Michael et al. (2003a)
12 P, 12 C
Grachev et al. (2003)
10 P, 10 C
Gruber et al. (2003)
17 P, 17 C
Caetano et al. (2005)
14 aP, 22 C
Brambilla et al. (2005)
9 cP, 10 ncP, 19 C
Coupland et al. (2005)
13 P, 13 C
SVS TE ¼ 55 ms, TR ¼ 3.5 s SVS TE ¼ 30 ms, TR ¼ 2.5 s 2D MRSI, TE ¼ 272 ms, TR ¼ 2.3 s SVS TE ¼ 30 ms, TR ¼ 3 s SVS TE ¼ 20 ms, TR ¼ 2.5 s SVS TE ¼ 30 ms, TR ¼ 1.5 s SVS TE ¼ 20 ms, TR ¼ 6 s SVS TE ¼ 30 ms, TR ¼ 6 s SVS TE ¼ 20 ms, TR ¼ 3 s SVS TE ¼ 28 ms, TR ¼ 3 s
SVS
Increase of NAA/tCr posttreatment
Left medial frontal cortex
2D MRSI, TE ¼ 272 ms, TR ¼ 2.2 s 2D MRSI, TE ¼ 20 ms, TR ¼ 1.8 s 2D MRSI, TE ¼ 135 ms, TR ¼ 1.5 s 2D MRSI, TE ¼ 135 ms, TR ¼ 1.5 s SVS TE ¼ 20 ms, TR ¼ 2.5 s
Decreased NAA/tCr
Hippocampus
Decreased Cho
Hippocampus
Decreased Cho/tCR
Left amygdala
Decreased Cho
Hippocampus
Decreased Glx
Hippocampus/ amygdala
Prefrontal cortex Increased Cho Increased Cho/NAA Increased Cho/tCr, increased m-Ino/tCr Decreased Glx
Left anterior medial frontal lobe Left DLPFC Left DLPFC
Decreased NAA/tCr
Right DLPFC
Increased Cr
Left DLPFC
Decreased Cho, increased mIno Decreased NAA/Cho and NAA/tCr in cP Decreased m-Ino/tCr
Left DLPFC left DLPFC Pregenual anterior cingulate/MedPFC
Prefrontal cortex — therapy response Gonul et al. (2006)
20 P, 18 C
Hippocampus/temporal lobe Blasi et al. (2004)
17 P, 17 C
Ende et al. (2000)
17 P, 24 C
Kusumakar et al. (2001) Ende et al. (in press) (WIN) Michael et al. (2003b)
11 aP, 11 C 8 P, 8 C 13 P, 28 C
Basal ganglia Charles et al. (1994)
7 P, 7 C
SVS
Increased Cho/tCr
Renshaw et al. (1997)
41 P, 22 C
SVS
Decreased Cho/tCr
Vythilingam et al. (2003) Vythilingam et al. (2003) Ende et al. (in press)
8 P, 12 C
2D MRSI, TE ¼ 20 ms, TR ¼ 1.5 s 2D MRSI, TE ¼ 20 ms, TR ¼ 1.5 s 2D MRSI, TE ¼ 135 ms, TR ¼ 1.5 s
Decreased NAA/tCr
Thalamus, putamen, and white matter Caudate and putamen Caudate
Increased Cho/tCr
Putamen
Increased Cho
Putamen
17 P, 17 C 8 P, 6 C
493 Table 3 (continued ) Authors (by brain region)
Sample
Method
Main result: abnormal metabolite levels
Key region
Thalamus, putamen, and white matter Caudate and putamen
Basal ganglia — therapy response Charles et al. (1994)
7 P, 7 C
SVS
Increase of NAA/Cho, decrease of Cho/tCr
Sonawalla et al. (1999)
15 P: 8 R, 7 NR
SVS
Increase of Cho/tCr in R
Auer et al. (2000)
19 P, 18 C
Decreased Glx
Anterior cingulate
Pfleiderer et al. (2003)
17 P, 17 C
Decreased Glx
Rosenberg et al. (2005)
14 aP, 14 C
SVS TE ¼ 35 ms, TR ¼ 2 s SVS TE ¼ 20 ms, TR ¼ 2.5 s SVS short TE
Left anterior cingulate Anterior cingulate
Anterior cingulate
Decreased Glx
Notes: P ¼ major depressive disorder patients; aP ¼ adolescent major depressive disorder patients; cP ¼ chronic major depressive disorder patients; ncP ¼ nonchronic major depressive disorder patients; C ¼ healthy controls; DLPFC ¼ dorsolateral prefrontal cortex; AGC ¼ anterior cingulate gyrus; MedPFC ¼ medial prefrontal cortex; NAA ¼ N-acetylaspartate; Cho ¼ choline-containing compounds; tCr ¼ total creatine (creatine and phosphocreatine); m-Ino ¼ myo-Inositol; Glx ¼ glutamate and glutamine; SVS ¼ single voxel spectroscopy; MRSI ¼ MR spectroscopic imaging; TE ¼ echo time; TR ¼ repetition time.
studies. However, tCr represents an important buffer capacity in the energy metabolism of the cell, which cannot be considered 100% stable a priori. Recent studies show a brain activity-dependent change in tCr signal intensity (Ke et al., 2002). That tCr is not always useful, as a reference compound has already been suggested more than a decade ago (Ross and Michaelis, 1994).
Fig. 3. Example for a brain 1H MR spectrum acquired at 1.5 T with the (relative) short TE of 30 ms.
metabolism. Creatine, phosphocreatine, and their main precursor, guanidinoacetate, are synthesized in extracerebral tissues (primarily liver and kidney) and then transported to the brain. Therefore, any metabolic defect resulting in decreased production of creatine/phosphocreatine (e.g., hepatic or renal diseases) will lower the tCr peak. The tCr signal has been used for gauging in metabolite ratios like NAA/tCr in countless
Myo-inositol (mI) mI is the major nutritionally active form of inositol and is vital to many biological processes of the body, participating in a diverse range of activities. Myo-inositol is one of nine distinct isomers of inositol. It is essential for the growth of rodents, but not for most animals, including humans. In humans mI is made endogenously from glucose. The dietary intake of mI can influence the levels of circulating and bound mI in the body. The specific importance of mI is based on the fact that its lipid conjugates are directly involved into the inositol-3phosphate (IP3) second messenger pathway. The ‘‘inositol depletion hypothesis’’ explains the mechanism of action of lithium in bipolar disorders (Harwood, 2005).
494
MRS results in key regions for emotional processing and MDD Prefrontal cortex The prefrontal cortex is the anterior part of the frontal lobes of the brain. It comprises the dorsolateral, ventrolateral, orbitofrontal, and mesial prefrontal areas, cortical key regions of cognition and mood regulation. These brain regions have been implicated in planning complex cognitive behaviors, personality expression, moderating correct social behavior and regulation of emotion. The prefrontal cortex may play an important role in delayed gratification by maintaining emotions over time and organizing human behavior toward specific goals. As outlined above and summarized in Tables 1 and 2, functional as well as morphological alterations have been found in these brain regions in MDD. Thus it is not surprising that the majority of MRS studies in MDD targeted the prefrontal lobe. Although the results are not always consistent, there is evidence that the Cho, mI, and tCr signals are abnormal in MDD patients. Some of the divergent results, at first sight, are possibly explained by the methodological differences, e.g., voxel sizes, voxel tissue composition, and quantitation methods used. The observation of an increased absolute tCr signal by Gruber et al. (2003) could explain why the studies by Frey et al. (1998) and Coupland et al. (2005) find decreased mI/tCr, whereas an increased mI signal was found by Caetano et al. (2005). Additionally, the finding of an increased tCr by Gruber et al. (2003) gives further reason to be cautious with the use of tCr as an internal reference.
MRS studies have targeted the hippocampus and the amygdala, and a decreased Cho signal is the most frequently reported abnormality in MDD (Ende et al., 2000; Kusumakar et al., 2001; Ende et al., in press). Additionally, a decreased NAA/ tCr ratio (Blasi et al., 2004) and a decreased Glx signal (Michael et al., 2003b) have been found. Basal ganglia The basal ganglia comprise a number of subcortical structures that are vital for the modulation of behavioral patterns, emotional responses, and executive cognitive functions. Part of the basal ganglia is the striatum consisting of the caudate nucleus and the putamen. Resting state metabolism in the basal ganglia measured with PET has been found altered in MDD patients (Ho et al., 1996; Videbech et al., 2002). Functional magnetic resonance imaging (fMRI) studies using emotional stimulation with faces detected significantly different bold responses in the putamen and the caudate nuclei of MDD patients as compared to healthy subjects (Fu et al., 2004; Surguladze et al., 2005). The most concordant MRS finding in this brain region in MDD is an increased Cho signal or Cho/tCr, respectively. Nevertheless, one study also found a decrease in the tCr/Cho ratio (Renshaw et al., 1997). There are two MRS studies in the basal ganglia reporting metabolite changes accompanying therapy response: in the early study by Charles et al. (1994), an increase of NAA/Cho and a decrease of Cho/tCr in response to antidepressant therapy were observed, whereas Sonawalla et al. (1999) found an increase of Cho/tCr in therapy responders.
Temporal lobe, hippocampus, and amygdala
Anterior cingulate gyrus
Another target region for emotional processing is the temporal lobe. Although the amygdala plays a particularly important role for all kinds of emotional processing, the hippocampus and other parts of the temporal lobe have been found to be morphometrically abnormal and abnormally activated in MDD patients in the brain’s response to emotional stimuli (see Tables 1 and 2).
The ACG is located bilaterally in the medial wall of the frontal lobes below the cingulated sulcus. The ACG is vital to cognitive functions, such as reward anticipation, decision making, empathy, and emotion. It may be particularly important with regard to conscious subjective emotional awareness in humans. Functional and morphometrical studies reported alterations of the ACG in
495
MDD (Tables 1 and 2). Three MRS studies have targeted the ACG and have consistently found abnormally low Glx signals in MDD compared to healthy subjects (Auer et al., 2000; Pfleiderer et al., 2003; Rosenberg et al., 2005). Since the Glx signal comprises glutamine and glutamate, the interpretation of the observed changes remains difficult. Spectral editing methods for more reliable separation of these two resonances will improve with higher magnet field strengths (3 T and more).
Conclusions MRS is a noninvasive tool that offers unique insights into the pathophysiology of MDD in vivo, with the ability for longitudinal observations of therapy response or illness progression. To date the number of studies attempting to put those pieces together is still limited. Nevertheless, it is impressive that metabolic alterations in MDD patients appear in those brain regions relevant for communication and processing of emotions. There is evidence that the Cho signal, representing membrane turnover and supposable changes in synaptic plasticity, is abnormal in MDD compared to healthy subjects. Furthermore, there are concordant findings of a reduced Glx signal in the ACG in MDD. Further changes have been reported for mI with less concordant directions of concentration changes. NAA as a surrogate marker of neuronal viability and probably of synaptic plasticity has not yet been reported to change in absolute concentration in any of the investigated brain regions in MDD. A decrease of NAA/tCr has been reported in MDD patients from the DLPFC (Grachev et al., 2003; Brambilla et al., 2005), the hippocampus (Blasi et al., 2004), and the caudate nucleus (Vythilingam et al., 2003). Furthermore, an increase of this ratio following antidepressant therapy was observed by Sonawalla et al. (1999). However, it has yet to be determined whether these changes are solely due to, or even dominated by, a concentration change of NAA or rather an altered tCr signal as reported by Gruber et al. (2003) and observed to increase in the hippocampus following electroconvulsive therapy (Ende et al., 2000). A
decreased tCr signal possibly mirrors energy metabolism hypofunction. The observation of the GABA signal with MRS has been limited to occipital regions due to methodological limitations with GABA editing (Sanacora et al., 2004). Future improvements are foreseeable as the methodology further develops. MRS can also be applied to animal models. This is a great advantage since MRS is the only noninvasive tool monitoring metabolic changes in patients in vivo. Most of our knowledge about etiopathogenesis of depression comes from animal models, and MRS may be the tool of choice to bridge the gap between clinical and preclinical research in psychiatric disorders. MRS has become a complex and sophisticated neuroimaging technique that enables reliable and reproducible quantification of an increasing number of neurometabolites. Still, it has not yet reached maturity. With access to higher field strength (3 T and more), spectral editing and the observation of other nuclei than protons will be facilitated. In general, MRS will gain in spatial resolution and sensitivity with higher field strength. Although advances in MRS research are foreseeable in the near future, the quality of MRS studies will depend on the accuracy of the applied acquisition and quantitation methods, and the application in large and well-defined patient populations ideally in longitudinal prospective study designs to confirm either stable or progressive brain deficits. Last but not least, with decreasing measurement times at higher field strength it will be possible to join forces to create new study protocols and techniques for mapping noninvasively brain activation, morphometry, diffusion, and metabolism with advanced MR methods and then to correlate these findings with neuropsychological test results, psychiatric ratings, and genetic polymorphisms to further the understanding of emotional processing biases in MDD.
Acknowledgment The authors thank the Heidelberg Academy of Science for grant support, Dr. Tim Wokrina for
496
providing the brain 1H MR spectrum (Fig. 3), Sigi Walter for analyzing the MR spectra, and Matthias Ruf for providing Fig. 2.
References Abercrombie, H.C., Schaefer, S.M., Larson, C.L., Oakes, T.R., Lindgren, K.A., Holden, J.E., Perlman, S.B., Turski, P.A., Krahn, D.D., Benca, R.M. and Davidson, R.J. (1998) Metabolic rate in the right amygdala predicts negative affect in depressed patients. Neuroreport, 9: 3301–3307. Adolphs, R., Tranel, D., Damasio, H. and Damasio, A. (1994) Impaired recognition of emotion in facial expressions following bilateral damage to the human amygdala. Nature, 372: 669–672. Almeida, O.P., Burton, E.J., Ferrier, N., McKeith, I.G. and O’Brien, J.T. (2003) Depression with late onset is associated with right frontal lobe atrophy. Psychol. Med., 33: 675–681. Auer, D.P., Putz, B., Kraft, E., Lipinski, B., Schill, J. and Holsboer, F. (2000) Reduced glutamate in the anterior cingulate cortex in depression: an in vivo proton magnetic resonance spectroscopy study. Biol. Psychiatry, 47: 305–313. Ballmaier, M., Toga, A.W., Blanton, R.E., Sowell, E.R., Lavretsky, H., Peterson, J., Pham, D. and Kumar, A. (2004) Anterior cingulate, gyrus rectus, and orbitofrontal abnormalities in elderly depressed patients: an MRI-based parcellation of the prefrontal cortex. Am. J. Psychiatry, 161: 99–108. Beck, A.T. (2005) The current state of cognitive therapy: a 40year retrospective. Arch. Gen. Psychiatry, 62: 953–959. Beck, A.T., Rush, A.J. and Shaw, B.G.E. (1979) Cognitive Therapy of Depression. Guilford Press, New York. Bench, C.J., Friston, K.J., Brown, R.G., Scott, L.C., Frackowiak, R.S. and Dolan, R.J. (1992) The anatomy of melancholia–focal abnormalities of cerebral blood flow in major depression. Psychol. Med., 22: 607–615. Blasi, G., Bertolino, A., Brudaglio, F., Sciota, D., Altamura, M., Antonucci, N., Scarabino, T., Weinberger, D.R. and Nardini, M. (2004) Hippocampal neurochemical pathology in patients at first episode of affective psychosis: a proton magnetic resonance spectroscopic imaging study. Psychiatry Res., 131: 95–105. Botteron, K.N., Raichle, M.E., Drevets, W.C., Heath, A.C. and Todd, R.D. (2002) Volumetric reduction in left subgenual prefrontal cortex in early onset depression. Biol. Psychiatry, 51: 342–344. Boulanger, Y., Labelle, M. and Khiat, A. (2000) Role of phospholipase A(2) on the variations of the choline signal intensity observed by 1 H magnetic resonance spectroscopy in brain diseases. Brain Res. Brain Res. Rev., 33: 380–389. Bradley, B.P., Mogg, K. and Lee, S.C. (1997) Attentional biases for negative information in induced and naturally occurring dysphoria. Behav. Res. Ther., 35: 911–927.
Bradley, B.P., Mogg, K. and Millar, N. (1996) Implicit memory bias in clinical and non-clinical depression. Behav. Res. Ther., 34: 865–879. Brambilla, P., Stanley, J.A., Nicoletti, M.A., Sassi, R.B., Mallinger, A.G., Frank, E., Kupfer, D.J., Keshavan, M.S. and Soares, J.C. (2005) 1 H Magnetic resonance spectroscopy study of dorsolateral prefrontal cortex in unipolar mood disorder patients. Psychiatry Res., 138: 131–139. Breiter, H.C., Etcoff, N.L., Whalen, P.J., Kennedy, W.A., Rauch, S.L., Buckner, R.L., Strauss, M.M., Hyman, S.E. and Rosen, B.R. (1996) Response and habituation of the human amygdala during visual processing of facial expression. Neuron, 17: 875–887. Bremner, J.D., Innis, R.B., Salomon, R.M., Staib, L.H., Ng, C.K., Miller, H.L., Bronen, R.A., Krystal, J.H., Duncan, J., Rich, D., Price, L.H., Malison, R., Dey, H., Soufer, R. and Charney, D.S. (1997) Positron emission tomography measurement of cerebral metabolic correlates of tryptophan depletion-induced depressive relapse. Arch. Gen. Psychiatry, 54: 364–374. Bremner, J.D., Narayan, M., Anderson, E.R., Staib, L.H., Miller, H.L. and Charney, D.S. (2000) Hippocampal volume reduction in major depression. Am. J. Psychiatry, 157: 115–118. Bremner, J.D., Vythilingam, M., Vermetten, E., Nazeer, A., Adil, J., Khan, S., Staib, L.H. and Charney, D.S. (2002) Reduced volume of orbitofrontal cortex in major depression. Biol. Psychiatry, 51: 273–279. Caetano, S.C., Fonseca, M., Olvera, R.L., Nicoletti, M., Hatch, J.P., Stanley, J.A., Hunter, K., Lafer, B., Pliszka, S.R. and Soares, J.C. (2005) Proton spectroscopy study of the left dorsolateral prefrontal cortex in pediatric depressed patients. Neurosci. Lett., 384: 321–326. Canli, T., Cooney, R.E., Goldin, P., Shah, M., Sivers, H., Thomason, M.E., Whitfield-Gabrieli, S., Gabrieli, J.D. and Gotlib, I.H. (2005) Amygdala reactivity to emotional faces predicts improvement in major depression. Neuroreport, 16: 1267–1270. Canli, T., Sivers, H., Thomason, M.E., Whitfield-Gabrieli, S., Gabrieli, J.D. and Gotlib, I.H. (2004) Brain activation to emotional words in depressed vs healthy subjects. Neuroreport, 15: 2585–2588. Cardoner, N., Pujol, J., Vallejo, J., Urretavizcaya, M., Deus, J., Lopez-Sala, A., Benlloch, L. and Menchon, J.M. (2003) Enlargement of brain cerebrospinal fluid spaces as a predictor of poor clinical outcome in melancholia. J. Clin. Psychiatry, 64: 691–697. Charles, H.C., Lazeyras, F., Krishnan, K.R., Boyko, O.B., Payne, M. and Moore, D. (1994) Brain choline in depression: in vivo detection of potential pharmacodynamic effects of antidepressant therapy using hydrogen localized spectroscopy. Prog. Neuropsychopharmacol. Biol. Psychiatry, 18: 1121–1127. Clark, D.M. and Teasdale, J.D. (1982) Diurnal variation in clinical depression and accessibility of memories of positive and negative experiences. J. Abnorm. Psychol., 91: 87–95.
497 Coffey, C.E., Figiel, G.S., Djang, W.T., Saunders, W.B. and Weiner, R.D. (1989) White matter hyperintensity on magnetic resonance imaging: clinical and neuroanatomic correlates in the depressed elderly. J. Neuropsychiatry Clin. Neurosci., 1: 135–144. Coryell, W., Nopoulos, P., Drevets, W., Wilson, T. and Andreasen, N.C. (2005) Subgenual prefrontal cortex volumes in major depressive disorder and schizophrenia: diagnostic specificity and prognostic implications. Am. J. Psychiatry, 162: 1706–1712. Cotter, D., Mackay, D., Chana, G., Beasley, C., Landau, S. and Everall, I.P. (2002) Reduced neuronal size and glial cell density in area 9 of the dorsolateral prefrontal cortex in subjects with major depressive disorder. Cereb. Cortex, 12: 386–394. Coupland, N.J., Ogilvie, C.J., Hegadoren, K.M., Seres, P., Hanstock, C.C. and Allen, P.S. (2005) Decreased prefrontal Myo-inositol in major depressive disorder. Biol. Psychiatry, 57: 1526–1534. Danielsen, E.R. and Ross, B. (1999) Magnetic Resonance Spectroscopy Diagnosis of Neurological Diseases. Marcel Dekker, New York. de Leeuw, F.E., de Groot, J.C., Achten, E., Oudkerk, M., Ramos, L.M., Heijboer, R., Hofman, A., Jolles, J., van Gijn, J. and Breteler, M.M. (2001) Prevalence of cerebral white matter lesions in elderly people: a population based magnetic resonance imaging study. The Rotterdam Scan Study. J. Neurol. Neurosurg. Psychiatry, 70: 9–14. Dolan, R.J., Bench, C.J., Brown, R.G., Scott, L.C. and Frackowiak, R.S. (1994) Neuropsychological dysfunction in depression: the relationship to regional cerebral blood flow. Psychol. Med., 24: 849–857. Duman, R.S. (2002a) Pathophysiology of depression: the concept of synaptic plasticity. Eur. Psychiatry, 17(Suppl 3): 306–310. Duman, R.S. (2002b) Synaptic plasticity and mood disorders. Mol. Psychiatry, 7(Suppl 1): S29–S34. Elliott, R., Rubinsztein, J.S., Sahakian, B.J. and Dolan, R.J. (2002) The neural basis of mood-congruent processing biases in depression. Arch. Gen. Psychiatry, 59: 597–604. Ende, G., Braus, D.F., Walter, S., Weber-Fahr, W. and Henn, F.A. (2000) The hippocampus in patients treated with electroconvulsive therapy: a proton magnetic resonance spectroscopic imaging study. Arch. Gen. Psychiatry, 57: 937–943. Ende, G., Demirakca, T., Walter, S., Wokrina, T. and Henn, F.A. (in press). Subcortical and medial temporal MR-detectable metabolite abnormalities in unipolar major depression. Eur. Arch. Psychiatry Clin. Neurosci. Erickson, K., Drevets, W.C., Clark, L., Cannon, D.M., Bain, E.E., Zarate Jr., C.A., Charney, D.S. and Sahakian, B.J. (2005) Mood-congruent bias in affective go/no-go performance of unmedicated patients with major depressive disorder. Am. J. Psychiatry, 162: 2171–2173. Farchione, T.R., Moore, G.J. and Rosenberg, D.R. (2002) Proton magnetic resonance spectroscopic imaging in pediatric major depression. Biol. Psychiatry, 52: 86–92.
Fossati, P., Harvey, P.O., Le Bastard, G., Ergis, A.M., Jouvent, R. and Allilaire, J.F. (2004) Verbal memory performance of patients with a first depressive episode and patients with unipolar and bipolar recurrent depression. J. Psychiatr. Res., 38: 137–144. Frey, R., Metzler, D., Fischer, P., Heiden, A., Scharfetter, J., Moser, E. and Kasper, S. (1998) Myo-inositol in depressive and healthy subjects determined by frontal 1H-magnetic resonance spectroscopy at 1.5 tesla. J. Psychiatr. Res., 32: 411–420. Frodl, T., Meisenzahl, E.M., Zetzsche, T., Hohne, T., Banac, S., Schorr, C., Jager, M., Leinsinger, G., Bottlender, R., Reiser, M. and Moller, H.J. (2004) Hippocampal and amygdala changes in patients with major depressive disorder and healthy controls during a 1-year follow-up. J. Clin. Psychiatry, 65: 492–499. Fu, C.H., Williams, S.C., Cleare, A.J., Brammer, M.J., Walsh, N.D., Kim, J., Andrew, C.M., Pich, E.M., Williams, P.M., Reed, L.J., Mitterschiffthaler, M.T., Suckling, J. and Bullmore, E.T. (2004) Attenuation of the neural response to sad faces in major depression by antidepressant treatment: a prospective, event-related functional magnetic resonance imaging study. Arch. Gen. Psychiatry, 61: 877–889. George, M.S., Ketter, T.A., Parekh, P.I., Horwitz, B., Herscovitch, P. and Post, R.M. (1995) Brain activity during transient sadness and happiness in healthy women. Am. J. Psychiatry, 152: 341–351. George, M.S., Ketter, T.A., Parekh, P.I., Rosinsky, N., Ring, H.A., Pazzaglia, P.J., Marangell, L.B., Callahan, A.M. and Post, R.M. (1997) Blunted left cingulate activation in mood disorder subjects during a response interference task (the Stroop). J. Neuropsychiatry Clin. Neurosci., 9: 55–63. Gonul, A.S., Kitis, O., Ozan, E., Akdeniz, F., Eker, C., Eker, O.D. and Vahip, S. (2006) The effect of antidepressant treatment on N-acetyl aspartate levels of medial frontal cortex in drug-free depressed patients. Prog. Neuropsychopharmacol. Biol. Psychiatry., 30: 120–125. Gotlib, I.H., Sivers, H., Gabrieli, J.D., Whitfield-Gabrieli, S., Goldin, P., Minor, K.L. and Canli, T. (2005) Subgenual anterior cingulate activation to valenced emotional stimuli in major depression. Neuroreport, 16: 1731–1734. Grachev, I.D., Ramachandran, T.S., Thomas, P.S., Szeverenyi, N.M. and Fredrickson, B.E. (2003) Association between dorsolateral prefrontal N-acetyl aspartate and depression in chronic back pain: an in vivo proton magnetic resonance spectroscopy study. J. Neural. Transm., 110: 287–312. Gruber, S., Frey, R., Mlynarik, V., Stadlbauer, A., Heiden, A., Kasper, S., Kemp, G.J. and Moser, E. (2003) Quantification of metabolic differences in the frontal brain of depressive patients and controls obtained by 1H-MRS at 3 Tesla. Invest. Radiol., 38: 403–408. Gur, R.C., Erwin, R.J., Gur, R.E., Zwil, A.S., Heimberg, C. and Kraemer, H.C. (1992) Facial emotion discrimination: II. Behavioral findings in depression. Psychiatry Res., 42: 241–251. Haaga, D.A. and Beck, A.T. (1995) Perspectives on depressive realism: implications for cognitive theory of depression. Behav. Res. Ther., 33: 41–48.
498 Hamann, S. and Mao, H. (2002) Positive and negative emotional verbal stimuli elicit activity in the left amygdala. Neuroreport, 13: 15–19. Hamidi, M., Drevets, W.C. and Price, J.L. (2004) Glial reduction in amygdala in major depressive disorder is due to oligodendrocytes. Biol. Psychiatry, 55: 563–569. Harwood, A.J. (2005) Lithium and bipolar mood disorder: the inositol-depletion hypothesis revisited. Mol. Psychiatry, 10: 117–126. Hastings, R.S., Parsey, R.V., Oquendo, M.A., Arango, V. and Mann, J.J. (2004) Volumetric analysis of the prefrontal cortex, amygdala, and hippocampus in major depression. Neuropsychopharmacology, 29: 952–959. Herbert, C., Kissler, J., Jungho¨fer, M., Peyk, P., Wildgruber, D., Ethofer, T. and Grodd, W. (2005). Sexy, successful, dynamic: left amygdala activation during reading of pleasant adjectives. HBM 2005 237, Toronto, CA. Hetherington, H.P., Spencer, D.D., Vaughan, J.T. and Pan, J.W. (2001) Quantitative 31P spectroscopic imaging of human brain at 4 Tesla: assessment of gray and white matter differences of phosphocreatine and ATP. Magn. Reson. Med., 45: 46–52. Hickie, I., Naismith, S., Ward, P.B., Turner, K., Scott, E., Mitchell, P., Wilhelm, K. and Parker, G. (2005) Reduced hippocampal volumes and memory loss in patients with early- and late-onset depression. Br. J. Psychiatry, 186: 197–202. Ho, A.P., Gillin, J.C., Buchsbaum, M.S., Wu, J.C., Abel, L. and Bunney Jr., W.E. (1996) Brain glucose metabolism during non-rapid eye movement sleep in major depression. A positron emission tomography study. Arch. Gen. Psychiatry, 53: 645–652. Hsieh, M.H., McQuoid, D.R., Levy, R.M., Payne, M.E., MacFall, J.R. and Steffens, D.C. (2002) Hippocampal volume and antidepressant response in geriatric depression. Int. J. Geriatr. Psychiatry, 17: 519–525. Janssen, J., Hulshoff Pol, H.E., Lampe, I.K., Schnack, H.G., de Leeuw, F.E., Kahn, R.S. and Heeren, T.J. (2004) Hippocampal changes and white matter lesions in early-onset depression. Biol. Psychiatry, 56: 825–831. Ke, Y., Cohen, B.M., Lowen, S., Hirashima, F., Nassar, L. and Renshaw, P.F. (2002) Biexponential transverse relaxation (T2) of the proton MRS creatine resonance in human brain. Magn. Reson. Med., 47: 232–238. Kempermann, G. and Kronenberg, G. (2003) Depressed new neurons–adult hippocampal neurogenesis and a cellular plasticity hypothesis of major depression. Biol. Psychiatry, 54: 499–503. Kumar, A., Thomas, A., Lavretsky, H., Yue, K., Huda, A., Curran, J., Venkatraman, T., Estanol, L., Mintz, J., Mega, M. and Toga, A. (2002) Frontal white matter biochemical abnormalities in late-life major depression detected with proton magnetic resonance spectroscopy. Am. J. Psychiatry, 159: 630–636. Kusumakar, V., MacMaster, F.P., Gates, L., Sparkes, S.J. and Khan, S.C. (2001) Left medial temporal cytosolic choline in early onset depression. Can. J. Psychiatry, 46: 959–964.
Lacerda, A.L., Keshavan, M.S., Hardan, A.Y., Yorbik, O., Brambilla, P., Sassi, R.B., Nicoletti, M., Mallinger, A.G., Frank, E., Kupfer, D.J. and Soares, J.C. (2004) Anatomic evaluation of the orbitofrontal cortex in major depressive disorder. Biol. Psychiatry, 55: 353–358. Lacerda, A.L., Nicoletti, M.A., Brambilla, P., Sassi, R.B., Mallinger, A.G., Frank, E., Kupfer, D.J., Keshavan, M.S. and Soares, J.C. (2003) Anatomical MRI study of basal ganglia in major depressive disorder. Psychiatry Res., 124: 129–140. Lampe, I.K., Hulshoff Pol, H.E., Janssen, J., Schnack, H.G., Kahn, R.S. and Heeren, T.J. (2003) Association of depression duration with reduction of global cerebral gray matter volume in female patients with recurrent major depressive disorder. Am. J. Psychiatry, 160: 2052–2054. Lavretsky, H., Roybal, D.J., Ballmaier, M., Toga, A.W. and Kumar, A. (2005) Antidepressant exposure may protect against decrement in frontal gray matter volumes in geriatric depression. J. Clin. Psychiatry, 66: 964–967. Lekwauwa, R.E., McQuoid, D.R. and Steffens, D.C. (2005) Hippocampal volume as a predictor of short-term ECT outcomes in older patients with depression. Am. J. Geriatr. Psychiatry, 13: 910–913. Leppanen, J.M., Milders, M., Bell, J.S., Terriere, E. and Hietanen, J.K. (2004) Depression biases the recognition of emotionally neutral faces. Psychiatry Res., 128: 123–133. Liotti, M., Mayberg, H.S., Brannan, S.K., McGinnis, S., Jerabek, P. and Fox, P.T. (2000) Differential limbic–cortical correlates of sadness and anxiety in healthy subjects: implications for affective disorders. Biol. Psychiatry, 48: 30–42. Liotti, M., Mayberg, H.S., McGinnis, S., Brannan, S.L. and Jerabek, P. (2002) Unmasking disease-specific cerebral blood flow abnormalities: mood challenge in patients with remitted unipolar depression. Am. J. Psychiatry, 159: 1830–1840. Lloyd, G.G. and Lishman, W.A. (1975) Effect of depression on the speed of recall of pleasant and unpleasant experiences. Psychol. Med., 5: 173–180. MacMillan, S., Szeszko, P.R., Moore, G.J., Madden, R., Lorch, E., Ivey, J., Banerjee, S.P. and Rosenberg, D.R. (2003) Increased amygdala: hippocampal volume ratios associated with severity of anxiety in pediatric major depression. J. Child. Adolesc. Psychopharmacol., 13: 65–73. Maddock, R.J., Garrett, A.S. and Buonocore, M.H. (2003) Posterior cingulate cortex activation by emotional words: fMRI evidence from a valence decision task. Hum. Brain Mapp., 18: 30–41. Mathews, A., Ridgeway, V. and Williamson, D.A. (1996) Evidence for attention to threatening stimuli in depression. Behav. Res. Ther., 34: 695–705. Mayberg, H.S., Brannan, S.K., Tekell, J.L., Silva, J.A., Mahurin, R.K., McGinnis, S. and Jerabek, P.A. (2000) Regional metabolic effects of fluoxetine in major depression: serial changes and relationship to clinical response. Biol. Psychiatry, 48: 830–843. Mayberg, H.S., Liotti, M., Brannan, S.K., McGinnis, S., Mahurin, R.K., Jerabek, P.A., Silva, J.A., Tekell, J.L., Martin, C.C., Lancaster, J.L. and Fox, P.T. (1999) Reciprocal limbic-
499 cortical function and negative mood: converging PET findings in depression and normal sadness. Am. J. Psychiatry, 156: 675–682. McEwen, B.S. (2000) The neurobiology of stress: from serendipity to clinical relevance. Brain Res., 886: 172–189. Michael, N., Erfurth, A., Ohrmann, P., Arolt, V., Heindel, W. and Pfleiderer, B. (2003a) Metabolic changes within the left dorsolateral prefrontal cortex occurring with electroconvulsive therapy in patients with treatment resistant unipolar depression. Psychol. Med., 33: 1277–1284. Michael, N., Erfurth, A., Ohrmann, P., Arolt, V., Heindel, W. and Pfleiderer, B. (2003b) Neurotrophic effects of electroconvulsive therapy: a proton magnetic resonance study of the left amygdalar region in patients with treatment-resistant depression. Neuropsychopharmacology, 28: 720–725. Morris, J.S., Frith, C.D., Perrett, D.I., Rowland, D., Young, A.W., Calder, A.J. and Dolan, R.J. (1996) A differential neural response in the human amygdala to fearful and happy facial expressions. Nature, 383: 812–815. Murphy, F.C., Sahakian, B.J., Rubinsztein, J.S., Michael, A., Rogers, R.D., Robbins, T.W. and Paykel, E.S. (1999) Emotional bias and inhibitory control processes in mania and depression. Psychol. Med., 29: 1307–1321. Naismith, S., Hickie, I., Ward, P.B., Turner, K., Scott, E., Little, C., Mitchell, P., Wilhelm, K. and Parker, G. (2002) Caudate nucleus volumes and genetic determinants of homocysteine metabolism in the prediction of psychomotor speed in older persons with depression. Am. J. Psychiatry, 159: 2096–2098. Nandrino, J.L., Dodin, V., Martin, P. and Henniaux, M. (2004) Emotional information processing in first and recurrent major depressive episodes. J. Psychiatry. Res., 38: 475–484. Neumeister, A., Wood, S., Bonne, O., Nugent, A.C., Luckenbaugh, D.A., Young, T., Bain, E.E., Charney, D.S. and Drevets, W.C. (2005) Reduced hippocampal volume in unmedicated, remitted patients with major depression versus control subjects. Biol. Psychiatry, 57: 935–937. Nolan, C.L., Moore, G.J., Madden, R., Farchione, T., Bartoi, M., Lorch, E., Stewart, C.M. and Rosenberg, D.R. (2002) Prefrontal cortical volume in childhood-onset major depression: preliminary findings. Arch. Gen. Psychiatry, 59: 173–179. Nolen-Hoeksema, S. (2000) The role of rumination in depressive disorders and mixed anxiety/depressive symptoms. J. Abnorm. Psychol., 109: 504–511. Noworolski, S.M., Nelson, S.J., Henry, R.G., Day, M.R., Wald, L.L., Star-Lack, J. and Vigneron, D.B. (1999) High spatial resolution 1H-MRSI and segmented MRI of cortical gray matter and subcortical white matter in three regions of the human brain. Magn. Reson. Med., 41: 21–29. O’Brien, J.T., Ames, D., Schweitzer, I., Desmond, P., Coleman, P. and Tress, B. (1997) Clinical, magnetic resonance imaging and endocrinological differences between delusional and nondelusional depression in the elderly. Int. J. Geriatr. Psychiatry, 12: 211–218. O’Brien, J.T., Lloyd, A., McKeith, I., Gholkar, A. and Ferrier, N. (2004) A longitudinal study of hippocampal volume, cortisol levels, and cognition in older depressed subjects. Am. J. Psychiatry, 161: 2081–2090.
Omura, K., Todd Constable, R. and Canli, T. (2005) Amygdala gray matter concentration is associated with extraversion and neuroticism. Neuroreport, 16: 1905–1908. Pardo, J.V., Pardo, P.J. and Raichle, M.E. (1993) Neural correlates of self-induced dysphoria. Am. J. Psychiatry, 150: 713–719. Persad, S.M. and Polivy, J. (1993) Differences between depressed and nondepressed individuals in the recognition of and response to facial emotional cues. J. Abnorm. Psychol., 102: 358–368. Pfleiderer, B., Michael, N., Erfurth, A., Ohrmann, P., Hohmann, U., Wolgast, M., Fiebich, M., Arolt, V. and Heindel, W. (2003) Effective electroconvulsive therapy reverses glutamate/glutamine deficit in the left anterior cingulum of unipolar depressed patients. Psychiatry Res., 122: 185–192. Posener, J.A., Wang, L., Price, J.L., Gado, M.H., Province, M.A., Miller, M.I., Babb, C.M. and Csernansky, J.G. (2003) High-dimensional mapping of the hippocampus in depression. Am. J. Psychiatry, 160: 83–89. Renshaw, P.F., Lafer, B., Babb, S.M., Fava, M., Stoll, A.L., Christensen, J.D., Moore, C.M., Yurgelun-Todd, D.A., Bonello, C.M., Pillay, S.S., Rothschild, A.J., Nierenberg, A.A., Rosenbaum, J.F. and Cohen, B.M. (1997) Basal ganglia choline levels in depression and response to fluoxetine treatment: an in vivo proton magnetic resonance spectroscopy study. Biol. Psychiatry, 41: 837–843. Robinson, R.G. and Lipsey, J.R. (1985) Cerebral localization of emotion based on clinical-neuropathological correlations: methodological issues. Psychiatry Dev., 3: 335–347. Rosenberg, D.R., Macmaster, F.P., Mirza, Y., Smith, J.M., Easter, P.C., Banerjee, S.P., Bhandari, R., Boyd, C., Lynch, M., Rose, M., Ivey, J., Villafuerte, R.A., Moore, G.J. and Renshaw, P. (2005) Reduced anterior cingulate glutamate in pediatric major depression: a magnetic resonance spectroscopy study. Biol. Psychiatry, 58: 700–704. Ross, B. and Michaelis, T. (1994) Clinical applications of magnetic resonance spectroscopy. Magn. Reson. Q, 10: 191–247. Rosso, I.M., Cintron, C.M., Steingard, R.J., Renshaw, P.F., Young, A.D. and Yurgelun-Todd, D.A. (2005) Amygdala and hippocampus volumes in pediatric major depression. Biol. Psychiatry, 57: 21–26. Rothman, D.L., Petroff, O.A.C., Behar, K.L. and Mattson, R.H. (1993) Localized 1H NMR measurements of g-aminobutyric acid in human brain in vivo. Proc. Natl. Acad. Sci. USA, 90: 5662–5666. Sanacora, G., Gueorguieva, R., Epperson, C.N., Wu, Y.T., Appel, M., Rothman, D.L., Krystal, J.H. and Mason, G.F. (2004) Subtype-specific alterations of gamma-aminobutyric acid and glutamate in patients with major depression. Arch. Gen. Psychiatry, 61: 705–713. Schuff, N., Amend, D., Ezekiel, F., Steinman, S.K., Tanabe, J., Norman, D., Jagust, W., Kramer, J.H., Mastrianni, J.A., Fein, G. and Weiner, M.W. (1997) Changes of hippocampal N-acetyl aspartate and volume in Alzheimer’s disease. A proton MR spectroscopic imaging and MRI study. Neurology, 49: 1513–1521.
500 Segal, Z.V., Williams, J.M., Teasdale, J.D. and Gemar, M. (1996) A cognitive science perspective on kindling and episode sensitization in recurrent affective disorder. Psychol. Med., 26: 371–380. Sheline, Y.I., Barch, D.M., Donnelly, J.M., Ollinger, J.M., Snyder, A.Z. and Mintun, M.A. (2001) Increased amygdala response to masked emotional faces in depressed subjects resolves with antidepressant treatment: an fMRI study. Biol. Psychiatry, 50: 651–658. Sheline, Y.I., Gado, M.H. and Kraemer, H.C. (2003) Untreated depression and hippocampal volume loss. Am. J. Psychiatry, 160: 1516–1518. Shestyuk, A.Y., Deldin, P.J., Brand, J.E. and Deveney, C.M. (2005) Reduced sustained brain activity during processing of positive emotional stimuli in major depression. Biol. Psychiatry, 57: 1089–1096. Shulman, R.G., Rothman, D.L., Behar, K.L. and Hyder, F. (2004) Energetic basis of brain activity: implications for neuroimaging. Trends Neurosci., 27: 489–495. Siegle, G.J., Steinhauer, S.R., Thase, M.E., Stenger, V.A. and Carter, C.S. (2002) Can’t shake that feeling: event-related fMRI assessment of sustained amygdala activity in response to emotional information in depressed individuals. Biol. Psychiatry, 51: 693–707. Sonawalla, S.B., Renshaw, P.F., Moore, C.M., Alpert, J.E., Nierenberg, A.A., Rosenbaum, J.F. and Fava, M. (1999) Compounds containing cytosolic choline in the basal ganglia: a potential biological marker of true drug response to fluoxetine. Am. J. Psychiatry, 156: 1638–1640. Steffens, D.C., Helms, M.J., Krishnan, K.R. and Burke, G.L. (1999) Cerebrovascular disease and depression symptoms in the cardiovascular health study. Stroke, 30: 2159–2166. Steffens, D.C., McQuoid, D.R., Welsh-Bohmer, K.A. and Krishnan, K.R. (2003) Left orbital frontal cortex volume and performance on the benton visual retention test in older depressives and controls. Neuropsychopharmacology, 28: 2179–2183. Steingard, R.J., Renshaw, P.F., Hennen, J., Lenox, M., Cintron, C.B., Young, A.D., Connor, D.F., Au, T.H. and Yurgelun-Todd, D.A. (2002) Smaller frontal lobe white matter volumes in depressed adolescents. Biol. Psychiatry, 52: 413–417. Steingard, R.J., Yurgelun-Todd, D.A., Hennen, J., Moore, J.C., Moore, C.M., Vakili, K., Young, A.D., Katic, A., Beardslee, W.R. and Renshaw, P.F. (2000) Increased orbitofrontal cortex levels of choline in depressed adolescents as detected by in vivo proton magnetic resonance spectroscopy. Biol. Psychiatry, 48: 1053–1061. Surguladze, S., Brammer, M.J., Keedwell, P., Giampietro, V., Young, A.W., Travis, M.J., Williams, S.C. and Phillips, M.L. (2005) A differential pattern of neural response toward sad versus happy facial expressions in major depressive disorder. Biol. Psychiatry, 57: 201–209. Surguladze, S.A., Young, A.W., Senior, C., Brebion, G., Travis, M.J. and Phillips, M.L. (2004) Recognition accuracy and response bias to happy and sad facial expressions in patients with major depression. Neuropsychology, 18: 212–218.
Taki, Y., Kinomura, S., Awata, S., Inoue, K., Sato, K., Ito, H., Goto, R., Uchida, S., Tsuji, I., Arai, H., Kawashima, R. and Fukuda, H. (2005) Male elderly subthreshold depression patients have smaller volume of medial part of prefrontal cortex and precentral gyrus compared with age-matched normal subjects: a voxel-based morphometry. J. Affect. Disord., 88: 313–320. Taylor, W.D., MacFall, J.R., Provenzale, J.M., Payne, M.E., McQuoid, D.R., Steffens, D.C. and Krishnan, K.R. (2003a) Serial MR imaging of volumes of hyperintense white matter lesions in elderly patients: correlation with vascular risk factors. AJR Am. J. Roentgenol., 181: 571–576. Taylor, W.D., Steffens, D.C., MacFall, J.R., McQuoid, D.R., Payne, M.E., Provenzale, J.M. and Krishnan, K.R. (2003b) White matter hyperintensity progression and late-life depression outcomes. Arch. Gen. Psychiatry, 60: 1090–1096. Taylor, W.D., Steffens, D.C., McQuoid, D.R., Payne, M.E., Lee, S.H., Lai, T.J. and Krishnan, K.R. (2003c) Smaller orbital frontal cortex volumes associated with functional disability in depressed elders. Biol. Psychiatry, 53: 144–149. Taylor, W.D., Steffens, D.C., Payne, M.E., MacFall, J.R., Marchuk, D.A., Svenson, I.K. and Krishnan, K.R. (2005) Influence of serotonin transporter promoter region polymorphisms on hippocampal volumes in late-life depression. Arch. Gen. Psychiatry, 62: 537–544. Teasdale, J.D. and Russell, M.L. (1983) Differential effects of induced mood on the recall of positive, negative and neutral words. Br. J. Clin. Psychol., 22(Pt 3): 163–171. Thomas, K.M., Drevets, W.C., Dahl, R.E., Ryan, N.D., Birmaher, B., Eccard, C.H., Axelson, D., Whalen, P.J. and Casey, B.J. (2001) Amygdala response to fearful faces in anxious and depressed children. Arch. Gen. Psychiatry, 58: 1057–1063. Vakili, K., Pillay, S.S., Lafer, B., Fava, M., Renshaw, P.F., Bonello-Cintron, C.M. and Yurgelun-Todd, D.A. (2000) Hippocampal volume in primary unipolar major depression: a magnetic resonance imaging study. Biol. Psychiatry, 47: 1087–1090. Videbech, P., Ravnkilde, B., Pedersen, T.H., Hartvig, H., Egander, A., Clemmensen, K., Rasmussen, N.A., Andersen, F., Gjedde, A. and Rosenberg, R. (2002) The Danish PET/ depression project: clinical symptoms and cerebral blood flow. A regions-of-interest analysis. Acta Psychiatr. Scand., 106: 35–44. Vollmert, C., Tost, H., Brassen, S., Jatzko, A. and Braus, D.F. (2004) Depression und moderne Bildgebung. Fortschr. Neurol. Psychiatr., 72: 435–445. Vythilingam, M., Charles, H.C., Tupler, L.A., Blitchington, T., Kelly, L. and Krishnan, K.R. (2003) Focal and lateralized subcortical abnormalities in unipolar major depressive disorder: an automated multivoxel proton magnetic resonance spectroscopy study. Biol. Psychiatry, 54: 744–750. Vythilingam, M., Heim, C., Newport, J., Miller, A.H., Anderson, E., Bronen, R., Brummer, M., Staib, L., Vermetten, E., Charney, D.S., Nemeroff, C.B. and Bremner, J.D. (2002) Childhood trauma associated with smaller hippocampal volume in women with major depression. Am. J. Psychiatry, 159: 2072–2080.
501 Vythilingam, M., Vermetten, E., Anderson, G.M., Luckenbaugh, D., Anderson, E.R., Snow, J., Staib, L.H., Charney, D.S. and Bremner, J.D. (2004) Hippocampal volume, memory, and cortisol status in major depressive disorder: effects of treatment. Biol. Psychiatry, 56: 101–112. Wang, Z.J. and Zimmerman, R.A. (1998) Proton MR spectroscopy of pediatric brain metabolic disorders. Neuroimaging Clin. N Am., 8: 781–807. Watkins, P.C., Martin, C.K. and Stern, L.D. (2000) Unconscious memory bias in depression: perceptual and conceptual processes. J. Abnorm. Psychol., 109: 282–289.
Watkins, P.C., Vache, K., Verney, S.P., Muller, S. and Mathews, A. (1996) Unconscious mood-congruent memory bias in depression. J. Abnorm. Psychol., 105: 34–41. Weisbrod, M., Trage, J., Hill, H., Sattler, H.-D., Maier, S., Kiefer, M., Grothe, J. and Spitzer, M. (1999) Emotional priming in depressive patients. Ger. J. Psychiatry, 2: 19–47. Willner, P. (1984) Cognitive functioning in depression: a review of theory and research. Psychol. Med., 14: 807–823. Yang, T.T., Menon, V., Eliez, S., Blasey, C., White, C.D., Reid, A.J., Gotlib, I.H. and Reiss, A.L. (2002) Amygdalar activation associated with positive and negative facial expressions. Neuroreport, 13: 1737–1741.
Subject Index Attitude 307, 309 Autism 405–408, 410 Autobiographic recall 430 Autonomic response 56, 57, 61, 275, 370, 372, 374, 426, 450–451 Autoshaping 11 Average reference 123, 124, 127–130, 153, 155, 172–174 Awareness 105–118, 369, 422
Acoustic cue hypothesis 272, 280 Acoustic lateralization hypothesis 251, 253, 258 Addiction 46 Affective Norms for English Words (ANEW) 5, 148, 180, 186, 187, 189, 196, 210 Affective priming see Priming Affective space 148, 187–189, 226 Aggression 235, 409, 434–435, 447 Amphetamine 12 Amplitude envelope 238 Amygdala 8–17, 20, 22–23, 107, 147, 159–161, 166–168, 208, 222, 255, 261, 263, 264, 444, 458–463, 468, 481, 483, 486 Crossmodal processing 355, 356 Emotion regulation 433 Gaze 365, 369 Social judgements 367 Visuospatial attention 369 Anger 5, 205, 240, 254, 296, 304, 322, 331, 365, 413, 446–448 Anterior cingulate cortex (ACC) see Cingulate cortex Antisocial personality disorder 457, 459 Aphasia 286 Appetitive system see Appetitive and defensive system Appetitive and defensive system 3–24, 67–68 Aprosodia 286 Area 7b 373 Area F5 380, 383, 426 Area PF 380, 383 Arousal 4–8, 10, 14, 16, 17, 19, 20, 23, 32, 34, 35, 37, 56, 57, 60, 62, 71, 85, 94, 96, 98, 134, 137, 148–150, 154–156, 158, 163, 165–168, 186–193, 195, 196, 198, 201, 210, 211, 212, 214, 218–221, 223–228, 236, 239–242, 260, 272, 275, 287, 296, 363, 367, 370, 389, 411, 426, 433, 449, 458, 459, 468, 482 Articulator 287 Asperger Syndrome 407 Attention 5, 8, 9, 10, 22, 32, 37, 40, 43, 53, 70, 80, 85, 107, 157, 165, 188, 191, 208, 214, 217, 235, 244, 330–331, 346, 363, 372, 405, 426, 444, 461, 462, 468, 483 spatial attention 67–87 motivated attention 31–48 Attentional blink 118, 160, 219, 219, 227, 228
Backward masking see Masking Basal ganglia 261, 262, 289, 446, 448–451, 494 see also Putamen, Striatum Basic emotion 411 see also Facial expression Bed nucleus of stria terminalis (BNST) 9, 11, 13 Binocular rivalry 107 Biphasic typology of emotions 5, 32 Blindsight 112, 369 Bottom-up effects 271–272 Brain stem 21, 186, 481 Broca’s area 258, 277 Broca’s homologue 250 Brunswik’s ‘‘lens model’’ 237 Categorization 53–55, 57, 58, 62, 192 Chimeric face 468–470 Cholecystokinin (CCK) 11 Choline 489, 491 Cingulate cortex 468 Anterior cingulate cortex (ACC) 11, 38, 80, 99, 160, 209, 386, 410, 413, 427, 432, 436, 461, 447–450, 482, 487, 494 Classical conditioning see Conditioning Competition model of visual attention 40 Computer vision 333–337 Conditioning 5, 9–10, 13–14, 16–17, 19, 22, 81, 159, 219, 222–224, 228, 372, 450, 458 Conjunction analysis 349 Connotation 147, 161 Connectivity Connectivity architecture 260 Functional connectivity 260, 369, 460 Intrinsic connectivity 260, 262 Consciousness 107 Contrastive stress patterns 279
503
504
Corrugator muscle 6, 425, 433–434 Cortical mapping 125 Corticosteroid 10 Corticotrophin releasing hormone (CRH) 11 Counterempathy 426 Creatine 489, 491, 493 Criterion threshold 115–116 Crossmodal processing 350, 355–357 Bias 346 Binding 346 Congruent/incongruent information 356 Convergence zones 346 Hebbian learning 392 Integration 263, 264 Integration rules 347 Interaction 261 Response depression 346 Supra-additivity 353 Current source density (CSD) 55, 125, 126, 129 DC potentials 271 Deep brain stimulation 447 Defense cascade 22–23, 57–58 Defensive system see Appetitive and Defensive System Depression, symptoms, cognitive theory 482 see also Major depression, Unipolar depression Desensitization 94 Dichotic listening 244, 245 Differential specialization hypothesis 270 Disgust 207, 254, 255, 262, 292, 296–300, 302, 328, 336, 365, 367, 373, 380, 386–388, 390, 392, 394, 396, 404, 411–413, 427, 443–446, 451, 472, 483 Disruptive behaviour disorder 434 Dissociation paradigm 106 Dopamine 9, 11, 12, 16, 21, 446, 447, 481 Dopaminergic system 449, 476 Dot-probe paradigm 68–78, 81 Duchenne smile 326 Duration cues 278 Duration discrimination 251 Dynamic causal modeling 260 Early posterior negativity (EPN) 33–35, 38, 40, 42, 44, 46–48, 125 Effective sensor coverage 132 Efferent leakage 92 Electroencephalography (EEG) 7–10, 17, 22, 33, 44, 71, 75, 77, 78, 85, 123–128, 130–133, 136–138, 147, 153, 156, 161–163, 186, 189, 208, 221, 223, 227, 228, 239, 242, 244, 269, 271, 347, 461, 468 Electromyography (EMG), facial 425 Emotion regulation 423, 433
Emotional connotation 205–215 Emotional contagion 404, 420, 422, 425 Emotional experience 373 Emotional expression, facial 32, 473–476 Emotional Stroop 188, 219 Emotion-specific intonation contours 242 Empathy quotient 409, 413 Enthorinal cortex 449 E-Representation 407 Event categorization 55 Event-related potential (ERP) 6, 17, 18, 31–48, 53, 54, 62, 68, 72, 124, 147, 152, 185, 221, 242, 329, 346, 446, 458, 468 Event-related field (ERF) 131–139 Explicit judgment 261 Explicit processing 158 emotional prosody 253 Extinction 14, 228, 446, 460 Extrastriate cortex 67, 77, 80, 83, 87, 331, 357, 368, 468 F0 see Fundamental frequency Face inversion effect 322 Face processing, models of 327–329, 331–333 Face recognition 321, 323, 324, 326, 328, 329, 331, 333–337, 470 Component-configural 323 Composite effect 323 Double dissociation of expression recognition and 331 Dynamic information 327 Expertise hypothesis of 330 Fuzzy logical model of perception (FLMP) 326 Holistic 323 N170 332 Scrambling and blurring, effect of 324 Facial action coding system (FACS) 326, 337 Facial expression 67–87, 482 Prototypical 334 Temporal cues 327 Double dissociation of recognition 331 Facial imitation 392, 425 Facial mimicry see Facial imitation Fairness 435 Fear 4, 7, 9, 11, 18, 20, 57, 67, 78, 81, 84, 159, 160, 205, 208, 224, 235, 240, 255, 328, 336, 347, 364, 365, 387, 388, 404, 411, 421, 443, 444–445, 458–459, 472, 483 Fear bradycardia 5–24 Freezing 3–23 Fear conditioning see Conditioning Frequency Domain analyses 222, 223 Patterns 279
505
Properties 289 Frequency-related information 278 Frontal cortex 209–215, 433 Dorsolateral prefrontal cortex (DLPFC) 38, 44, 54, 349, 352, 412, 436, 459, 482, 485, 488, 493 Inferior frontal cortex (IFC) 249, 250, 254, 256, 258–260, 263, 264, 287, 463 Inferior frontal gyrus (IFG) 97, 100, 258, 264, 383, 411, 413, 427, 463, 485 Medial prefrontal cortex (MPFC) 54, 87, 96, 99, 396, 410–411, 413, 428–430, 433–435, 444, 447–448, 483, 492 Orbitofrontal cortex (OFC) 13, 15, 67, 84, 99, 134, 258, 259, 264, 285, 289, 290, 346, 350–354, 411, 446–449, 457–461, 468, 482, 485, 488 Ventromedial prefrontal cortex (VMPFC) 44, 84 Ventrolateral prefrontal cortex (VLPFC) 81 Frontal eye field 81 Frontal lobe 208, 210, 211, 214, 222, 260, 462, 468, 475, 494 Frontoparietal network of attention 44 Functional lateralization hypothesis 256, 258 Fundamental frequency (F0) 236, 270 contours 240, 241, 242, 256 direction 280 level 241, 242 range 241, 270, 272, 273 Fusiform face area (FFA) 263, 330, 368 Crossmodal processing 357 Expertise hypothesis 330 Fusiform gyrus 40, 76, 160, 263, 264, 329–331, 368, 427
Helping behaviour 434 High-lexical (HLI) interjections 299, 303 Hippocampus 8, 12, 38, 99, 137, 365, 366, 371, 481, 486, 489, 492, 494–495 Huntington’s disease 446, 451 Hypothalamus 9, 11, 208, 388, 411, 481
GABA 11, 481, 482, 489, 491, 495 Gabor patch 75 Gain control mechanism 77, 78, 80, 87 Gambling task 460 Gamma band activity 223 Gastric ulceration 10 Glutamate 481, 489, 491, 495 Glutamine 489, 491, 495 Go/no-go task 55, 72
Late Positive Complex (LPC) 154–155, 157, 159, 163–165, 170, 173–175, 177–178 Late Positive Potential (LPP) 17, 35–37, 48, 53, 57, 59, 61–62, 190–193, 197, 199 Lateralization 250–256, 258, 468 Left hemispatial bias (LHB) 469 Left hemisphere 130, 153–154, 163, 188–189, 208–209, 214, 228, 270, 277–279, 282, 467 Lexical access 164, 167, 193, 194, 217, 225, 227–229 Lexical decision 154, 164, 165, 193–195, 218, 219, 225 Lexicon 161, 167, 206, 209, 218, 297 Limbic system 11, 287, 331, 481, 483, 489 Linguistic accentuation 256 Linguistic prosody 242, 256, 258–259, 279, 287–288, 309 Linked mastoid reference 127–130 Locus coeruleus 9, 44 Loudness contours 298 Low fear model 458 Low-lexical (LLI) interjections 299, 303 Low-resolution tomography (LORETA) 126
Habituation 38–40, 53, 60–62 Haemodynamic refractoriness 355, 358 Happiness 189, 205, 240–242, 255, 262, 269, 270, 296–298, 300, 302, 322, 326, 327, 347, 351, 355, 257, 365, 367, 390, 397, 404, 412, 413, 425, 443, 445, 449–450, 459, 472 Hebbian cell assemblies 16 Learning 380, 392, 394 Facial imitation 392 Language 392
Ideomotor theory 421 Imagery, 94, 173,177, 189, 196, 220 Implicit processing Emotional prosody 253, 263 Incongruent prosody 302, 303 Inner speech 277 Insula 99, 251, 253, 255, 261, 279, 291, 331, 373, 386, 451, 459, 483 Crossmodal processing 346, 354 Disgust 373, 386, 411, 413 Pain 386, 427, 432 Intention 205, 208, 209, 213, 214, 249, 250, 306, 314, 367, 410, 412, 413, 419, 421, 423, 426, 429, 449 Interaction analysis of fMRI data Inspection of time series 352 Negative interaction 352 Unspecific deactivation 352 Interjections 297, 298, 300, 302, 303 International Affective Digitized Sounds (IADS) 5, 220, 224 International Affective Picture System (IAPS) 5, 33 Interpersonal stance 307, 309 Intonation 238 Inverse problem 131
506
Magnetic resonance spectroscopy (MRS), proton 489 Magnetencephalography (MEG) 76, 85, 123, 124, 130–133, 136–139, 329, 383 Major depression 482, 489 Mania 457, 469, 482 Masking 57, 106–107, 369 McGurk effect 345, 356 Memory task 459, 462 Mentalizing see Theory of Mind Mesolimbic dopamine system 11, 12 Microstate 77–78 Mindreading see Theory of Mind Mindreading system 404–406 Mirror neurons 381, 410, 426 Circuit 381 Congruent selectivity 384 Overgeneralization 395 Somatotopy 384 Tuning of 381, 385 Mock politeness 317 Mood congruent processing bias 482 Morphometry 485, 486, 488, 489 Motivation 3–24 Motivated attention see attention Motivational priming hypothesis 19 Motivational-structural rules 235, 236 Motivational systems 3–24, 186 Motor imagery 94 Motor imitation see Facial imitation Motor lateralization 470 Motor learning 12 Motor vocabulary 381, 385 M-representation 407 Multimodal processing see Crossmodal processing Myo-inositol 490, 493 N100 75, 152–157, 188, 222, 347 N170 76, 332 N-acetylaspartate 491 Natural scenes 34, 53–63 Neglect 71 Neurotransmitter 11, 16, 481, 489, 491 N-methyl-D-aspantam (NMDA) 10, 13 Norepinephrine 9, 11, 16, 44, 481 Novelty, of stimuli 38, 61 Occipital face area (OFA) 329, 339 Occipital cortex 18, 74, 82, 83, 85–87, 134, 330, 485 Orbitofrontal cortex see Frontal cortex Oscillatory brain activity 217
P300 55, 74, 153, 157–158, 161–163, 165, 189, 193, 199, 221, 226, 458, 461 Pain 386, 427 Paracingulate cortex 429, 436 Parahippocampal gyrus 96, 448, 449, 468, 483, 485, 488 Parietal cortex 18, 34, 67, 70, 75, 80, 85, 256, 332, 349, 352, 353, 373, 430, 483, 485 Inferior parietal cortex (IPC) 85, 352, 353, 430, 483, 485 Perceptual awareness 106 Perceptual threshold 157–158 Perspective taking 420, 422–423 Phobia 7, 46, 57 Phonation 287 Phonemic identification 242, 244 Phosphocreatine 489, 491, 493 Phrase 185–202 Phrase understanding 475 Pitch 298 Contours 239, 258, 273, 279, 280 Direction 280, 282 Features 239 Patterns 256 Range 256, 258, 272, 279 Variability 280, 281 Positive slow wave 37 Posner orienting task 82 Postencounter stage 4–24 Posttraumatic stress disorder (PTSD) 99 Preencounter stage 4–24 Prefrontal cortex see Frontal cortex Premotor cortex 150, 207, 214, 379–380, 382–384, 390, 426, 427 Priming 19, 42–43, 75–76, 157, 159, 164, 167, 176, 193, 195, 198–199, 201, 219 195, Proportional global signal scaling (PGSS) 124, 133–136 Propositional labels 297 Prosocial behaviour 423 Prosopagnosia 329 Pseudo-utterances 308, 311, 314 Psychic blindness 364 Psychopathy 158, 403, 408, 457–463 Pull effects 236, 237, 238 Pulvinar 331, 367, 369, 373, 374 Push effects 236, 237, 238 Putamen 12, 251, 446, 463, 483, 485, 488, 492–494 Question intonation 280 Questionnaire 431
507
Rapid categorization 217 Rapid serial visual presentation (RSVP) 33, 46, 53, 59, 153–157, 160, 172, 174, 188 Recognition potential 54, 160 Respiration 287 Response bias 114–115 Retrosplenial cortex 96, 98, 209, 213, 214 Reward 3, 4, 11, 115, 117, 411, 412, 449, 460, 494 Rhythm 238 Right hemisphere 270, 282, 304–306, 309, 313, 315, 468 Right hemisphere hypothesis 286, 287 Rythm 238 SI 386 SII 373, 386 Sadness 4, 205, 240–242, 254, 255, 267, 270, 275, 287, 296–300, 302, 304, 327, 365, 367, 397, 404, 413, 443, 444, 445, 451, 459, 483, 484 Schizophrenia 467–476 Schizophrenic patients, acute, chronic 472 Script imagery 95, 97–99 Segmental information 279, 298 Segmental structure 251 Self-assessment manikin (SAM) 94, 149, 187, 260, 347, 359 Self-other distinction 391, 412 Self-report measures see Questionnaire Semantic categorization 192–193, 227 Semantic differential 147, 150, 153, 158, 166, 170, 172–174, 187–188, 202 Semantics 150, 152, 164–165, 167, 205–206, 260, 295–296, 298 Semantic network 147, 150, 167, 217, 228 Semantic monitoring 209–211 Semantic Priming see Priming Sensor standardization 124, 130–133 Sensory threshold 113–115 Sentence focus 256, 279 Serotonin 9, 11, 481, 489 Sex differences 408–410 Simulation theory 372, 395, 421 Skin conductance response (SCR) 6, 7, 35, 38, 55, 56, 370, 426, 436 Spatial attention see attention Spatial sampling 125–126 Speaker attitude 309–310 Speaker confidence 311–313, 317, 318 Speaker politeness 314, 316 Spectral energy distribution 238 Startle reflex 10–11, 17–23, 46, 56, 61, 155, 157, 182, 450, 459, 463 Statement intonations 256, 279
Steady state visual evoked potential (ssVEP) 223–227 Striatum 11–13, 15, 21, 222, 291, 411, 446–447, 449, 485, 494 Stroop effect 188, 202 Stroop task 219, 462 Subliminal 107–111, 115, 118–120, 147, 152, 154, 159, 160, 188, 369, 425 Subthalamic nucleus (STN) 447 Superior colliculus 369 crossmodal processing 346 response depression 353 supra-additivity 353 Supramarginal gyrus (SMG) 277, 372, 373 Suprasegmental processing 279, 298 Suprasegmental features 251, 258 Suprasegmental sequences 260 Surprise 5, 108, 118, 277, 327, 328, 336, 365, 367, 388, 411, 443, 444, 446, 448, 449, 451 Syllabic stress 308 Syllable durations 298 Sympathy 404 Taming effect 11 Temporal cues 278, 289, 327 Temporal cortex 8, 73, 251, 253, 258 Inferior temporal cortex 17, 46, 54, 94 Mid temporal area (MT) 331 Middle temporal gyrus (MTG) 265, 350, 368, 379, 381, 383–385, 390, 411, 485 Superior temporal gyrus (STG) 331, 410–413, 430 Superior temporal sulcus (STS) 244–245, 253–254, 256, 260–261, 290–292, 346, 368, 379, 410 Temporal lobe 494 Temporal lobectomy 444 Temporal pole 22, 428, 430, 450 Temporo-parietal junction 81, 410, 411, 428, 430 Thalamus 8, 9, 20, 99, 137, 159, 256, 258, 331, 346, 350, 372, 463 see also Pulvinar Thatcher illusion 322 Theory of mind (ToM) 395, 404, 421 Thyroid-releasing hormone (TRH) 11 Time frequency analysis 223 Tone languages 237, 258 Touch 385–386 Top-down effects see bottom-up effects Transient sadness 483 Two-stage model of stimulus perception 47–48, 54 Ultrarapid visual categorization 54 Unattended processing see implicit processing Unintelligible emotional speech 291 Unipolar depression 469
508
Valence 4–8, 19, 20, 32, 34–37, 42–44, 55, 60, 71, 76, 94, 96, 97, 148, 150, 154, 18, 163, 165–168, 186–190, 193–196, 198–201, 210, 213, 218–220, 226, 253, 254, 256, 260, 262, 270, 281, 285, 286, 288, 289, 291, 347, 348, 411, 431, 449, 458, 459, 463, 467, 468, 471, 472, 482, 483 Valence hypothesis 254, 285, 286, 288, 467 Valence theory 472 Vasopressin 11 Ventral striatum see Striatum Verbal expression 296, 475 Violence inhibition mechanism (VIM) model 458, 459 Virtual reality 434
Visual search task 70, 218 Voice quality 238, 298 Vowel duration 272, 273 Voxel-based morphometry (VBM) see Morphometry Water maze task 12 Wernicke’s homologue 250, 253 White matter lesions 485 Withdrawal see Appetitive and Defensive System Word generation task 209–211 Word processing 127, 147–175, 195 Zygomatic muscle 6, 425, 431