ATTRACTION, DISTRACTION AND ACTION Multiple Perspectives on Attentional Capture
ADVANCES IN PSYCHOLOGY 133 Editor:
G. E. STELMACH
ELSEVIER Amsterdam - London -New York -Oxford - Paris -Shannon
- Singapore -Tokyo
ATTRACTION, DISTRACTION AND ACTION Multiple Perspectives on Attentional Capture
Edited b y
Charles L. FOLK Department of Psychology Villanova University Villanova,PA, U.S.A.
Bradley S. GIBSON Department of Psychology University of Notre Dame Notre Dame, IN, U.S.A.
200 1
ELSEVIER Amsterdam-London-NewYork-Oxford-Paris
-Shannon-Tokyo
ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box21 I , l000AEAmsterdam,TheNetherlands 0 2001 Elsevier Science B.V. All rights reserved.
This work is protected under copyright by Elsevier Science, and the following terms and conditions apply to its use: Photocopying Single photocopies of single chapters may be made for personal use as allowed by national copyright laws. Permission of the Publisher and payment of a fee is required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional purposes, resale, and all forms of document delivery. Special rates are available for educational institutions that wish to make photocopies for non-profit educational classroom use Permissions may be sought directly from Elsevier Science Global Rights Department, PO Box 800, Oxford OX5 IDX, UK; phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail:
[email protected] may also contact Global Rights directly through Elsevier’s home page (http://www.elsevier.nl), by selecting ‘Obtaining Permissions’.
In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; phone: (+1) (978) 7508400, fax (+1) (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid Clearance Service (CLARCS), 90 Tottenham Court Road, London W 1 P OLP, UK; phone (+44) 207 63 1 5555; fax: (+44) 207 63 I 5500. Other countries may have a local reprographic rights agency forpayments. Derivative works Tables of contents may be reproduced for internal circulation, but permission of Elsevier Science is required for external resale or distribution of such material. Permission of the Publisher is required for all other derivative works, including compilations and translations. Electronic Storage or Usage Permission of the publisher is required to store or use electronically any material contained in this work, including any chapter or part of a chapter. Except as outlined above, no part of this work may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the Publisher. Address permissions requests to: Elsevier Science Global Rights Department, at the mail, fax and e-mail addresses noted above Notice No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liabili ty, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made.
First edition 2001 Library of Congress Cataloging in Publication Data A catalog record from the Library of Congress has been applied for.
ISBN: 0 444 50676 4 ISSN: 0166-41 15 (Series)
@ The paper used in this publication meets the requirements ofANSI/NISO 239.48- 1992 (Permanence of Paper). Printed in The Netherlands.
Preface
The notion that certain mental or physical events can capture attention has strong intuitive appeal. Such intuitions are typically based on experiences in which an irrelevant event summons or attracts attention away from the demands of a current task. Although this apparent vulnerability to extemal distraction can, in some situations, be detrimental to the mental and physical health of the organism (as when the distracting event causes us to have an automobile accident), it may also be beneficial to the organism in situations where adaptation to important environmental change is required (as when the distracting event is itself potentially harmful and should therefore be avoided). Because attentional capture can have profound consequences both positive and negative) for mental and physical action, it is necessary to go beyond a simple intuitive understanding of this complex behavior. Indeed, scientific interest in attentional capture has grown exponentially over the last 10 years. A good part of this interest stems from the fact that modeling attentional capture has the potential to provide fundamental insights into the nature of cognitive control in general. More specifically, attentional capture provides an important empirical domain for modeling the interaction between "automatic" and "controlled" processing. However, a broad survey of this field suggests that the term "capture" means different things to different people. In some cases, it refers to shifts of spatial attention, in others involuntary saccades, and in still others general distraction by irrelevant stimuli. The properties that elicit "capture" can also range from abrupt flashes of light, to unexpected tones, to semantic novelty, to reoccurring thoughts. There also appear to be a number of different theoretical perspectives on the mechanisms underlying "capture" (both functional and neurophysiological) and the level of cognitive control over capture. Thus, the study of attentional capture appears to be at somewhat of a crossroad. Although there is growing interest in the phenomenon, and general agreement as to its practical and theoretical importance, there is also a growing diversity of empirical findings, theoretical perspectives, and experimental approaches. We believe that at this crossroad, it is critical to pause and attempt to reach some consensus on the existing state of research on attentional capture, and to chart new directions for future research on this important topic. However, given the diversity of experimental approaches to attentional capture, there is currently no forum for bringing together researchers to accomplish these goals. Existing conferences, such as Psychonomics, ARVO, Neuroscience, SRCD, Cognitive Aging, etc., rarely attract all the relevant researchers. Therefore, the first conference and workshop devoted exclusively to the study of attentional capture was held on June 3-4, 2000 at Villanova University. Over twenty-five researchers from a variety of different theoretical and methodological perspectives participated. The express purpose of the conference
vi was twofold: The first purpose was to provide a forum for researchers to present their latest empirical findings and theoretical developments; the second purpose was to engage in structured discussions concerning such fundamental issues as the definition of attentional capture, behavioral manifestations of attentional capture, and the measurement of attentional capture. By far, the issue of how to define attentional capture generated the most extensive discussions, with no clear consensus emerging. (Indeed, one of the discussion group leaders described his role as akin to "herding cats.") Nonetheless, although many of the fundamental issues remained unresolved, the interdisciplinary nature of the conference resulted in an exciting exchange of ideas and theory, many of which are represented in the following chapters. The present volume is organized into six different topic areas, or "perspectives." Each chapter reflects either cutting-edge research or state-of-the-art reviews of specific content areas. The Neuroscience section contains chapters that explore the biological underpinnings of attentional capture. The Visual Cognition section explores the theoretical boundaries of attentional capture within the visual domain, with particular emphasis on the debate regarding the degree of top-down control over attentional capture. The Multiple Modalities section extends the phenomenon of attentional capture to other modalities besides vision, including work on pre-pulse inhibition, auditory attention, and cross-modal interactions. The Developmental section addresses how attentional capture varies across the life span; whereas, the Individual Differences section addresses how attentional capture varies across individuals at similar stages of development. And, finally, the Dynamical Systems/Evolution section addresses the function of attentional capture from a broad, evolutionary perspective. We owe a debt of gratitude to the National Science Foundation and Villanova University for providing generous funding for this project. We would especially like to thank Helene Intraub at NSF for her encouragement and support. We also like to thank all those who participated in the Villanova Capture Conference, including presenters as well as those who participated in the workshop discussions; collectively, you made it an unqualified success. Finally, we are thank Clare Gideon for her clerical assistance in preparing the manuscript on which this volume is based. Charles L. Folk Bradley S. Gibson
vii
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v .ix
Part I. Neuroscience Electrophysiological studies of reflexive attention Joseph B. Hopfinger and George R. Mangun. . . . . . . . . . . . . . . . . . . . . . . .3 2. Inhibition of return in monkey and man Raymond M. Klein, Douglas P. Munoz, Michael C. Dorris, and Tracy L. Taylor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.
Part 11. Visual Cognition Inattentional blindness and attentional capture: Evidence for attention-based theories of visual salience Bradley S. Gibson and Mary A. Peterson . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1 4. Involuntary orienting to flashing distractors in delayed search? HaroldPashler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5 . Attentional capture in the Spatial and Temporal Domains Howard E. Egeth, Charles L. Folk, Andrew B. Leber, Takehiko Nakuyama, andSharmaK.Hende1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6. Attentional and oculomotor capture Jan Theeuwes and Richard Godijn .............................. 121 7. Attention capture, orienting, and awareness Steven B. Most and Daniel J. Simons . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15 1 3.
Part 111. Multiple Modalities
8. Using pre-pulse inhibition to study attentional capture: A warning about pre-pulse correlations J. Toby Mordkoffand Hilary Barth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ,177 9. Temporal expectancies, capture, and timing in auditory sequences Mari Riess Jones. ........................................... 191 10. Crossmodal attentional capture: A controversy resolved? Charlesspence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
viii Part IV. Developmental 11. Testing models of attentional capture during early infancy James L. Dannemiller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .265 12. Attentional capture, attentional control, and aging Arthur F. Kramer, Charles T. Scialfa, Matthew S. Peterson, and David E. Irwin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Part V. Individual Differences 13. A multidisciplinary perspective on attentional control Douglas Derryberry and Majorie A. Reed. . . . . . . . . . . . . . . . . . . . . . . . .325 14. Capacity, control and conflict: An individual differences perspective on attentional capture Andrew R. A. Conway and Michael J. Kane ...................... .349 Part VI. Dynamical Systems/Evolution
15. A dynamic, evolutionary perspective on attention capture William A . Johnston and David L. Stayer ........................ Subjectindex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.375 399
ix
Contributors
Hilary Barth, Massachusetts Institute of Technology Andrew R. A. Conway, University of Illinois at Chicago James L. Dannemiller, University of Wisconsin at Madison Douglas Derryberry, Oregon State University Michael C. Dorris, New York University Howard E. Egeth, Johns Hopkins University Charles L. Folk, Villanova University Bradley S. Gibson, University of Notre Dame Richard Godijn, Vrije Universiteit Sharma K. Hendel, Johns Hopkins University Joseph B. Hopfinger, University of North Carolina at Chapel Hill David E. Irwin, University of Illinois at Urbana-Champaign William A. Johnston, University of Utah Michael J. Kane, University of North Carolina at Greensboro Raymond M. Klein, Dalhousie University Arthur F. Kramer, University of Illinois at Urbana-Champaign Andrew B. Leber, Johns Hopkins University George R. Mangun, Duke University J. Toby Mordkoff, Pennsylvania State University Steven B. Most, Harvard University Douglas P. Munoz, Queens University Takehiko Nakayama, Johns Hopkins University Harold Pashler, University of California at San Diego Mary A. Peterson, University of Arizona Matthew S. Peterson, University of Illinois at Urbana-Champaign Marjorie A. Reed, Oregon State University Mari Riess Jones, Ohio State University Charles T. Scialfa, University of Calgary Daniel J. Simons, Harvard University Charles Spence, University of Oxford David L. Stayer, University of Utah Tracy L. Taylor, Dalhousie University Jan Theeuwes, Vrije Universiteit
This Page Intentionally Left Blank
Part I Neuroscience
This Page Intentionally Left Blank
Attraction, Distraction, and Action: Multiple Perspectives on Attentional Capture C. Folk and B. Gibson (Editors) 9 ElsevierScienceB. V. All rights reserved.
Electrophysiological Studies of Reflexive Attention Joseph B. Hopfinger and George R. Mangun
Models of human cognition hold that information processing occurs in a series of stages. Cognitive psychology, in particular, is concerned with the internal mental processes that begin with the appearance of an external stimulus and result in a behavioral response. An enduring question has focused on determining the stage or stages of information processing at which attention might have an influence. Measures of overt behavior have long been used to make inferences about the internal mental mechanisms of attention. Increasingly though, physiological measures of human brain activity have been used to provide direct measures of discrete stages of information processing during attentional performance. In this chapter, we briefly review the event-related potential (ERP) approach to the study of attention, and present recent results utilizing this methodology in the study of reflexive attentional capture. These experiments have revealed that reflexive attention is able to influence multiple stages of information processing beginning at a relatively early stage of visual cortical analysis.
Background Tracking information processing in the brain: Electrophysiological methods The development of electrophysiological recording techniques dates back to the early 1930's, when Hans Berger and Herbert Jasper developed techniques that would later be used to directly examine the neural mechanisms of the human brain's attention systems (Jasper, 1935). By recording from electrodes placed on a human subject's scalp, they were able to measure small voltage fluctuations that reflected underlying neural activity. The recording of the ongoing voltage variations measured on the scalp is known as the electroencephalogram (EEG) and is now known to be primarily a measure of the post-synaptic (dendritic) potentials from populations of synchronously active and aligned neurons (see Nunez, 1981 for a more comprehensive discussion). Early electrophysiologists analyzed large rhythmic fluctuations in the EEG (e.g., "alpha" waves) that could index overall states of arousal (e.g., Jasper, 1935). Although the ongoing EEG can provide a measure of the subject's global brain state, it is not as well suited for identifying patterns of brain activity associated with specific types of stimulus processing or specific mental functions. This is due to the fact that the larger rhythmic potentials
4
Hopfinger and Mangun
of the ongoing EEG may be several times larger in amplitude than the relatively small fluctuations produced by neural activity supporting individual mental events. The EEG reflects processes occurring throughout the brain related to a host of mental activities, as well as voltage fluctuation that are not due to brain activity (e.g., "artifacts" generated by muscles on the head or neck). As a result, the neural activity generated by a specific mental event of interest can be difficult or impossible to observe in the ongoing EEG. The voltage fluctuations produced by particular events of interest can, however, be detected using signal averaging procedures. For example, the neural activity produced by a specific visual stimulus can be measured if the ongoing EEG is averaged over multiple occurrences of that specific visual event (Figure 1). Epochs of time surrounding the visual event of interest can be extracted from the EEG record and averaged together, after aligning the onset of the visual stimulus for each epoch. The voltage amplitude can then be averaged at each timepoint separately, resulting in a single event-related-potential (ERP) waveform. The ERP thus represents the response to a specific event, timelocked to the onset of that event. The averaging process effectively cancels out the electrical activity in the EEG that is not time-locked to the stimulus event of interest. This occurs because on average over many trials, the uncorrelated activity is just as likely to be of positive or negative polarity at any post-stimulus time point. Given a sufficient number of trials, the averaging process leaves only the activity evoked by the event of interest. Electroencephalogram (EEG)
=o,v r /
Signal Averager
[ NnO A
I I
I J
I I
I
o
r
,vL
Visual Event.Related (ERP)
!'
,
'
S1
S2
S3
.t..t..t.
I
' . .
S = onset of visual stimulus
.t, Sn
5o0 mssc
I
~Sz 9 ~ . e .
I
Sn ~t, ~
n
~ Sk k=l n
Potential
N2
_.J fl n
"201JV
"--r l/V -lt00
0
1~)0
2~0
,v
300
t OnsetofVisualstimulus
400 mlec
Figure 1. At left is shown an example of the scalp recorded electroencephalogram (EEG), recorded continuously while the event of interest (in this case a visual stimulus: S) is presented multiple times. Epochs of the EEG surrounding the onset of the visual event are extracted, aligned according to the onset time of the event of interest, and then averaged point by point (middle column). The resulting average is referred to as the Event-Related Potential (ERP; right column). Note that the amplitude of the ERP is much less than that of the EEG, a typical situation that necessitates the averaging procedure.
A canonical ERP waveform consists of a series of voltage fluctuations, representing positive and negative potentials generated by the event of interest. As shown in Figure 1, the voltage fluctuations are typically labeled according to: (1) polarity ("P"ositive, or "N"egative; note that the convention followed here plots positive voltages downward); and (2) order of occurrence (PI=I st major exclusively-
Electrophysiological Studies
5
positive component) or latency of occurrence (e.g., the peak of the NP80 component, which can be negative or positive, depending on the location of the visual stimulus, occurs at approximately 80 ms latency). The "prestimulus" period represents the activity time-locked to the event of interest that occurs before the stimulus appears. Since the event of interest has not yet occurred, under most circumstances, there should be no systematic activity before the onset of the event of interest. Therefore, this period can be used as a measure o f the effectiveness of the averaging procedure in eliminating activity that is not due to the event of interest. Using ERPs, it is possible to measure neural activity from the moment in time a stimulus is presented, through multiple levels of processing, up to and including response execution. ERP components can be related to hypothesized stages of mental processing (indicated schematically in Figure 2). Although much work remains to be done in order to understand the specific mental functions subserved by each particular ERP component, many of these components can at least be classified as underlying simple sensory processes or higher order cognitive processes. The ability to track mental processing in real time has proven very useful in helping to elucidate the stage(s) of processing that attention may act upon to modify mental processing.
Figure 2. Shown at the top are a few of the many hypothesized stages of information processing that intercede between the initial presentation of a physical stimulus and an eventual response to that stimulus. At bottom, an ERP waveform is shown approximately aligned with the hypothesized stages of processing. The ERP waveform shown here is only for illustration purposes - the components shown here are typically observed at different scalp sites; not all would be observed at the same scalp location. In this chapter, we will focus mainly upon the sensory P1 component, and on the P300 component that indexes post-sensory higher-order cognitive processing.
6
Hopfinger and Mangun
Early versus late selection
A classic debate in psychology has concerned the nature of our ability to filter out unwanted information. Specifically, the debate concerns the level of information processing at which relevant information is selected. One possibility is that this selection process occurs only just before a response must be made. This would be the extreme version of the "late-selection" argument that holds that all information received by the senses is fully processed to the level of semantic meaning (e.g., Deutsch & Deutsch, 1963). Accordingly, all sensory inputs would be completely processed, and selection would involve choosing to respond to one of several completely processed inputs (e.g., Allport et al., 1985). Alternatively, as suggested by "early-selection" theories, information may get filtered out well before it is ever processed to a level of semantic meaning. Broadbent (1958) argued that selective attention acts as a gate that allows only the desired information to proceed to higher-order processing, while keeping out all irrelevant information. Treisman (1960) argued along less extreme lines that attention acts to attenuate, rather than completely filter out, the processing of unattended inputs. Eason, Harter, and White (1969) used the ERP technique to show that alertness and attention could affect pre-decision level neuronal processing. Specifically they showed that attentional alertness could alter neural processing of a visual stimulus as quickly as 200 ms after the presentation of a stimulus. Van Voorhis and Hillyard (1977) showed that covert (in the absence of any overt eye movements) visual selective attention could enhance visual processing starting within about 100 ms after stimulus presentation. Further investigations have shown that the P 1, a positive deflection in the visual evoked ERP that peaks around 90-110 ms latency and is maximal at posterior occipital scalp sites, is the earliest visually evoked component to be reliably affected by spatial attention (e.g., Clark & Hillyard, 1996; Luck, Hillyard, Mouloua, Woldorff, Clark, & Hawkins, 1994; Mangun & Hillyard, 1988; 1990, 1991). The P 1 component is referred to as a visual sensory component, in that it is evoked by visual stimuli and is sensitive to physical features of the stimulus. Scalp current density mapping and dipole modeling of scalp recorded electrical activity in attention studies have suggested that these P1 attention effects, produced by voluntary spatial selective attention, are generated in lateral extrastriate cortex (Gomez Gonzalez, Clark, Fan, Luck, & Hillyard, 1994; Mangun, Hillyard, & Luck, 1993). Combined ERP and functional neuroimaging studies have provided further evidence that the P 1 is generated in the fusiform gyrus of extrastriate cortex in humans (Heinze et al., 1994; Mangun, Hopfinger, Kussmaul, Fletcher, & Heinze, 1997; Woldorff et al., 1997). Many investigations, using multiple disciplines, have thereby converged on the conclusion that voluntary attention can affect neural processing at relatively early levels. However, there are components of the visual ERP that occur earlier than the P1 that are not reliably modulated by selective voluntary attention. The NP80 component, thought to be generated by activity in the striate cortex (area V1), has not been found to be reliably affected by voluntary selective spatial attention (e.g., Clark & Hillyard,
Electrophysiological Studies
7
1996). Although some neuroimaging and non-human primate studies have provided evidence for attention-related modulations in the striate visual cortex (e.g., Worden & Schneider, 1996; Motter, 1993), a recent combined neuroimaging and ERP study found that the modulation of activity in striate cortex was related to processing that occurred after the NP80 component (Martinez et al., 1999). This result suggests that modulations of striate cortex happen via feedback pathways, after the initial sensory processing in that region (indexed by the NP80) has completed without being influenced by voluntary spatial attention. While previous research has thus been able to identify the precise stages of processing at which voluntary attention can and cannot affect visual processing, much less research has been devoted to understanding the stage(s) of processing affected by reflexive attentional capture. Finally, recent theories of attention suggest that attentional selection cannot adequately be described as either simply early or late (see Pashler, 1998 for a comprehensive discussion). For example, Lavie and Tsal (1994) provided evidence that task difficulty plays a significant role in determining whether behavioral measures show evidence for early or for late selection. Specifically, under high levels of perceptual load, voluntary attention has been shown to act as an early filter, as all available resources are consumed by the difficult task, and unattended information is not completely processed. Under low levels of perceptual load however, attentional resources exceed what is needed to perform the easy perceptual task, and attention may act only at a later stage of processing. Handy and Mangun (2000) recently demonstrated an enhancement of the P1 by voluntary attention under conditions of high perceptual load, and no modulation of the P1 under conditions of low perceptual load. Finally, Lavie (2000) has suggested that in addition to the perceptual difficulty of the task (perceptual load), cognitive load (e.g., working memory resources; task coordination) may also play a significant role in the control of attention. Reflexive versus voluntary attention
Despite the fact that both voluntary and reflexive attention influence the focus of our "mind's eye," evidence supports a strong distinction between these two attention systems. For instance, compared to voluntary attention, reflexive attention is engaged more rapidly, is more resistant to interference, and dissipates more quickly (e.g., Cheal & Lyon, 1991; Jonides, 1981; Mtiller & Rabbitt, 1989; Posner, Nissen, & Ogden, 1978). In addition, the effects of reflexive attention change significantly as time passes after an attention-capturing event (i.e., a non-predictive exogenous "cue"), whereas voluntary attention is more stable over time. Unlike voluntary attention, reflexive attention results in a biphasic effect on response times. Specifically, the initial facilitation that follows reflexive attentional capture is followed by a period in which items at the cued location are actually responded to more slowly (i.e., Inhibition of Return- IOR, Posner et al., 1985; Posner & Cohen, 1984). In addition, neuropsychological studies indicate that reflexive attention may
8
Hopfinger and Mangun
be controlled by partially or wholly separate neural mechanisms from those involved in voluntary attention (see Rafal, 1996, for review). While there has been an abundance of research into the neural mechanisms of voluntary attention (including work in both humans and non-human animals), relatively little is known about the neural consequences of reflexive attention. In part, this may reflect an implicit assumption that the reflexive system is somehow more basic than the voluntary system, and that voluntary attention works through the same mechanisms as reflexive attention, merely adding on higher-order control mechanisms. As Briand (1998) points out, however, reflexive and voluntary attention have been shown to have distinct properties and qualitative differences, which make such assumptions tenuous. For example, evidence suggests that reflexive attention performs a role in feature integration, while voluntary attention alone does not (Briand, 1998; Briand & Klein, 1987). Therefore, reflexive attention mechanisms cannot be completely understood on the basis of inferences drawn from the results of voluntary attention studies. The relative lack of neurophysiological studies of reflexive attention may also be due in part to the difficulty of attributing neural activity to specific events when those events (e.g., reflexive cue and target) occur very closely in time, as is typical in studies of reflexive attention. Specifically, two or more events occurring in a short period of time may produce partially overlapping pattems of neural activity recorded at the scalp (and in neuronal recordings in animals). Certain precautions therefore need to be taken to ensure that electrophysiological recordings will not be contaminated by the overlapping activities. If successive events are separated by only a brief interval, and if the interval is constant across all trials, it is not possible to completely differentiate the event-related activity from the two events. This is because any activity time-locked to the second event will also be time-locked to the first event, since the second event itself is perfectly time-locked to the first event. If, on the other hand, the interval between events can be varied over a range of time, then it may be possible, via signal analysis methods, to obtain distinct ERPs for both events. This procedure of randomly varying the interstimulus intervals can be quite effective if (1) the range of ISI variation is larger than the period of the slowest component of interest, and (2) if there is a sufficiently long interval between the events, such that the processing of one event finishes before the processing of the next begins. However, in studies of reflexive attention, this second requirement often cannot be met, due to the transient nature of reflexive attentional capture, which requires very short interstimulus intervals to be used. Therefore, even with a randomly varying interval, the average ERPs from both events still contain some overlapping activity, because the events generating them (i.e., reflexive cues and subsequent targets), are only a few tens of milliseconds offset from one another. Within the past decade, however, advances in signal processing techniques have provided scientists with the tools (e.g., the adjacent response filter of Woldorff, 1993) to dissociate overlapping pattems of brain activity, allowing the investigation of neural activity related to specific events occurring with short interstimulus intervals. As described previously (Woldorff, 1993), this procedure
Electrophysiological Studies
9
estimates overlapping activity by convolving the recorded ERP waveforms with the actual distribution of the interstimulus intervals. For example, the overlap from the cue processing onto the target ERP can be estimated by convolving the cue ERP with the event distribution specifying the interstimulus intervals at which it preceded the targets. This estimate of overlap from the cue may then be subtracted from the recorded ERP to the target, providing a better estimate of the target related activity. This better estimate of the target activity can then be convolved with the interstimulus interval distribution to provide an estimate of overlap from the target onto the preceding cue ERP. This procedure is then iterated using the new estimates of the cue and target waveforms until a stable solution is arrived at (when successive iterations no longer produce any differences in the estimates). The following experiments investigated the effects that reflexive attentional capture has on subsequent visual processing: specifically, whether reflexive attention can modulate neural processing within early sensory processing stages, as early as does voluntary attention. These experiments examined processing at both short interstimulus intervals and at longer interstimulus intervals to investigate the early facilitatory effects of reflexive attention as well as the later inhibitory effect (IOR). Finally, across the experiments, we were able to examine the effects of reflexive attention within different tasks in order to examine whether simple task changes would affect the "automatic" effects of reflexive attention. The Effects of Reflexive Attentional Capture on Visual Processing: ERP Studies Part 1: Reflexive attention in a difficult discrimination task
Recently, we investigated the effects of reflexively oriented attention on visual processing by measuring neural activity in human subjects using the ERP method (Hopfinger & Mangun, 1998). This study used a paradigm that was known to produce a reflexive shift of attention, in order to investigate the effects that attentional capture has on the processing of subsequent visual events (i.e., events occurring after attention has been captured by a brief visual transient - the reflexive cue). Similar to the early versus late selection debate discussed above, this study was motivated in part by the question of whether reflexive attention would be able to affect visual processing as early as does voluntary attention. ERPs were recorded from human subjects while they performed a discrimination task in which non-predictive "cues" preceded each target stimulus. Subjects maintained fixation upon a centrally located cross on a computer monitor throughout all trials (see Figure 3). On either side of fixation, four small white dots demarcated the comers of an imaginary rectangle 1.03 degrees wide and 1.37 degrees tall. The center of each imaginary rectangle was located 1.5 degrees above and 6.4 degrees lateral to fixation. The beginning of each trial commenced with the four dots on one side of fixation (equally probable on the left or right of fixation)
10
Hopfinger and Mangun
Figure 3. Discrimination Experiment. Example of stimulus display showing a trial with a target occurring at a cued location (left column) and a trial where this target is occurring at an uncued location (fight column). The "cue" was a 34-msec offset and then re-appearance of the 4 dots on one side of fixation. The cue-to-target inter-stimulus-interval (ISI) was randomly varied over a short (34-234 msec) or long (566-766 msec) interval. The target was a vertical bar presented for 50 msec. The participants' task was to judge whether the bar was the "tall" or "short" bar, and press the appropriate button as quickly as possible.
being extinguished for 34 ms and then reappearing, giving the subjective impression of a blinking of one set of dots. This "cue" was used in order to minimize neuronal
Electrophysiological Studies
11
refractory effects and overlap from the cue ERP onto the target-evoked ERP, while still producing an effective sensory cue. Subjects were informed that the cue would be completely non-predictive of the location of the subsequent target, as described below. After a variable interval (ranging randomly from either 34-234 or 566-766 ms; rectangular distribution within each range), a vertical target bar was flashed to one side of fixation, centered between the dots on that side. The location of the target bar was equally probable on the right or left of fixation and was equally likely to be at the same versus the opposite hemifield location as the preceding cue. The target remained on the screen for 50 ms, and was either a short (1.8 deg by .69 deg) or tall (2.3 deg by .69 deg) vertical bar. Subjects performed a height discrimination judgment in which they were required to rapidly press one button for short bars or a different button for tall bars. The intertrial interval was varied randomly between 1500 - 2000 ms, and each block consisted of 40 trials wherein short (34-234) and long (566-766 ms) cue-to-target intervals (interstimulus interval, ISI) were randomly intermixed. Each block was 90-100 seconds long and each subject performed 80 blocks, 40 on each of 2 separate testing days. Catch trials, during which no target appeared, accounted for 20% of the trials and were included in order to reduce the likelihood of subjects forming temporal expectancies and to prevent anticipatory responses. Data from 8 healthy, right-handed, volunteers (4 female), ages 18-30, with normal or corrected-to-normal vision, were analyzed. Although the cue was subtle and did not overlap on the retina with the target, the scalp-recorded neural responses to the cue still overlapped with the responses recorded to the target stimulus, especially at the shortest cue-to-target intervals. In order to eliminate the possibility that any differences in early ERP components might be due to overlapping neural activity produced by the cues, the adjacent response (Adjar) filter method (Woldorff, 1993) was employed to remove confounding potentials generated by the lateralized cues. As described briefly earlier, this procedure iteratively estimates and subtracts the overlap from adjacent events (i.e., cue and target) until the estimates of the cue and target overlap do not change over successive iterations, at which point overlap is considered to have been removed from the original waveforms (see Hopfinger & Mangun, 1998, for more details on how this procedure was applied to the present data). Physiological measures were gathered by recording from 64 electrodes distributed over the scalp of each volunteer. In agreement with prior reaction time (RT) studies using non-predictive peripheral visual transients (Jonides, 1981; Miller, 1989; Mtiller & Rabbitt, 1989; Theeuwes, 1991), subjects in the present experiment responded reliably faster to targets at the cued location versus the uncued location (517 ms versus 533 ms, respectively) for the short cue-to-target intervals (main effect of cueing F(1,7)=24.41, p<.01; ANOVA factors were Cueing (cued versus uncued location targets), visual field of target (right versus left hemifield), and Subjects (N=8)). At the long ISis, there were, however, no differences in reaction times between targets at cued and uncued locations (546 versus 544 ms). Hence, classical inhibition of retum (IOR), wherein RTs are typically slowed at cued locations at long ISis
12
Hopfinger and Mangun
(Posner & Cohen, 1984), was not observed, a result we attribute to the use of a discrimination task, known to reduce the likelihood of RT inhibition (e.g., Pratt, 1995; Terry, Valdes, & Neill, 1994). No other significant main effects or interactions were found in the RT data. Effects on perceptual level processing
When the cue-to-target ISI was short (34-234 ms), targets at the cued location elicited visual P 1 ERP components with significantly enhanced amplitudes compared to targets at an uncued location (F(1,7) = 15.15, p<.01; Figure 4, top left). For the ANOVA analysis of ERP data in the 90-140 ms latency range (corresponding to the P1 component), the following factors were included in addition to those used above: Electrode locations (medial versus lateral scalp locations), and hemisphere of electrodes (right versus left scalp locations). The occipital electrodes included in the analyses were T5, T6, OL, and OR. OL and OR are located midway between T5 and O1, and T6 and 02, respectively, of the International 10-20 system of electrode placement (Jasper, 1958). At the longer cueto-target ISis (566-766 ms), however, the effect of the cues on the P1 component was reversed (Figure 4, bottom left) -- targets at cued-locations now elicited significantly smaller responses than targets at uncued-locations (F(1,7)=13.68, p<.01). This reduction in P 1 amplitude at long ISis cannot be attributed to a simple neuronal refractory effect between cue and target because at short ISis, when such an effect would be greatest, the pattern is opposite to that predicted by neuronal refractoriness. By investigating topographic voltage maps during the time period of the P 1 component, one can observe that the location of the maximal response at the scalp corresponding to the P1 component was highly similar for cued versus uncuedlocation targets (Figure 5). This pattern is consistent with the view that the same type of neural process was evoked in both cases, with the primary difference being the strength of the response. The effect of reflexive spatial attention on the amplitude of the P 1 component occurred with little or no change in the waveshape, latency, or scalp distribution of this ERP component, suggesting that reflexive attention modulates the activity level of the sensory P1 generators via a selective sensory gating or "gain control" of information processing through ascending visual pathways (e.g., Eason, 1981). As mentioned above, the extrastriate-generated P1 component represents the earliest stage of visual processing to be reliably modulated by voluntary spatial attention (e.g., Heinze et al., 1994; Mangun, 1995; Mangun & Hillyard, 1991). Our findings thus indicated that reflexive attention leads to modulations at this same stage of visual cortical processing, although presumably, partially or wholly distinct control circuitry is involved in producing these two attention effects (Kustov & Robinson, 1996; Rafal, 1996; Robinson & Kertzman, 1995).
Electrophysiological Studies
13
Figure 4. Discrimination Experiment. Event-related potentials (ERPs) to target bar stimuli, collapsed over contralateral scalp sites (data from the left hemisphere for right visual field targets combined with data from the right hemisphere for left visual field targets). The scalp location of electrodes OL/OR (left column) and Pz (right column), are indicated in Figures 5 and 6, respectively. Cued-location target ERPs are indicated by solid lines; uncued-location target ERPs are represented by dashed lines. Top: Shaded gray areas highlight the significant effects that reflexive attention had on the contralateral P1 component (left column) and the P300 component (right column) at the short cue-totarget ISis. Bottom: Shaded gray area highlights the effect that reflexive attention had on the contralateral P 1 component (left column) at the long cue-to-target ISis. There was no significant difference in the P300 at the long ISis (right column).
14
Hopfinger and Mangun
Figure 5. Discrimination Experiment. Scalp topographic voltage maps of the time period corresponding to the peak of the P1 component (110-120 msec), collapsed over contralateral and ipsilateral scalp sites and shown from a back view of the head. The left scalp hemisphere of each map represents the ipsilateral hemisphere (data from the left hemisphere for left visual field targets combined with data from the fight hemisphere for fight visual field targets), while the fight scalp hemisphere of each map represents the contralateral hemisphere (data from the left hemisphere for fight visual field targets combined with data from the fight hemisphere for left visual field targets). The small black dots on each topographic map indicate the location of the electrodes, and all maps are referenced to the fight mastoid. Top: At the short ISis, the cued-location targets (left map) produced a significantly enhanced P1 component relative to uncued-location targets (fight map). The distribution of activity across the scalp for the contralateral P1 is similar for cued versus uncued location targets over contralateral occipital scalp sites, but the amplitude of the P 1 component is significantly larger for cued-location targets compared to uncued-loeation targets at these short ISis. Bottom: At the long ISis, the cued location targets (left map) produced a significantly reduced P 1 component relative to uncued-location targets (fight map). Again, the distribution of activity was very similar, with the main difference being the strength of the P 1.
Electrophysiological Studies
15
Effects on cognitive level processing
In order to track the fates of signals for cued and uncued targets, we evaluated longer-latency ERP components known to reflect higher-order aspects of target processing. One such component of the ERP that has been used in conjunction with RTs to examine human information processing is the P300 component (latency 250-500 ms, maximal over central and central-parietal scalp sites) (e.g., Duncan-Johnson & Donchin, 1982). The amplitude of the P300 is not directly tied to RTs, but instead indexes aspects of information processing such as expectancy and perceived stimulus relevance. The P300 is typically larger to infrequent, unexpected, stimuli (Donchin, 1981). For the analyses of the ERP data in the 250-500 ms latency range (corresponding to the P300 component), ANOVA factors were those listed above for the analysis of the P 1, except that only midline scalp electrodes were analyzed here (Cz and Pz), and thus the ANOVA factor of hemisphere of recording was not included. In the present study, the P300 was enlarged to cued-location targets, but only at the short cue-to-target ISis (F(1,7)=31.41, p<.001; Figure 4 upper right and Figure 6 top). The P300 to cued and uncued targets did not differ at long ISis (Figure 4 lower right and Figure 6 bottom). The P300 component of the ERP has been shown to be an index of a subject's cognitive response to a stimulus (e.g., Donchin, 1981). One factor that affects the amplitude of the P300 component is stimulus frequency (e.g., DuncanJohnson & Donchin, 1982). Specifically, less frequent stimuli elicit larger amplitude P300s. In the present study however, stimuli were equally frequent at the cuedlocation and at the uncued-location, and yet cued-location target stimuli elicited larger P300s than uncued location target stimuli at the short ISis. Therefore, the observed difference is likely due to a different factor that affects the amplitude of the P300. The P300 is also sensitive to the significance, or perceived importance, of the stimulus (Johnson, 1988). For example, when monetary payoffs are manipulated, stimuli associated with high risk elicit larger P300s than do stimuli with low risk (e.g., Johnston, 1979; Tueting & Sutton, 1973). In addition, in comparison to neutral stimuli, stimuli rated as strongly positive or negative elicit larger P300s (Johnston, Burleson, & Miller, 1987). The enhancement of the P300 found in this experiment, therefore, may be due to a difference in the perceived relevance of the stimuli. Since both cued and uncued location stimuli were known by the subjects to be equally important to the task, it is likely that the difference was generated by automatic mechanisms. Specifically, reflexive attention may act to tag the cued-location as being of higher value. There were no differences in the P300 at the long ISis, however, suggesting that this tagging of the cued location is transient, and affects processing for only a short period of time after the attention-capturing event.
16
Hopfinger and Mangun
Figure 6. Discrimination Experiment. Scalp topographic voltage maps of the time period corresponding to the peak of the P300 component (250-300 msec), collapsed over contralateral and ipsilateral scalp sites and shown from a back view of the head. The left scalp hemisphere of each map represents the ipsilateral hemisphere (data from the left hemisphere for left visual field targets combined with data from the right hemisphere for right visual field targets), while the right scalp hemisphere of each map represents the contralateral hemisphere (data from the left hemisphere for right visual field targets combined with data from the right hemisphere for left visual field targets). The distribution of activity across the scalp is similar for cued and uncued location targets, but the amplitude of the P300 is significantly larger for cued-location targets compared to uncued-location targets at short ISis (top). There was no significant difference in the P300 amplitude at the long ISis (bottom).
Electrophysiological Studies
17
An alternate account of these findings is that the P300 effect was simply a result of the enhanced sensory processing (the P 1 effect). There is evidence that the P300 may increase as a function of loudness under certain conditions (Johnson & Donchin, 1978), and this may extend to intense stimuli in other modalities as well. However, this view would not account for the pattern of data collected here. If the P300 amplitudes in this experiment were simply a function of earlier sensory processing, then the P1 and P300 effects should have covaried. In this experiment, however, there was a significant reduction of the P 1 at the long ISis for cuedlocation targets compared to uncued-location targets, but there was no difference in the P300 component at those ISis. Furthermore, in a review of the properties that affect P300 amplitude, Johnson (1988) suggested that the observed effects of stimulus intensity on the P300 were likely due to the ability of intense stimuli to summon attention to "high-value" (i.e., potentially important) stimuli, rather than being due to a simple linear relation between intensity and P300 amplitude. Together with the fact that the P1 and P300 effects did not covary across the conditions in our study, this suggests that the P300 effects we observed were not simply a result of the P 1 effects. Although more research needs to be done to completely understand the cognitive process(es) underlying the P300, the present results do indicate a key function of reflexive orienting. At short ISis, reflexive attention not only facilitates processing in sensory cortex (i.e., the P 1 modulation), but also leads cued-location stimuli to be treated differently at higher stages of stimulus evaluation. Additionally, because there was no P300 difference observed at long ISis, it is possible to conclude that subjects in the present task were not invoking voluntary orienting toward the task-irrelevant cue. If they had done so, a difference in P300 amplitude would have been expected at the long ISis as well. These findings suggest that very shortly after a sensory stimulus, reflexive orienting results in the stimulated location being briefly tagged as being more relevant than other locations in the environment. Part 2: Reflexive attention in a simple detection task
As mentioned above, recent studies have shown that task demands can influence the P1 attention effects of voluntary attention (Eimer, 1994; Handy & Mangun, 2000). If the P1 attention effect we observed previously (Hopfinger & Mangun, 1998) was truly due to automatic reflexive attention mechanisms, then it should appear regardless of the task demands. Specifically, our initial experiment utilized a difficult discrimination task. Behavioral studies of reflexive attention have shown that performing a discrimination task can produce different patterns of behavioral effects than when performing simple detection tasks with the identical stimuli (Danziger & Kingstone, 1999; Klein & Taylor, 1994). Such results suggest an interaction between reflexive attention and task parameters. Indeed, inhibition of return has typically been more difficult to obtain in discrimination tasks compared to detection tasks (see Klein, 2000 for review). In order to investigate these issues,
18
Hopfinger and Mangun
and to determine if the effects we previously obtained were truly automatic reflexive attention effects, we recently performed a study (Hopfinger & Mangun, 2001) in which participants performed a simple detection task with the same stimuli we used previously. All stimuli were identical to our earlier study (Hopfinger & Mangun, 1998), but the task was changed to a simple detection task in which subjects were required to rapidly press a button with their index finger as soon as the bar was detected (the size of the bar was task-irrelevant in the present study). Subjects were informed that the cue would be completely non-predictive of the location of the subsequent target, and were instructed not to attend voluntarily to either location. Subjects performed the same number of trials as in the original experiment. All recording and analysis procedures were identical to those described above (Hopfinger & Mangun, 1998). Data from 8 subjects (2 female; ages 19-28) were individually filtered with the Adjar algorithm. The only difference in the statistical analyses was that in this experiment the P300 occurred earlier than in our previous study, and therefore the latency range of 200-400 ms was used to measure this component. Subjects in the present experiment were significantly faster in responding to targets at the cued location compared to the uncued location (282 ms vs. 290 ms; p<.05) at short cue-to-target ISis (F(1,7)=7.63, p<.05). At short ISis, the P1 component was significantly enhanced for stimuli occurring at the same location as the previous non-predictive cue compared to stimuli at the uncued location (0.79~tV vs. 0.31~tV; F(1,7)=7.92, p<.05; Figure 7 upper left). The P300 component to targets was also enhanced for cued-location targets compared to uncued-location targets (2.41 ~tV vs. 1.69~tV, F(1,7)=44.54, p<.001; Figure 7 upper right). The present results strengthen our claim that these enhancements of processing are due to reflexive attention, not task related mechanisms. At longer cue-to-target ISis (566-766 ms), RTs were slower at the cued location than at the uncued location (290 ms vs. 277 ms; F(1,7)=8.37, p<.05). If subjects had voluntarily attended to the cued location, RTs should have been faster at the cued location at the longer ISis as well. Thus, this demonstration of IOR in the present study is relevant for our contention that voluntary attention was not engaged. The P 1 enhancement for cued-location targets, seen at the short ISis, was no longer observed at the longer cue-to-target intervals of 600-800 ms (Figure 7, bottom left), providing evidence that the ability of reflexive attention to enhance processing at early levels of sensory processing is a transient effect. Cued-location targets tended to have a smaller P 1 component compared to uncued-location targets, although no significant difference was found for the amplitude of the P 1 component at the long ISis (0.79 ~tV vs. 0.92 ~tV, F(1,7) = 1.36, p>.2). In addition, there was no difference in the amplitude of the P300 component at the long ISis, although there was a tendency for cued location targets to have a larger P300 than uncued-location targets (3.02~tV vs. 2.76~tV, F(1,7)=3.16, p>. 1; Figure 7, lower right).
Electrophysiological Studies
19
Figure 7. Detection Experiment. Event-related potentials (ERPs) to target bar stimuli at the short cue-to-target ISis, collapsed over contralateral scalp sites (data from the left hemisphere for fight visual field targets combined with data from the right hemisphere for left visual field targets). Top: Shaded gray areas highlight the effects that reflexive attention had on the contralateral P1 component (left column) and the P300 component (right column). At the short cue-to-target ISis, cued-location targets (solid lines) elicited significantly enhanced P 1 and P300 components compared to uncued-location targets (dashed lines). Bottom: At the long cue-to-target ISis, there were no significant differences between cued- and uncued-location targets in either the P 1 (left column) or P300 components (riglat column).
20
Hopfinger and Mangun
Inhibition of Return: Inhibition of Perceptual Processing or Motor Programming? Although the facilitatory effects of reflexive orienting at short cue-to-target intervals appear to be stable phenomena across different task parameters, a different story emerges for the effects at longer ISis when inhibitory processes typically dominate. The slowing of response times at the cued location (IOR) has previously been attributed to an attentional mechanism (e.g., Posner et al., 1985). It has been suggested that such a mechanism may be vital for efficiently scanning the environment, as it would prevent attention from becoming fixed upon or repeatedly retuming to the same location. Klein (1988) initially found empirical evidence that IOR reflects a mechanism that inhibits attention from retuming to a previously cued-location. Later, however, Klein and Taylor (1994) suggested that IOR may instead reflect the inhibition of motor responses, and may not be a strictly attentional phenomenon. The motor-theory account of IOR suggests that the cue normally activates a motor response to the cued location, but that subjects inhibit the execution of that response. When a target stimulus subsequently appears at the cued location, the inhibition generated to the cue is still active, resulting in IOR to the target. An attentional account of IOR predicts that IOR would occur in both location decision and identity decision tasks, since attention would be inhibited from returning to the cued-location regardless of the task. The motor account, on the other hand, predicts that IOR would occur only for a localization task, since that task would require the execution of a similar motor program as that which had just been inhibited in response to the cue (see Pratt, Kingstone, & Khoe, 1997). According to the motor-theory, IOR would not develop for an identification task, since that response would in no way be spatially directed. Pratt, Kingstone, and Khoe (1997) conducted a study in which subjects either responded to the location of a target or to the identity of a target, and they found that IOR was observed in both tasks, providing support for the attentional account. Kingstone and Pratt (1999) replicated the finding that IOR occurs for tasks that do not require a spatially directed response, and additionally they found that the IOR effect increased when subjects were required to localize the target stimulus with an eye movement, relative to when the response was a manual one. This relation between IOR and eye movements is consistent with previous findings that IOR can be produced by the programming of eye movements (Rafal, Calabresi, Brennan, & Sciolto, 1989), even in the absence of any peripheral cue or target (Abrams & Dobkin, 1994). These results suggest that IOR has an oculomotor component, as well as an attentional component. Recently, Handy, Jha, and Mangun (1999) found a reduction in the accuracy of target discrimination at long ISis in a reflexive cuing task, providing evidence that reflexive attention affects perceptual level processes. Our previous results, along with other recent ERP studies (McDonald, Ward, & Kiehl, 1999), have suggested that the extrastrlate visual cortex may be a locus for inhibited
Electrophysiological Studies
21
perceptual processing at long ISis. In our discrimination experiment (Hopfinger & Mangun, 1998), the P1 component was significantly reduced at cued-locations relative to uncued-locations at the long ISis. This IOR-like pattern in the P1 was suggestive of an early attentional locus for IOR. However, no difference was observed in RTs in that experiment. Subjects performing the detection task (Hopfinger & Mangun, 2001), showed significantly slower RTs at the cued location at the long cue-to-target intervals (i.e., the prototypical IOR phenomenon). However, there was no significant difference in the P1 component at long ISis in that study, although there was a tendency for the cued-location targets to have a smaller P1. Regardless of whether that non-significant tendency could be reliable with enough power, the most robust IOR and the most robust P 1 "inhibition" did not covary across our tasks. The most robust P1 "inhibition" occurred in the discrimination experiment, while the most robust behavioral IOR occurred in the detection experiment. Therefore, although there is some evidence that an IOR-like effect occurs at the level of early visual processing under some conditions, it appears that an early sensory modulation cannot fully explain IOR. Rather the IOR phenomenon may consist of a combination of sensory, as well as other motor and response-related factors (see also Kingstone & Pratt, 1999). The results from the present ERP experiments suggest a neural locus at which these perceptual-level effects may be manifest (see also McDonald et al., 1999). Although the results of these experiments support the view that reflexive attention mechanisms may produce IOR-like effects in early sensory processing, our results also suggest that IOR cannot be predicted solely on the basis of these early modulations. Conclusions
Our studies provide evidence that when an abrupt luminance change in the visual scene attracts attention reflexively to a spatial location, the result is an enhancement in the neural processing for subsequent visual stimuli occurring at the same location in space. These studies suggest that reflexive attention contributes to our ability to quickly evaluate new and potentially interesting stimuli by enhancing sensory processing. Our findings support the idea that visual information processing can be modulated as early as the extrastriate visual cortex in humans by means of reflexive attentional mechanisms. The enhancement of the P1 at the short cue-totarget ISis was observed in both difficult discrimination and simple detection tasks. Furthermore, in recent studies not reviewed here, we found enhancements of the P 1 component for stimuli that were completely task-irrelevant and for which no overt response was required (Hopfinger, Maxwell, & Mangun, 2000), suggesting that the early sensory enhancement is a strongly automatic process. These experiments have also suggested that reflexive attention does more than merely affect sensory processing, because we observed differential ERP responses at longer latencies as a function of reflexive cueing. Specifically, the P300 component, a reflection of higher-order stimulus evaluation processes, was found to be enhanced for items occurring at previously cued locations, but only for a brief period of time following
22
HopfingerandMangun
the non-predictive "cue" stimulus. In summary, the experiments reported here illuminate how bottom-up, visually-triggered reflexive attention mechanisms affect our perceptions of the visual world: They do so by modulating the neural responses to visual events within visual cortex and by altering the way these stimuli are processed at higher-level stages of analysis. References
Abrams, R. A. & Dobkin, R. S. (1994). Inhibition of retum: Effects of attentional cueing on eye movement latencies. Journal of Experimental Psychology: Human Perception and Performance, 20, 467-477. Allport, D. A., Tipper, S. P., & Chmiel, N. R. J. (1985). Perceptual integration and postcategorical filtering. In M. Posner & O. Marin (Eds.), Attention and Performance XI (pp. 107-132). Hillsdale, N.J.: Erlbaum Associates. Briand, K. A. (1998). Feature integration and spatial attention: More evidence of a dissociation between endogenous and exogenous orienting. Journal of Experimental Psychology: Human Perception and Performance, 24, 1243-1256. Briand, K. A. & Klein, R. M. (1987). Is Posner's beam the same as Treisman's "glue"?: On the relationship between visual orienting and feature integration theory. Journal of Experimental Psychology: Human Perception and Performance, 13, 228-241. Broadbent, D. E. (1958). Perception and communication. London: Pergamon Press. Cheal, M. L. & Lyon, D. R. (1991). Central and peripheral precuing of forced-choice discrimination. The Quarterly Journal of Experimental Psychology, 43A, 859-880. Clark, V. P. & Hillyard, S. A. (1996). Spatial selective attention affects early extrastriate but not striate components of the visual evoked potential. Journal of Cognitive Neuroscience, 8, 387-402. Danziger, S. & Kingstone, A. (1999). Unmasking the inhibition of return phenomenon. Perception & Psychophysics, 61, 1024-1037. Deutsch, J. A. & Deutsch, D. (1963). Attention: Some theoretical considerations. Psychological Review, 70, 80-90. Donchin, E. (1981). Surprise! ...Surprise?. Psychophysiology 18, 493-513. Duncan-Johnson, C. & Donchin, E. (1982). The P300 component of the event-related brain potential as an index of information processing. Biological Psychology, 14, 1-52. Eason, R. G. (1981). Visual evoked potential correlates of early neural filtering during selective attention. Bulletin of the Psychonomic Society, 4, 203-206. Eason, R. G., Harter, M. R., & White, C. T. (1969). Effects of attention and arousal on visually evoked cortical potentials and reaction time in man. Physiology and Behavior, 4, 283-289. Eimer, M. (1994). An ERP study on visual spatial priming with peripheral onsets. Psychophysiology, 31, 154-163.
Electrophysiological Studies
23
Gomez Gonzalez, C.M., Clark, V.P., Fan, S., Luck, S.J., Hillyard, S.A. (1994). Sources of attention-sensitive visual event-related potentials. Brain Topography, 7, 41-51. Handy, T. C., Jha, A. P., & Mangun, G. R. (1999). Promoting novelty in vision: Inhibition of return modulates perceptual-level processing. Psychological Science, 1O, 157-161. Handy, T. C. & Mangun, G. R. (2000). Attention and spatial selection: Electrophysiological evidence for modulation by perceptual load. Perception & Psychophysics, 62, 175-186. Heinze, H. J., Mangun, G. R., Burchert, W., Hinrichs, H., Sholz, M., Munte, T. F., Gos, A., Scherg, M., Johannes, S., Hundeshagen, H., Gazzaniga, M. S., & Hillyard, S. A. (1994). Combined spatial and temporal imaging of brain activity during selective attention in humans. Nature, 372, 543-546. Hopfinger, J. B. & Mangun, G. R. (1998). Reflexive attention modulates processing of visual stimuli in human extrastriate cortex. Psychological Science, 9, 441-447. Hopfinger, J. B. & Mangun, G. R. (2001). Tracking the influence of reflexive attention on sensory and cognitive processing. Cognitive, Affective, and Behavioral Neuroscience, 1, 56-65. Hopfinger, J. B., Maxwell, J., & Mangun, G. R. (2000, April). Reflexive attention captured by the irrelevant appearance or disappearance of visual objects modulates early visual processing. Poster presented at the seventh annual meeting of the Cognitive Neuroscience Society, San Francisco, CA. Jasper, H. (1958). The ten twenty electrode system of the International Federation. Electroencephalography and Clinical Neurophysiology, 1O, 371-375. Jasper, H. (1935). Electrical potentials from the intact human brain. Science, 81, 51-53. Johnson, R. & Donchin, E. (1978). On how the P300 amplitude varies with the utility of the eliciting stimuli. Electroencephalography and Clinical Neurophysiology, 44, 424-437. Johnson, R. (1988). The amplitude of the P300 component of the eventrelated potential: Review and synthesis. Advances in Psychophysiology, 3, 69-137. Johnston, V. S. (1979). Stimuli with biological significance. In H. Begleiter (Ed.), Evoked brain potentials and behavior (pp. 1-12). New York: Plenum Press. Johnston, V. S., Burleson, M. H., and Miller, D. R. (1987). Emotional value and late positive components of ERPs. In R. Johnson, Jr., J. W. Rohrbaugh, & R. Parasuraman (Eds.), Current trends in event-related potential research. Electroencephalography and Clinical Neurophysiology, Suppl. 40 (pp. 198-203). Amsterdam: Elsevier. Jonides, J. (1981). Voluntary versus automatic control over the mind' s eye movement. In J.B. Long & A.D. Baddeley (Eds.), Attention and performance IX (pp. 187-203). Hillsdale, N.J: Erlbaum. Kingstone, A. & Pratt, J. (1999). Inhibition of return is composed of attentional and oculomotor processes. Perception & Psychophysics, 61, 1046-1054.
24
Hopfingerand Mangun
Klein, R. (1988). Inhibitory tagging system facilitates visual search. Nature, 334, 430-431. Klein, R. M. (2000). Inhibition of return. Trends in Cognitive Science, 4,138-147. Klein, R. M. & Taylor, T. L. (1994). Categories of cognitive inhibition with reference to attention. In D. Dagenbach & T. H. Carr (Eds.), Inhibitory processes in attention, memory, and language (pp. 113-150). San Diego: Academic Press. Kustov, A. A. & Robinson, D. L. (1996). Shared neural control of attentional shifts and eye movements. Nature, 384, 74-77. Lavie, N. (2000). Selective attention and cognitive control: Dissociating attentional functions through different types of load. In J. Driver & S. Monsell (Eds.), Attention and Performance XVIII: The control over cognitive processes (pp. 175-194). Oxford: Oxford University Press. Lavie, N. & Tsal, Y. (1994). Perceptual load as a major determinant of the locus of selection in visual attention. Perception & Psychophysics, 56, 183-197. Luck, S. J., Hillyard, S. A., Mouloua, M., Woldorff, M. G., Clark, V. P., & Hawkins, H. L. (1994). Effects of spatial cuing on luminance detectability: Psychophysical and electrophysiological evidence for early selection. Journal of Experimental Psychology: Human Perception and Performance, 4, 887-904. Mangun, G. R. (1995). Neural mechanisms of visual selective attention. Psychophysiology, 32, 4-18. Mangun, G.R., & Hillyard, S.A. (1988). Spatial gradients of visual attention: Behavioral and electrophysiological evidence. Eleetroencephalography and Clinical Neurophysiology, 70, 417-428. Mangun, G.R. & Hillyard, S.A. (1990). Allocation of visual attention to spatial locations: Tradeoff functions for event-related brain potentials and detection performance. Perception & Psychophysics, 47, 532-550. Mangun, G. R., & Hillyard, S. A. (1991). Modulations of sensory-evoked brain potentials indicate changes in perceptual processing during visual-spatial priming. Journal of Experimental Psychology: Human Perception and Performance, 17, 1057-1074. Mangun, G.R., S.A. Hillyard and S.J. Luck (1993). Electrocortical substrates of visual selective attention. In D. Meyer & S. Kornblum (Eds.), Attention and Performance XIV (pp. 219-243). MIT Press: Cambridge, MA. Mangun, G. R., Hopfinger, J. B., Kussmaul, C. L., Fletcher, E., & Heinze, H. J. (1997). Covariations in ERP and PET measures of spatial selective attention in human extrastriate visual cortex. Human Brain Mapping, 5, 273-279. Martinez, A., Anllo-Vento, L., Sereno, M. I., Frank, L. R., Buxton, R. B., Dubowitz, D. J., Wong, E. C., Heinze, H. J., & Hillyard, S. A. (1999). Involvement of striate and extrastriate visual cortical areas in spatial selective attention. Nature Neuroscience, 2, 364-369.
Electrophysiological Studies
25
McDonald, J. J., Ward, L. M., & Kiehl, K. A. (1999). An event-related brain potential study of inhibition of return. Perception & Psychophysics 61,14111423. Miller, J. (1989). The control of attention by abrupt visual onsets and offsets. Perception & Psychophysics, 45, 567-571. Motter, B. C. (1993). Focal attention produces spatially selective processing in visual cortical areas V1, V2, and V4 in the presence of competing stimuli. Journal of Neurophysiology, 70, 909-919. Mtiller, H. J. & Rabbitt, P.M. (1989). Reflexive and voluntary orienting of attention: Time course of activation and resistance to interruption. Journal of Experimental Psychology: Human Perception and Performance, 15, 315-330. Nunez, P. L. (1981). Electrical fields of the brain: The neurophysics of EEG. New York: Oxford University Press. Pashler, H. (1998). The psychology of attention. Cambridge, MA: MIT Press. Posner, M. I. & Cohen, Y. (1984). Components of visual orienting. In H. Bouma & D. G. Bouwhuis (Eds.), Attention and Performance X (pp. 531-556). Hillsdale, N.J.: Erlbaum Associates. Posner, M. I., Nissen, M. J., & Ogden, W. C. (1978). Attended and unattended processing models: The role of set for spatial locations. In H. L. Pick & F. J. Saltzman (Eds.), Modes of perceiving and processing information (pp. 137157). Hillsdale, N. J.: Erlbaum. Posner, M. I., Rafal, R. D., Choate, L. S., & Vaughan, J. (1985). Inhibition of return: Neural basis and function. Cognitive Neuropsychology, 2, 211-228. Pratt, J. (1995). Inhibition of return in a discrimination task. Psychonomic Bulletin & Review, 2, 117-120. Pratt, J., Kingstone, A., & Khoe, W. (1997). Motor-based versus attentionbased theories of inhibition of return. Perception & Psychophysics, 59, 964-971. Rafal, R. (1996). Visual attention: Converging operations from neurology and psychology. In: A. F. Kramer, M. G. H. Coles, & G. D. Logan (Eds.), Converging operations in the study of visual selective attention (pp. 139-192). Washington, DC: American Psychological Association. Rafal, R. D., Calabresi, P. A., Brennan, C. W., & Sciolto, T. K. (1989). Saccade preparation inhibits reorienting to recently attended locations. Journal of Experimental Psychology: Human Perception and Performance, 15, 673-685. Robinson, D. E. & Kertzman, C. (1995). Covert orienting of attention in macaques. III. Contributions of the superior colliculus. Journal of Neurophysiology, 74, 713-721. Terry, K. M., Valdes, L. A., & Neill, W. T. (1994). Does "inhibition of return" occur in discrimination tasks? Perception & Psychophysics, 55, 279-286. Theeuwes, J. (1991). Exogenous and endogenous control of attention: The effect of visual onsets and offsets. Perception & Psychophysics, 49, 83-90. Treisman, A. M. (1960). Contextual cues in selective listening. Quarterly Journal of Experimental Psychology, 12, 242-248.
26
Hopfingerand Mangun
Tueting, P., & Sutton, S. (1973). The relationship between prestimulus negative shifts and post-stimulus components of the average evoked potential. In S. Komblum (Ed.), Attention and performance IV (pp.185-208). New York: Academic Press. Van Voorhis, S. & Hillyard, S. A. (1977). Visual evoked potentials and selective attention to points in space. Perception & Psychophysics, 22, 54-62. Woldorff, M. G. (1993). Distortion of ERP averages due to overlap from temporally adjacent ERPs: Analysis and correction. Psychophysiology, 30, 98-119. Woldorff, M. G., Fox, P., Matzke, M., Lancaster, J., Veeraswamy, S., Zamarripa, F., Seabolt, M., Glass, T., Gao, J., Martin, C., & Jerabeck, P. (1997). Retinotopic organization of the early visual-spatial attention effects as revealed by PET and ERPs. Human Brain Mapping, 5, 280-286. Worden, M. & Schneider, W. (1996). Society for Neuroscience Abstracts, 22, 1856. Authors' Notes
This research was supported by funding from the NSF, the NIMH, the Human Frontier Science Program and the Army Research Office. We would like to thank Dr. M. Woldorff for consultation on portions of the analyses, Jeff Maxwell for assistance in data collection, and Toby Mordkoff and Jennifer Schaaf for helpful suggestions on earlier versions of the manuscript.
Attraction, Distraction, and Action: Multiple Perspectives on Attentionai Capture C. Folk and B. Gibson (Editors) @2001 Elsevier Science B. V. All rights reserved.
27
Inhibition of Return in Monkey and Man Raymond M. Klein, Douglas P. Munoz, Michael. C. Dorris, and Tracy L. Taylor
Introduction
The capture of attention by the objects of our experience can be explored at a phenomenological, behavioral or neural level of analysis. Although, alone, each of these approaches is limited, some of these limitations can be overcome by collecting data and developing explanatory frameworks that link the different levels of analysis (Teller, 1990). We are excited by the fruitfulness of an interdisciplinary approach in which behavioral/cognitive analysis is combined with the collection of neuroscientific evidence that is aimed at determining how the mechanisms which affect behavior are implemented in the nervous system (cf Trappenberg et al., 2001). In this chapter we will describe a collaborative effort to explore just one mechanism pertinent to the orienting of visual attention, inhibition of return (IOR, see Klein, 2000, for a review). We hope that our interdisciplinary strategy and initial successes will serve as a model or blueprint for similar collaborations on other topics covered in this volume. Valuable information for a cognitive-neuroscientific understanding of any mind-brain relation can be provided by studying the breakdowns in human behavior that follow from damage due to insult, stroke or disease. The same could be said for modem neuroimaging methods which can provide temporally (magnetic encephalography; event-related potentials) or spatially (functional magnetic resonance imaging) precise information about the net activity level of many neurons in a brain region. Valuable information about IOR has been provided using both of these approaches (Hopfinger & Mangun, 1998; McDonald, Ward, & Keihl, 1999; Posner et al., 1985; Sapir, Soroker, Berger, & Henik, 1999). We believe that for a cognitive analysis aimed at how cognitive computations are implemented in the neural machinery (Marr, 1982) an analysis of the activity level of single neurons is also essential. Whereas the technique of single unit recording in the alert nonhuman primate is well-matched for this purpose, its implementation and connection with studies of human performance requires a task that both humans and monkeys can perform and whose mechanism of interest is manifest and similar in both species.
28
Klein, Munoz, Dorris, and Taylor
Model task: IOR in a cue-target paradigm
Although it has since been explored using more complex displays and task requirements, IOR was first observed using a highly simplified, "model" task. In Posner and Cohen's (1984) initial demonstration, the observer was presented with a fixation display consisting of 3 horizontally arranged boxes. One of the peripheral boxes brightened briefly and, at varying intervals after the onset of this "cue", a target was presented. Observers made simple manual detection responses to peripheral targets that appeared within the left or right box with equal probability (i.e., regardless of the location of the cue). As had been observed in previous studies, at short cue-target intervals, targets were detected more rapidly at the cued than at the uncued location. This finding is typically interpreted as a processing benefit due to the reflexive allocation of attention to the luminance increase (the cue) in the periphery. 1 Posner and Cohen used two methods to encourage observers to remove their attention from the cued location. Either there was a brightening of the fixation box (shortly after the onset of the peripheral cue) or central targets were presented with a high probability (60%, compared to 10% on each side). Because peripheral targets were equally likely to occur at the cued and uncued locations, one might naturally expect that after the initial facilitatory effects of the cue had waned, performance at the cued and uncued locations would become similar. Instead, beginning about 250 ms after cue onset, there was a cross-over interaction wherein performance became worse at the cued location for the remaining intervals (up to 500 ms in this experiment). This inhibitory aftereffect of the cue was subsequently named "inhibition of return" (Posner et al., 1985) and since Posner's seminal experiments we have learned a great deal about the timecourse, spatial distribution, causes, effects and development of IOR (for reviews, see Hood, Atkinson, & Braddick, 1998; Klein, 2000; Taylor & Klein, 1998). The activation of oculomotor activity, in the form of a saccadic program that is executed or merely planned, seems to be a pre-requisite for IOR. Typically, a peripheral event is used to cause IOR, and such an event will activate a tendency to foveate it (whether or not a saccade is executed). It was Rafal et al.'s (1989) finding of IOR following the cancellation of an endogenously prepared saccade (in response to an arrow cue at fixation) but not following the cancellation of an endogenously generated shift of attention that definitively linked the cause of IOR with oculomotor programming. Once caused, IOR affects both manual and saccadic responses to peripheral targets, delaying responses to targets presented in the vicinity of the previous cue. If the eyes move after a cue but before a target is presented, it is the location in space defined by environmental coordinates -- not by retinal coordinates -- that is inhibited. However, if an object is cued, and then the object moves through space before a target is presented, some if not all of the inhibition moves with the object (Abrams & Dobkin, 1994a; Tipper et al., 1991; Tipper et al., 1997; Tipper et al., 1999). Thus, IOR is also coded in an object-centered frame of reference.
IOR in Monkey and Man
29
IOR is linked to the concept of attentional capture in two ways. First, under some views, IOR is conceptualized as the aftermath of attentional capture (i.e., as resulting from the disengagement of attention from the captivating stimulus). Second, the presence of IOR modulates the attention-capturing ability of subsequent targets. When the visual system is given a choice of two targets to inspect, for example, there is an increased probability of choosing the target that is not at a previously cued (Posner et al., 1985) or previously fixated (Peterson et al., 2000) location. In this sense, IOR, laid down by the previous overt or covert orienting experience, biases subsequent orienting in new directions (Klein & MacInnes, 1999). Whereas the effect of IOR is linked to output levels of processing (e.g., Ivanoff & Klein, 2001; Taylor & Klein, 2000) associated with spatial orienting, it is not restricted to motoric processes. Strongly supporting a reduction in signal amplitude, Handy et al. (1999) showed that IOR decreases d' and Klein & Dick (in press) found that IOR decreases the probability of correct target identification using a dual-stream RSVP task designed to minimize the contribution of motoric processing. Similarly, several studies of event-related potentials (ERPs; Hopfinger & Mangun, 1998; McDonald, Ward & Keihl, 1999) have reported that IOR reduces the sensory response (magnitude of the P 1, an early ERP component seen about 100 ms after stimulus onset) generated by stimuli presented at previously cued locations, thereby giving rise to a "salience" differential in favour of items at uncued locations.
Functional significance Perhaps the most important question to consider before launching an interdisciplinary attack is, "What is the function of IOR?" Suppose you are looking for something (on your desk) or someone (in your classroom). When the item you are looking for does not pop out of the visual array, one strategy you might employ is to inspect each item in turn until the target is found. Such a search might be deliberate and controlled by a pre-planned order (as when you search for typographical errors when proofreading); or it might be more haphazard, beginning with candidate items whose initially detected features indicate a potential match with the target. The latter kind of serial search strategy, which has been supported in many search experiments (though not necessarily to the exclusion of other strategies) is referred to as "guided search." One problem confronting any serial search that is not controlled by a pre-planned sequence of inspections is how to keep track of which items have already been examined and identified as non-targets. Such a search would be increased in efficiency by a mechanism that discouraged attention or gaze from returning to previously inspected objects or regions where the goal object had not been found. As first proposed by Posner and Cohen (1984), inhibition of return (IOR) may be such a mechanism. Supporting this idea, Klein (1988; Klein & MacInnes, 1999; see also, Mtiller &von Mtihlenen, 1999; Peterson et al., in press; Takeda & Yagi, 1999) found direct evidence that an inhibitory tagging mechanism operates during serial search, presumably as a "foraging facilitator."
30
Klein, Munoz, Dorris, and Taylor
Neuroscientific answers to unsolved problems
At its most fundamental level, the cognitive neuroscientific approach we have begun is aimed at discovering how the behavioral inhibition that has been observed in studies of human performance is implemented in the nervous system. As seen below, we have so far only "looked" in one neural structure known to play a critical role in visual orienting: the superior colliculus (SC). For this reason, and others, it should hardly surprise the reader that we do not have a complete answer. However, we have made progress. Numerous investigators, beginning with Wurtz (e.g., Wurtz & Mohler, 1976) have explored other structures during a cue-target paradigm, but for the most part, these studies of the neural substrate during visual orienting have not used a paradigm that is well-suited for eliciting IOR behaviorally (e.g., Robinson & Kerzman, 1995; see p. 41 - 42 below). The SC has been implemented in the manifestation of IOR (see below for a more detailed review) and one question we sought to answer was, "Would we see in the colliculus, evidence of inhibited processing that corresponds with IOR?" When we did, we then asked, "Is the SC itself inhibited or, if not, is it receiving signals that are inhibited upstream?" When considered together with other data in the neuroscientific literature, our data can also be used to address a fundamental uncertainty concerning the timecourse of the underlying inhibition. When measured behaviorally, the timecourse of IOR has been shown to vary with task demands, emerging later to the extent the task encourages orienting toward the uninformative peripheral cue (Klein, 2000) and earlier when there is incentive to remove attention rapidly from the cued location (e.g., Danziger & Kingstone, 1999). Somewhat reflecting this variability, there are two dramatically different interpretations of when, in relation to the eliciting event, signals begin to be inhibited (see Figure 1). Let us assume that the eliciting event is an uninformative peripheral onset (cue) that precedes the target by different temporal intervals. According to one view, inhibition begins with the cue but is not apparent until the facilitatory effect of the cue has subsided. According to an alternative view, inhibition is not caused by the cue itself, but is instead elicited when attention leaves the cued region or object. Under both of these interpretations the timecourse of the inhibitory effect measured behaviorally would depend directly on the timecourse of the removal of attention. Whereas Klein's (2000) attentional control setting account of the timecourse of behaviorally measured IOR has met with considerable success in organizing discrepancies in the literature, the question still remains as to whether IOR begins when attention is removed from the cue (as depicted in the middle panel of Figure 1) or whether it begins with the cue itself (as depicted in the bottom panel of Figure 1) but is not seen at early intervals because of the overshadowing facilitation. An answer to this question can be provided by a neuroscientific approach, wherein we examine the time-locked activity of neurons in systems believed to be involved in the manifestation of IOR.
IOR in Monkey and Man 31 Figure 1. Hypothetical temporal dynamics of facilitation and inhibition. (Adapted from Klein, 2000, with permission) The data from Posner and Cohen's (1984) study (top panel) are plotted in the upper panel as a difference score [diamonds = cued minus uncued RT]. In this graph, a negative score reflects a net facilitation at the cued location (faster RTs), whereas a positive score reflects a net inhibition (slower RTs) at the cued location. The black line and grey background represent a hypothetical timecourse function fit to these data. As noted in the text, from performance patterns such as these cognitive psychologists infer hypothetical inhibitory and facilitatory processes.
0
~40 E
.;
0
6 z o-40 '(/)
U. L
-80
I
!
I
8O 0
-
~g40 E
One construal of the timecourse of two such processes, attentional facilitation at the cued location (mmmmmmm) and inhibition of return, also acting at the cued location (i,ll,,,,,,,,,a,) is shown in the middle panel. Here it is assumed that facilitation follows rapidly if not immediately after presentation of the cue and that when attention has been withdrawn from the cued location (which here takes a little over 200 ms) an inhibitory process (IOR) grows in its place. Facilitation and inhibition are assumed to add linearly but this is immaterial until the bottom panel.
ppppmlmll 000 o#
-~ al c-
0
R e nil i n
IP
i
9 i
am
Bin me
uI.~I~P~IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
9 _...,ills ttttttttttv"
IIIIIIlTIIIIIS~V
O
_
Z . m i1 1
-801
H N B I I N I I I I I I
.gc_
....
I
1
I i l l i
I
I
mum i
n i m b i
I
H
o "$1515t"
6
"l$$lltlllllllllllll
z"~ o-40 'U.
ill i i i i illlllltl||ltltttttttt/t$$ttStttt $
-80
0
100
I
200 300 Interval(msec)
..
i
t
400
500
A different construal of the timecourse of similar facilitatory and inhibitory processes is shown in the bottom panel. Here it is assumed that the inhibition starts, like the facilitation, when the cue appears; and that inhibition remains constant, whereas the facilitation decreases over time. At first, there is greater facilitation than inhibition, hence the net effect on performance ( ) is faster RT at short cue-target intervals. However, net facilitation switches to net inhibition at the interval where their absolute values are equal.
32
Klein, Munoz, Dorris, and Taylor
Monkey behavioural data A first step toward understanding the neural mechanisms of IOR is to develop an animal model of human behaviour. The rhesus monkey may be ideal for these investigations because of its similarity to man in the organization of the visual system and the oculomotor pathways (Wurtz & Goldberg 1989). Thus, we have devised an oculomotor-IOR paradigm for use with rhesus monkeys (Dorris, Taylor, Klein & Munoz, 1999). Figure 2 illustrates the paradigm that we employed. The monkeys were trained to fixate a central fixation point (FP) on a visual screen in front of them and to maintain fixation until its disappearance. While the FP remained visible, a spatially unpredictive cue was flashed. Later, the FP disappeared at the same time as a saccadic target appeared. The monkeys were rewarded for maintaining steady fixation upon the FP during the cue and then orienting to the target when it appeared. The location of the cue and target were varied, as was the time between their appearance (stimulus onset asynchrony: SOA).
Fixation I Cue (50 ms)
0 0
Q
I Interval (200, 600 or 1100 ms)
0 0
Saccade target
I~l-
0
uncued target
cued target
Figure 2. Sequence of events in the cue-saccade experiment from Dorris et al. (1999). The cue is presented for 50 ms in one of the locations marked by a peripheral circle. Shown here, in the second panel, as a filled circle is a cue presented to the right of fixation. Unfilled circles in this panel illustrate the other possible locations where a cue could have been presented. After a randomly selected interval of 200, 600 or 1100 ms, a target (shown as a filled circle in each of the bottom panels) was presented to the left or right of fixation. Given the cue on the right, a target presented on the right (shown in the bottom right panel) would be called a cued target, whereas a target on the left (shown in the bottom left panel) would be called an uncued opposite target. When the cue was presented above or below fixation the target would be called an uncued near target.
33
IOR in Monkey and Man
The first experiment was designed to show that rhesus monkeys produced IOR in this cue-saccade task. The cue appeared at 10 degrees eccentricity either left, right, up or down and the target subsequently appeared either 10 degrees left or right. The SOA was selected randomly from 250, 650, or 1150 ms. The average results obtained from 4 different monkeys performing this cue-saccade task are shown in Figure 3. There are several important observations to make. First, saccadic reaction time (SRT) for eye movements to the target was influenced by location of the non-predictive cue. SRTs were slowest when the cue was presented at the same location as the target (cued SRT = 201.6 ms, collapsed across the 3 SOAs), and 210
200 oo v
E
I-rr o "o
190
~
o o
180
[]
170
Cued Uncued Near Uncued Opposite
03
--Q--160
0
200
400
600
800
1000
1200
Cue-Target SOA (ms) Figure 3. Average saccadic reaction time, as a function of the interval between the cue and target onsets, from 4 monkeys in the paradigm illustrated in Figure 2. Cued targets were presented in the same location as the cue; uncued near targets were presented 90 degrees away from the cue and uncued opposite targets were presented 180 degrees away from the cue. The data are redrawn from Dorris et al. (1999).
fastest when the cue was presented on the opposite side of the target (uncued opposite SRT = 176.6 ms). This observation is consistent with the previous IOR effect observed in human subjects. Second, presentation of the cue at intermediate locations relative to the target (uncued near SRT = 187.2 ms) produced SRTs that were intermediate (SRTs were significantly greater than SRTs obtained when the cue was presented opposite the target and significantly less than SRTs obtained when the cue was presented on the same side as the target). Third, the difference in
34
Klein, Munoz, Dorris, and Taylor
SRT between cued versus opposite (i.e., the IOR effect) was extremely robust at the shortest SOA tested (250 ms) and then diminished gradually across the longer SOAs (650 and 1150 ms). Although this pattern reveals an inhibitory effect that seems similar to what has been referred to as IOR in the literature with human participants, 2 our confidence that this is akin to human IOR can be fortified by comparing the spatial distribution and timecourse of the inhibitory effects seen in the monkey with that from comparable studies using human participants. To determine the spatial extent of the influence of the uninformative cue on SRT, we modified the cue-saccade task by varying independently the direction and eccentricity of the cue (while keeping the targets at fixed locations on the horizontal axis). Because IOR was greatest at the 250 ms SOA, the SOA was held constant at 250 ms in these followup experiments. As seen in Figure 4 (open circles), variations Human Manual RT (ms)
Monkey Saccadic RT
(ms)
o
A ,w
360
210
350
200
340
190
330
180
320
,170
310 o
30
60
, 90
160 ,8o
Angular Distance from Cued Location (deg.) Figure 4. Reaction time as a function of the angular difference between a single cue and subsequent target. Data from monkeys making saccadic responses (from Dorris et al., 1999) are shown as open circles. The human data (filled circles) are from Pratt, Spalek and Bradshaw (1999) and Klein et al. (unpublished ms), two studies with human participants making manual detection responses. Not surprisingly, monkeys making saccades respond much faster (these data are referenced to the right-side Y axis) than human's making manual detection responses (these data are referenced to the left-side Y axis). Nevertheless, the gradients are very similar.
IOR in Monkey and Man
35
in directional alignment of stimuli (in 45-degree steps) systematically influenced SRT. SRT was slowest when the target was presented at the cued location and decreased gradually to a minimum when it was at the opposite location. Data from two studies with human subjects (Klein et al., in preparation; Pratt et al., 1999) making simple, manual detection responses are shown (filled circles) in the same figure. The nearly identical gradients strongly suggest that a similar phenomenon was being elicited by the cues in these two species. When targets and cues were all presented on the horizontal axis, we observed a similar gradient wherein SRT varied inversely with the linear distance between the cue and target (Dorris et al., 1999). We are unaware of comparable data from human observers using this procedure. Our monkey timecourse data are compared with timecourse data from studies in which human participants made saccades to targets presented after uninformative peripheral cues in Figure 5A. The data from the human literature are shown as small filled circles and the data from Dorris et al. (1999) are shown as filled triangles. The timecourse of effects upon SRT in the human participants reveals a biphasic pattern in which a cue produces an early RT benefit that is replaced by costs somewhere between 100 and 200 ms after the cue's appearance. In the human studies, the cost (which has been attributed to IOR) is maximal between 100 and 300 ms post-cue and gradually decreases. We did not test early enough to see the early facilitation, but the timecourse of the inhibitory effect we observed in the monkey superimposes perfectly with that seen in the human literature. We conclude from the data presented in Figures 4 and 5A that the rhesus monkey qualitatively produces a similar behaviour as seen in human subjects and therefore represents a good animal model for exploring the neural mechanisms underlying the expression of IOR in a cue saccade paradigm.
Superior Colliculus involvement in IOR The SC, a laminated structure in the dorsal midbrain, is a critical node in the visual orienting network (see Figure 6). Its superficial layers contain neurons that respond to visual stimuli and they are interconnected with other cortical and subcortical visual areas (Robinson & McClurkin, 1989). The superficial layers also receive a direct retinal projection. In contrast, the intermediate layers of the SC are interconnected with several cortical and subcortical visual and oculomotor areas (Sparks & Hartwich-Young, 1989 for review). Perhaps most importantly, the intermediate layers form a critical node for the convergence of cortical inputs, especially from parietal and frontal areas. Output neurons in the intermediate SC carry presaccadic signals to the brainstem premotor circuitry (see Moschovakis et al., 1996 for review). Various lines of evidence suggest that the SC may be intimately involved in the generation of IOR. IOR is reduced or absent in patients with lesions to the SC (Posner & Cohen, 1984; Posner et al., 1985; Sapir et al., 1999). IOR is larger for temporal hemifield stimuli than nasal hemifield stimuli and the retinotectal projection is greater from the temporal hemifield (Rafal et al., 1989; but see Perry
36
Klein, Munoz, Dorris, and Taylor
A
9
60
Data from the literature with human subjects
Studies with Monkey subjects 9
"~" g 40 E :,= 0,
u.. 20
~
0
it::: LM
tO
f"
.~
--
0
Dorris et al. (submitted)
--
9
---,Ik---
',,,
9 O~
9
o
9
Dorris et al. (1999) Robinson & Kerzman (informative cue)
",,
p
O
-20
'I( " 9
-40
-60
0
200
400
600
800
Cue-Target
B
9
SOA
1000
1200
14'00
(ms)
1.0 ,• . . . . . . . . "--'-'-'-t~ >
. D
0.8
0
< a
l l
0.6
i
0 "o N
E o z
9
l
i
0.4 Robinson & Kerzman - - - i --
0.2
--- ~ - 0
200
400
Dorris et al.
Cued Target
9
Uncued (opposite) Target 600
Cue-Target
800 SOA
1000
El 1200
1400
(ms)
Figure 5A. The effect of a spatial pre-cue upon saccadic latencies (shown as SRT to cued targets minus SRT to uncued targets) is plotted as a function of the SOA between cue and target. Data from the literature with human subjects (small circles, from Klein, 2000, Figure 3) is shown together with the data from our studies with rhesus monkeys (solid squares) in a similar cue-saccade paradigm. Added to these studies all of which used uninformative peripheral cues is the data from a second study with monkeys (Robinson & Kerzman, 1995a) which used informative peripheral cues (solid triangles, dotted line). B. Activity of collicular cells from two studies as a function of cue condition (cued=same; uncued=opposite) and cue-target SOA. Data have been normalized to the overall mean activity in the uncued condition (= 1.0). See text for further explanation.
37
IOR in Monkey and Man
& Cowey 1984]. IOR occurs in newborns for whom the SC is more developed than the cortex (Simion, Valenza, Umilta, & Dalla, 1995; Valenza et al., 1994). IOR is observed in the hemianopic field of blindsight patients who have damage to V 1 but intact subcortical (i.e., retinotectal) pathways (Danzinger et al., 1997). And, IOR interacts with the gap effect (Abrams & Dobkin, 1994b), which is mediated by the disinhibition of oculomotor programming in the SC (Dorris & Munoz, 1995; Dorris, Par6, & Munoz, 1997; Munoz & Wurtz, 1993a; 1993b). Parietal Frontal
'F F \ / Recording
vc
Visual
~ellum
Pons
Figure 6. The superior colluculus (SC) is shown to be a center of converging inputs from various brain systems whose outputs control the oculomotor machinery via circuitry in the brainstem (pons). (PFC=prefrontal cortex; FEF=frontal eye fields; SEF=supplementary eye fields; LIP=lateral intraparietal cortex; VC=visual cordes; CN=caudate nucleus; SNpr=substantia nigra pars reticulata).
Whereas these findings converge with Rafal et al.'s demonstration of the important role of oculomotor programming in causing IOR, and therefore point to the SC as an important target area for neuroscientific exploration, they by no means rule out the involvement of other brain structures. Indeed, several findings in the IOR literature point to an important role for cortical regions in the generation and maintenance of the inhibition, once the to-be-inhibited regions or objects are identified or tagged by prior orienting behavior. First, IOR has been measured using both oculomotor and manual responses, whereas the SC has been traditionally viewed as an oculomotor orienting structure (Munoz et al., 2000; Sparks & Hartwich-Young, 1989;). Recent work, however, has suggested a role for the ventral layers of the SC and the underlying tegmentum in the control of reaching movements (Stuphorn et al., 1999). Second, IOR, at least when measured with manual responses, is coded in environmental coordinates when the eyes move between cue and target (Maylor & Hockey, 1985; Posner & Cohen, 1984) or object coordinates when a previously attended object moves (Abrams & Dobkin, 1994a;
38
Klein, Munoz, Dorris, and Taylor
Tipper et al., 1991), whereas the SC uses an oculocentric coding system (Sparks & Hartwich-Young, 1989). Third, IOR can be maintained at several, successively cued locations, possibly up to 5 (Snyder & Kingstone, 2000), whereas the SC is conceived as a vector-averaging "winner-take-all" system. Fourth, an intact corpus callosum is required for object-coded IOR to move across the vertical midline (Tipper et al., 1997).
Monkey neurophysiology Even though other areas may be potentially involved in IOR, evidence in the literature suggesting that the SC is important for the manifestation of IOR led us to begin our search for the neural mechanisms in that structure of the rhesus monkey. We recorded from single neurons in the superficial and intermediate layers of the SC. Neurons in the superficial layers carry visual signals, while neurons in the intermediate layers carry both visual and motor signals. The visual responses of these SC neurons consist of a robust phasic burst that occurs following the sudden appearance of a visual stimulus in the response field of the neuron. Some visuallyresponsive neurons also have a sustained discharge following stimulus appearance. The methods and typical results are shown in Figure 7. In this experiment, the uninformative visual cue and the saccadic target were presented either in the response field of the neuron or at the mirror location, opposite the horizontal and vertical meridians. The neuron displayed both an initial phasic visual response timelocked to the presentation of the stimulus in its response field and a second motor response time-locked to initiation of a saccade into its response field. During the opposite condition, the cue (shown here presented 200 ms before target appearance) elicited no response from this neuron when its location was opposite to the neuron's response field. In contrast, during the same condition, the appearance of the cue elicited a robust phasic response when presented in the neuron's response field. Most importantly, there was a striking difference between the activity pattems in the same and opposite conditions following the subsequent appearance of the target in the neuron's response field (i.e., difference between peaks during the gray epoch in Figure 7). Although the identical target stimulus was presented at the identical location in the response field, the magnitude of the stimulus-related response was dependent upon the previous cue location. The visual response to the target appearance was significantly attenuated if the cue had been presented in the neuron's response field. Although the motor burst in the Same condition was slightly delayed, when the neuronal activity was aligned on saccade onset (not shown), there was no significant difference in the magnitude of the saccade-related activity for the same versus opposite condition. Thus, it was the magnitude of the visual response, not that of the motor response that was influenced by the location of the cue. Another difference in the neural activity recorded in these two conditions was seen during the interval between the appearance of the cue and target. During this interval the discharge rate of the neuron was higher in the same compared to the
IOR in Monkey and Man
39
Figure 7. The sequence of events on prototypical Same (or cued) and Opposite (uncued) trials are shown on the top where the dashed circle represents the receptive field of the cell being recorded from and the filled black circles represent the cue (S1) and target (T2). The arrow represents the saccadic response to the target. The timing of the events is shown on the bottom. The results from a typical visuomotor cell recorded from the intermediate layers of the SC is shown in the middle with trial by trial cellular activity (rasters) plotted above the average firing rates. Cellular activity (from both the individual trials and the average firing rates) from the Opposite condition are plotted in dark grey and from the Same condition are plotted in black (with a dotted line for the average activity).
opposite condition. This suggests that IOR is not caused by active inhibition o f SC neurons because inhibited neurons would be at a lower level of excitability when, in fact, recently activated neurons were at a higher level of excitability. An alternative to active inhibition of the SC is that there is a reduction in the magnitude of the target-related input to neurons in the same condition. The observation that the decreased responsiveness o f collicular neurons to the presentation o f the target stimulus occurs concurrently with increased activity during the SOA period is also of interest because it lends support to the suggestion that facilitatory effects due to
40
Klein, Munoz, Dorris, and Taylor
reflexive orienting of attention and the inhibitory effects of IOR may be independent effects that occur simultaneously (Tipper et al., 1997; Ro & Rafal, 1999). A reduced stimulus-related response like that shown in the Same condition of Figure 7 was observed in the vast majority of neurons (94% - 45/48) recorded in the superficial and intermediate layers of the SC. While recording from these neurons we observed a concomitant increase in SRT in the same condition in 85 % (41/48) of the cases. To address the issue of whether the relationship between magnitude of target-related response predicted changes in SRT, we measured peak discharge 70-120 ms after the target's appearance in the neuron's response field and correlated it with SRT on a trial by trial basis. We obtained a negative correlation for 98 % (47/48) of the neurons and this correlation was significant for 7 1 % (34/48) of the cases. Therefore, the target-related activity of SC neurons was related to IOR on both a gross level and a trial by trial basis. This finding represents the most direct evidence obtained thus far implicating the SC in the manifestation of IOR. Is the SC the site of inhibition?
Although the results described above support the hypothesis that the SC lies within the pathway subserving oculomotor IOR, they do not address the issue of whether the SC is the site of the inhibition or whether the inhibition occurs elsewhere and the SC receives reduced target-related inputs. The observation that there was increased activity during the SOA period in the same condition compared to the opposite condition (Figure 7) supports the latter view. To directly test between these two possibilities, a stimulation IOR experiment was devised that was identical to the previous cue-saccade task, except that on 25% of the trials, rather than present the target, an eye movement was evoked via microstimulation of the SC. The stimulating electrode was positioned in the intermediate layers of the SC at a site that coded for one of the possible cue/target locations. The time to initiate the electrically-evoked saccade should depend upon the level of pre-existing neural excitability (Hikosaka & Wurtz, 1985; Munoz et al., 2000; Stanford et al., 1996). If SC neurons are actively inhibited during the same condition, then more time should be required to reach saccadic threshold, resulting in longer SRTs than those elicited in the opposite condition (i.e., IOR pattern of SRTs should remain for electricallyinduced saccades). If, however, these neurons are not actively inhibited during the same condition but are instead at a higher level of excitability before target presentation, there should be a reversal of the IOR pattern of shorter SRTs in the opposite condition. The results of this stimulation experiment are illustrated in Table 1. First, at the 200 ms SOA there was a significant IOR effect on the non-stimulated trials. However, when saccades were elicited by electrical stimulation of the SC, there was a reversal of the IOR effect with the same condition being significantly faster than the opposite condition. Neither effect was present at the longer interval. In conjunction with the observation that SC neurons are at a higher level of excitability
41
IOR in Monkey and Man
immediately following a cue presented to their receptive field (compare same versus opposite trials in the SOA period, Figure 7), these results are inconsistent with the view that IOR involves the inhibition of the SC circuitry. Rather, the data presented here support the hypothesis that the target-related response is attenuated upstream of the SC. Table 1. Results from a stimulation experiment (from Dorris, et al., submitted). Saccadic reaction times (in ms) to visual targets and to electrical stimulation of the superior colliculus (of the region that would cause a saccade toward the location of the cue (same) or in the opposite direction) are shown as a function of the interval between the onset of the 100 ms cue and the stimulus.
SOA (ms): Cued (Same) Uncued (Opposite)
Visual Target 200 1100 263 213 215 216
Electrical Stimulation 200 1100 53 55 69 56
When does the inhibition begin? Robinson and Kerzman (1995) observed similar attenuation of SC stimulus-related activity in a covert attention task. Their study differed in several important ways from ours. First, and most importantly, their cues were informative about the likely location of the target. In the IOR literature, uninformative cues have been used because of the presumption that IOR will not be observed and may not even be generated until attention is removed from the cued location (this was implicit in Posner and Cohen's methods and has been demonstrated by Wright & Richard, 2000). Therefore, this methodological feature was, according to conventional wisdom, not well-suited for obtaining IOR behaviorally (this was not a problem for Robinson & Kerzman, who were not looking for IOR). Second, like much of the attention and IOR literature with humans Robinson and Kerzman's monkeys made a manual detection response to the target onset, whereas ours made saccades to the targets. Finally, whereas the shortest cue-target SOA we used was 200 ms, Robinson and Kerzman used SOAs as short as 100 ms for their behavioral observations and 50 ms in their assessment of neural activity. Robinson and Kerzman's behavioral data are shown in figure 5A (triangles) along with data from studies using uninformative cues and saccadic responses. Even with uninformative cues, facilitation is observed at cue-target SOAs less than 150 ms, presumably because the cue's appearance triggers a reflexive attentional orienting response. Robinson and Kerzman's monkeys generally showed more facilitation (or less inhibition) at all the intervals tested, probably because the cues were informative. An answer to our question, "when does IOR begin", is strongly suggested by considering the data that is highlighted behind the grey bar in the Figure. Behaviorally speaking (Figure 5A), in this period which immediately follows a peripheral event, there is both facilitation and inhibition, with the former occurring at the short intervals and the latter present at the longer intervals (at least
42
Klein, Munoz, Dorris, and Taylor
with uninformative cues). During this entire period, however, the activity of single SC neurons in response to targets is reduced at a previously cued (as compared to uncued) location (Figure 5B). This pattern is most consistent with view presented in the bottom panel in Figure 1, according to which inhibition (we assume of the signals reaching the SC) begins with the cue. The advantage for cued targets that is seen in behavior at the shortest SOAs must then be due to a form of signal amplification which overshadows the signal attenuation shown here. However, one caveat must be made. Whereas IOR has been shown to last seconds when human participants make manual responses to targets, the IOR we have observed behaviorally and at the single cell level (in the monkey in a cue-saccade paradigm) did not appear to persist beyond one second.
Summary and Conclusion All other things being equal, one consequence of visual orienting is a tendency for subsequent orienting to shun the familiar and seek out the new. This tendency is implemented through an attenuation of signals arising in objects and locations from which orienting had recently been disengaged. This signal attenuation process has been called "inhibition of return", and it is clear, now, that IOR has dramatic effects on which, otherwise psychophysically equivalent, stimuli will capture our attention. We elicited and characterized IOR in the monkey, which we showed to be quite similar to IOR in human participants, so that we could begin to explore the neural substrate of IOR. The target-elicited sensory responses of superficial and intermediate layer neurons in the superior colliculus were reduced at a previously cued location (compared to the uncued location on the opposite side), and this reduction was statistically linked to behavioral IOR. Because the post-cue activity was higher, and because the reaction time of saccades elicited via electrical stimulation were faster at the previously cued location, we concluded that the circuitry in superior colliculus is not itself inhibited. Rather, signals reaching the colliculus are attenuated upstream. Thus, based on the many findings pointing to the importance of the SC for the manifestation of IOR, as well as the findings suggesting that the SC cannot be the whole story (see "Superior Colliculus involvement in IOR"), we propose that the orienting behavior coded in the SC tags locations and objects as candidates for IOR while other brain systems implement and maintain the inhibition despite changes due to alterations in gaze direction and object motion. The pathways illustrated in Figure 6 point to several candidate structures that may be involved in the implementation and maintenance of the inhibitory tags leading to the attenuated signals we observed in the SC. As noted by Klein and Taylor (1994): "The potential for symbiosis between cognitive and neural science is particularly evident in the study of inhibition where, on the one hand, neural mechanisms provide a terminology and insight for understanding possible mechanisms of cognitive inhibition and, on the other hand, behavior-based models
IOR in Monkey and Man
43
of cognitive functioning imply the need for inhibitory circuitry." It is our hope that neuroscientific studies of these structures using a model task like that illustrated here will, by revealing how inhibition of return is implemented by the brain, demonstrate this "potential." Footnotes
IAlthough this early facilitation is not always observed in studies that use a target detection task (e.g. Samuel & Weiner, in press; see Collie et al., 2000, Table 4, for a review), it is ubiquitous in tasks which require some sort of target discrimination, leading to the suggestion that a particular difficulty distinguishing cued targets from catch trials may be responsible for masking the facilitatory effects of attention in some detection tasks (see Lupianez & Weaver, 1998). 2One apparent exception is the absence of early facilitation. This is likely due to the fact that we did not use SOAs shorter then 250 ms. The human literature (see figure 5A) reveals that the crossover from facilitation to inhibition takes place between 100 and 200 ms when saccadic responses to targets are used. References
Abrams, R. A., & Dobkin, R. S. (1994a). Inhibition of return: effects of attentional cueing on eye movement latencies. Journal of Experimental Psychology: Human Perception and Performance, 20, 467-477. Abrams, R. A & Dobkin, R. S. (1994b). The gap effect and inhibition of return: interactive effects on eye movement latencies. Experimental Brain Research. 98, 483-487. Collie, A., Maruff, P., Yucel, M., Danckert, J., & Currie, J. (2000). Spatiotemporal distribution of facilitation and inhibition of return arising from the reflexive orienting of covert attention. Journal of Experimental Psychology: Human Perception and Performance, 26, 1733-1745. Danziger, S. & Kingstone, A. (1999). Unmasking the inhibition of return phenomenon. Perception & Psychophysics, 61, 1024-1037. Danziger, S., Fendrich, R., & Rafal, R. D. (1997). Inhibitory tagging of locations in the blind field of emianopic patients. Consciousness and Cognition, 6, 291-307. Dorris, M. C. & Munoz, D. P. (1999). Saccadic reaction times are influenced similarly by previous saccadic metrics and exogenous cueing in monkey. Journal of Neurophysiology, 81, 2429-2436. Dorris, M. C., Taylor, T., Klein, R. M., & Munoz, D. P. (1999). Influence of previous visual stimulus or saccade on saccadic reaction times in monkey. Journal of Neurophysiology, 81, 2429-2436. Dorris, M. C., Klein, R. M., Everling, S., & Munoz, D. P. (submitted). Contribution of the primate superior colliculus to inhibition of return.
44
Klein, Munoz,Dorris,and Taylor
Dorris, M. C. & Munoz, D. P. (1995). A neural correlate for the gap effect on saccadic reaction times in the monkey. Journal of Neurophysiology, 73, 25582562. Dorris, M. C., Pare, M., & Munoz, D. P. (1997). Neuronal activity in monkey superior colliculus related to the initiation of saccadic eye movements. Journal of Neuroscience, 17, 8566-8579. Handy, T. C., Jha, A. P., & Mangun, G. R. (1999). Promoting novelty in vision: Inhibition of remm modulates perceptual-level processing. Psychological Science, 1O, 157-161. Hikosaka, O. & Wurtz, R. H. (1985b). Modification of saccadic eye movements by GABA-related substances. I. Effect of muscimol and bicuculline in monkey superior colliculus. Journal of Neurophysiology, 53, 266-291. Hood, B. R., Atkinson, J., & Braddick, O. J. (1998). Selection for action and the development of orienting and visual attention. In J.G. Richards (Ed.), Cognitive Neuroscience of Attention: A developmental perspective (pp.219-250). Erlbaum. Hopfinger, J. B. & Mangun, G. R. (1998). Reflexive attention modulates processing of visual stimuli in human extra striate cortex. Psychological Science, 9, 441-447. Ivanoff, J. & Klein, R. M. (2001). The presence of a nonresponding effector increases inhibition of return. Psychonomic Bulletin and Review, 8, 307314. Klein, R. M. (1988). Inhibitory tagging system facilitates visual search. Nature, 334, 430-431. Klein, R. M. (2000). Inhibition of Return. Trends in Cognitive Science. 4, 138-146. Klein, R. M. & Dick, B. (in press). A dual-stream RSVP exploration of the temporal dynamics of reflexive attention shifts. Psychological Science. Klein, R. M. & MacInnes, W. J. (1999). Inhibition of remm is a foraging facilitator in visualosearch. Psychological Science, 1O, 346-352. Klein, MacGillivary & Morris (in preparation) Klein, R. M. & Taylor, T. L. (1994). Categories of cognitive inhibition, with reference to attention. In D. Dagenbach & T. H. Carr (Eds.), Inhibitory processes in attention, memory, and language (pp.l13-150). San Diego, CA: Academic Press. Lupianez, J. & Weaver, B. (1994). On the time course of exogenous cueing effects: A commentary on Tassinari et al. (1994). Vision Research, 38, 1621-1623. McDonald, J. J., Ward, L. M., & Kiehl, K. A. (1999). An event-related brain potential study of inhibition of return. Perception & Psychophysics, 61, 14111423. Marr, D. (1982). Vision: a computational investigation into the human representation and processing of visual information. (pp.3-38) San Francisco, CA : W.H. Freeman.
IOR in Monkey and Man
45
Maylor, E. & Hockey, R. (1985). Inhibitory component of extemaly controlled covert orienting in visual space. Journal of Experimental Psychology:
Human Perception and Performance. 11,777-787. Moschovakis, A. K. (1996). The superior colliculus and eye movement control. Current Opinions in Neurobiology, 6, 811-816. Mtiller, H. & v o n Mtihlenen, A. (2000). Probing distractor inhibition in visual search. Journal of Experimental Psychology." Human Perception and Performance. 26, 1591-1605. Munoz, D. P., Dorris, M. C., Pare, M., & Everling, S. (2000). On your mark, get set: Brainstem circuitry underlying saccadic initiation. Canadian Journal of Physiological Pharmacology, 78, 934-944. Munoz, D. P. & Wurtz, R. H. (1993a). Fixation cells in monkey superior colliculus: I. Characteristics of cell discharge. Journal of Neurophysiology, 70, 559575. Munoz, D. P. & Wurtz, R. H. (1993b). Fixation cells in monkey superior colliculus: II. Reversible activation and deactivation. Journal of Neurophysiology, 70, 576-589. Perry, V. H. & Cowey, A. (1984). Retinal ganglion cells that project to the superior colliculus and pretectum in the macaque monkey. Neuroscience. 12, 11251137. Peterson, M. S., McCarley, J. S., Kramer, A. F., Irwin, D. E., & Wang, R. F. (2000, November). Visual search has memory. Paper presented at the annual meeting of the Psychonomics Society, New Orleans, LA. Peterson, M. S., Kramer, A. F.,Wang, R. F., Irwin, D. E., & McCarley, J. S. (in press). Visual search has memory. Psychological Science. Posner, M. I. & Cohen, Y. (1984). Components of visual orienting. In Bouma, H. & Bowhuis, D. (Eds.) Attention and Performance X (pp. 531-556). London: Academic Press. Posner, M. I, Rafal, R. D., Choate, L. S., & Vaughan, J. (1985). Inhibition of return: neural basis and function. Cognitive Neuropsychology, 2, 211-228. Pratt, J., Spalek, T. M., & Bradshaw, F. (1999). The time to detect targets at inhibited and non inhibited locations: Preliminary evidence for attentional momentum. Journal of Experimental Psychology: Human Perception and Performance, 25, 730-746. Rafal, R. D., Calabresi, P. A., Brennan, C. W., & Sciolto, T. K. (1989). Saccade preparation inhibits reorienting to recently attended locations. Journal of Experimental Psychology: Human Perception and Performance, 15, 673-685. Ro, T. & Rafal, R. D. (1999). Components of reflexive visual orienting to moving objects. Perception and Pscyhophysics, 61, 826-836. Robinson, D. L. & Kerzman, C. (1995). Covert orienting of attention in Macaques. III. Contributions of the superior colliculus. Journal of Neurophysiology. 74, 713-721.
46
Klein, Munoz, Dorris, and Taylor
Robinson, D. L. & McClurkin, J. W. (1989). The visual superior colliculus and pulvinar. In The Neurobiology of Saccadic Eye Movements. (pp.337360) Elsevier, Amsterdam.. Samuel, A. G. & Weiner, S. (in press). Attentional consequences of object appearance and disappearance. Journal of Experimental Psychology: Human
Perception and Performance. Sapir, A., Soroker, N., Berger, A., & Henik, A. (1999). Inhibition of retum in spatial attention: direct evidence for collicular generation. Nature Neuroscience, 2, 1053-1054. Simion, F., Valenza, E., Umilta, C., & Dalla B. (1995). Inhibition of return in newboms is temporo-nasal asymmetrical. Infant Behavior and Development, 8, 189-194.
Snyder, J. J. & Kingstone, A. (2000). Inhibition of return and visual search: How many separate loci are inhibited. Perception & Psychophysics, 62, 452-458. Sparks, D. L. & Hartwich-Young, R. (1989). The deep layers of the superior colliculus. In R.H. Wurtz and ME Goldberg (Eds.) The Neurobiology of Saccadic Eye Movements Vol. III (p. 213-255) Amsterdam: Elsevier. Stanford, T. R., Freedman, E. G., & Sparks, D. L. (1996). Site and parameters of microstimuluation: evidence for independent effects on the properties of saccades evoked from the primate superior colliculus. Journal of Neurophysiology, 76, 3360-3381. Stuphom, V., Hoffmann, K-P., & Miller, L. E. (1999). Correlation of primate superior colliculus and reticular formation discharge with proximal limb muscle activity. Journal of Neurophysiology, 81, 1978-1982. Takeda, Y. & Yagi, A. (2000). Inhibitory tagging to continuous visual stimuli. Perception & Psychophysics. 62, 927-934. Taylor, T. L., & Klein, R. M. (1998). On the causes and effects of inhibition of retum. Psychonomic Bulletin & Review, 5, 625-643. Taylor, T. & Klein, R. M. (2000). Visual and motor effects in inhibition of return. Journal of Experimental Psychology: Human Perception and Performance, 26, 1639-1655. Teller, D. Y. (1990). The domain of visual science. In Spillmann, L. & Wemer, J. S. (Eds.), Visual perception: The neurophysiological foundations. San Diego, CA: Academic Press, Inc. Tipper, S. P., Driver, J., & Weaver, B. (1991). Object-centred inhibition of return of visual attention. The Quarterly Journal of Experimental Psychology, 43, 289-298. Tipper, S. P., Rafal, R., Reuter-Lorenz, P. A., Starrveldt, Y., Ro, T., Egly, R., Danziger, S., & Weaver, B. (1997). Object based facilitation and inhibition from visual orienting in the human split brain. Journal of Experimental Psychology: Human Perception and Performance, 23, 1522-1532.
IOR in Monkey and Man
47
Tipper, S. P., Jordan, H., & Weaver, B. (1999). Scene-based and objectcentered inhibition of return: Evidence for dual orienting mechanisms. Perception & Psychophysics, 61, 50-60. Trappenberg, T. P., Dorris, M. C., Munoz, D. P., & Klein, R. M. (2001). A model of saccade initiation based on the competitive integration of exogenous and endogenous signals in the superior colliculus. Journal of Cognitive Neuroscience, 133, 256-271. Valenza, E. L., Simion, F. L., & Umilta, C. L. (1994). Inhibition of return in newborn infants. Infant Behavioral Development, 17, 293-302. Wright, R. D. & Richard, C. M. (2000). Location cue validity affects inhibition of return of visual processing. Vision Research, 40, 2351-2358. Wurtz, R. H. & Mohler, C. W. (1976). Organization of monkey superior colliculus: enhanced visual response of superficial layer cells. Journal of Neurophysiology, 39, 745-765. Wurtz, R. H. & Goldberg, (1989). The neurobiology of saccadic eye movements. New York, NY: Elsevier Science Publishers Co.
This Page Intentionally Left Blank
Part I1
Visual Cognition
This Page Intentionally Left Blank
Attraction, Distraction, and Action: MultiplePerspectiveson Attentional Capture C. Folk and B. Gibson (Editors) 9 ElsevierScience B. V. All rights reserved.
51
3 Inattentional Blindness and Attentional Capture: Evidence for Attention-Based Theories of Visual Salience Bradley S. Gibson and Mary A. Peterson
For the past three decades or more, vision research has considered the possibility that certain early stages of visual processing might occur without attention (Neisser, 1967). Recently, however, there appears to be growing consensus that traditional conceptions of"pre-attentive" visual processing may be misleading in many respects (Nakayama & Joseph, 1998). One fundamental issue that has generated considerable discussion concerns the question of whether pre-attentive processing truly occurs prior to the allocation of attention or whether this stage of processing might also depend on the allocation of attention. In the present chapter, we discuss how the resolution of this issue is intertwined with theories of attentional control. At the center of this discussion has been the visual pop out effect which is typically observed in the visual search paradigm when the target can be distinguished from the distractors on the basis of a simple feature discontinuity, as for instance, when a single red target appears in a homogenous field of blue distractors. In this "singleton search" task, the visual pop out effect is characterized by relatively fast and efficient detection of the red singleton target regardless of the number of blue distractors that are present in the visual field (Treisman, 1988; Treisman & Gelade, 1980; Wolfe, 1998). Feature singletons that lead to such fast, efficient detection within the context of singleton search tasks are often said to be "visually salient" (Yantis & Egeth, 1999), though it is important to note that visual salience is a relative, rather than an absolute, term. In the context of the example in which the target appears as the single red element among a homogenous field of blue distractors, the target would have the highest salience because it differs from every distractor in color; whereas, the distractors would all be equally low in salience because each of the homogenous distractors differs only from the target in color. Computationally, relative salience is typically based on an analysis of feature differences and this analysis is thought to be performed by pre-attentive visual processes operating in parallel across the visual field (Cave & Wolfe, 1990). The computation of these stimulus-based, "difference signals," is important, because these signals are thought to guide (along with other, top-down, control processes) the subsequent allocation of focal attention to objects in the environment so that these objects can be consciously detected, identified, and responded to (Cave &
52
Gibson and Peterson
Wolfe, 1990; Wolfe, 1994). Thus, according to this "guided search" account, the detection of visual singletons in singleton search tasks is typically fast and efficient because salience can be computed in parallel across the visual field and focal attention is consistently allocated to the element with highest salience (the singleton target) first. The relative ease with which visual singletons are detected in singleton search tasks such as these can be contrasted with much less efficient forms of visual search that arise when the target appears relatively non-salient, as for instance when the target cannot be distinguished from the distractors on the basis of a simple feature discontinuity (Treisman, 1988; Treisman & Gelade, 1980; Wolfe, 1998). In this situation, all of the display elements are similar and thus feature differences cannot be used to guide focal attention. As a result, focal attention will not be consistently allocated to the target first; rather, focal attention must instead be allocated to the display elements in a random fashion in this situation until the target is found. Consequently, the time to detect the target typically increases as the number of distractors increases in this search environment. Although it has been tempting to interpret the efficient detection of feature singletons as reflecting visual processes that occur pre-attentively, some researchers have recently pointed out that this conclusion may be inappropriate. This is because such detection is typically measured under conditions in which observers are actively attending to the visual search display (Mack & Rock, 1998). Thus, the ability to process salience by simultaneously comparing feature differences across the visual field may not reflect a truly pre-attentive process. This observation in turn has led to a variety of dual-task studies that attempted to limit the amount of attention that would otherwise be available to mediate singleton detection. For instance, Mack and Rock (1998) conducted a dual-task experiment in which observers were initially led to believe that they would be performing a single linelength judgment task. Following several trials, however, an unexpected color singleton appeared in a background display of homogenous distractors along with the expected stimuli. Immediately after their response to the primary task, observers were asked if they noticed anything unusual on that trial. Relatively few observers reported any awareness of the color singleton (see pp. 44-51), and Mack and Rock termed this surprising lack of awareness "inattentional blindness." In another series of experiments, Joseph, Chun, and Nakayama (1997) used the rapid serial visual presentation (RSVP) paradigm to determine whether observers could simultaneously perform both a primary letter identification task and a secondary singleton search task in which observers attempted to detect the presence or absence of a uniquely-oriented target element that appeared among an homogenous display of distractors. On each trial in this experiment, observers initially observed a stream of black letters presented briefly one after the other and were instructed to identify the single white letter in the stream. At various time points following this critical letter stimulus, the visual search display appeared. As expected, performance was highly accurate in the letter identification task; more importantly however, singleton detection was nearly at chance in the singleton
Inattentional Blindness
53
search task, at least when the search display appeared during the "attentional blink," a period lasting several hundred milliseconds in which the re-allocation of attention is thought to be temporarily suppressed by the primary letter identification task (Raymond, Shapiro, & Arnell, 1992; see Egeth, Folk, Leber, Nakama, & Hendel, this volume, for further discussion of these results). Together, the studies described above have been interpreted to suggest that a visual singleton may appear salient and therefore may be detected regardless of the number of homogenous distractors only when attention is appropriately allocated to the search display (see also, Theeuwes, Kramer, & Atchley, 1999). As a result, these studies have led many vision researchers to reconsider the classic distinction between pre-attentive and post-attentive visual processing, at least as it relates to the processing of visual salience. For instance, Treisman (1993) has suggested that search for salient versus non-salient targets should no longer be construed as reflecting pre-attentive and post-attentive forms of visual processing, respectively, but rather as reflecting different ways of attending to a scene. Nakayama and his colleagues (Bravo & Nakayama, 1992; Nakayama & Joseph, 1998) made a similar argument and suggested that the computation of visual salience requires that attention be divided across the search display so that all the elements can be simultaneously compared. Note that, in this view, a visual singleton may be consciously detected as a s i n g l e t o n only by using divided attention. However, Nakayama and Joseph (1998) have argued that focal attention is automatically drawn to the singleton following its initial processing so that other aspects of the singleton (such as its identity) may be revealed. (The guided search model takes a similar position regarding the automatic allocation of attention to a singleton.) Before accepting this new conception of singleton detection, however, it is necessary to point out that the mere failure to consciously detect whether a visual singleton is present or not may be consistent with a variety of different conceptions of singleton detection, including those that explicitly construe the underlying computation of visual salience to be pre-attentive. For instance, although the guided search theory contends that the computation of visual salience occurs preattentively, it also contends that many subsequent visual processes, including conscious detection of the singleton, may depend on the allocation of focal attention. Thus, the primary challenge for theories such as guided search is to explain why focal attention was not immediately drawn to the most salient element present in the visual field in the inattentional blindness studies cited above. With respect to the evidence reported by Mack and Rock (1998), it is possible that the salience of the unexpected singleton was processed pre-attentively without the aid of divided attention, but that the normal allocation of focal attention to this element was overridden by the observer's intention to keep attention focused on the primary task (see also, Yantis & Jonides, 1990). Likewise, with respect to the evidence reported by Joseph et al. (1997), it is also possible that the salience of the expected singleton was processed pre-attentively without the aid of divided attention, but that the reallocation of focal (or divided) attention to any element was temporarily impaired by the attentional blink. If this analysis is correct, then, at best, the evidence cited above
54
Gibson and Peterson
serves as weak support for the notion that the computation of visual salience (as opposed to the conscious detection of visual salience) depends on attention. Stronger evidence for the notion that the computation of visual salience depends on attention must therefore come from studies in which focal attention is neither potentially constrained by its temporal limitations (as in the Joseph et al. experiments) or fixed in any one location by observers' intentions (as may have occurred in Mack and Rock's experiments). In the present chapter, we begin by considering other, potentially stronger, evidence for the notion that the processing of visual salience does depend on attention. One important difference between the new, "attention-based," conception of visual salience and previous, pre-attentive, conceptions concerns the extent to which the processing of visual salience can occur outside the context of singleton search tasks. According to the new conception, the fast, efficient computation of visual salience should be confined to search tasks that induce observers to divide their attention across the display (such as singleton search tasks) and should not extend to search tasks that involve a different (narrower) configuration of the attentional window. In contrast, previous conceptions have contended that the fast, efficient computation of visual salience occurs pre-attentively and therefore should be unaffected by the nature of the search task (Cave & Wolfe, 1990). Recall that, according to the guided search model (Cave & Wolfe, 1990; Wolfe, 1994), the preattentive processing of visual salience across the visual field plays a critical role in determining how efficiently the target will be detected, with less efficient forms of search resulting as the relative salience of the display elements decreases (Cave & Wolfe, 1990; Wolfe, 1994; 1998; see also, Duncan & Humphreys, 1989). Thus, both the attention-based and pre-attentive accounts of visual salience assume that visual salience controls the allocation of focal attention during visual search. The critical difference is that the attention-based account predicts that the presence of a visual singleton should not control focal attention when observers are set to use focused attention to find the target (because the visual salience associated with the visual singleton should not be computed in this search context). In contrast, the preattentive account predicts that the presence of a visual singleton should control focal attention across all search contexts (cf. Wolfe, 1996) Before considering the stronger evidence for the idea that the processing of visual salience depends upon attention, a remark about experiments conducted by Brain and his colleagues is in order. Braun and Julesz (1998; Braun, 1998; Braun & Sagi, 1990) showed that observers can simultaneously search (in separate displays) for both a primary non-salient target, that presumably requires attention to be focused, and a secondary salient target, that presumably requires attention to be divided, with relatively little or no cost. Note, however, that although these findings are important and suggest that salience can be processed within a context in which search for both salient and non-salient targets is performed, they do not directly address whether salience is automatically processed outside the context of singleton search. This is because in this experiment observers fully expected to search for both a salient as well as a non-salient target; thus, the ability to detect salience may have
Inattentional Blindness
55
simply fallen in line with the demands of the task. Thus, the possibility that visual salience might be processed across a variety of different search environments need not be inconsistent with the notion that the processing of visual salience depends on attention. Indeed, if the new conception of singleton detection is correct, then expectation is likely to be an important mediating factor in determining whether a singleton element will appear salient in a display of otherwise homogenous elements. This is because expectation likely controls the division of attention across the visual field, just as it has been shown to control where focal attention is allocated in the visual field (see Yantis, 1998, for a review). As a result, if the computation of visual salience does depend on the attentional set of the observer, then the expectations and other forms of knowledge that observers hold about a particular search environment may come to exert an important influence upon the ability to extract certain kinds of visual information from that environment. In particular, the expectation that a particular visual environment will require only highly focused, "serial," forms of search in order to find desired objects may come to preclude the processing of more global aspects of the environment that require attention to be divided, such as the relative visual salience associated with objects in the scene. Such findings would be important because they would provide strong evidence that the processing of visual salience is contingent on the configuration of the attentional window. Consequently, this contingency would have to be incorporated into existing theories of visual search which assume that the processing of visual salience occurs pre-attentively (Cave & Wolfe, 1990; Koch & Ullman, 1985; Wolfe, 1994). As mentioned above, one way to determine whether the computation of visual salience can occur outside the context of singleton search tasks is to investigate whether the presence of a visual singleton can control the allocation of focal attention when observers are set to use focused attention to search for a target. The methodological challenge posed by this question, then, is to design search tasks that somehow induce observers to use focused attention when a visual singleton is present. In fact, such tasks are regularly used to study attentional capture, though the implications of these studies for theories of visual salience have typically not been clearly spelled out. Recall that existing theories of visual search have interpreted the fast, efficient processing of visual salience during singleton search to represent not only a pre-attentive form of processing, but also a stimulus-driven form of control over focal attention (Cave & Wolfe, 1990; Wolfe, 1994; see also, Bravo & Nakayama, 1992; Koch & Ullman, 1985; and, Nakayama & Joseph, 1998). This is because the allocation of focal attention is thought to be controlled by visual salience in this task, and the computation of visual salience is thought to be based entirely on the physical aspects of the display. However, visual salience may not be the only influence on where focal attention is allocated because observers are also voluntarily searching for the singleton element in this task (Yantis, 1996). Thus, with respect to attentional control, it is unknown whether focal attention is being allocated to the singleton element simply because it is the most salient element in the display or because observers are using top-down control processes to intentionally
56
Gibson and Peterson
search for it. To diminish the effects of these top-down control processes and to isolate the effects of stimulus-driven control processes based on visual salience, a variety of visual search studies have been conducted in which a singleton element is presented, but not as the explicit target of search (see Yantis, 1998; and, Yantis & Egeth, 1999, for reviews). For instance, in many studies of this sort, an irrelevant visual singleton is presented within the context of a search task in which observers have to discriminate the identity of a target letter that appears among several similar letters. The singleton is considered to be irrelevant in these studies because its location in the display is not correlated with the location of the target. Thus, observers should not be using top-down control processes to intentionally search for the singleton in this situation. More importantly, for present purposes, observers should be set to use only focused attention to find the target in this task because there would be nothing gained by considering the salience of the elements. Thus, although originally designed to address the nature of attentional control, these studies can also be used to address the nature of visual salience. In particular, if the processing of visual salience requires attention to be divided, then it should not be computed in this task, and search for the target should be equally efficient regardless of whether the target appears as the singleton or not. In contrast, if the processing of visual salience does occur pre-attentively, then it should be processed regardless of the search task, and focal attention should be consistently guided by the salience of the singleton. Search for the target should therefore be more efficient when the target happens to appear as the singleton than when it happens to appear as one of the non-singletons. Overall, there has been very little evidence that focal attention is guided solely by the presence of the singleton in these experiments. For instance, several studies have failed to find evidence that color, brightness, or motion singletons can attract attention when these features are uncorrelated with the target (Folk & Annett, 1994; Gibson & Jiang, 1998; Jonides & Yantis, 1988; Todd & Kramer, 1994; Theeuwes, 1990; Yantis & Egeth, 1999). Note that small benefits in search efficiency have sometimes been observed in studies of this sort (e.g., Todd & Kramer, 1994; Theeuwes & Burger, 1998). These effects can usually be attributed to misguided search strategies, in that observers may occasionally search voluntarily for the singleton. ~ In addition, some studies have reported that the appearance of an irrelevant abrupt-onset element -- a new-object -- may capture attention in a purely stimulus-driven fashion (Jonides &Yantis, 1988; Yantis & Hillstrom, 1994; Yantis & Jonides, 1984). It is not clear that new objects should be classified as singletons in the same manner as targets that differ from distractors by virtue of a single feature; hence these experiments will not be considered further in this review. 2 The major conclusion that has been drawn from attentional capture studies such as these is that the mere presence of a visual singleton does not attract focal attention in an involuntary manner, at least when it is reasonable to assume that observers are set to use focused attention to find the target. On the face of it, such evidence appears to support the notion that the computation of visual salience is contingent on search set. According to this account, the computation of visual
Inattentional Blindness
57
salience can occur when attention is divided across the display (so that all elements can be simultaneously compared), but not when attention is only narrowly focused on individual display elements. It should be noted, however, that this conclusion is not the only conclusion that can be drawn from these attentional capture studies. In fact, there are a variety of ways in which voluntary or goal-directed attentional control processes may operate in visual search (Folk, Remington, & Johnston, 1992; Gibson & Kelsey, 1998; Yantis & Egeth, 1999), and most researchers have proposed that these processes may influence the allocation of focal attention after the (presumably preattentive) computation of visual salience has occurred. For instance, Wolfe (1994; 1996) has suggested that the influence of visual salience on focal attention may come under voluntary control during the course of visual search, especially after observers have learned that a singleton is irrelevant. Consequently, although the visual salience of an irrelevant singleton may be automatically processed even when observers are set to use focused attention during search, the salience of this element may lose its ability to guide focal attention after repeated exposures (as in the attentional capture studies cited above). Thus, in this view, a visual singleton may have the potential to capture attention only when observers are not set against it. Such findings would have important implications for understanding the nature of visual salience because they would weaken the inferential link between the computation of visual salience and changes in search efficiency. As such, the failure to find effects of visual salience on the allocation of focal attention during visual search may or may not have implications for understanding how visual salience is processed. Gibson and Jiang (1998; see also, Yantis & Egeth, 1999) recently addressed the possibility of this form of top-down control by using a surprise paradigm. More specifically, observers were led to expect that they would only be searching for a non-salient target. After many trials, however, the target letter unexpectedly switched from being highly similar to the distractors to being highly dissimilar to the distractors in that the target now appeared as the single red element among white distractors. If the processing of visual salience requires attention to be divided, then once again, it should not be computed in this task because observers should be set to use only focused attention to find the target in this experiment. Consequently, search for the color singleton target on the surprise trial should be no more efficient than it had been on previous trials in which it was similar to the distractors. If, however, visual salience is computed pre-attentively, then it should be computed regardless of search task and the color singleton target should be assigned the highest salience on the surprise trial. Consequently, focal attention should be allocated more efficiently to the color singleton target on the surprise trial than it had been on previous trials when it appeared similar to the distractors. Moreover, note that the allocation of focal attention to the color singleton target should not be precluded by knowledge of the singleton's irrelevance under these conditions because the appearance of the singleton was unexpected and thus its relevance was unknown.
58
Gibson and Peterson
The results were consistent with the notion that salience is not processed pre-attentively. The proportion of observers who correctly discriminated the identity of the target on the surprise trial was equal to the proportion that was expected to respond correctly on the basis of the preceding search trials, suggesting that observers continued to search inefficiently for the target letter, at least for the first few trials following the change. These findings are therefore inconsistent with the notion that voluntary control processes influenced the allocation of focal attention after visual salience was computed in this task. The findings reviewed thus far appear to be consistent with the notion that visual salience is not processed when observers are set to use only focused attention to find the target. However, there are still other ways in which top-down control processes might operate in these situations to influence the allocation of focal attention after visual salience has been computed. Perhaps the strongest obstacle to concluding that the processing of visual salience depends on attention concerns the possibility that stimulus-based information may never be sufficient to guide focal attention in the absence of an explicit intention to search for it (see Folk, Remington, & Johnston, 1992, for a clear statement of this account). In other words, the possibility exists that visual salience may be processed pre-attentively, but it may have no effect on the allocation of focal attention, unless observers are intentionally set to search for that particular feature value (as they are in singleton search tasks). Thus, once again, the failure to find effects of visual salience on the allocation of focal attention during visual search may or may not have implications for understanding how visual salience is processed. One way this issue can be addressed is by investigating contingencies between the explicit feature set of the observer and the subsequent effects of visual singletons that are incongruent with this set. For instance, in one representative study, Theeuwes (1994; see also, Theeuwes, 1991; 1992; Bacon & Egeth, 1994) required observers to search for a color singleton while trying to ignore an onset singleton. In these studies, it is reasonable to assume that salience is being computed, regardless of one's theoretical perspective, because the task requires it. Indeed, this assumption is supported by evidence that the relevant singleton is typically detected regardless of the number of distractors. However, if a visual singleton is only capable of guiding focal attention when observers are intentionally set to search for that particular feature value, then observers should be able to successfully ignore the irrelevant singleton in this situation. In contrast, if the effect of a visual singleton is not contingent on the explicit feature set of the observer, once visual salience is processed, then observers should not be able to successfully ignore the irrelevant singleton in this situation (assuming it is not less salient than the relevant singleton; see Theeuwes, 1992, for a more detailed discussion of this issue). The results indicated that the irrelevant visual singletons did in fact capture focal attention, which suggests that, once computed, the salience associated with a visual singleton does influence the allocation of focal attention regardless of whether it is relevant or irrelevant to the search task. Thus, we might reasonably conclude that had the singleton been processed in those experiments in which observers were set
Inattentional Blindness
59
to use focused attention (e.g., Gibson & Jiang, 1998), it would have influenced performance. That no such effects were obtained supports the notion that the processing of visual salience requires divided attention. Note, however, that not all researchers agree with the conclusions reported by Theeuwes (1991; 1992; 1994). For instance, Folk and his colleagues designed a paradigm in which an irrelevant singleton cue was presented shortly before the appearance of singleton search display (Folk, Remington, & Johnston, 1992; Folk, Remington, & Wright, 1994; Folk & Remington, 1998). In one group, observers searched for a color singleton target; and, in another group, observers searched for an onset singleton target. In addition, two kinds of irrelevant singleton cues preceded the appearance of the singleton search display. For each group, the irrelevant singleton cue could be either congruent or incongruent with the defining feature of the singleton search display. Thus, when observers were intentionally searching for a color singleton in the singleton search display, the irrelevant color singleton cue would be considered congruent and the irrelevant onset singleton cue would be considered incongruent. Contrary to Theeuwes' (1994) findings, Folk and his colleagues found that only congruent singleton cues appeared to capture focal attention, even though observers should have been set to use divided attention to search for the singleton target in this situation. As a result, Folk and his colleagues have concluded that focal attention will not be controlled by visual salience unless observers are explicitly set to attend to that feature value. These findings therefore appear to be consistent with the notion that the allocation of focal attention can be controlled after the salience of a visual singleton has been computed, which in turn raises the possibility that salience may be processed but have no effect on visual search performance. Recently, other studies have been conducted in an attempt to resolve the apparent discrepancy between Folk (Folk, Remington, & Johnston, 1992; Folk, Remington, & Wright, 1994; Folk & Remington, 1998) and Theeuwes (1991; 1992; 1994). Although a full discussion of this issue is beyond the scope of this chapter, suffice it to say that the evidence obtained thus far appears to favor Theeuwes' notion that the appearance of any visual singleton will capture attention when observers are set to perceive visual salience (Gibson & Wenger, 1999; Theeuwes, Kramer, & Atchley, 2000). Thus, the contingent capture findings reported by Folk and his colleagues may not constrain the findings obtained from other visual search studies after all (e.g., Gibson & Jiang, 1998), which can be interpreted to suggest that the processing of visual salience requires attention to be divided across the display. In sum, we have reviewed a wide range of attentional capture studies with the goal of determining whether visual salience can be processed outside the context of singleton search tasks in which observers are putatively set to use divided attention. As noted above, the primary difference between attention-based and preattentive conceptions of visual salience concerns the extent to which the processing of visual salience can occur outside the context of singleton search tasks. By considering these attentional capture studies, we have been able to evaluate the
60
Gibson and Peterson
extent to which focal attention can be controlled by visual salience both when observers are set to use focused attention to find their target and when they are set to use divided attention to find their target. Based on this evidence, we have concluded that the processing of visual salience is task dependent. In particular, the presence of an irrelevant visual singleton appears to control the allocation of focal attention when observers are set to divide their attention across the display, but not when they are set to use focused attention. Such findings suggest that the computation of visual salience does not occur pre-attentively. In the remainder of this chapter, new empirical studies are reported that seek converging support for this hypothesis.
The Present Experiments Recall that the initial evidence suggesting that the processing of visual salience depends on attention came from studies that explicitly measured observer's awareness of a visual singleton (Mack & Rock, 1998; Joseph et al., 1997). This evidence was criticized earlier on the grounds that the methods may have hampered the allocation of focal attention to the singleton even if it were processed. Generally speaking, then, lack of awareness alone (as in the phenomenon of inattentional blindness) may not adequately reflect the underlying processing of visual salience. One must also consider how this processing relates to the control of focal attention. Accordingly, up to this point in the discussion, we have focused on measures that can reflect the salience-based control of focal attention during visual search (e.g., search efficiency). Consistent with the evidence obtained from inattentional blindness studies (Joseph et al., 1997; Mack & Rock, 1998), this evidence has suggested that the processing of visual salience does not occur pre-attentively. However, we also noted that search efficiency might not represent a pure measure of visual salience because search efficiency may also reflect the contributions from other, top-down, control processes. For this reason, theoretical conclusions about the nature of visual salience should not be based entirely on search efficiency either. Therefore, in the present study, we sought to use both conscious detection and search efficiency as a means of providing potentially converging sources of evidence for understanding the nature of visual salience. Of particular interest in the present chapter is the relation between the processing of visual salience, the conscious detection of visual singletons, and the salience-based control of focal attention. In particular, as behavioral measures of visual salience, do changes in conscious detection and search efficiency tend to cooccur as a function of visual salience or might these two measures become dissociated? Unfortunately, there are no well-developed theories of this relation, at the present point in time. According to the attention-based account of visual salience (Nakayama & Joseph, 1998), there does not appear to be any direct causal relation between the conscious detection of a visual singleton and the salience-based control of focal attention in the sense that one does not necessarily depend upon the other. Thus, functional dissociation appears to be a theoretical possibility in this view. However, there have been very few, if any, studies that have systematically
Inattentional Blindness
61
attempted to measure both search efficiency and conscious awareness across search contexts in which the processing of visual salience would be expected to differ. Consequently, very little is known about the relation between conscious awareness and attentional capture as joint measures of visual salience. Note, however, that there has been speculation about this relation (see e.g., Most & Simons, this volume; Theeuwes & Godijn, this volume). In particular, some researchers have speculated that a visual event may capture attention without any visual awareness of the precipitating event (though some researchers, such as Theeuwes and Godijn, appear to be arguing that observers may lack awareness that their attention was captured rather than lacking awareness of the precipitating event, per se). Such speculation appears to be motivated, at least in part, by the hypothesis that conscious detection may be an overly conservative measure of visual salience; thus, focal attention may be oriented to salient stimuli in the absence of conscious detection. Such findings, should they be obtained, would have important implications for attention-based theories of visual salience because they would suggest that the attention that is required for processing visual salience is not sufficient for conscious detection. Likewise, it is equally important to consider the possibility of the opposite form of dissociation; that is, full awareness of a visual singleton in the absence of attentional capture. Such findings, should they be obtained, would also have important implications for attention-based theories of visual salience because they would suggest that processing of visual salience is not sufficient for attentional capture (perhaps because other, top-down, control processes are also involved). In the present chapter, we provide an initial investigation into this issue by considering whether conscious detection of a visual singleton is possible under conditions in which it is not expected to capture attention. Accordingly, Experiment 1 investigated whether the salience of a visual singleton can be processed when observers are set to use focused attention to search for a non-salient target. To accomplish this, we used the surprise paradigm developed by Gibson and Jiang (1998). As in their original experiment, we assessed whether the salience of an unexpected color singleton was processed by comparing observers' discrimination performance on the surprise trial with their performance on the preceding search trials in which targets were non-salient. More importantly, we also attempted to assess whether the salience of an unexpected color singleton was processed by measuring observers' awareness of this singleton on the surprise trial. A surprise paradigm was used to assess awareness in the present experiment so that observers could not simply rely on their general knowledge of the search environment to report about the singleton (as they could do in previous capture studies in which the appearance of the singleton was made known to the observers and appeared repeatedly; see e.g., Folk & Annett, 1994; Jonides & Yantis, 1988; Todd & Kramer, 1994; Theeuwes, 1990; Yantis & Egeth, 1999). As in Gibson and Jiang's (1998) original experiment, we expected that the appearance of the unexpected singleton would not influence search efficiency. However, the question remains whether observers can consciously detect this singleton. If the salience of the color singleton is processed, then this element may
62
Gibson and Peterson
appear visually conspicuous in the present experiment. Consequently, all (or nearly all) of the observers should report awareness of the unexpected color singleton on the surprise Vial. Note, however, that this latter finding need not be inconsistent with the notion that the processing of salience depends on attention. In particular, although we have been assuming that observers are set to use only focused attention to search for non-salient targets, observers may routinely attend to both global and local levels of the visual search display, regardless of search context. For instance, observers might first attend to the global target display and then narrow their focus to individual display elements. If this occurs, then the initial division of attention across the display may enable global aspects of the display, such as the relative salience of the display elements, to reach conscious awareness. Alternatively, the salience of the color singleton may be processed even when observers are set to use only a focused attention strategy because the processing of salience may occur preattentively and somehow reach conscious awareness without the aid of focal attention. Further experiments will therefore be necessary to disentangle these two explanations if the salience of the unexpected color singleton is found to be unaffected by the present search context. Note that this pattern of results would be unexpected, given the evidence reviewed above, and would suggest a dissociation between the conscious detection of visual salience and the ability of this visual salience to control focal attention. In particular, such findings would contradict our interpretation of previous attentional capture studies, and would suggest that the processing of visual salience is not sufficient to control focal attention. In contrast, if the salience of the color singleton is not processed, then we expected to observe a form of inattentional blindness for the color singleton on the surprise trial (Mack & Rock, 1998). More specifically, observers may attend to only those aspects of the display that are required by the task, which in the context of this search task would tend to be more local than global. If this assumption is correct, then only those observers who happened to have allocated focal attention to the target during the course of visual search should report awareness of its color. In the present study, as in other visual search experiments, it was assumed that target accuracy would be higher on trials in which the target was focally attended than on trials in which the target was unattended. Hence, if this account is correct, then color awareness should be related to target accuracy; in particular, the proportion of correct responses among the aware observers should be relatively high (indicating that the target was attended) whereas the proportion of correct responses among the unaware observers should only be at chance (indicating that the target was not attended). Experiment 1
Method
Participants. The subjects were 120 students from the University of Notre Dame who participated for course credit. All subjects had normal or corrected-tonormal vision.
Inattentional Blindness
63
Stimuli and Apparatus. The stimuli were letters that subtended 1.1 o X 0.8 ~ of visual angle when viewed at a distance of 50 cm (the viewing distance used in this experiment). The stimuli appeared in eight equally spaced positions (0 ~ 45 ~ 90 ~, 135 ~, 180 ~, 225 ~, 270 ~, and 315 ~) around the circumference of an imaginary circle (with a diameter of 9.65~ The letters were created by removing one or more line segments from block figure eights. Letters were shown in uppercase and were chosen from the following set: A, C, E, H, J, L, P, S, and U. All stimuli appeared on a dim background (0.23 cm/m2). The two possible target letters were H and U. The target letter was always present and was equally likely to be the H or U on each trial. The target letter appeared blue (2.57 cm/m 2) during the initial non-salient search segment of the experiment and it appeared red (2.89 cm/m 2) on the single singleton search trial of the experiment (see below for details). In contrast, the distractor letters always appeared blue, and each distractor letter could appear only once within a given display. These two colors were used (instead of the red and white originally used by Gibson & Jiang, 1998) because they were similar in luminance. The stimuli were presented on a ZEOS 14" color monitor equipped with a standard VGA videocard. Response time was measured from the onset of the target display by a ZEOS 486 microcomputer. Procedure. The experiment was divided into two segments. The first segment of the experiment was a standard non-salient search task in which a display of eight blue letters was shown for 143 ms and then energy masked for 200 ms by bright white rectangles. There were a total of 64 trials in this initial non-salient search segment of the experiment, and these trials were preceded by an additional 8 practice trials. This search task was expected to be relatively demanding because the letters were all composed of different configurations of horizontal and vertical line segments, and thus the target letter could not be distinguished from the distractors on the basis of any simple feature discontinuity (Wolfe, Cave, & Franzel, 1989). Observers determined which of two possible target letters (H or U) was present among the distractors, and they were instructed to respond as accurately as possible without worrying about the speed of their response. Accuracy was stressed over speed in the present experiment because unexpected events can affect decision-level processes which in turn can inflate response times (see e.g., Meyer, Niepel, Rudolph, & Schutzwohl, 1991). For this reason, accuracy serves as a better index of visual selective attention in the present experiment. The second segment of the experiment consisted of a single trial in which the target appeared unexpectedly as a red singleton among a homogenous display of blue distractors. Immediately following the surprise trial, observers were asked, "Did you see anything unusual on the last trial of the experiment?" (see also, Mack & Rock, 1998). Note, that we did not attempt to determine whether the salience of the visual singleton was detectable under conditions in which it was expected to appear in the present experiment because Gibson and Jiang (1998) already showed that observers could use the color singleton to efficiently guide focal attention when it was expected to appear.
64
Gibson and Peterson
Results and Discussion
The overall proportion of observers who correctly discriminated the identity of the target on each of the 65 trials in this experiment is shown in Figure 1. We began by evaluating whether the sudden appearance of the unexpected color singleton had any effect on the overall efficiency with which observers correctly discriminated the identity of the target. Following Gibson and Jiang (1998), proportion correct was regressed on trial number in the initial, non-salient, search condition. Predicted performance in the non-salient search condition following the initial 64 trials was estimated to be 0.72. Thus, target accuracy on the surprise trial should be significantly higher than this predicted value if the unexpected color singleton increased the efficiency of visual search. However, actual performance on 1 0.9 0.8
~
0~
~
o
0.6 ~
o
0.5
omU
o o
t
~
0.40.3
~
0~
~
0~
~
0 i
I
!
!
I
!
I
I
8
16
24
32
40
48
56
I
64
Surprise
Search Trial N u m b e r Figure 1. Proportion of observers in Experiment 1 who responded correctly on each of the non-salient (N = 1 - 64) and salient (N - 65, "surprise") search trials. The best-fitting regression line is shown in bold for the non-salient search trials, and the error bars shown for the surprise condition represent a 95% confidence interval.
the surprise trial was found to be 0.71 in this experiment. Although the observed value of 0.71 was slightly less than the predicted value of 0.72, this predicted value was included within the 95% confidence interval (0.78 to 0.62) computed around
Inattentional Blindness
65
actual performance on the surprise singleton trial. The overall results of Experiment 1 are therefore consistent with the main findings reported by Gibson and Jiang (1998), and suggest that observers continued to search inefficiently for the target letter on the surprise trial even though it could now be identified on the basis of its salience. Although the unexpected color singleton did not appear to capture focal attention on the surprise trial, the possibility remains that observers were nevertheless aware of its appearance. To address this question, immediately after observers reported which target was present on the surprise trial we asked whether they had seen anything unusual on the last trial of the experiment, and their responses were recorded by the experimenter. All those who reported an unusual event, reported about the changed color of the display. On the basis of these responses, the subjects were divided into two groups. The "aware" group consisted of those subjects who reported seeing something red on the last trial of the experiment; whereas, the "unaware" group consisted of those subjects who did not report the appearance of red in the display. These results suggested that observers did not automatically detect the singleton target, as significantly less than 100% of the observers reported awareness of the unexpected singleton on the surprise trial (p < .05). More specifically, only 63% of the observers (N = 76) reported seeing the unexpected color of the target on the surprise trial; the remaining 37% of the subjects (N = 44) failed to report the unexpected color of the target. These findings constitute inattentional blindness for the unexpected singleton and are consistent with the findings reported by Mack and Rock (1998). Although the relatively low level of awareness achieved in the present study suggests that the processing of salience does not automatically occur when observers are set to use focused attention to find the target, there was still a substantial number of observers who were able to detect the appearance of the unexpected color singleton. How was this awareness achieved? The results suggested that observers became aware of the redness of the target during the course of visual search only because they happened to focus attention on this element. This interpretation is supported by the finding that target accuracy was higher in the aware group (0.78) than it was in the unaware group (0.59). Performance in the aware condition was significantly above chance (which was 0.50); whereas, performance in the unaware condition did not differ significantly from chance. In addition, the level of accuracy attained in the aware condition (0.78) was above the 95% confidence interval computed around actual performance in the unaware condition (0.72 to 0.44). Likewise, the level of accuracy attained in the unaware condition (0.59) was below the 95% confidence interval computed around actual performance in the aware condition (0.86 to 0.67). There was no evidence that the observers classified as aware and unaware used different search strategies on the trials preceding the surprise trial. Overall target accuracy averaged across the first 64 trials was 0.67 in the aware group and 0.69 in the unaware group, t(118) = 0.10, SE = 0.16, p > .80. Suppose observers in the aware group achieved awareness because they consistently divided their
66
Gibson and Peterson
attention across the display, even before the singleton appeared, whereas observers in the unaware group used focused attention. It is reasonable to expect that this difference in search strategy would have produced differences in visual search performance across these two groups of observers prior to the appearance of the unexpected color singleton. The fact that no such differences were obtained is consistent with the hypothesis that awareness of the target's new color on the surprise trial was mediated by focal attention during the course of visual search.
Experiment 2 The results of Experiment 1 suggest that salience is not automatically processed when observers are set to use focused attention to search for a target and are not expecting salient stimuli to appear. Rather, awareness of the target's color in this situation appeared to be mediated primarily by the allocation of focal attention to the target letter during the course of visual search. Thus, the intention to search for a non-salient target appears to have important functional consequences for the processing of visual salience both when search efficiency and conscious detection are measured. One question that arises from Experiment 1 is whether these functional consequences depend on extended experience with the particular search task, or whether these functional consequences can occur more immediately. In particular, observers may only commit to using a focused attention strategy after they have learned that there is nothing to be gained by dividing their attention in order to process global aspects of the display. Therefore, the processing of visual salience may be altered only after observers have firmly committed to using a focused attention strategy, which may take some time to develop. Experiment 2 attempted to address this issue by presenting the unexpected color singleton on the very first trial of the experiment (following three instructional trials in which the non-salient search stimuli were introduced). Following this single trial, observers were once again asked to report which of the two possible target letters was present in the display. In addition, we assessed observers' awareness of the color singleton by using two different procedures in the present experiment. For one group of observers, the procedure was exactly the same as in Experiment 1. In contrast, the other group of observers was given a two-alternative forced choice in which they were asked to decide whether all the letters appeared blue or whether the color of one of the letters appeared different than the rest (in which case they were also asked to specify the nature of this difference). We decided to use this alternative procedure in case observers were reluctant to report the occurrence of the unexpected event in response to the rather open-ended questioning used in Experiment 1; we hoped that this procedure would allay any reluctance by providing the appropriate response as one of the two response alternatives. The predictions remained the same as in Experiment 1. If the processing of visual salience occurs in this situation, then all (or nearly all) of the observers should report seeing the unexpected color singleton; however, if awareness of the color of the target depends on focal attention, then only
Inattentional Blindness
67
those observers who happen to attend to the target should become aware of its new color. Method
Participants. The participants were 48 students from the University of Notre Dame who participated for course credit. All subjects had normal or corrected-to-normal vision. None of these observers participated in the previous experiment. Thirty-two of the observers were asked open-ended questions about what they saw in the display, and an additional 16 observers were given a twoalternative forced choice. Stimuli and Apparatus. The stimuli and apparatus were the same as those used in Experiment 1. Procedure. After reading the instructions, observers were shown an example of the non-salient search display and the ensuing mask display for an unlimited period of time. A similar search display was then shown again for 143 ms, the exposure duration used in the present experiment, to provide an example of how quickly the display would be presented in the actual experiment. Finally, a third example was shown in which a non-salient search display first appeared for 143 ms followed immediately by the mask display for 200 ms. After addressing any questions, the experimenter then initiated the first (and only) trial of the experiment. Contrary to the introductory trials in which all the letters (including the target) appeared blue, the target always appeared unexpectedly as a single red element among blue distractors on the sole experimental trial. Each of the two target letters appeared equally often at each of the eight possible locations across observers. Following the disappearance of this display, the experimenter immediately asked observers which target letter was present in the display. Next, she either asked observers whether they noticed anything unusual in the display (the open-ended procedure), or told observers that she was interested in knowing what color the display appeared to them. Observers were offered two possible descriptions: All the letters appeared blue or one letter appeared to be different from the rest (the twoalternative forced choice procedure). If observers chose the latter response, they were also asked to describe the nature of this difference. Results and discussion
The primary question of interest in this experiment was whether the salience of the color singleton would be more likely to be processed in this experiment given that observers had virtually no experience searching for the nonsalient target, and thus, perhaps might have been less likely to commit to a strategy in which attention was narrowly focused on individual display elements. As expected, overall target accuracy on the surprise trial was much lower in this experiment (0.56), where the surprise trial occurred on the very first trial of the experiment, than it was in Experiment 1 (0.71), where the surprise trial occurred on
68
Gibson and Peterson
the sixty-fifth trial of the experiment. Nevertheless, the overall level of target accuracy achieved on the first (surprise) trial in the present experiment was very similar to the overall level of target accuracy achieved on the first trial of Experiment 1 (0.58), suggesting once again that the unexpected appearance of the color singleton did not capture focal attention. Had it done so, accuracy would have been higher. Such evidence suggests that the salience of the unexpected color was not processed. Likewise, as in Experiment 1, the overall percentage of observers (averaged over both interview procedures) who reported awareness of the color singleton was significantly below 100% (p < .05). In fact, only 50% of the observers now reported awareness of the color singleton. This finding therefore provides converging evidence for the notion that the visual salience of the unexpected color singleton was not processed in this situation. There was no indication that observers were more reluctant to report awareness of the color singleton in response to the open-ended procedure than they were in response to the two-alternative forced choice procedure. In fact, fewer observers actually reported awareness of the color singleton in response to the twoalternative forced choice procedure (31%) than they did in response to the openended procedure (59%), even though overall target accuracy was identical (0.56) in both groups. It should also be noted that all of the observers who reported that one of the letters appeared different than the rest in the two-alternative forced choice procedure reported the correct color difference. Likewise, all those you reported something unusual in the open-ended procedure reported about the changed color of the display. We therefore combined these two groups in the remaining discussion. Once again, target accuracy was significantly higher in the aware group (0.67) than it was in the unaware group (0.46). Target accuracy was significantly above chance in the aware group and not different from chance in the unaware group. In addition, neither value was contained within the other group's 95% confidence interval, which spanned between 0.79 and 0.53 in the aware group and between 0.56 and 0.37 in the unaware group. Hence, the present findings again suggest that the processing of visual salience is impaired during search for a non-salient target even when observers have very little experience within this particular search context. General Discussion
The primary purpose of the present study was to determine whether the salience of a visual singleton can be processed when observers are set to search for a non-salient target. In the present study, both awareness and search efficiency were used to assess the processing of visual salience. As such, the present study provided an initial investigation into the relation between the conscious awareness of visual singletons and attentional capture by those singletons. As noted above, there are two general questions that need to be addressed in order to better understand this relation. First, can a visual singleton capture attention without being consciously detected? Second, can the conscious detection of a visual singleton occur without attentional capture? Study of these possible dissociations is important, because
Inattentional Blindness
69
evidence of the first sort constrains the use of conscious awareness as a measure of visual salience while evidence of the second sort constrains the use of search efficiency as a measure of visual salience. The present study focused on the latter question and found convergence between the two measures of visual salience. As in previous studies (Gibson & Jiang, 1998), the present study found that the unexpected appearance of a color singleton had no influence on search efficiency following small or large amounts of experience searching for a non-salient target. A significant proportion of observers were also found to be unaware of the color of the target when it unexpectedly appeared as a color singleton under these conditions. Moreover, those observers who did report awareness of the target's new color were consistently better at discriminating the identity of the target on the surprise trial (though not before) than those who were unaware, suggesting that awareness occurred when observers happened to allocate focal attention to the target during the course of their random search for the target. As such, the present findings can be interpreted to suggest that the processing of visual salience does not automatically occur when observers are set to use a focused attention strategy to search for the target. Hence these findings provide direct empirical support for recent theories of visual search which contend that the processing of visual salience requires attention to be divided across the display (Nakayama & Joseph, 1998; Treisman, 1993). The evidence that visual salience is not automatically processed when observers are set to use focused attention during search may seem at odds with other findings such as those recently reported by Braun and Julesz (1998). Note, however, that there were important procedural differences between these two studies. In particular, recall that Braun and Julesz found that search for salient and non-salient targets could both be performed simultaneously, with little or no cost, when observers were explicitly instructed to perform both search tasks. In contrast, the present findings were obtained under conditions in which observers were only explicitly instructed to search for a non-salient target. The different pattern of findings observed across these two studies can therefore be interpreted to support a role for expectation in the processing of visual salience. In particular, these findings suggest that expectation can influence the processing of visual salience by controlling whether the attentional window is divided, focused, or both, depending on the search environment, which in turn can have important functional consequences for the processing of visual salience. That expectation can play this role has sometimes been overlooked in other studies that have investigated the role of attention in the processing of visual salience. For instance, Nakayama and Joseph (1998) failed to consider the importance of expectation and argued that efficient singleton detection may be shown to depend on attention only under relatively extreme task conditions in which attention can be completely diverted from the singleton search task. According to this account, observers in Braun and Julesz' (1998) dual-task experiment were capable of efficiently detecting the presence of the visual singleton while also searching for a non-salient target, because the search for the non-salient target did not completely exhaust the available attentional resources. However, if this account
70
Gibson and Peterson
is correct, then efficient detection of the singleton should also have been observed in the present study, as very similar non-salient search tasks were used across the present study and the Braun and Julesz study. Thus, expectation appears to play a critical role in the processing of visual singletons. As such, the present findings are inconsistent with theories of visual search such as guided search which contend that visual salience is automatically processed in all search contexts (Wolfe, 1994). Fortunately, however, these theories can be easily fixed by allowing the perception of salience to be controlled by voluntary processes that determine whether the attentional window is divided across display elements, focused on individual display elements, or both. Moreover, the findings obtained in Experiment 2 also suggested that the attentional window may be configured in such a way as to preclude the processing of visual salience even though the observer has had very little experience with the search task. It should be noted, however, that the observers in Experiment 2 were shown examples of the visual search display before the search task began which may have assured them that a focused attention strategy was appropriate in that situation. Further experiments will therefore be needed to determine whether observers will spontaneously process visual salience when even less is known about the nature the search environment. The failure to find evidence that the salience of visual singletons was automatically processed in the present study thus has important implications for theories of attentional control. Recall that many theorists have postulated a direct causal relation between the processing of visual salience and attentional capture (Bravo & Nakayama, 1992; Cave & Wolfe, 1990; Koch & Ullman, 1985; Nakayama & Joseph, 1998), which has served as the primary theoretical basis for stimulusdriven forms of attentional control. Yet, this form of attentional control has been challenged by a variety of different studies which have been interpreted to show that this relation is actually contingent on the attentional set of the observer (Folk et al., 1992; Gibson & Jiang, 1998; Yantis & Egeth, 1999). More specifically, these studies have typically been interpreted to suggest that visual salience has no direct causal effect on the allocation of focal attention. Rather, these studies have tended to emphasize the importance of top-down attentional control processes in visual search. However, as we have argued in the present chapter, the failed effects of visual salience on search efficiency may not reflect voluntary processes that underlie the control of focal attention. Rather these failed effects may reflect voluntary processes that underlie the computation of visual salience, which in tum influences the control of focal attention. In fact, when interpreted in this light, there is actually very little evidence to suggest that top-down control processes dominate salience-based control processes in visual search (see above for details). Rather, the available evidence suggests that visual salience actually does have a strong influence on attentional control, so long as it is processed. The present findings provide converging support for this conclusion by providing stronger evidence (from two converging measures) that the processing of visual salience does not occur when observers are set to use focused attention during visual search.
Inattentional Blindness
71
We, of course, do not wish to suggest that top-down control processes never operate in opposition to salience-based control processes. However, further experiments will be needed to adequately demonstrate this effect. For instance, consider the experiment reported by Lamy and Tsal (1999) in which observers were required to search for a target that was defined by a conjunction of color and form. The allocation of focal attention in these conjunction search tasks is thought to be guided, at least in part, by an evaluation of the relative salience of display elements (Wolfe, 1994). In other words, the computation of visual salience can increase the efficiency of visual search in this situation and observers may therefore divide attention in order to accomplish this. Nevertheless, Lamy and Tsal found that search rates were unaffected (at least on target present trials) by the presence of a color singleton distractor that repeatedly appeared throughout the duration of the experiment, despite the fact that this distractor should have appeared as the most salient element in the display and therefore should have attracted focal attention. As such, these findings might be consistent with the notion that voluntary attentional control processes may operate in opposition to salience-based control processes. However, it is also important to note that Lamy and Tsal only measured search efficiency in their experiment and did not explicitly measure whether observers were consciously aware of the singleton distractor. This question could be easily addressed by combining the stimuli used in the Lamy and Tsal study with the surprise procedure used in the present study so that both search efficiency and conscious detection could be adequately measured. Clear cut evidence that top-down control processes offset salience-based control processes would be obtained if observers were capable of consciously detecting the presence of the visual singleton without any change in search efficiency. Another important topic that requires further research concerns the possibility that observers may divide their attention across a visual search display without processing visual salience; that is, observers may be able to selectively process certain global aspects of a visual search display without necessarily processing others. For instance, Bacon and Egeth (1994) have suggested that observers may be able to efficiently process a visual singleton either by indiscriminately processing feature differences (as in the processing of visual salience) or more selectively by searching for particular feature values. In support of this "feature detection" mode, Gibson and Kelsey (1998) argued that observers may become set to process certain, selective, features that are associated with the appearance of the global search display. In particular, Gibson and Kelsey presented irrelevant singleton cues just before the appearance of non-salient search displays and found that these singleton cues attracted focal attention only when they matched a global feature of the display. Based on these findings, we might surmise, in the context of the present surprise paradigm, that observers would be intentionally looking for the appearance of blue (because the appearance of the non-salient search displays as a whole were defined by the appearance of blue). More importantly, we might also surmise that the appearance of an unexpected blue singleton (presented among red distractors) on the surprise trial would have captured focal attention in
72
Gibson and Peterson
the present paradigm because of this prior set for blue. Such findings would suggest that observers can in fact selectively process global aspects of the display even though they are engaged in search for a non-salient target. Thus, different search contexts may not influence whether attention is divided or focused so much as it influences how global aspects of the display are processed using divided attention. Whether such influences, if they are found to exist, reflect voluntary processes that operate on attentional control processes that compete with salience-based control processes (see e.g., Bacon & Egeth, 1994) or on earlier, pre-control, processes remains to be determined. One final issue that should be addressed concerns the possibility that the appearance of the visual singleton was consistently perceived in the present study, but this perception was not consciously available for report immediately following its disappearance. Indeed, Moore and Egeth (1997) have argued that attention may be needed to encode the on-line perception of an unexpected event into memory (see also, Wolfe, 1999). Hence, the dependency observed between target accuracy and awareness in the present experiments might be more correctly interpreted as reflecting a relation between attention and memory for the unexpected event rather than as reflecting a relation between attention and perception for this event, the latter of which may have been revealed if we had used a more sensitive, on-line measure of these percepts. Consequently, the present findings may in fact be consistent with the notion that the salience of visual singletons is automatically processed during search for a non-salient target. Note, however, that Moore and Egeth's analysis was originally applied to the dual-task paradigm used by Mack and Rock which was designed in such a way that the perceptual effects of the unexpected visual singleton may not have easily influenced performance on the primary line-length judgment task. In contrast, the single-task paradigm used in the present study was designed in such a way that the perceptual effects of the unexpected visual singleton could have easily influenced performance on the visual search task. In fact, the unexpected visual singleton was the actual target of search in the present study. Consequently, target discrimination performance on the surprise trial in the present experiment had the potential to reflect the on-line perceptual effects of the unexpected color singleton. Hence, if one assumes that the perception of salience, conscious or otherwise, should attract focal attention, as many theories of visual search appear to do (Cave & Wolfe, 1990; Koch & Ullman, 1985; Wolfe, 1994), then one can interpret the lack of improvement observed on this trial as converging evidence that the salience of the visual singleton was not consistently processed. However, as noted above, search efficiency may also have been constrained by other processes and therefore may have produced misleading results as well. Thus, it is possible that the present results were obtained because both conscious awareness and search efficiency are relatively insensitive measures of visual salience. Further understanding of these possible inadequacies, and of visual salience itself, may therefore ultimately depend on the development of additional behavioral measures of visual salience.
Inattentional Blindness
73
Footnotes
1 For instance, in one interesting modification of the irrelevant singleton paradigm, Theeuwes and Burger (1998) found that observers had difficulty ignoring irrelevant color singletons whose identity was either congruent or incongruent with the response indicated by the target. Ironically, however, asking observers to ignore potentially incongruent singletons may actually require them to intentionally process visual salience so that they can efficiently detect (and hopefully avoid) the singleton's location in the display. These findings may therefore be consistent with the notion that the processing of visual salience depends on divided attention, and that once processed, visual salience does influence the allocation of focal attention. 2 Others have argued that these results reflect a sensory bias that favors the abrupt-onset element (Martin-Emerson & Kramer, 1997; Gellatly, Cole, & Blurton, 1999; Gibson, 1996a; 1996b; see also, Yantis & Jonides, 1996). For instance, Gibson and Boker (1998) used a temporal coding model of temporal integration to argue that the perception of "no-onset" elements in these experiments is significantly delayed relative to the perception of the onset singleton. Clearly, one can no longer conclude that onset singletons capture attention under these conditions. References
Bacon, W. F. & Egeth, H. (1994). Overriding stimulus-driven attentional capture. Perception & Psychophysics, 55, 485-496. Braun, J. (1998). Divided attention: Narrowing the gap between brain and behavior. In R. Parasuraman (Ed.), The attentive brain (pp. 327-352). MIT: Cambridge, MA. Braun, J., & Julesz, B. (1998). Withdrawing attention at little or no cost: Detection and discrimination tasks. Perception & Psychophysics, 60, 1-23. Braun, J., & Sagi, D. (1990). Vision outside the focus of attention. Perception & Psychophysics, 48, 45-58. Bravo, M., & Nakayama, K. (1992). The role of attention in different visual search tasks. Perception & Psychophysics, 51,465-472. Cave, K. R., & Wolfe, J. M. (1990). Modeling the role of parallel processing in visual search. Cognitive Psychology, 22, 225-271. Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review, 96, 433-458. Folk, C. L., & Annett, S. (1994). Do locally defined feature discontinuities capture attention? Perception & Psychophysics, 56, 277-287. Folk, C. L., & Remington, R. W. (1998). Selectivity in distraction by irrelevant featural singletons. Journal of Experimental Psychology: Human Perception and Performance, 24, 847-858.
74
Gibson andPeterson
Folk, C. L., Remington, R. W., & Johnston, J. C. (1992). Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 18, 1030-1044. Folk, C. L., Remington, R. W., & Wright, J.H. (1994). The structure of attentional control: Contingent attentional capture by apparent motion, abrupt onset, and color. Journal of Experimental Psychology: Human Perception and Performance, 20, 317-329. Gellatly, A., Cole, G., & Blurton, A. (1999). Do equiluminant object onsets capture visual attention? Journal of Experimental Psychology: Human Perception and Performance, 25, 1609-1624. Gibson, B. S. (1996a). Visual quality and attentional capture: A challenge to the special status of abrupt onsets. Journal of Experimental Psychology: Human Perception and Performance, 22, 1496-1404. Gibson, B. S. (1996b). The masking account of attentional capture: A reply to Yantis and Jonides (1996). Journal of Experimental Psychology: Human Perception and Performance, 22, 1514-1522. Gibson, B.S., & Boker, S.M. (1998). Temporal integration impairs the detection of object change. Paper presented at the 38 th Annual Meeting of the Psychonomic Society. Dallas, TX. Gibson, B. S., & Jiang, Y. (1998). Surprise! An unexpected color singleton does not capture attention in visual search. Psychological Science, 9, 176-182. Gibson, B. S., & Kelsey, E. M. (1998). Stimulus-driven attentional capture is contingent on attentional set for displaywide visual features. Journal of Experimental Psychology: Human Perception and Performance, 24, 699-706. Gibson, B. S. & Wenger, M. J. (1999). A new look at contingent capture. Paper presented at the 39 th Annual Meeting of the Psychonomic Society. Los Angeles, CA. Hillstrom, A. P. & Yantis, S. (1994). Visual motion and attentional capture. Perception & Psychophysics, 55, 399-411. Jonides, J., & Yantis, S. (1988). Uniqueness of abrupt visual onset in capturing attention. Perception & Psychophysics, 43, 346-354. Joseph, J. S., & Optican, L. M., (1996). Involuntary attentional shifts due to orientation differences. Perception & Psychophysics, 58, 651-665. Joseph, J. S., Chun, M. M., & Nakayama, K. (1997). Attentional requirements in a "preattentive" feature search task. Nature, 387, 805-807. Koch, C. & Ullman, S. (1985). Shifts in selective attention: Toward the underlying neural circuitry. Human Neurobiology, 4, 219-227. Lamy, D. & Tsal, Y. (1999). A salient distractor does not disrupt conjunction search. Psychonomic Bulletin & Review, 6, 93-98. Mack, A. & Rock, I. (1998). Inattentional blindness. MIT: Cambridge, MA. Martin-Emerson, R. & Kramer, A. F. (1997). Offset transients modulate attentional capture by sudden onset. Perception & Psychophysics, 59, 739-751.
Inattentional Blindness
75
Meyer, W. V., Niepel, M., Rudolph, U., & Schutzwohl, A. (1991). An experimental analysis of surprise. Cognition and Emotion, 5, 295-311. Moore, C. M., & Egeth, H. (1997). Perception without attention: Evidence of grouping under conditions of inattention. Journal of Experimental Psychology: Human Perception and Performance, 23, 339-352. Nakayama, K. & Joseph, J. S. (1998). Attention, pattern recognition, and pop-out in visual search. In R. Parasuraman (Ed.), The Attentive Brain (pp. 279298). MIT: Cambridge, MA. Neisser, U. (1967). Cognitive Psychology. New York: Appleton-CenturyCrofts. Raymond, J. E., Shapiro, K. L., & Arnell, K. M.(1992). Temporary suppression of visual processing in an RSVP task: An attentional blink? Journal of Experimental Psychology: Human Perception and Performance, 18, 849-860. Simons, D. J. (2000). Attentional capture and inattentional blindness. Trends in Cognitive Sciences, 4, 147-155. Theeuwes, J. (1990). Perceptual selectivity is task-dependent: Evidence from selective search. Acta Psychologica, 74, 81-99. Theeuwes, J. (1991). Cross-dimensional perceptual selectivity. Perception & Psychophysics, 50, 184-193. Theeuwes, J. (1992). Perceptual selectivity of color and form. Perception &
Psychophysics, 51,599-606. Theeuwes, J. (1994). Stimulus-driven capture and attentional set: Selective search for color and visual abrupt onset. Journal of Experimental Psychology: Human Perception and Performance, 20, 799-806. Theeuwes, J., Kramer, A. F. & Atchley, P. (1999). Attentional effects on preattentive vision: spatial precues affect the detection of simple features. Journal of Experimental Psychology: Human Perception and Performance, 25, 341-347. Theeuwes, J. & Burger, R. (1998). Attentional control during visual search: The effect of irrelevant singletons. Journal of Experimental Psychology: Human Perception and Performance, 24, 1342-1353. Todd, S. & Kramer, A. F. (1994). Attentional misguidance in visual search. Perception & Psychophysics, 56, 198-210. Treisman, A. (1988). Features and objects: The 14th Bartlett Memorial Lecture. Quarterly Journal of Experimental Psychology, 40A, 201-237. Treisman, A. (1993). The perception of features and objects. In A. Baddeley & L Weiskrantz (Eds.), Attention, Selection, Awareness, and Control (pp. 5-35). Oxford University Press: Oxford, UK. Treisman, A. & Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology, 12, 97-136. Wolfe, J. M. (1994). Guided search 2.0: A revised model of visual search.
Psychonomic Bulletin & Review, 1,202-238. Wolfe, J. M. (1996). Extending guided search: Why guided search needs a preattentive item map. In A. F. Kramer, M. G. H. Coles, & G. D. Logan (Eds.),
76
Gibson and Peterson
Converging Operations in the Study of Visual Selective Attention (pp. 247-270). APA: Washington, D.C.
Wolfe, J. M. (1998). Visual search. In H. Pashler (Ed.), Attention (pp. 1374). Psychology Press: East Sussex, UK. Wolfe, J. M. (1999). Inattentional amnesia. In V. Coltheart (Ed.), Fleeting Memories: Cognition of Brief Visual Stimuli (pp. 71-94). MI T Press: Cambridge, MA. Wolfe, J. M., Cave, K. R., & Franzel, S. L. (1989). Guided search: An alternative to the feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance, 15, 419-433. Yantis, S. (1993). Stimulus-driven attentional capture. Current Directions in Psychological Science, 2, 156-161. Yantis, S. (1998). Control of visual attention. In H. Pashler (Ed.), Attention (pp. 223-256). Psychology Press: East Sussex, UK. Yantis, S. & Egeth, H. (1999). On the distinction between visual salience and stimulus-driven attentional capture. Journal of Experimental Psychology." Human Perception and Performance, 25, 661-676. Yantis, S. & Hillstrom, A. P. (1994). Stimulus-driven attentional capture: Evidence from equiluminant visual objects. Journal of Experimental Psychology: Human Perception and Performance, 20, 95-107. Yantis, S. & Jonides, J. (1984). Abrupt visual onsets and selective attention: Evidence from visual search. Journal of Experimental Psychology: Human Perception and Performance, 1O, 601-620. Yantis, S. & Jonides, J. (1990). Abrupt visual onsets and selective attention: Voluntary versus automatic allocation. Journal of Experimental Psychology: Human Perception and Performance, 16, 121-134. Yantis, S. & Jonides, J. (1996). Attentional capture by abrupt onsets: New perceptual objects or visual masking? Journal of Experimental Psychology: Human Perception and Performance, 22, 1505-1513. Author Notes
The authors wish to thank Dan Simons for his comments on a previous draft of this chapter and Kimberly S. Salvagni for her valuable assistance in conducting the experiments. The writing of this chapter was supported by a grant from the National Science Foundation (SBR-9817245) awarded to BSG. Correspondence conceming this article should be addressed to: Brad Gibson, Department of Psychology, 118 Haggar Hall, University of Notre Dame, Notre Dame, IN 46556. Email:
[email protected].
Attraction, Distraction, and Action: Multiple Perspectives on Attentional Capture C. Folk and B. Gibson (Editors) @2001 ElsevierScience B. V. All rights reserved.
77
Involuntary Orienting to Flashing Distractors in Delayed Search? H a r o l d Pashler
Common sense suggests that abrupt change in the sensory environment often captures our attention, and early writers on attention generally endorsed this view. Titchener (1908), for example, remarked that any sudden change or movement, including a change in pitch, could distract someone from concentration on something else (p. 192), and James (1890/1950) made similar suggestions. Recent studies using visual search tasks to measure attention shifts have supported and refined this hypothesis. Abrupt appearance of a new object does indeed seem to trigger a shift of visual attention to the object even when the shift is unhelpful to performance. This is demonstrated by faster responses to targets that appear suddenly as compared to those that "fade in" (even when sudden appearance does not predict that a stimulus will be a target), and also by faster responses to cued items that follow nonpredictive cues (Yantis, 1994; Yantis & Hillstrom, 1994). Remington, Johnston, and Yantis (1992) found that even in blocks of trials in which the target never appeared in the position of the onset cue, thus providing a maximum incentive to ignore it, the cue still apparently drew attention. Contrary to the views of early writers, however, other changes such as offsets or changes in color do not generally seem to produce involuntary orienting (see Yantis, 2000, for an overview). While the findings just described would seem to imply that orienting to onsets is completely involuntary, recent results challenge this view, and suggest that onset-triggered shifts may be contingent on what task set a person has chosen to adopt. Folk, Remington and Johnston (1992) had subjects make a speeded discrimination, choosing between an = and an X (see Figure 1). Two different tasks were paired with two different cue sequences. In the single-item task (the upper box on the right), a single symbol (= or X) was presented in one of four positions at the comers of an imaginary square. In the color-selection task (lower right box), there were four symbols; one of these was red, and subjects responded to that one. One of the cue sequences consisted of tiny flashing disks surrounding one of the positions (Onset Cue sequence). Onset cues seemed to produce involuntary orienting in the single-item task (performance was worsened by the presence of the cue even when its location never predicted the target position). They did not have this effect in the color-selection task, however. In the Color Cue sequence, red dots surrounded one location and green dots surrounded the other locations. Color cues interfered with performance in the color-selection task even in blocks where the cued location never
78
Pashler
Fixation Display
Target Display
Single-Item Task
Onset Cue
C]
[-1 [3
Color Cue
fi @
ColorSelection Task
Figure 1. Design used by Folk, Remington and Johnston (1992). Subjects see a sequence of four displays, proceeding from left to right. The four arrows depict different cue-target sequence conditions.
predicted the location of the target. Not surprisingly, they had no such effect in the single-item task. To account for this pattem of results, Folk et al. proposed what they termed the Contingent Involuntary Orienting (CIO) hypothesis. According to this theory, there is no truly automatic (task-set independent) orienting to onsets, unique colors, or any other stimulus property. Rather, what appears to be involuntary orienting occurs when observers have adopted a task set to optimize performance in the primary task, and this task set governs the response to the cues as well as the display that requires a response. If the relevant item is going to appear suddenly in uncertain locations, according to the CIO account, people adopt a set to orient to onsets. It is evidently impossible to have this set in place by the time the display appears without having it set up at least 200 ms earlier, and thus the set affects processing triggered by the cue as well as the target display. Consequently, a rapidonset cue causes some degree of orienting to its location even when this orienting is predictably disadvantageous. Presumably, the disadvantage of orienting to the cue is more than compensated for by the benefit of having this set in place when the target display appears. Similarly, the set to select red stimuli, adopted in
Contingent Orienting
79
anticipation of a color-selection task, spills over to produce seemingly involuntary orienting to a red cue. The Folk et al. data are consistent with the CIO hypothesis, but for the present author at least, the results (and supporting evidence amassed by Folk & Remington, 1999, and Gibson & Kelsey, 1998) seemed less than fully convincing. For one thing, the onsets that were successfully ignored in the color-selection task appeared prior to, rather than concurrently with, the display. Thus, the results do not necessarily demonstrate that observers can shut out interfering onset stimuli while they are happening (cf. Gibson & Wenger, 1999). For another, the cue effects are relatively small (26 ms effect of onset cues in the onset target condition), as one might expect given the small display load, making it difficult to gauge their presence and their nature with complete confidence (Luck & Thomas, 1999). A third reason that the Folk et al results seem less than compelling is that their account of the colorselection task is puzzling in certain respects. The four stimuli displayed in the colorselection task are all rapid onsets, after all. While the property of being an onset does not discriminate target from distractor, neither did it in many studies finding involuntary orienting to onsets (Jonides & Yantis, 1988). Further, even in the onset task, it is not clear in exactly what sense the property of being an onset is essential; it is not needed to discriminate target from distractors, because there are no distractors. CIO would seem to make a simple and, to the author, quite counterintuitive prediction to which the objections or conceptual puzzlements just mentioned would not apply. The prediction is this: while people search a crowded display based on color, attempting to ignore interspersed distractors of a different color, it should make no difference at all if the distractors flash on and off while the search is underway. The problem in testing this prediction is that, according to the CIO hypothesis, the prediction will not hold if the relevant stimuli themselves appear suddenly as they do in a conventional search experiment. If relevant stimuli are onsets, observers may be expected voluntarily to set themselves to orient to onsets as they are presumed to have done in the single-item task used by Folk et. al. To get around this problem, search displays in the experiment described below were presented before the subject knew what target he or she would be searching for. The subject was informed about the target by a spoken message played through the computer speakers. To produce robust distractor effects, very busy displays were used. Each search display consisted of 30 red digits scattered quasi-randomly throughout the CRT monitor screen, sometimes with 30 additional distractors added (Figure 2 and 3). Three hundred milliseconds after onset, the computer played a wave file consisting of a spoken digit and the subject began searching the red items to determine if this target digit was present. In no-distractor blocks, the display contained only the 30 red digits. In static-distractor blocks, it contained the red digits plus another 30 green digits interspersed among them (some of which might have the same numerical identity as the target). In flashing-distractor blocks, the display contained the red digits plus 30 green digits flashing 200 ms on, 200 ms off, and so on.
80
Pashler
Figure 2. Schematic of display used in present experiments; 30 red target digits (shown black) interspersed with 30 green distractor digits (shown gray).
Experiment I Method
Subjects. Fifty-four UCSD undergraduates (11 male) participated, 53 in partial fulfillment of a course requirement, one in return for payment. Apparatus and Stimuli. Experiments were controlled by Pentium II computers controlling 15-inch SONY Trinitron Multiscan 100GS SVGA monitors. Each display consisted of 30 red digits, and, in some conditions, an additional 30 green digits (using readily discriminable, high-saturation colors). Each digit measured .6 cm in height by .5 cm in width (based on a viewing distance of 70 cm, this corresponded to visual angles of .49 by .40 deg). The digits were scattered in a quasi-random fashion about the entire CRT display, which measured 21.5 cm high by 28.5 cm wide (17.1 X 22.2 deg visual angle). This was done as follows. The overall display was divided into a grid of subregions (6 high and 10 across). Thirty of these were selected at random without constraint independently on each trial, and a single red digit was placed randomly within each of these 30 subregions. If green digits were present, one of them was placed in each of the remaining 30 subregions. Design. Each subject performed 9 blocks of 40 trials per block. The three conditions (no-distractor; static-distractor; flashing-distractor) were presented in
Contingent Orienting
8
Figure 3. Procedure used in both experiments. Target is specified auditorily 1 sec after onset of display. In no-distractor condition only red digits are present; in static-distractor condition, green digits remain present throughout. In flashing-distractor condition, green digits flash on 200 ms, off 200 ms, etc.
82
Pashler
separate blocks, with rotation through the blocktypes and the initial block type counterbalanced across subjects. On a given vial, the target was selected randomly and independently from the range 1-9. The red distractors were selected randomly from the other 8 digits. When the green distractors were present, they were selected randomly with replacement from entire the range 1-9, so usually there would be green distractors identical to the target digit (which would not themselves count as targets, of course). Procedure. Subjects were given written instructions, stating that they would begin each trial by fixating on the center of the screen; that they would be told by the computer what digit to look for, and that they should look for this target only among the red digits; that there would sometimes be green distractor digits that they would need to ignore; and that they should respond as rapidly and accurately as possible. Each trial began with a plus sign presented for 1 second, followed by 500ms second blank screen, and then the appearance of the display. Not until three hundred milliseconds after the display onset did the computer begin playing a wavefile of the spoken target name, resulting in a significant delay from the onset of the display to the time where the subject knew what to search for. Of course, different wave files took slightly different amounts of time to communicate this information, but these differences were not confounded with the variables of interest. Subjects pressed the M key for target present, and the N key for target absent; as soon as they had done so, the display disappeared from the screen. The computer provided feedback by playing different sounds after errors and correct responses. A period of 2.5 sec elapsed before the next fixation point was presented. At the end of each block, the average response time and percent correct during the preceding block was displayed, and the subject was allowed to rest until he or she felt ready to resume. Results
Data from four subjects was discarded because they had overall error rates in excess of 25%, leaving 50 subjects. RTs (measured from onset of the display) that exceeded the mean by three standard deviations were trimmed (simulations by van Selst & Jolicoeur, 1994, suggest that this procedure is appropriate under conditions like these). Figure 4 shows the mean reaction times for correct responses in targetpresent and target-absent trials in the three conditions. The effect of flicker condition was significant, F(2,98)=37.9, p<.001, as was the effect of target presence/absence, F(1,49)=217, p<.001. The two variables did not interact, p>. 10. Flashing-distractor displays produced faster rather than slower responses (3035 ms) compared to static-distractor displays (3148 ms); this difference was reliable, F(1,49)- 15.4, p<.001. Error rates are shown in Table 1. Most errors were misses (18%) rather than false alarms (2%), as is typical in visual search tasks. The presence of distractors increased the miss rate compared to the no-distractor condition. There was no significant effect of flashing- vs. static-distractors on overall errors rates
83
Contingent Orienting
(p>.60) nor any significant interaction of this variable with type bf error, i.e., false alarm vs. miss (p>.l 5). The slight elevation in mean false alarm rates with flashing distractors (3% vs. 2%) was tested by itself and proved not to be statistically significant (p>.35). 4000 ..O
3800
,........
3600 A
.... ..'''"
.... .,'"
........... O
O'"'"""
3400 3200
I-n,' c
3000 2800 2600 2400 2200 2000
' No Distractor
i Static Distractor
,
i Flashing Distractor
Distractor Condition Figure 4. Mean correct RTs (in ms) in Experiment 1 (same green digits flash on and off) as a function of distractor condition and target presence/absence.
Table 1. Mean percent errors in Experiments 1 and 2 as a function of target presence/absence and distractor condition.
Exp. 1 Target Present Exp. 1 Target Absent Exp. 2 Target Present Exp. 2 Target Absent
No Distractor
Static Distractor
Flashing Distractor
16.4
19.1
17.9
1.8
1.9
2.5
19.5
17.2
18.4
2.0
1.7
2.1
84
Pashler Experiment 2
Using visual search designs, Yantis and colleagues have found that visual transient signals that do not signal the appearance of new objects generally do not produce involuntary orienting (Yantis & Hillstrom, 1994; see Yantis, 2000, for a review). It is not clear whether or not the reappearance of the green distractor digits in the displays used in Experiment 1 should be regarded as signaling the appearance of new objects. To see whether the flashing of the distractors would remain harmless (and indeed, helpful) even when new objects appeared, Experiment 2 was conducted with one change: when the green digits reappeared every 400 ms, each digit was replaced with a new (usually different) digit in the same location.
Method
Subjects. Forty two UCSD undergraduates (9 male) participated in partial fulfillment of a course requirement. The Apparatus, Procedure and Design were exactly as in Experiment 1. The only difference was that in the flashing-distractor condition, every 400 ms a new randomly chosen set of green digits flashed on. Results No subjects had overall error rates in excess of 25%, and RTs were trimmed as in Experiment 1. Figure 5 shows the mean correct reaction times for target-present and target-absent trials in the three conditions. The effect of distractor condition was significant, F(2,82)-22.5, p<.001, as was the effect of target presence/absence, F(1,42)=298, p<.001. The two variables interacted, F(2,82)=4.1, p<.02, apparently due to a slightly smaller effect of target absence in the no-distractor condition. Flashing-distractor displays produced faster rather than slower responses (2999 ms) compared to static-distractor displays (3119 ms); this difference was reliable, F(1,41)= 10.3, p<.003. Error rates are shown in Table 1. Again, most errors were misses (18.4% of trials) rather than false alarms (1.9%). There was no significant effect of distractor condition on overall errors rates (p>. 10) nor any significant interaction of this variable with type of error, i.e., false alarm vs. miss (p>. 15).
Contingent Orienting
85
4000 3800 3600 ,_,
3400
E I--
3200
t~
3000
c m
2800
........O .........'"'"'"""
' .................................... O
O'"""'"
2600 2400 2200 2000
L No Distractor
,
I Static Distractor
j Flashing Distractor
Distractor Condition
Figure 5. Mean correct RTs (in ms) in Experiment 2 (new green digits appear on each flash) as a function of distractor condition and target presence/absence.
Experiment 3
The third experiment examined several diverse forms of transient activity in the distractors. The experiment included the three conditions of Experiment 2 (no-distractor, static-distractor, and flashing-distractors) plus two additional forms of distractor change. The first was "twinkling", whereby distractors disappeared and were replaced with new distractors independently, rather than pulsing off and on in synchrony with each other as in the previous studies. The second was a form of motion that will be referred to as "shimmying", whereby distractors shuttled back and forth along short individually determined paths (motion was constrained in this way because if distractors were free to wander, the overall geometry of the display would likely deform during the search, conceivably impairing search by disrupting eye movement control rather than grabbing attention). The preview period prior to the vocal presentation of the target was lengthened to one second to make it more certain that onset-hood would not be a useful criterion for locating relevant materials.
Pashler
86 Method
Subjects.
Twenty-five UCSD undergraduates (10 male) participated in partial fulfillment of a course requirement. The Apparatus, Procedure and Design were as in previous experiments except as noted. There were five conditions presented in separate blocks: nodistractors, static-distractor, flashing-distractor, asynchronously twinkling-distractor, and shimmying-distractor. Subjects performed 15 blocks of 24 trials per block with counterbalanced order of conditions. The first three conditions were as in Experiment 2. In the twinkling-distractor condition, four distractors were selected randomly every 100 ms and caused to disappear; 200 ms later, they were replaced with new items. A constraint on selecting distractors to disappear was that any distractor was ineligible to be selected while it was already undergoing replacement and for 400 ms after the appearance of its replacement. In this way, items scattered around the display were continuously seen to disappear with new ones popping up in their place; the span during which objects remained in the display varied greatly and the lifespans of individual items were temporally overlapping in a haphazard fashion. The shimmying-distractor blocks were constructed as follows. Distractors were pseudo-randomly assigned a home location, just as in the other conditions. For each individual distractor, a motion vector was determined in advance of the trial (with its tail on the home location, a length equal to approximately one-half character and a randomly chosen direction). Each distractor now had two resting places, its home location and the endpoint of this vector, at a distance equal to onehalf the size of the letter. Every 200 ms, four distractors were selected at random from the entire display and moved to their alternative position. Thus, every 200 ms a random subset of the display would traverse on a fixed trajectory one-half character width in distance. Over the whole display, therefore, distractors could seen seen to exhibit temporally chaotic jumping motion, with different distractors moving in different directions. The wave file began playing played after a delay of 1 second in all conditions. Results
No subjects had overall error rates in excess of 25%, and RTs were trimmed as in Experiment 1. Table 2 shows the mean correct reaction times for target-present and targetabsent trials in the five conditions. As expected, the overall ANOVA yielded significant results, with the no-distractor condition fastest of all by a good measure (for distractor condition: F(4,96)-9.0, p<.001; for target presence/absence: F(1,24)=170.7,p<.001; for the interaction: F(4,96)-2.7, p<.05).
87
Contingent Orienting Table 2.
Mean correct RTs (in ms) in Experiment 3 as a function of target presence/absence and
distractor condition
No Distractor Static Distractor Flashing Distractor Twinkling Distractor Shimmying Distractor
Target Present 3016 3141 3114 3119 3207
Target Absent 4226 4545 4379 4325 4517
Table 3. Mean percent errors in Experiment 3 as a function of target presence/absence and distractor condition.
No Distractor Static Distractor Flashing Distractor Twinkling Distractor Shimmying Distractor
Target Present 16.8 16.7 16.2 25.4 17.9
Target Absent 1.4 1.8 1.4 1.7 1.9
As in Experiment 2, flashing-distractor displays produced faster rather than slower responses (3747 ms) compared to static-distractor displays (3843 ms); this difference was not quite reliable, however, F(1,24)=4.0, p<.06. The shimmyingdistractor displays (3862) were not reliably different from the static-distractor displays (3843), F(1,24)=0.17, p>.65, nor did this difference interact with target presence/absence in a 2X2 ANOVA, F(1,24)=2.1, p>. 15. Twinkling distractors, too, produced overall faster responses (3723) as compared to static distractors (3843), a significant speedup, F(1,24)=5.5, p<.03. Twinkling sped detection of target presence by 22 ms and responses to target-absent displays by 220 ms, yielding a significant distractor type X target presence-absence interaction in a 2X2 Anova, F(1,24)--6.9, p<.02. Error rates are shown in Table 3. Again, most errors were misses (18.6% of trials) rather than false alarms (1.6%), numbers very similar to those of the previous experiment. In this case, however, there was a significant effect of distractor condition on error rates, F(4,96)=8.2, p<.001, and this variable interacted with target presence, F(1,24)=7.9, p<.001. This chiefly reflected an excess of about 8% in misses occurring in the twinkling-distractor condition (a comparison of twinkling against static showed a significant effect on errors and a significant interaction as well).
88
Pashler
General Discussion Conclusions
When people searched the red digits in a large crowded display for a target that was specified by a spoken message played after the display had been previewed, the presence of interspersed green distractors slowed them down, as one might expect. In this task the relevant and irrelevant material was all present at the beginning of the search and remained so until response. Thus, the events on the trial were contrived so that there was no need or reason to attend to or search for rapid onsets (especially in Experiment 3, where the display was previewed for a full second before the verbal presentation of the target began). For that reason, the contingent involuntary orienting hypothesis (Folk et al., 1992) predicts that having the green digits flashing on and off throughout the search will not cause them to be any more disruptive than the static digits. Indeed, it would suggest that they might even be them less disruptive because flicker provides an additional feature that differentiates them from targets. The first two experiments examined the situation where distractors pulsed off and on again in synchrony with each other (newly chosen distractors replacing old ones in Experiment 2). In both studies, the CIO prediction was strikingly (and, to the present author, unexpectedly) born out, with flashing distractors producing faster, not slower, responses as compared to static distractors. The great majority of subjects' errors were misses. Flashing digits cause a small and statistically nonsignificant reduction in the miss rate, and a tiny and nonsignificant increase in the false alarm rate, but were nonetheless far more helpful than harmful overall. Experiment 3 again replicated this observation, but also included two more chaotic forms of distractor-related transients: temporally unpredictable motion ("shimmying") and disappearance and replacement occurring at unpredictable and typically asynchronous moments in time ("twinkling"). Shimmying distractors did not impair performance as compared to static distractors. Twinkling the distractors had a more complex effect. It sped up correct RTs, but much more so on targetabsent trials as compared to target-present trials, and produced some modest but significant elevation in the miss rate. Terms like "speed-accuracy tradeoff" do not do full justice to this pattern. One possible interpretation is that the twinkling distractors caused subjects prematurely to give up on a small proportion of trials, or to mistakenly conclude they had already searched the entire display. While puzzling in some respects, the effects of this kind of twinkling are modest and do not particularly seem to suggest that twinkling distractors have any unusual power to summon attention involuntarily. In summary, the results suggest that the contingent involuntary orienting hypothesis is much closer to reality than is the traditional common-sense view espoused by Titchener and others quoted in the Introduction. It does appear that one form of distractor activity ("twinkling") has some modest effects on search performance but these do not appear to indicate involuntary grabbing of attention on
Contingent Orienting
89
any large proportion of trials. All in all, then, CIO is well supported; most importantly, it seems to describe what happens when people must deal with protracted transient activity while engaged in an ongoing selective attention task, not merely their response to single isolated flashes occurring prior to their engagement in an attention task.
Limitations A number of limitations of this study should be noted. First, the selective attention task was relatively easy. While the aggregate cost of the distractors was substantial, the cost per distractor was quite small. This was expected given the degree of chromatic contrast present (red vs. green). While it would be interesting to see if the same results hold with a less discriminable selection criterion, this would have to be done with some caution. If the color judgment had been much more difficult, an alternative strategy would be encouraged: detecting any digit targets regardless of their color and then checking their color. Obviously, such a strategy would thwart the design of the experiment. A second limitation is the use of relatively large amounts of transient activity. In the first two experiments, all the distractors flashed; even in the asynchronous twinkling condition of Experiment 3, a substantial amount of transient activity occurred per second. Thus, the results do not rule out the possibility that set-independent attention capture by transients does occur, but only when some special set of conditions is met, which includes a requirement that the transients be isolated. 1 (Of course, the putative set of necessary conditions just mentioned would have to include some factors not satisfied by the studies of Folk et al., 1992, either). A third limitation that should be kept in mind is the fact that the transient activity here was always confined to the distractors, not the targets. Thus, not only did subjects lack an incentive to attend to transients - they were given a very strong incentive to ignore them. It would be interesting to know if transient activity that was orthogonal to, rather than negatively correlated with, task relevance, would also prove as innocuous as that examined here. Answering this question might not be straightforward, however, because flashing or moving the relevant stimuli might change the difficulty of the task due to sensory and perceptual factors unrelated to attention capture.
Broader Questions The new support for CIO presented here raises an obvious question that relates back to points mentioned at the beginning of the Introduction above: why does it seem almost self-evident to most people (including distinguished early writers on attention) that stimuli that flash, move, or jump involuntarily attract our attention? Are advertisers, for example, wasting their time and money in putting flashing lights on signs by roadsides or on websites? Perhaps not. When we are engaged in passive viewing with no particular task to perform, we may adopt a
90
Pashler
"default" setting that favors orienting to transients (as Folk et al, 1992, suggested). Altematively, there could be a more general default setting, to orient to whatever is unusual; in most visual scenes (some parts of Las Vegas being a possible exception) static objects are much more common than flickering or moving ones. Recent work from our laboratory (Pashler & Harris, in press) has examined spontaneous allocation of attention in tasks not involving any set to locate targets or even reports. For example, in experiments using just a single trial, we told subjects they would be required to make an aesthetic judgment, and only after the display had been presented did we ask them to describe what they had seen. The results supported the suggestion that there is a default tendency to attend to transients, and even more strongly, to attend to stimuli bearing unique properties (Pashler & Harris, in press). It is possible, therefore, that the commonplace observations correctly describe this default mode, while the Folk et al. model correctly describes a mode that people readily adopt when presented with the requirement to search (see Pashler, Ruthruff & Johnston, 2001, for further discussion). Some previous research suggests that effects of abrupt pre-cues can sometimes be eliminated by allocating attention in advance to a relevant position (Yantis & Jonides, 1990; Juola, Koshino, & Warner, 1995). Potentially, therefore, one might view the present results as showing that this generalization extends even to cases where relevant (attended) locations are spatially intertwined with distractor locations. This assumes of course that attention can be simultaneously allocated to noncontiguous locations, as some research indicates (e.g., Awh & Pashler, 2000; Bichot, Cave & Pashler, 1999; Kramer & Hahn, 1995; but see Pan & Eriksen, 1993). On the other hand, nullification of the effects of abrupt-onset cues by advance knowledge of distractor locations (in situations where the relevant items are onsets) has proven to be a rather tenuous phenomena. Several studies by Folk and Remington (1996) found that involuntary orienting continued to occur when the position of the irrelevant stimulus was fixed for a whole block of trials. Further, it should be kept in mind that in the experiments reported here, the distractors were not ineffective; in aggregate, they imposed total costs much larger than one typically finds in experiments involving displays of just a few items. Additionally, displays in the present studies were large in spatial extent and required several eye movements to search exhaustively. 2 Thus, the relevant and irrelevant stimuli occupied different retinal locations at different times during the search. Even if knowledge of distractor positions is generally sufficient to nullify effects of abruptonset distractors within a fixation (which is questionable, as noted above), it is hardly obvious this would apply when saccades occur, altering the retinal locations of all the stimuli during the time the search is taking place. In summary, the results of the present study suggest that when people adopt a set to search for stimuli of a particular color, rapid onsets of the other color are deprived of much ability to "grab" attention involuntarily even under demanding conditions that provide every opportunity for involuntary orienting to show itself (search of a busy display with intertwined items flashing on and off). Indeed, the very properties often hypothesized to produce automatic grabbing of attention seem
ContingentOrienting
91
to facilitate exclusion by adding additional redundant features that help differentiate relevant and irrelevant inputs. Footnotes 1The author is grateful to Jan Theeuwes for pointing this out. 2 Though eye movements were not measured, skeptical readers are invited to scatter .6-cm-high digits about a whole CRT display and try to search them from a single point of regard. References Awh, E. & Pashler, H. (2000). Evidence for split attentional foci. Journal of Experimental Psychology: Human Perception and Performance, 26~ 834-846. Bichot, N. P., Cave, K. R., & Pashler, H. (1999). Visual selection mediated by location: Feature-based selection of noncontiguous locations. Perception & Psychophysics, 61, 403-423. Folk, C. L. & Remington, R. W. (1996). When knowledge does not help: Limitations on the flexibility of attentional control. In A. Kramer and F. H, Coles (Eds.), Converging Operations in the Study of Visual Selective Attention. American Psychological Association, Washington, DC, USA. 1996. p. 271-295 of xxv, 545 pp. Folk, C. L. & Remington, R. W. (1999). Can new objects override attentional control settings? Perception & Psychophysics, 61, 727-739. Folk, C. L., Remington, R. W., & Johnston, J. C. (1992). Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 18, 1030-1044. Gibson, B. S. & Kelsey, E. M. (1998). Stimulus-driven attentional capture is contingent on attentional set for displaywide visual features. Journal of Experimental Psychology: Human Perception and Performance, 24, 699-706. Gibson, B. S., & Wenger, M. (1999). A closer look at contingent capture. Paper presented at the 40th Annual Meeting of the Psychonomics Society, Los Angeles, CA. Hahn, S. & Kramer, A. F. (1998). Further evidence for the division of attention between noncontiguous locations. Visual Cognition, 5, 217-256. James, W. (1890/1950). The Principles of Psychology, Vol 1. New York: Dover. Jonides, J. & Yantis, S. (1988). Uniqueness of abrupt visual onset in capturing attention. Perception & Psychophysics, 43, 346-354. Juola, J. F., Koshino, H., & Warner, C. B. (1995). Tradeoffs between attentional effects of spatial cues and abrupt onsets. Perception & Psychophysics, 57, 333-342. Koshino,, H., Warner, C. B., & Juola. J. F. (1992). Relative effectiveness of central, peripheral, and abrupt-onset cues in visual search. Quarterly Journal of Experimental Psychology, 45A, 609-631.
92
Pashler
Kramer A. F. & Hahn, S. (1995). Splitting the beam: Distribution of attention over noncontiguous regions of the visual field. Psychological Science, 6, 381-386. Luck, S. J. & Thomas, S. J. (1999). What variety of attention is automatically captured by peripheral cues? Perception & Psychophysics, 61, 14241435. Pan, K. & Eriksen, C.W. (1993). Attentional distribution in the visual field during same-different judgments as assessed by response competition. Perception & Psychophysics, 53, 134-144. Pashler, H. & Harris, C. (in press). Spontaneous allocation of visual attention: Dominant role of uniqueness. Psychonomic Bulletin & Review. Pashler, H., Ruthruff, E., & Johnston, J. C. (2001). Attention and performance. In Annual Review of Psychology. San Diego: Academic Press. Remington, R. W., Johnston, J. C., & Yantis, S. (1992). Involuntary attentional capture by abrupt onsets. Perception & Psychophysics, 51, 279-290. Titchener, E. B. (1908). Lectures on the Elementary Psychology of Feeling and Attention. New York: The MacMillan Company. Van Selst, M. & Jolicoeur, P. (1994). A solution to the effect of sample size on outlier elimination. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 47, 631-650. Yantis, S. (1994). Stimulus-driven attentional capture. Current Directions in Psychological Science, 2, 156-161. Yantis, S. (2000). Goal-directed and stimulus-driven determinants of attentional control. In S. Monsell & J. Driver (Eds.), Control of Cognitive Processes: Attention and Performance XVIII. Cambridge, MA: MIT Press. Yantis, S. & Hillstrom, A. (1994). Stimulus-driven attentional capture: Evidence from equiluminant visual objects. Journal of Experimental Psychology: Human Perception and Performance, 20, 95-107. Yantis, S. & Johnson, D. N. (1990). Mechanisms of attentional priority. Journal of Experimental Psychology: Human Perception and Performance, 16, 812 -825. Yantis, S. & Jonides, S. (1990). Abrupt visual onsets and selective attention: Evidence from visual search. Journal of Experimental Psychology: Human Perception and Performance, 16, 121-134.
Attraction, Distraction, and Action: Multiple Perspectives on Attentional Capture C. Folk and B. Gibson (Editors) 9 Elsevier Science B.V. All rights reserved
93
Attentional Capture in the Spatial and Temporal Domains Howard E. Egeth, Charles L. Folk, Andrew B Leber, Takehiko Nakama, and Sharma K. Hendel
In the study of what kinds of stimuli do or do not capture attention, one of the most productive ideas has been the notion of contingent capture-that is, that what captures attention depends on your current task goals. Most of this research has been conducted in the context of visual search studies either with or without precues indicating the likely location of the target. We begin with a discussion of an example of this paradigm that appears to challenge the notion of contingent capture. We then move on to an extension of the contingent capture idea to the temporal domain with the use of the RSVP paradigm, that is, the rapid serial visual presentation of stimuli. In this paradigm there is uncertainty about the time at which a target will occur. We first introduce some novel results bearing on the nature of the attentional blink, and then use a variant of this paradigm to study attentional capture. In our experiments, subjects will have the opportunity to resolve the uncertainty about time of arrival of the target by using a stimulus property-colorthat just so happens to also define a spatially irrelevant stimulus. Under these conditions will subjects be able to ignore the irrelevant stimulus?
The Contingent Capture Hypothesis Attentional allocation has often been described as having two sources of control: stimulus-driven and goal-directed selection. Stimulus-driven selection is characterized by allocation based entirely on the salience of features in the display; specific task demands or goals of the observer are irrelevant. Attentional allocation of this sort can be said to be involuntary and not under the observer's control (Remington, Johnston & Yantis, 1992; Yantis, 1993; Yantis & Jonides, 1990). Alternatively, attentional deployment may be a direct result of the objectives of the observer so that attention is allocated to visual elements that are task-relevant, familiar, or otherwise important to the observer (e.g., Francolini & Egeth, 1979; Posner, 1980; Yantis & Egeth, 1999). Thus spatial shifts of attention have been thought to occur in two ways: either as an involuntary response to a salient stimulus (attentional capture) or as a voluntary process determined by the observer's goals. The fact that attention may
94
Egeth, Folk, Leber, Nakama and Hendel
be summoned involuntarily and automatically has been much studied with the goal of determining what kinds of stimuli can generate such shifts. To date, a variety of experiments reveal that abrupt stimulus onsets (such as new objects) will generate exogenous spatial shifts (e.g., Jonides & Yantis, 1988; Muller & Rabbitt, 1989; Remington, et al., 1992; Posner & Cohen, 1984; Yantis & Hillstrom, 1994; Yantis & Jonides, 1990). A crucial question in the area of attentional allocation is whether salient features other than abrupt onsets can result in attentional capture. Jonides and Yantis (1988) examined whether color and intensity discontinuities, in addition to abrupt onsets, led to attentional capture and concluded that onsets were unique in this regard. Subsequently, some observations have cast doubt on the generality of that conclusion. For one thing, Yantis and Egeth (1999) found circumstances in which large or bright stimuli captured attention while moving or differently colored stimuli did not. More generally, Folk, Remington, and Johnston (1992) obtained evidence that suggested the deployment of attention depends critically on what the subject is set for. In a cuing paradigm they independently manipulated the nature of the target and the nature of a preceding cue. Subjects might have to locate the target on the basis of a color discrepancy (it was the only red element among three white elements), or they might have to locate it on the basis that it was the only element with a sudden onset (it was the only element in the display). Similarly, the cue might consist of one red element among three white elements, or it might be the only sudden onset element (again, it was alone in the display). The most important finding was that when cue and target were of the same type (i.e., both sudden onsets or both color singletons), subjects could not ignore the cue even when it was known to be 100% invalid.
A Challenge to the Contingent Capture Hypothesis In a paper by Joseph and Optican (1996), the authors suggested that attention can be captured by an orientation discontinuity within a cue stimulus. What is particularly provocative about this finding is that there was no obvious relationship between the target and cue, and thus application of the contingent involuntary capture hypothesis seems strained. Joseph and Optican (1996) used a cuing design to assess the degree to which oriented bars captured attention. In Experiments 1 and 2, subjects were presented with a brief cue display consisting of a dense array of vertical (or horizontal) bars. One of the bars in the array was oriented differently (e.g., it was horizontal while the others were vertical) creating a "pop-out" stimulus. In order to assess the degree to which this stimulus captured attention, a subsequent target display was presented containing a similar array composed entirely of Ts except for a single target L; subjects were instructed to find the L and report its location. (Note that the Ts and Ls in these experiments were n o t
Spatial and Temporal Capture
95
randomly rotated.) Cue and target stimuli only appeared in one of four locations in the display -- the center element in each of the four quadrants of the array. The assumption was that if the cue led to a shift of attention to its location, it would result in improved accuracy for a target that subsequently appeared at same location (i.e., a valid cue). The cue, however, was not informative; when a cue was presented, it occurred equally often at the target location (valid) and the three other locations (invalid). On one-fifth of the trials all of the bars were of a uniform orientation; these were uncued trials. Figure 1 illustrates the stimuli and time course of a typical trial.
EEEEEEE EEEEEEE TTTTTTT TTTTTLT _______ TT -- I ." . ~. ~. . T T
EE EE EE EE__
mask
probe
+
cue fixation
Figure 1. Overview of the sequence of trial events in experiments by Joseph and Optican (1996) and Hendel (1998/1999). In Joseph and Optican the fixation cross appeared for 500 ms and remained throughout the trial. Cue durations were varied over a range from 50 to 800 ms (across Experiments 1 and 2), and the probe duration was 50 ms. Event durations for Hendel's experiments are given in the text. In Experiment. 1 of Joseph and Optican an array of adapter squares was shown prior to the cue display, but this had no effect on the pattern of results.
In these experiments, Joseph and Optican (1996) found that subjects' accuracy in locating the target was considerably greater when the target occurred at a location previously occupied by an orientation difference than when it appeared where there had been no orientation difference. Given this validity effect, they concluded that "orientation differences can cause shifts of visual attention,
96
Egeth, Folk, Leber, Nakama and Hendel
regardless of whether they convey any information relevant to the task at hand." p. 661. While it is certainly possible that the orientation cues used in their experiment resulted in shifts of attention, Joseph and Optican (1996) further suggested that these attentional shifts are involuntary, claiming that their experiments "...might be said to be probing bottom-up processing, which includes both the conventional notion of preattentive vision and the involuntary attentional shifts that can follow from it." p. 661. A similar conclusion from a study using a rather different paradigm is due to Theeuwes (1992; see also Mounts 2000; Turatto & Galfano, 2001). Theeuwes (1992) proposed that early attentional allocation is mediated entirely by stimulusdriven characteristics of the display. In several experiments using a visual search design, he demonstrated that observers were slower to respond to a target when there was a salient singleton distractor in the display than when no singleton distractor was present. He argued that slower response in the presence of a distractor occurred because observers' attention was captured by the salient distractor (Theeuwes, 1991, 1992, 1994, 1996). In fact, he claimed that goaldirected selection was not possible during early preattentive visual search (Theeuwes, 1992). Another possibility is that subjects employ two different strategies depending on circumstances. Thus, subjects might rely on a strategy to look for the target-defining feature, as suggested by Folk et al. (1992) when the target is defined by a single simple feature such as "red" or "vertical". However, subjects might also use the strategy of trying to find a salient singleton, or odd-man-out, when there is no single feature that defines the target or when the target-defining feature does not easily segregate from the nontarget elements (e.g. L among Ts). The idea is that "singleton" itself can be the defining-feature of a target and observers can choose to search for a salient singleton (Bacon & Egeth, 1994; Pashler, 1988). Using a visual search experiment identical to that of Theeuwes (1991, 1992), Bacon and Egeth (1994) demonstrated that when observers searched for a target that was always a singleton, other singleton distractors impaired performance. Significantly, when observers searched for a target that was not always a singleton, a singleton distractor did not impair performance. In one experiment, the number of targets present on each trial varied from 1 to 3. When multiple targets were present they were not true singletons, which should have discouraged observers from setting their attention for singletons. In this condition, even when there was only a single target present on a given trial, the presence of a singleton distractor did not lead to slower responses. Bacon and Egeth (1994) took this as evidence that subjects could adopt a "singleton detection mode" strategy when appropriate (see also Pashler, 1988). In several other experiments, Bacon and Egeth demonstrated that observers relied heavily on singleton search when it was an efficient strategy. However, when singleton detection mode was inefficient, subjects used a feature search strategy.
Spatial and Temporal Capture
97
New Evidence of Top-Down Control in Visual Search It is arguable that the capture of attention found by Joseph and Optican (1996) was due to the voluntary adoption of singleton detection mode; after all, their target was the sole L in a field of canonically oriented Ts. However, it is not clear a priori whether this discrimination is sufficiently salient to invoke "singleton detection mode." One piece of evidence is that Egeth and Dagenbach (1991) found that two upright characters (T/L) could be searched in parallel, but the conditions of that study are very different from those of Joseph and Optican, especially in regard to the number of elements in the display (2 vs. 48). To explore whether an orientation singleton captures attention independently of the subject's search task, Hendel (1998/1999) conducted an extensive series of experiments, which will be summarized briefly here. Hendel's first experiment was essentially a replication of Joseph and Optican's Exp. 2. She found that subjects were more accurate in locating a target when it occurred at the cued location. However, unlike Joseph and Optican, she also found evidence for a very large response bias effect, which rendered the cuing effect of dubious validity. When subjects were incorrect they tended to select the cued location much more frequently than would be expected on the basis of chance. In her second experiment she changed the paradigm to eliminate the opportunity for response bias. Now on each trial the L was presented either in its normal orientation or in its left-right mirror image orientation; subjects had to identify the orientation of the L rather than locate it. Here too she found a validity effect, as had been found by Joseph and Optican. In her next experiments she addressed the theoretical issue of whether the effect was due to bottom-up salience, as suggested by Joseph and Optican, or the adoption of a goal-directed process motivated by the subject's knowledge of target-relevant information. Such is the suggestion of Folk et al.'s (1992) contingent capture hypothesis. A crucial test of the salience hypothesis is to determine whether the same (orientation) cue that gave evidence of capture when the target was an L in a field of Ts would also give evidence of capture when the target task is changed. Conversely, if the cue captures attention because of some feature similarity between the cue and the target, then a different target task, in which the target-defining feature is not the same as the cue-defining feature, should eliminate attentional shifts to the cue (Folk et al., 1992). In Hendel's Experiment 3a the target was defined in terms of color; it was the red element in a field of white elements. Specifically, the target display was a 7 x 7 matrix of Ls, half normal, and half mirror-image reversed. One of the Ls was red, and the remainder were white. The subject's task was to indicate the orientation of the red L. Two cue conditions were used. One was the orientation cue used earlier; all of the bars were horizontal (vertical), except for one
98
Egeth, Folk, Leber, Nakama and Hendel
which was vertical (horizontal). The other was a color cue the same color as the target; all of the bars were of the same orientation and one of them was red. As before, cue location and target location were independent, and one-fifth of the trials were uncued (i.e., no singleton was present in the cue display). The cue and target displays were each displayed for 100 ms with no interval between them. Thirteen subjects each received 320 color-cued trials and 320 orientation-cued trials, randomly mixed. Recall that there were four possible locations where the cues and targets could appear. Consider, for example, the situation where the target appears in the upper left location. If the cue on that trial also appears in the upper left, it is a valid trial. There are three kinds of invalid trial, which are characterized with respect to the relative locations of the cue and target. If the cue appears in the upper right this is an invalid horizontal trial (i.e., the cue and target locations differ by only a horizontal translation). Similarly, if the cue appears in the lower left location this is an invalid vertical trial. If the cue appears in the lower right this is an invalid diagonal trial. If the cue display contained no singleton it is an uncued trial. To determine whether the validity effect differed between the two cue types, percentage of correct responses were entered in a 2 x 5 repeated measures ANOVA. This analysis revealed a significant main effect of validity, F(4, 48) 11.64, p < .0001, and a significant interaction between validity and cue type, F(4, 48) = 6.72,p < .001, however, there was no main effect of cue type itself, F(1, 12)= 2.07, p > . 17. As is evident from Figure 2a, the effect of validity is due predominantly to the color cue. In the color-cue condition, each cue validity was compared to the null cue. These comparisons confirm that the valid cue led to greater accuracy than the null cue, F(1, 12) = 8.53, p < .05, and the invalid cues led to lower accuracy than the null cue: horizontal, F(1, 12) = 10.53,p < .01, vertical, F(1, 12) = 8.96,p < .05, and diagonal, F(1, 12) = 22.66, p < .001. Similar comparisons were made for validity in the orientation-cue condition. There were no reliable differences found between the null cue and any of the other validity conditions: null cue and valid cue, F(1, 12) = 2.19, p > .16, null cue and horizontal, F(1, 12) = .157, p > .69, null cue and vertical, F(1, 12) = .001, p > .97, null cue and diagonal, F(1, 12) = .001, p > .97. In this experiment in which the target was defined by a color discrepancy, the orientation cue did not lead to a reliable shift of attention, calling into question the generality of the effect reported by Joseph and Optican. However, the color cue did lead to a reliable shift of attention. This pattern is consistent with the contingent capture hypothesis and suggests that the shifts of attention found in this experiment may be due to the similarity between cue and target. This finding prompted reexamination of why the original task (identifying an L among Ts) showed a validity effect when an orientation cue was used. One possibility is that subjects shifted
Spatial and Temporal Capture
99
Figure 2. Mean percentage of correct determinations of target orientation for the various levels of cue type (color and orientation) and cue-target validity in Experiment 3 of Hendel (1998/1999). See text for a full description of the validity conditions. Experiment 3a (red target among white nontargets) is represented in the top panel and 3b (L or backwards L target among nontarget Ts) in the lower panel.
1 O0
Egeth, Folk, Leber, Nakama and Hendel
attention to the orientation cue because they were searching for singletons. If this were the case, then we might expect that both color cues and orientation cues would capture attention in the L among Ts task. In Hendel's Experiment 3b the target was a forwards or backwards L among upright Ts. Cues were the same as in Exp. 3a, consisting of either an orientation or a color singleton (as before, one-fifth of the trials were uncued). In all other respects the experiment was identical to the previous one. Thirteen subjects participated. The results are shown in Figure 2b. Analysis of variance revealed a significant main effect of validity, F(4, 48) - 19.18, p < .0001, but neither cue type, F(1, 48) = .06, p > .80, nor the interaction between cue type and validity were significant, F(4, 48) = 1.88, p > .12. In the absence of both a main effect of cue type and an interaction between cue type and validity, contrasts were analyzed collapsed across the two cue types. The increase in accuracy on valid as compared to null trials was significant, F(1, 12) = 26.11, p < .001, as was the decrease for horizontal trials relative to the null cue, F(1, 12) = 7.05, p < .05. However, the differences between the vertical and null cues and the diagonal and null cues did not reach significance, F(1, 12) = 3.06, p > .10, and F(1, 12) = 1.63, p > .22, respectively. In Hendel's Experiment 3b, both the red cue and the orientation cue led to shifts of attention. This result is expected on the contingent capture model assuming subjects are in singleton detection mode. It is also expected on the salience model, if we can assume that both the color and the orientation cue displays contain a salient discontinuity. However, the results of Hendel's Experiment 3a suggest that salience alone cannot explain the observed pattern of validity effects. In that study, significant validity effects were obtained only when the target-defining feature and the cue-defining feature were the same (red), suggesting that observers maintained a set for red stimuli. Taken as a whole, Hendel's Experiments 3a and 3b are more consistent with the contingent capture hypothesis than the salience hypothesis for the following reasons. As we have argued, the salience hypothesis predicts that subjects will shift their attention to the most salient element in the cue display. In each of these experiments, on any given cue display, there is only one salient difference -the cue. Consequently, it was expected that a shift of attention to the most salient element would always be to the cue, leading to a cue-validity effect for both cues in Experiments 3a and 3b. As no validity effect was observed with the orientation cue in Experiment 3a, the predicted outcome for the salience hypothesis was not obtained. Nonetheless, a different interpretation of these data may be entertained. Theeuwes (1994) argued that the typical brief presentation of stimuli in the cueing paradigm may lead subjects to integrate the cue and the target displays. Thus attention would be likely to go to the most salient element in the entire trial
Spatial and TemporalCapture
| 01
If this were the case, then the salience hypothesis could be interpreted to predict that attention would be likely to shift to the target, rather than the cue, if the target were more salient than the cue. In Experiment 3a, it is possible, then, that the target -- a red L among white Ls -- was more salient than the oriented bar cue presented on half of the trials. The red bar cue, on the other hand, may have been e v e n m o r e salient than the red target, leading to the validity effects observed for the red cue. Note that this interpretation requires that we assume that the red bar cue was the most salient element in the trial sequence, followed by the red target L. The orientation cue is assumed to be less salient than the red cue and the red target. This interpretation would be ruled out if one could find a target task in which the orientation cue leads to attentional capture but the color cue does not. This was the rationale for Hendel's Experiment 4. In Experiment 4 the cue types used previously (a color singleton and an orientation singleton) were used in conjunction with a new target task in which 24 subjects were set for a stimulus of a particular orientation. Subjects had to detect the presence of a Q-like character among Os. The "stem" of the Q was a horizontal bar on either the left or the right of the target. Subjects were asked to report whether the stem was on the left or right side of the character. The rationale was that this task would lead subjects to set their attention for horizontal bars. The experiment was similar to the preceding ones except that the orientation cue was now always a matrix of vertical bars with a single horizontal bar; as before the color cue consisted of a single red vertical bar among white vertical bars. The results are shown in Figure 3. The crucial result is that contrast analysis showed that for color cues there were no reliable differences between the null cue and any of the other cues (all p values >. 15) However, the orientation cue led to a reliable improvement in the valid cue condition compared to the null cue condition (p <.01), and to reliable decrements for the three invalid cue conditions relative to the null condition (all p values <.01). This study confirmed that attention was most likely to be drawn to cues that matched the defining features of the target. Hendel went on to discuss an altemative interpretation of such results proposed by Theeuwes (1994). According to Theeuwes, disengagement of attention from a cue that contains a target-defining feature may take longer than disengagement of attention from a cue that does not contain a target-defining feature. Thus cue-validity effects are observed in the latter case because recovery (i.e., disengagement) occurs sufficiently quickly so that attention is available to shift to the target location. Hendel provides a detailed rebuttal of this argument, but its details are beyond the scope of the present paper. The series of studies by Hendel provide converging evidence for the existence of some top-down control in visual search. Her data show, using an accuracy measure, that the relationship between target-defining features and
sequence.
102
Egeth, Folk, Leber, Nakama and Hendel
Figure 3. Mean percentage of correct determinations of target orientation for the various levels of cue type and cue-target validity in Experiment 4 (Q-like target among nontarget Os) of Hendel (1998/1999). This figure represents combined data for the 13 subjects who participated in her Experiment 4a and the 11 who served in her Experiment 4b.
distractor features is crucial to eliciting involuntary shifts of attention. In addition, these studies have succeeded in uncovering a specific task in which salient cues do reliably capture attention (Exp. 3a). This indicates that when subjects are in singleton detection mode, the salience of the cue is critical. The RSVP Paradigm and the Attentional Blink
Before turning to our research on the applicability of the contingent capture hypothesis to stimuli appearing in rapidly presented streams, we discuss some research that speaks to the nature of the capacity demands of identifying targets in such streams. Part of the appeal of the rapid serial visual presentation (RSVP) paradigm is that it seems subjectively to be very attentionally demanding. A typical RSVP task is to present a stream of letters to a subject that is all black except for one that is green. The subject has to name the green letter. At a rate of 10 letters per second the task feels difficult. Overall accuracy rate will be high, but distinctly less than perfect, say 85-90% correct. Moreover, if subjects have to identify a second element in the stream following identification of the green letter, they may perform very poorly for a period of up to a half second or more, a refractory period that has been dubbed the attentional blink (e.g., Broadbent & Broadbent, 1987; Raymond, Shapiro, & Arnell, 1992). An interesting example of an attentional blink appears in a paper by Joseph, Chun, and Nakayama (1997). They presented an RSVP stream of black letters at fixation. There were two targets. The first was a green letter embedded in
Spatial and Temporal Capture
103
the stream; this was followed after a variable interval by the second target, which was a ring of Gabor patches surrounding (at 5.3 deg eccentricity) one of the later letters in the stream. Subjects had to name the colored letter and indicate if the patches were all oriented in the same direction or if one was misoriented by 90 deg. In a control condition subjects could ignore the RSVP stream and just indicate if the ring of Gabor patches did or did not contain an orientation oddball. Performance in the control condition was in excess of 90% correct and was independent of the lag between the green letter and the Gabor patches. In the experimental condition performance was poor (about 60% correct detections) when the ring of Gabor patches was simultaneous with the green target letter (i.e., lag=0), and improved monotonically to nearly 90% correct when the lag between the target letter and the Gabors was 700 ms. This monotonic improvement over time is one of the standard forms of the attentional blink, with recovery taking over half a second (cf. Visser, Bischof, & DiLollo, 1999). What makes this study so interesting is the implications it may have for the study of visual attention. Specifically, Joseph et al. (1997) pointed out that this result is something of an embarrassment for the notion of preattention. Detection of an orientation singleton is usually thought to be preattentive (e.g., Sagi & Julesz, 1985; Treisman & Gormican, 1988). Why, then, should there be such a huge dual-task decrement? Indeed, Braun and Sagi (1990; 1991) have conducted conceptually similar experiments and found seemingly contrary results. In their studies, identification of a single element at fixation was used as the central primary task. This target element was followed by a masking stimulus, thus this situation is equivalent to the case in which the number of letters in an RSVP stream is two. No decrement was found in the detection of an orientation singleton in a field of elements surrounding fixation. The difference between the two studies may result in part from the length of the RSVP stream. There are several possible reasons why the attention demand of RSVP may be greater when the RSVP stream contains more letters. To cite just one, it may become more difficult to identify just which letter is the colored target when there are more opportunities for falsely conjoining color and identity. There are, of course, numerous differences between the Joseph et al. (1998) and the Braun and Sagi (1990, 1991) studies, and so a direct test is required to evaluate our suspicion that stream length may be important. In particular, Braun (1998) has observed that naive subjects were used in the Joseph et. al. experiments, whereas highly trained psychophysical observers were used by Braun and Sagi. Braun (1998) has shown that the RSVP version of the experiment also yields no attentional decrement when sufficiently experienced observers are tested. The question remains, however: When an attentional blink is found, what determines its magnitude?
104
Egeth, Folk, Leber, Nakama and Hendel
The role of stream length in the attentional blink
Egeth and Nakama (1999) explored the role of stream length in naive subjects. The central RSVP string consisted of a stream of letters presented at fixation on a gray screen. Each letter was presented for 67 ms; there was an 84 ms interval between successive letters. Thus, letters were presented at the rate of approximately 6.6 per sec. One major independent variable was the length of the RSVP string. In one condition that approximates the typical attentional blink paradigm, the white target letter appeared in the middle of a 19-letter sequence; the other 18 letters were all green. In another condition, the white target letter appeared as the first of two letters, the second being green; i.e., this was a sequence of length two. The two stream lengths were not randomly mixed, but were tested in separate conditions. The second major independent variable was the temporal lag (SOA) between the white target letter and the display of line segments; these lags were 0, 200, and 500 ms and varied randomly from trial to trial. The stimulus for the singleton detection task consisted of a ring of eight bars centered at fixation. The outside diameter of this array was 12 deg of visual angle, and each bar was 1.2 deg in length. On half of the trials all of the bars were oriented in the same direction (either +20 or-20 deg from vertical); on the other half of the trials one of the bars was oriented +20 while the others were at -20 deg, or vice versa. Each array of oriented bars was displayed for 150 ms and was followed by a masking field. Note that the line segments never preceded the white letter. The 0 ms lag is typical of the Braun and Sagi experiments (e.g., 1990); the remaining two lags permit us to sketch out the time course of any attentional blink that may be generated in our experimental conditions. Finally, half of the subjects were assigned to the dual task version of the experiment, while the other half were instructed to ignore the letter stream and just perform the singleton detection task. In the dual task version the importance of identifying the target within the RSVP stream was emphasized. The results are shown in Figure 4. They indicate two important points. For length=19 (the right panel), there was a significant attentional blink; attending to a 19-item stream severely hampered detection of a misoriented bar in the periphery. For length=2 (left panel) there was relatively little difference between the RSVP-attended and RSVP-ignored condition. (In another experiment in which stream length was 1 rather than 2, there was literally n o attentional blink. That is, there was no difference between the RSVP-attended and RSVP-ignored conditions.) Interestingly, these results tend to support important features of both the Braun and Sagi (1990, 1991) studies and the Joseph, Chun, and Nakayama (1997) study; attending to a 2-letter RSVP "stream" has a small effect on singleton detection in the periphery, whereas attending to a 19-item stream has a large effect. Second, note that for the 2-item stream there is a decline in performance for SOA=0, even when the RSVP task is unattended. (This was also the case for the
Spatial and Temporal Capture
(b) Length = 19
(a) Length = 2 100
100
95
L I,,,
8 o~ 4-1
c r u L
a.
90 85
95 90
r
85
80
75
105
80
(
7O
75 70
65
65
6O
60
55
s5 i
5O-
50
0
150 3 0 0 4 5 0 6 0 0
~ 0
150 3 0 0 4 5 0 6 0 0 7 5 0
Lag ( m s )
t - e - RSVP i g n o r e d
-=- RSVP a t t e n d e d
Figure 4. Mean percentage correct detections of an orientation singleton as a function of the delay (SOA) between presentation of white target letter in foveal RSVP stream and presentation of display of oriented bars. Filled circles show performance for the RSVP-ignored control condition, open circles show performance for the RSVP-attended dual task. Upper panel represents data for the condition in which a white target letter is followed by a single green nontarget letter. Lower panel represents data for the condition in which a white target letter appears in the middle of a 19-letter stream (the other 18 letters being green).
1-item stream.) This struck us as a surprising finding. It is known that alphanumeric characters may be processed to the point of identification even when subjects do not intend to do so. This is the basis of the Stroop effect among other phenomena. Teichner and Krebs (1974) argued that even simple RT tasks may be subject to "compulsive encoding" effects when alphanumeric stimuli are used. With this in mind we simplified the preceding experiment by testing only RSVP-ignored trials, and using a 1-element "stream" that consisted of a small filled white square presented at fixation. The results were striking: at S O A - 0 performance was 85% correct; at SOA = 150 and 600 ms performance was 95% and 94%, respectively. The effect is not due to the subject being alerted to the upcoming line discrimination task by the onset o f the white square. When the white square was replaced by a brief tone, accuracy was independent of SOA at about 90% correct. The decrement at SOA=0 most likely does not reflect simple masking, as the separation between the central target element and the peripheral bars was greater than 5 deg (e.g., Breitmeyer, 1984). The effect is reminiscent of the interfering
106
Egeth, Folk, Leber, Nakama and Hendel
effect of strong transients (or "mudsplashes") found in the literature on change blindness (e.g., O'Regan, Rensink, & Clark, 1999). However, when strong transients have interfered with the perception of other items they have typically been presented in advance of those other items; that was not the case here, as the SOA was zero. It may be, then, that the effect is an instance of cognitive masking, like the filtering cost discussed by Treisman, Kahneman, and Burkell (1983). With respect to the focus of the present chapter, the point of the first set of experiments is that the attentional blink paradigm, with its rapidly streaming alphanumeric characters is attentionally demanding. A 2-element central stream (i.e., a target character followed by a mask) as used by Braun and Sagi among many others, may not be sufficiently demanding to interfere with detection of a peripheral orientation singleton, but a lengthy central stream apparently is. However, we do not know just what it is about the attentional blink paradigm that results in the performance deficit on the second target. Presumably, it is the attentional demands of having to identify the first target that makes it difficult to detect or identify the second target (see, e.g., models proposed by Chun & Potter, 1995 and Jolicoeur, 1998). An attentional blink at negative SOA?
At this point we reflected on the fact that Joseph, Chun and Nakayama (1997), following standard practice in the field, had explored only positive lags, with the presumably attention demanding letter identification task coming first. What would happen if the supposedly preattentive task came first? Using the same paradigm described earlier, Nakama and Egeth (1999) tested both positive and negative lags. When letter identification came first the results were much like those of Joseph, Chun, and Nakayama although in our study recovery wasn't complete even by 600 ms. When the orientation discrimination came first we also found a large decrement in the ability to detect an orientation singleton. This suggests that the dual-task decrement was not caused solely by identification of the green letter; much of the deficit may be caused by the need to monitor the stream for the target letter. This was confirmed in our next study in which we omitted the green letter on some trials. On these trials performance stayed low throughout the trial (see Figure 5). Again, the importance of identifying the target in the RSVP stream was emphasized. The main point we draw from this is that, regardless of whether it is the identification of the target letter itself or simply preparation for the identification that is the basis of the effect, the RSVP task is indeed attentionally demanding. As a secondary point, we speculate that the need to monitor a rapid stream of stimuli at the fovea may both center attention at that location and narrow it to the approximate size of the letters that appear there (e.g., LaBerge, 1983), at least when the RSVP
107
Spatial and Temporal Capture
100 A
.i,a U
v
~ N - - _
A
A
A v
8o
I. L_
o u i
!
60
+
RSVP ignored
50
-I-
40
RSVP attended, t a r g e t - present
~ k - RSVP attended,
o.
target-absent .
30 -800
-600
-400
-200
0
200
400
.
.
.
600
800
S0A (ms)
Figure 5. Mean percentage correct detections of an orientation singleton as a function of the delay (SOA) between presentation of white target letter in foveal RSVP stream and presentation of display of oriented bars. Negative SOA means bar display preceded target letter. Open circles show data for standard dual task condition. Filled circles show data for condition in which the RSVP stream was all green and thus did not contain a target letter. Triangles show data for a control condition in which subjects are instructed to ignore the RSVP stream.
task is given high priority. We turn next to an exploration of whether peripheral stimuli may capture attention when attention is narrowly focused by a foveal RSVP task. Attentionai Capture in the RSVP paradigm We earlier described the contingent involuntary orienting hypothesis that assumes attentional control settings are a function of the behavioral goals of the observer, such as searching for a red letter. Another form of top-down control over attentional capture involves the degree to which attention is spatially focused prior to the presentation of a salient irrelevant stimulus. Yantis and Jonides (1990) found that when the location of a target letter in a visual search task was uncertain, the presence of an irrelevant abrupt onset letter produced evidence of attentional capture. However, when subjects were given a 100% valid precue regarding the subsequent location of a target, capture effects were eliminated. Similar results were reported by Theeuwes (1991). Such findings support the widely held belief that when spatial attention is in a highly focused state, salient stimuli (such as abrupt
108
Egeth, Folk, Leber, Nakama and Hendel
onsets) are no longer capable of capturing spatial attention. A potentially important aspect of these studies, however, is that the use of a 100% valid spatial precue not only eliminated uncertainty about the target location, it also eliminated any uncertainty about which object in the display was the target. That is, on any given trial, only one object ever occurred at the cued location. One could imagine a situation in which the target location is known, but multiple objects appear at that location, producing uncertainty with regard to which object is the target. This is precisely the situation in an RSVP paradigm in which all of the letters appear at fixation. Although all of the letters in the stream are spatially attended, there is still uncertainty about which is the target. This uncertainty is resolved through an additional act of non-spatial selection based on properties such as color or shape (e.g., report the white letter in the stream). An important question then, is whether the elimination of attentional capture by events outside the focus of attention still holds when there remains uncertainty about which object within the focus of attention is the target. In other words, is the spatial focusing of attention sufficient to override attentional capture? It is likely that the non-spatial selection of a target involves the establishment of attentional control settings for the defining property of the target. For example, determining the identity of a red letter in a sequence of white letters would presumably require an attentional control setting for the color red. An interesting issue concerns the extent to which this attentional control setting for nonspatial selection would influence the allocation of attention in space. Specifically, what effect would an irrelevant distractor have if it appeared outside the focus of spatial attention, but matched the attentional control setting (e.g., "red") for nonspatial selection from a temporal sequence? On the one hand, if the focusing of spatial attention is sufficient to eliminate capture, then peripheral events should not interfere with the identification of targets at the focused location. On the other hand, if attentional control settings for non-spatial selection influence the allocation of spatial attention, then, consistent with the contingent involuntary orienting hypothesis, we might expect an irrelevant stimulus that matches the attentional control setting to capture attention even if the stimulus occurs outside the focus of attention. To address this issue, we used a variant of the rapid serial visual presentation (RSVP) paradigm. In our task, subjects were required to monitor a centrally presented stream of letters for a target letter of a particular color, and to report the identity of that letter. It was assumed that this task would require attention to be tightly focused at fixation. However, instead of an additional target in the stream, a task-irrelevant, peripheral distractor was presented at different temporal positions relative to the target. As the most interesting data derive from the conditions in which the peripheral distractor preceded the central target, one can think of that distractor as playing the role of T1 in the standard attentional blink
Spatial and Temporal Capture
109
paradigm. However, it is important to keep in mind that there was no task associated with that stimulus; it was an irrelevant distractor. The critical manipulation was whether this distractor shared the color that defined the central target or not. We reasoned that if a peripheral distractor matching the color of the central target captures attention, then even under the focused attentional state required by the central stream, a decrement in the identification of the centrally presented target should obtain. In our initial study with peripheral distractors in an RSVP stream (Folk, Leber, & Egeth, submitted) subjects were shown a stream of 15 letters centered at fixation coming at a rate of about 12 characters per second (42 ms on, 42 ms blank ISI interval). The letters were all gray except for one that was colored (see Figure 6). The colored target character appeared equally often in positions 8-12 of the sequence. For 17 of the subjects the one colored letter was always red; for the remaining 16 subjects it was green. The task was to name the colored letter. On trials containing a distractor, one of the letters in the series was surrounded by four #'s whose inner edges were 5.2 deg above, below, left, and right of the center of the letter. Depending on the distractor condition, the #'s were either all gray, or three were gray and one was red or green. Subjects received four different distractor conditions that occurred randomly and equally frequently within blocks. In the no-distractor condition each letter in the RSVP stream appeared alone at the center of its frame. In the other three conditions, on one frame four #'s appeared along with the central letter. In the four-gray distractor condition the #'s were all gray. In the same-color distractor condition, one frame contained three gray #'s plus one that was the same color as
I !
9
j
! target J
!
,! ISI fixation
;~
~-i
' [ distractor
9 1 4 I9
i RSVP stream onset
Figure 6. Representationof stimuli and sequence of events on a trial with a distractor-target lag of 2. The characters printed in black were actually red or green (see text for details). In the RSVP stream blank frames were inserted between successive letters. The durations of the letter and blank displays for each experimentare given in the text.
110
Egeth, Folk, Leber, Nakama and Hendel
the target letter. In the different-color distractor condition one flame contained three gray #'s and one # that was different in color from the target. (If the subject was searching for a red target, the distractor would be green, and vice versa.) Distractors could appear at any of four temporal lags with respect to the target. The target could appear two frames after the distractor (lag 2), one frame after the distractor, (lag 1), simultaneous with the distractor (lag 0), or one frame before the distractor (lag -1). For the purpose of analysis, each trial in the nodistractor condition was assigned a lag value, but the distractor was omitted from the sequence. Subjects were fully informed about the nature of the trials and were urged to ignore the distractor, if possible. There were 24 practice trials and 320 experimental trials. The results are shown in Figure 7. The four gray distractors condition yielded performance that was independent of lag and essentially identical to the nodistractor condition. In contrast, both the same- and different-color distractor conditions yielded substantial and equal interference that increased as lag varied from -1 to 2. Analysis of variance yielded significant main effects of distractor condition, F(3,93) = 23.53, p < .0001, and distractor-target lag, F(3, 93) = 34.56, 100 ., u
9O
I::
80
0
u
0')
70
.~
60
I~
so
c
IU D.
--0-- no d i s t r a c t o r s -I-
all g r a y
-~-
1 target-colored
-0-
1 non-target-colored
40 30 -1
0
1
Distractor-target
2
lag
Figure 7. Mean percentage of correct target identification as a function of distractor condition and distractor-target lag. For some subjects targets were red, for others green. The four distractor conditions were: (1) no distractors; (2) four gray number signs (#s); (3) one target-colored number sign and three gray number signs; (4) one number sign colored differently from the target (e.g., green if the target was red) and three gray number signs. In this study all nontarget stream letters were gray. Lag was dummy coded for the no-distractor condition; i.e., each of these trials was assigned one of the lag values arbitrarily.
Spatial and Temporal Capture
111
p < .0001, and a significant distractor condition by distractor-target lag interaction, F(9, 279) = 14.25, p < .0001. The fact that interference was no greater for the four gray distractor condition than for the the no-distractor condition indicates that the disruption in the two conditions with colored distractors was not due to the mere presence of abrupt onsets in the periphery. It is interesting to compare this aspect of the data with the results of Egeth and Nakama (1999) described earlier. There, it appeared that sudden onset at the fovea disrupts peripheral processing. There are many differences between the studies, but the discrepancy in outcomes is provocative. Feature search versus singleton detection
The results make it clear that top-down attentional control settings influence the allocation of attention. Although the spatial location of the target was known with certainty, selection of the target required an attentional set for color. When the distractor matched this attentional set it disrupted performance even though its presence and location were irrelevant to the task at hand. In this connection, the fact that the interference was the same in the same- and differentdistractor conditions is intriguing. Keep in mind that a given subject only saw one of these colors as the target. The fact that both colors were equally effective as distractors suggests that subjects may have been operating in what Bacon and Egeth (1994; see also Pashler, 1988) have referred to as singleton detection mode. That is, despite the fact that subjects were supposedly looking for just a specific color, it appears that they may have been doing something more like looking for any nongray item to name. The non-gray distractor was irrelevant to the task both by dint of its shape and its location, but nevertheless subjects could not effortlessly ignore it. An altemative possibility is that the disruption in performance observed in this task does not have anything to do with attentional control settings, but instead reflects capture of attention by any color discrepancy in the periphery (i.e., by a spatial singleton). One way to distinguish these accounts is to set subjects to look for a particular color letter in the stream, but to use several different colors in the stream. With a heterogeneous stream singleton detection mode should not permit target acquisition. In this circumstance, the same-color distractor should show a performance decrement, but the different-color distractor should not. In our next experiment, we again tested at lags of -1, 0, 1, and 2, but now the RSVP stream consisted of variously colored items. For subjects searching for red targets, the other colors in the stream were green, blue, purple, and gray. For subjects searching for a green target the other colors in the stream were red, blue, purple, and gray. The task was somewhat more difficult, and so the duration of letters was increased from 42 to 56 ms for a total frame duration of 98 ms. Note that
112
Egeth, Folk, Leber, Nakama and Hendel
the distractor conditions were the same as in the preceding experiment. The results are shown in Figure 8. Analysis of variance showed that the main effects of distractor condition and distractor-target lag were significant, F(3,90) = 51.80,p < .0001, and F(3, 90) = 37.10,p < .0001, respectively, as was the distractor by lag interaction, F(9, 279) = 27.33, p < .0001. At lag 1 a Tukey test showed that only the same-color distractor differed from the no-distractor condition, while at lag 2 all three distractor conditions differed significantly from the nodistractor condition. However, in contrast to the preceding experiment, the mean for the same-color distractor was significantly lower than the mean for the differentcolor distractor.
100 ~
u
0 u
9O
+
so 70
m c
60
~
5o
no d i s t r a c t o r s
- I - - all g r a y --4- 1 target-colored ---0- 1 n o n - t a r g e t - c o l o r e d
n_ 40 30
-1
0
1
2
Dk~trad:or-ta rget lag Figure 8. Mean percentage of correct target identification as a function of distractor condition and distractor-target lag. For description of distractor conditions see text or Figure 7 caption. In this study, letters of various colors were presented in the RSVP stream.
These results establish two important points. First, it would appear that the subjects in the preceding experiment had adopted an attentional control setting for any singleton, and not just the specific color of the target for which they were instructed to search. Second the results also make it clear that absent the adoption of singleton detection mode, attentional capture is not simply produced by any color discontinuity in the periphery. What these experiments do not tell us is why subjects adopt the strategy of looking for any singleton, especially since adoption of such a strategy leaves one susceptible to interference from salient distractors. This puzzle awaits the results of further research.
Spatial and Temporal Capture
113
Spatial capture versus filtering cost An implicit assumption underlying our experiments on capture by peripheral distractors in RSVP experiments is that the deficits in detection of the central target are due to the spatial capture of attention by the involuntary orienting of spatial attention to an irrelevant spatial location containing a color singleton. However, it is possible that the deficit is not spatial at all, and might even reflect a process as general as the "filtering cost" of Treisman, Kahneman, and Burkell (1983; see Folk & Remington, 1998, for evidence of both spatial and nonspatial forms of attentional capture). To address this issue the design of the preceding experiment, which used a heterogeneous RSVP stream, was changed in several ways. First, the four number-sign (#) distractors were replaced by four boxes. (As before, there were four gray; three gray and one red; or three gray and one green; there was also a no-box condition.) Second, in the frame immediately following the boxes, four gray letters appeared, each one at the center of the space previously occupied by a box. On each trial one of the four peripheral letters (the prime) was the same as the target letter for that trial. The position of the prime was varied systematically across trials. In the same-color and different-color distractor conditions the position of the prime varied such that it appeared at the location of the colored peripheral singleton box on an unpredictable 1/4 of the trials and at the location of one of the three gray non-singleton boxes on 3/4 of the trials. (Note singleton vs. non-singleton status here refers to the color of the boxes.) If spatial attention is drawn to the location of a same-color peripheral singleton, then the likelihood that the gray letter that follows immediately at that location will be identified should be increased, and, if that character is the prime, then we might expect that identification of the central target would be more likely (via perceptual priming, for example). Fifteen subjects searched for a red target, and fifteen searched for a green target. Only two lags were studied, 0 and 2. Note that lag refers here (as before) to the separation between the distractor boxes (not the gray peripheral letters) and the target. Mean percentage of correct target identifications as a function of distractor condition, distractor-target lag, and prime status are presented in Figure 9. Note first that there is a substantial decrement in performance at lag 2 when the distractor box is the same color as the target letter. This decrement is consistent with the effect of distractors observed in the immediately preceding experiments. Analysis of variance showed distractor condition and target-distractor lag both produced significant main effects, F(3, 84) = 39.65, p < .0001, and F(1, 28) = 6.93, p < .01, respectively. These two variables also entered into a significant interaction, F(3, 84) = 16.96, p < .0001. As is evident in the figure, this interaction is driven by a deficit in performance that is specific to lag 2 in the same-color distractor condition.
114
Egeth, Folk, Leber, Nakama and Hendel
C o n s i d e r next the influence o f the peripheral p r i m e on central target identification. P r i m e status (i.e., w h e t h e r the p r i m e letter was at the location o f a singleton or a n o n s i n g l e t o n distractor square) p r o d u c e d a significant m a i n effect, F ( 1 , 28) = 4.51, p < .05, and interacted significantly with distractor condition. S i m p l e effects analyses o f p r i m e status at each distractor c o n d i t i o n y i e l d e d a
100 9 no distractors gray colored 9 1 non-target-colored
9all 91 target
90
u
---- singleton p r i m e ..... n o n - s i n g l e t o n p r i m e
80
x_
u 70
o
4,a
~ .
prime at singleton location
"""'"-A prime at n o n - s i n g l e t o n
60
location
n_ 50
40
30 0
Distractor-target
2
lag
Figure 9. Mean percentage of correct target identification as a function of distractor condition and distractor-target lag and prime status. In this study peripheral boxes appeared in one frame, followed by a frame containing peripheral gray letters, one of which was identical to the target. The distractor conditions refer to the relation between the presence and color of the boxes and the color of the target. The novel finding here is that performance was significantly better when the prime letter was at a singleton as opposed to a non-singleton location in the condition in which there was a target-colored singleton distractor in the stream (denoted by triangles in the figure).
Spatial and Temporal Capture
115
significant effect in the same-color condition only, F(1, 28) = 7.68, p < .01. In this condition, gray primes at the singleton location produced a significant enhancement in central target identification relative to trials on which primes appeared at a nonsingleton location. In an effort to obtain converging evidence for the spatial capture of attention by distractor squares that were the same color as the target, we examined error trials. On error trials, subjects report a letter other than the target letter in the central stream. If attention is shifted to the spatial location of the singleton, this should increase the likelihood that on error trials subjects will report the letter at the singleton location. The percentage of error trials on which subjects incorrectly reported the singleton letter instead of the central target letter is shown in Figure 10.
Figure 10. On error trials, mean percentage of erroneous reports of the letter appearing in the same location as a singleton distractor square as a function of distractor condition and distractor-target lag. In the four gray square condition, where there was no singleton, on each trial one randomly selected location was treated as if it had contained a color singleton. The other two distractor conditions represent cases where the singleton square was the same color as the target or a different color from the target.
(Dummy coding was used in this analysis; In the four gray square condition, where there was no distractor, on each trial one randomly selected location was treated as if it had contained a color singleton.) Note that there are substantially more errors of this type in the same-color condition than in either of the other two distractor conditions, and that this is particularly evident at lag 2. An ANOVA yielded
116
Egeth, Folk, Leber, Nakama and Hendel
significant main effects of both distractor condition and lag, F(2, 56) = 7.45, p < .01, and F(1, 28) = 4.57, p < .05, respectively. Although the interaction between distractor condition and target-distractor lag just failed to reach significance, focused comparisons yielded a significant effect of lag in the same color condition only, F(1, 28) = 11.73, p < .01. General Discussion
The general thrust of our conclusions should already be clear. The contingent involuntary orienting hypothesis of Folk et al. (1992) is able to account for a wide variety of data. It was originally developed to account for data in cuing experiments in which reaction time was the chief independent variable. Hendel's (1998/1999) work has shown that similar results can be obtained when stimuli are shown briefly and masked, with accuracy the dependent variable. Further, her studies suggest that an apparent exception to the hypothesis (Joseph & Optican, 1996) can be explained on the assumption that search for an L among Ts is carried out in singleton detection mode. Folk, Leber, and Egeth (submitted) have subsequently extended the application of the contingent capture hypothesis to the realm of RSVP tasks. As several investigators have shown, the RSVP task is attentionally demanding. More specifically, Joseph, Chun, and Nakayama (1997) have shown that when attention is focused at the center of the field the seemingly preattentive task of detecting an orientation singleton in the periphery becomes difficult. Nakama and Egeth (1999) have shown that the task is demanding even when the first target is absent from the stream. This may challenge theories that claim the blink is due solely to processing of the first target, but it does nothing to detract from the idea that identifying a letter in an RSVP stream is an attentionally demanding task. These results set the stage for determining whether spatially focused attention can be disrupted by a peripheral distractor that shares the defining feature of a centrally displayed target. Such disruption was observed. The form of this disruption is much like that of the attentional blink, but with the role of the traditional first target played here by an irrelevant peripheral distractor. The conditions under which the disruption occurred follow closely what would be expected on the basis of the contingent capture hypothesis. When the target was, say, a red letter in a stream of variously colored letters, only the red distractor interfered with performance. However, when the target was a red letter in a stream of gray letters, then both a red and a green distractor interfered with performance. This suggests subjects perform this latter task by adopting singleton detection mode. Leber and Egeth (2001) have begun to systematically explore what determines whether subjects adopt the feature-search or singleton-detection mode of processing.
Spatialand TemporalCapture
117
References
Bacon, W.F., & Egeth, H.E. (1994). Overriding stimulus-driven attentional capture. Perception & Psychophysics, 55, 485-496. Braun, J. (1998). Vision and attention: the role of training. Nature, 393, 424-425. Braun, J. & Sagi, D. (1990). Vision outside the focus of attention. Perception & Psychophysics, 48, 45-58. Braun, J. & Sagi, D. (1991). Texture-based tasks are little affected by second tasks requiring peripheral or central attentive fixation. Perception, 20, 483500. Breitmeyer, B. G. (1984). Visual masking: An integrative approach. New York, Oxford. Broadbent, D. E., & Broadbent, M. H. (1987). From detection to identification: Response to multiple targets in rapid serial visual presentation. Perception & Psychophysics, 42, 105-113. Chun, M.M., & Potter, M.C. (1995). A two-stage model for multiple target detection in rapid serial visual presentation. Journal of Experimental Psychology: Human Perception and Performance, 21, 109-127. Egeth, H.E., & Dagenbach, D. (1991). Parallel versus serial processing invisual search: Further evidence from subadditive effects of visual quality. Journal of Experimental Psychology: Human Perception and Performance, 17(2), 551-560. Egeth, H.E., & Nakama, T. (1999). Pop-out line detection affected by string length of concurrent RSVP task and "attentional capture" by initial item [Abstract] Investigative Ophthalmology & Visual Science, 40, $808. Folk, C.L., Leber, A.B., & Egeth, H.E. (submitted). Made you blink! Contingent Attentional Capture Produces a Spatial Blink. Folk, C.L., & Remington, R. (1998). Selectivity in distraction by irrelevant featural singletons: Evidence for two forms of attentional capture. Journal of Experimental Psychology: Human Perception and Performance, 24, 847-858. Folk, C.L., Remington, R.W., & Johnston, J.C. (1992). Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 18, 1030-1044. Francolini, C. M., & Egeth, H. E. (1979). Perceptual selectivity is taskdependent: The pop-out effect poops out. Perception & Psychophysics, 25, 99-110. Hendel, S.K. (1999). Attentional shifts to orientation differences are contingent on task demands. (Doctoral dissertation, Johns Hopkins University, 1998). Dissertation Abstracts International, 60-02B, 849. Jolicoeur, P. (1998). Modulation of the attentional blink by on-line response selection: Evidence from speeded and unspeeded task decisions. Memory
118
Egeth, Folk,Leber.NakamaandHendel
& Cognition, 26, 1014-1032. Jonides, J., & Yantis, S. (1988). Uniqueness of abrupt visual onset in capturing attention. Perception & Psychophysics, 43, 346-354. Joseph, J. S., Chun, M. M., & Nakayama, K. (1997). Attentional requirements in a "preattentive" feature search task. Nature, 387, 805-807. Joseph, J.S., & Optican, LM. (1996). Involuntary attentional shifts due to orientation differences. Perception & Psychophysics, 58, 651-665. LaBerge, D. (1983). Spatial extent of attention to letters and words. Journal of Experimental Psychology: Human Perception and Performance, 9, 371379. Leber, A. B., & Egeth, H. E. (2001, May). Exploring mode selection in visual search. Poster session presented at the annual meeting of the Vision Sciences Society, Sarasota, FL. Mounts, J.R.W. (2000). Evidence for suppressive mechanisms in attentional selection: Feature singletons produce inhibitory surrounds. Perception & Psychophysics, 62, 969-983. Mueller, H.J., & Rabbitt, P.M. (1989). Spatial cueing and the relation between the accuracy of "where" and "what" decisions in visual search. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 41, 747773. Nakama, T., & Egeth, H.E. (1999). The dependence of attentional blink on the temporal position of the target in RSVP [Abstract]. Investigative Ophthalmology & Visual Science, 40, $49. O'Regan, J.K., Rensink, R.A. & Clark, J.J. (1999). Change-blindness as a result of"mudsplashes". Nature, 398, 34. Pashler, H. (1988). Cross-dimensional interaction and texture segregation. Perception & Psychophysics, 43, 307-318. Posner, M.I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3-25. Posner, M. I., & Cohen, Y. A. (1984). Components of visual orienting. In H. Bouma and D. G. Bouwhuis, Eds. Attention and Performance X (pp. 531-556). Hillsdale, NJ: Erlbaum. Raymond, J.E., Shapiro, K.L., & Amell, K.M. (1992). Temporary suppression of visual processing in an RSVP task: An attentional blink? Journal of Experimental Psychology: Human Perception and Performance, 18, 849-860. Remington, R.W., Johnston, J.C., & Yantis, S. (1992). Involuntary attentional capture by abrupt onsets. Perception & Psychophysics, 51,279-290. Sagi, D. & Julesz, B. (1985). "Where" and "what" is vision. Science, 228, 1217-1219. Teichner, W.J. & Krebs, M.J. (1974). Laws of visual choice reaction time. Psychological Review, 81, 75-98.
Spatial and Temporal Capture
119
Theeuwes, J. (1991). Cross-dimensional perceptual selectivity. Perception
& Psychophysics, 50, 184-193. Theeuwes, J. (1992). Perceptual selectivity for color and form. Perception
& Psychophysics, 51,599-606. Theeuwes, J. (1994). Stimulus-driven capture and attentional set: Selective search for color and visual abrupt onsets. Journal of Experimental Psychology: Human Perception and Performance, 20, 799-806. Theeuwes, J. (1996). Perceptual selectivity for color and form: On the nature of the interference effect. In A.F. Kramer, M. Coles, & G. Logan (Eds.), Converging operations in the study of visual selective attention (pp. 297-314). Washington, DC: American Psychological Association. Treisman, A., & Gormican, S. (1988). Feature analysis in early vision: Evidence from search asymmetries. Psychological Review, 95, 15-48. Treisman, A., Kahneman, D., & Burkell, J. (1983). Perceptual objects and the cost of filtering. Perception & Psychophysics, 33, 527-532. Turatto, M. & Galfano, G. (2001). Attentional capture by color without any relevant attentional set. Perception & Psychophysics, 63, 286-297. Visser, R. A. W., Bischof, W.F., & Di Lollo, V. (1999). Attentional switching in spatial and nonspatial domains: Evidence from the attentional blink. Psychological Bulletin, 125, 458-469. Yantis, S. (1993). Stimulus-driven attentional capture. Current Directions in Psychological Science, 2, 156-161. Yantis, S., & Egeth, H. E. (1999). On the distinction between visual salience and stimulus-driven attentional capture. Journal of Experimental
Psychology: Human Perception and Performance, 25,661-676. Yantis, S., & Hillstrom, A.P. (1994). Stimulus-driven attentional capture: Evidence from equiluminant visual objects. Journal of Experimental Psychology: Human Perception and Performance, 20, 95-107. Yantis, S., & Jonides, J. (1990). Abrupt visual onsets and selective attention: voluntary vs. automatic allocation. Journal of Experimental Psychology."
Human Perception and Performance, 16,121-134. Author Note
The research reported in this chapter was supported in part by grants to H.E. Egeth from NIMH (R01MH57388) and the FAA (2001-G-020) and by a grant from NSF to C.L. Folk (BCS-9817673). The authors would like to thank Vince DiLollo, Anne Hillstrom, Jim Johnston, Julian Joseph, Peter Jusczyk, Hal Pashler, Matt Peterson, Eric Ruthruff, Jeremy Wolfe, and Steve Yantis for helpful comments on various parts of this work.
This Page Intentionally Left Blank
Attraction, Distraction, and Action: MultiplePerspectives on Attentional Capture C. Folk and B. Gibson(Editors) 9 ElsevierScience B.V. All rights reserved.
121
Attentional and Oculomotor Capture Jan Theeuwes and Richard Godijn
In everyday life we are constantly subjected to complex visual stimulation from the environment. In order to behave in a goal-directed manner, it is important that we select only the relevant information from the environment and ignore information that is irrelevant, particularly when this information disrupts our actions. A visual environment may contain many objects, i.e., potential targets for action, yet the visual system is limited in the number of objects that can be processed at a time. This limitation implies that at some stage (or stages) in the information flow, some objects are excluded from processing. This process of selecting part of simultaneous sources of information by either enhancing processing of some objects or/and by suppressing information of others can be accomplished either covertly or overtly (Posner, 1980). When selection occurs covertly only attention (and not the eyes) is directed to a location in space. For example, without moving one's eyes by directing attention to the right side of the visual field one is able to detect an approaching car from a side street. When selection occurs overtly not only attention but also the eyes are moved to a particular location in space. Thus, by shifting the eyes to the approaching car from the side street one is not only able to determine that there is a car but also what type of car it is (i.e., one can identify the car). Even though in every day life attention and eye movements are usually correlated, attention precedes the overt movement of the eyes and therefore attention and eye movements may be dissociated. A crucial research question is the extent to which we are able to exert cognitive control over what we select from the visual environment. Overt or covert selection may either be controlled by the properties of the stimulus field or by intentions, goals and beliefs of the observer (see recent reviews e.g., Egeth & Yantis, 1997; Theeuwes, 1993, 1994a; Yantis, 1996, 2000). When we intentionally select only those objects and events needed for our current tasks, selection is said to occur in a voluntary, goal-directed manner. When, irrespective of our goals and beliefs, specific properties present in the visual field determine what we select, this selection is said to occur in an involuntary, stimulus-driven manner. These two mechanisms of selection have been referred to as bottom-up and top-down attentional control (e.g., Eriksen & Hoffman, 1972; Posner, 1980; Theeuwes, 1991b; Yantis & Jonides, 1984). When objects or events receive priority independent of the observer's goals and beliefs one refers to attentional capture when such event or object only captures our attention (e.g., Yantis, 1996) and one
122
Theeuwes and Godijn
refers to oculomotor capture when such an event triggers an exogenous saccade to the location of the object or event (Theeuwes, Kramer, Hahn & Irwin, 1998).
Attentional Capture When confronted with a display in which one element is unique in a basic visual dimension (such as a red element surrounded by green elements) one is able to immediately detect this element without any effort. Typically search time to determine whether such a salient prespecified target is present or not is independent of the number of elements in the display. Elements that pop-out from the display are referred to as feature singletons or simply singletons. Given the observation that upon presentation, a feature singleton can immediately be detected it has been claimed that feature singletons receive attention priority independent of the intentions of the observer. In other words, when searching for a prespecified target (such as a red circle between green circles) one may argue that attention is captured in a bottom-up way by the uniquely colored element. As noted by Yantis & Egeth (1999) such a conception is wrong because in this particular example the target of search is also the feature singleton. In other words, it may very well be a top-down strategy to attend to feature singleton because it is the target one is looking for. As pointed out by Yantis and Egeth (1999) one can only speak of attentional capture in a purely stimulus-driven fashion when the stimulus feature in question is completely task-irrelevant, so that there is no incentive for the observer to attend to it deliberately. As expressed by Yantis and Egeth (1999): "If an object with such an attribute captures attention under these conditions, then and only then can that attribute be said to capture attention in a purely stimulus-driven fashion " (p. 663). To investigate whether salient singletons capture attention in a purely stimulus driven manner Theeuwes (1991b, 1992, 1994a) developed a paradigm referred to as the "irrelevant singleton" (Yantis & Egeth, 1999) or "additional singleton" paradigm (e.g., Simons, 2000). In line with the definition above, Theeuwes ensured that the singleton he was investigating was always completely irrelevant for the task. The logic underlying the additional singleton task is simple: participants perform a visual search task, and one item in the search display is a unique salient feature singleton that is unrelated and completely irrelevant to the search task. The feature singleton is never the search target. This condition is compared to a condition in which such an irrelevant featural singleton is not present. For example, Theeuwes (1992) presented participants with displays consisting of colored circles or diamonds appearing on the circumference of an imaginary circle. Line segments of different orientations appeared in the circles and diamonds. Participants had to determine the orientation of the line segment appearing in the target shape. The target shape that participants searched for was a singleton because it was the only circle present in the display (see Figure 1). In the
123
Attentional and Oculomotor Capture
distractor condition, a irrelevant color singleton distractor was also present in the display (see right panel Figure 1). Time to find the shape singleton increased when an irrelevant color singleton was present (i.e., one of the circles was red). Figure 1 presents an example of the stimulus display and the results.
No distractor
_
--:
Color Distractor ,
<5
,3,
.
o'~
Q
Q
650 625
E
Color Distractor
600
~ 575 C .9 (J
~= 5 5 0
0_______._____~
"----'~
No Distractor
Q.t (Z:
525 500
L 5
i 7
j 9
Display Size
Figure 1. Stimuli and data from Theeuwes (1992). Top: A vertical or horizontal line segment appeared within a green circle (the other line segments were oblique). On the left, all diamonds are green (solid lines); on the fight, one of the diamonds is red. Participants were to report the orientation of the line segment in the circle. Bottom: The presence of the irrelevant color singleton slowed responses to the line segment (from Theeuwes, 1992; Experiment 1).
Even though participants had a clear top-down set to search for the shape singleton (i.e., the single green circle), the presence of an irrelevant singleton (i.e., the single red diamond) caused interference. It was shown that selectivity depended on the relative salience of the stimulus attributes: when the color singleton was made less salient (by reducing the color difference between the target and the
124
Theeuwes and Godijn
nontarget elements) than the shape singleton, the shape singleton interfered with search for the color singleton while the color singleton no longer interfered with the search for the shape singleton. Theeuwes (199 lb, 1992, 1994a, 1996) explained the increase in search time in conditions in which an irrelevant singleton was present in terms of attentional capture. Because the irrelevant singleton exogenously captured attention, it required more time before a response could be emitted. Given the observation that selectivity completely depended on the relative salience of the singleton target and the singleton distractor, it was suggested that early visual pre-attentive processing is only driven by bottom-up factors such as salience. Irrespective of the attentional set of the observer, it was argued that spatial attention is automatically and involuntarily captured by the location containing the most salient singleton. The shift of spatial attention to the location of the singleton, implies that the singleton is selected for further processing. If this singleton is the target, a response is made. If it is not the target, attention is directed to the next most salient singleton. The initial shift of attention to the most salient singleton is thought to be the result of relatively inflexible, "hardwired" mechanisms, which are triggered by the presence of feature difference signal interrupts. It is assumed that at each location in the visual field a local feature contrast is calculated that represents how different that object is within a particular primitive feature dimension (e.g., color, shape, contrast, etc). The notion suggested by Theeuwes is similar to that of Koch and Ullman (1985) who introduced the notion of a salience map to accomplish preattentive selection. This map is a two dimensional map that encodes the salience of objects in their visual environment. Neurons in this map compete among each other giving rise to a single winning location (cf. winner take all) that contains the most salient element. If this location is inhibited the next salient location will receive spatial attention (see also Itti & Koch, 2000; Sagi & Julesz, 1985; Nothdurft, 2000). The observation that the irrelevant yet very salient singleton received attentional priority even though it was never relevant for the task is a clear example of purely stimulus-driven attentional capture. The requirement that "one can only speak of attentional capture in a purely stimulus-driven fashion when the stimulus feature in question is completely task-irrelevant" (Yantis & Egeth, 1999) is clearly fulfilled. The salient singleton was never task relevant. Other researchers using Theeuwes' paradigm (or variations to the paradigm) also demonstrated attentional capture, i.e., the presence of a salient singleton interfered with search for the target (see e.g., Bacon & Egeth, 1994, Experiment 1; Caputo & Guerra, 1998; Joseph & Opticon, 1996; Kawahara, 1986; Kim & Cave, 1999; Mounts, 2000; Kumada, 1999; Todd & Kramer, 1994).
Attentional and Oculomotor Capture
125
Filtering costs or capture of spatial attention? Theeuwes (1991 a, 1992, 1994a, 1995b) observed an increase in reaction time for those conditions in which the irrelevant singleton was present. As noted above, he explained these findings in terms of attention capture: attention moved exogenously to the location of the salient singleton before it could move to the location of the (less salient) singleton target. Recently, Folk and Remington (1998) offered an alternative explanation for the increase in RT in conditions in which a distractor was present. They suggested that the increase in search time caused by the irrelevant singleton is due to what they call "filtering costs" a notion first introduced by Kahneman, Treisman and Burkell (1983). The idea of filtering costs is that the presence of an irrelevant singleton may slow the deployment of attention to the target item by requiring an effortful and time-consuming filtering operation. According to this line of reasoning, attention is employed in a top-down way and goes directly to the singleton target; simply because another irrelevant singleton is present, directing attention to the target may take more time than when no such irrelevant singleton is present. Note that this view does not entail a shift of spatial attention to the location of the irrelevant singleton. The filtering cost explanation is compatible with the notion that top-down control can selectively guide spatial attention to the singleton target. Even though a filtering notion can account for the effects of Theeuwes' earlier studies, the results appear to be inconsistent with Theeuwes (1996). Participants performed a typical "additional singleton" search as for example in Theeuwes (1992). Instead of having a neutral element at the location of the irrelevant singleton Theeuwes (1996) manipulated the congruency of the character at the location of the irrelevant distractor. In half of the trials the character at the distractor location was associated with the same response as was required by the target, while on the other half it was the opposite of what was required by the target. Theeuwes (1996) argued that the identity of the character at the location of the irrelevant singleton could only have an effect on responding when at some point spatial attention would have been employed at the location of the distractor. Figure 2 present an example of the stimulus material and the data. If, in line with Folk and Remington's (1998) filtering notion, attention goes directly and exclusively to the target location then there should be no congruency effect of a character presented at the irrelevant singleton location. In other words, if attention never goes to the location of the irrelevant singleton, it is impossible that the identity of a character can have any effect on responding. However, Theeuwes (1996) did find clear congruency effect of about 20 ms (see Figure 2) which provided strong evidence that before a response was given, spatial attention was at the location of the irrelevant singleton. This finding was completely in line with the notion that spatial attention was captured by the irrelevant singleton.
126
Theeuwes and Godijn
r |
| 9
|174
9
r
9
E
..-..
""" 9
r
@
9
~R.....
600 E
|169
|169 9
@
" "9 -..... L"
|
9
@
incongruent
575 "
ii ~ . . . . . . . . . ~ 9 9 congruent
550 _
0
I
I
8
I
w
.
"~no distractor
I 7
9 display
size
Figure 2. Stimuli and data from Theeuwes (1996). Top: Sample stimulus display (display size 7). In the no-distractor condition (top left) the green diamond shape appears among green circles. In the congruent condition (top middle), the letter inside the green diamond target shape (in this case the letter R) is identical to the letter inside the red circle distractor. In the incongruent condition (top right), the letter inside the green diamond target shape is different from the letter inside the red circle distractor. Solid lines indicate green and dotted lines indicate red. Bottom: Data from Theeuwes (1996, Exp. 1). The incongruent condition is significantly slower than the congruent condition suggesting that the letter at the location of the to-be-ignored singleton was processed. This finding indicates that spatial attention was captured by the irrelevant colored singleton enabling the processing of the letter inside the irrelevant singleton (from Theeuwes, 1996; Experiment 1).
However, Folk
and Remington (1998) argued that the results o f Theeuwes (1996) might not reflect a serial shift of spatial attention to the location o f the irrelevant singleton. Instead they suggested parallel processing of both the target and distractor identity. In line with the idea o f perceptual load (Lavie, 1995), they argued that when the number o f elements is small, the identity information can influence
Attentional and Oculomotor Capture
127
response mechanisms in parallel. In other words, they claimed that attention went in parallel to both the target and the irrelevant distractor causing a congruency effect on responding. It is clear that this explanation in terms of parallel processing is not in line with the notion that there is first a shift of spatial attention to the irrelevant singleton location before attention shifts to the location of the target. Folk & Remington (1998) are not clear on whether this suggested parallel processing reflects the same processing mechanisms that causes what they called "filtering costs." In a recent study, Theeuwes & Godijn (submitted) used the phenomenon of "inhibition of" return" (Posner and Cohen, 1984) to solve the dispute whether or not spatial attention ever went to the location of the irrelevant singleton. The basic claim underlying inhibition of return (IOR) is that after attention is reflexively shifted to a location in space, there is delayed responding to stimuli subsequently displayed at that location (see Klein [2000] for a review). Theeuwes & Godijn (submitted) presented a stimulus display consisting of 8 outline circles equally spaced around the fixation point on an imaginary circle. In the center of" each of the eight outline circles there was a small gray dot. All outline circles were gray except one circle which was red, constituting the uniquely colored irrelevant singleton. This colored singleton was completely irrelevant for the task. Participants viewed the display for 1300 ms and then had to detect whether or not one of the small dots was turned off. The results showed that participants were slower to detect the offset of the small gray dot when it was located in the irrelevant singleton distractor than when the gray dot was extinguished in another non-singleton circle. The observation of an IOR effect to the location of the salient red distractor outline circle strongly suggests that attention was captured to that location. Theeuwes & Godijn (submitted) claimed that their results could only be interpreted as evidence for exogenous purely-stimulus driven attentional capture to the location of" the irrelevant singleton. They argued for such an interpretation because there is agreement that IOR is the result of the reflexive, involuntary orienting system and not of the endogenous, voluntary system (e.g., Posner & Cohen, 1984; Pratt, Kingstone & Khoe, 1997; Pratt, Sekuler & McAuliffe, 2001). For example, Pratt et al. (2001) showed that a top-down attentional control setting for a color singleton cannot and will not result in IOR suggesting that indeed IOR is associated with involuntary, bottom-up, stimulus driven attentional processes. In conclusion, overall, there is strong evidence that salient singletons do capture spatial attention in a purely bottom-up, exogenous way. The most parsimonious interpretation of the current findings is that top-down control during early pre-attentive search is not possible. Selectivity is determined by the salience of objects in the visual field, i.e., the most salient singleton gets attention first; after this location is inhibited the next salient location will receive spatial attention. There seems to be no need to interpret the current findings by notions such as non-spatial filtering costs, and/or parallel processing due to low attentional load.
128
Theeuwes and Godijn
Speed of disengaging spatial attention Even though there is now quite some evidence from different laboratories for the notion that salient elements capture attention exogenously (e.g., Caputo & Guerra, 1998; Joseph & Optican, 1996; Kim & Cave, 1999; Kumada, 1999; Mounts, 2000; Turatto & Galfano, 2000), a group of other researchers have basically argued for a position that is completely opposite of that of Theeuwes. Folk and colleagues (Folk, Remington & Johnston, 1992; Folk, Remington & Wright, 1994; Folk & Remington, 1998) have argued that the ability of a singleton to capture attention is completely contingent on whether an attention-capturing stimulus is consistent with top-down settings which are established "off-line" on the basis of current attentional goals. According to the "contingent capture" model, only stimuli that match the top-down control settings will capture attention; stimuli that do not match the top-down settings will be ignored. Thus, according to this theory top-down control is possible even when target and distractor are both salient singletons. A crucial question is why Folk and colleagues report top-down control over attentional capture while Theeuwes (and others) have reported evidence for bottom-up attentional capture. The answer to this question may be the procedural differences between the paradigms employed by Folk et al. and Theeuwes. Folk et al. use a spatial cueing paradigm in which participants have to ignore a "cue" that appears 150 ms prior to the presentation of the target display (see, e.g., Folk et al. 1992). Participants respond to a character shape (X vs. =) which, in different conditions, has either a unique color or a unique abrupt onset. When the search display is preceded by a to-be-ignored featural singleton (the "cue") that matches the singleton for which they are searching, the cue captures attention as evidenced by a prolonged reaction time to identify the target (i.e., when the cue and target appear in different spatial locations). On the other hand, if the to-be-ignored featural singleton "cue" does not match the singleton for which they are searching, its appearance apparently does not have an effect on responding, i.e., the cue does not capture attention. This "contingent" capture of attention occurred for both color and onset conditions, and was considered evidence that involuntary capture is contingent on the adoption of some attentional set. The critical finding in these studies is that a cue that does not match the top-down search goal (i.e., the defining property of the target) does not affect RT (i.e., a zero effect), while a cue that matches the search goal has an effect on RT. In other words, if participants were searching for a red plus sign, they were more likely to be distracted by a red cue than a cue that was an abrupt onset and vice versa. Folk et al. (1992) suggested that the absence of an effect on RT for a cue that does not match the target indicates that the cue did not capture attention.
Attentional and Oculomotor Capture
129
Contrary to the spatial cueing paradigm, Theeuwes used the above described additional singleton search task in which the target and singleton distractor were simultaneously present. He showed that, independent of any topdown goal, an irrelevant singleton that was more salient than the singleton target interfered with search. The increase in RT which was always 15 to 25 ms was taken as evidence that attention was captured by the salient singleton distractor. Given that the distraction effects in basically all experiments of Theeuwes were relatively small it is very well feasible that attention shifts to the irrelevant singleton for a relatively brief time before it moves on to the singleton target. As Theeuwes et al. (2000) have argued it is quite feasible that also in Folk et al.'s spatial cueing paradigm the irrelevant cue did capture attention. Because there was a delay of 150 ms between the presentation of the cue and the search display the cue display participants may have been able to overcome the attentional capture by the time the search display was presented (see also, Theeuwes, 1994a, 1994b). Disengagement of attention from the cue may have been relatively fast when the cue and target did not share the same defining properties (e.g., the cue is red and the target is an onset), while disengagement from the cue may have been relatively slow in the case where the cue and target share the same defining properties (e.g., both were red). Such a mechanism could explain why there are RT costs when the cue and target have the same defining characteristics and no costs when cue and target are different. In this view the contingent capture hypothesis can explain why it may be easier to disengage attention from a particular location when an element presented at that location is not in line with the top-down control settings. However, this does not imply that there is no capture of attention by the irrelevant cue singleton; it simply indicates that after a certain time participants are able to exert top-down control over the erroneous capture of attention by the irrelevant singleton. Theeuwes et al. (2000) provided strong evidence for the claim that once attention is exogenously captured by an irrelevant singleton it only takes a very brief time to disengage attention from that location. Theeuwes et al. (2000) used a visual search task similar to that of Theeuwes (1992) in which participants searched for a shape singleton (a single gray diamond among 8 gray circles). Prior to the presentation of the target display at different SOAs (50, 100, 150, 200, 250 and 300 ms) a color singleton was presented. Figure 3 presents an example of the stimulus display and the data (Experiment 1 from Theeuwes et al. 2000). As is clear from Figure 3 the presence of an irrelevant salient distractor only had an effect when the singleton target and distractor were presented in close succession. Theeuwes et al. argued that in conditions in which target and distractor were presented in close temporal proximity there was not enough time to exert topdown control that could have overcome attentional capture by the salient distractor. However, when the singleton distractor was presented a considerable time (SOAs 150 to 300 ms) before the presentation of the singleton target, it was possible to
130
Theeuwes and Godijn
exert sufficient top-down control such that by the time the singleton target was presented there is no sign of attentional capture by the distractor anymore.
-Qj,
<)
@|
0<)
@
+
+
|
|174169 . . . . . . . .
tea gray
750 6.7%
745 740 ,-, 735 03
E,E 730
5.
113
.__ 725 tO
720
o
5.8%
t~ 715 (13 710
5.4% 7.0%
o
4.4%
705 700
no distractor
50
100
150
200
250
300
SOA (ms)
Figure 3. Stimuli and data from Theeuwes et al. (2000). Top: Sample stimulus display 9 The premask display (top left) was presented for 700 ms. At SOA of 50,100, 150, 200, 250 and 300 ms before the presentation of the search display, the color of one of the elements of the premask changed from gray into equiluminance red (top middle) 9 The search display (top right) contained both the color distractor singleton (the red element) and a shape singleton target (the diamond). Bottom: The irrelevant color distractor only had an effect on responding when it was presented close in time to the target singleton (50 and 100 ms). The results indicate that attentional capture occurs early in processing and is relatively short lived. When the irrelevant singleton preceeds the target singleton by 150 ms there is no evidence of capture anymore (from Theeuwes et al., 2000; Experiment 1).
Attentional and Oculomotor Capture
131
The results of Theeuwes et al. (2000) showing no sign of attentional capture when the interval between the target and singleton distractor is more than 100 ms provides some new insights regarding the findings obtained with the spatial cuing paradigm of Folk and colleagues (Folk et al. 1992, 1993; Folk & Remington, 1998). As noted before, in the spatial cueing paradigm participants have to ignore a cue that appears 150 ms before the search display. The critical finding is that a cue that does not match the top-down search goal (e.g., as in Theeuwes' experiments, the search goal is a shape singleton, the cue is a color singleton) does not affect RT while a cue that does match the search goal slows search. The finding that there is an effect on RT in Folk et al's experiments when the cue and target share the same defining property (e.g., the cue is red and the target is red) is not surprising since it is likely that disengagement and redirection of attention from the distractor location will take much longer when the distractor and target have the same defining property. It will be clear that this explanation of Folk et al.'s data does not suggest anything like a "contingent capture" hypothesis, but merely confirms Theeuwes' stimulus-driven model of selection (Theeuwes, 1992) in which early processing is driven by bottom-up salience factors (see also Gibson, 1999; Gibson & Wenger, 2000). The current interpretation fits very well with data reported by Kim & Cave (1999) employing the additional singleton search task (e.g., as in Theeuwes, 1992) in combination with a probe detection task. Kim and Cave (1999) presented probes either 60 or 150 ms after the presentation of the search display at the location of the target and the location of the distractor. It was hypothesized that if the early preattentive processing is solely driven by bottom-up salience as suggested by Theeuwes (1991a, 1992), then the location of the salient singleton distractor should be attended first. Therefore, the probe RT at the distractor location should be faster than at any of the other locations in the short SOA condition regardless of whether the unique feature is relevant or not. On the other hand, if top-down control is possible somewhat later in time as the current experiments suggest, then it is expected that in the late SOA condition, attention will no longer be at the distractor location but instead will be at the location of the singleton target. For those conditions in which target and distractor were locally unique (and therefore salient enough) Kim and Cave (1999) did indeed find these results. At the 60 ms SOA the probe RT at the location of the singleton distractor was about 20 ms faster then at the singleton target location. At the 150 ms SOA however this pattem was reversed: the probe RT at the target location was about 15 ms faster then at the distractor location. The bottom line is that Kim and Cave also show that after 150 ms, attention is no longer at the location of the distractor but instead at the location of the target. Recently, Kim & Cave (2001) replicated these findings using a focused attention task in which participants responded to a central letter that was flanked on one side by a letter having the same color as the target letter and on the other side by the uniquely colored letter. Again, it was shown that at 60 ms attention was first
132
Theeuwes and Godijn
captured by uniquely colored distractor, while soon thereafter (at 150 ms) the flanker that shared the target color received more attentional activation. Given the findings of Kim and Cave (1999, 2001) and those of Theeuwes et al. (2000) it is likely that it takes somewhere between 100 and 150 ms to disengage attention from the location of the distractor and redirect it to the location of the singleton target, at least when the distractor and singleton target have different defining properties. The current findings give a clear answer why Folk et al. never reported any attentional capture while Theeuwes always did find evidence for attentional capture. Obviously, because it only takes 100 to 150 ms to disengage attention from the location of a singleton distractor, in Folk et al.'s experiments the interval between the to-be-ignored cue and the target display was too long to find evidence for capture. By the time the search display was presented, participants were able to overcome initial attentional capture by the irrelevant singleton. Note that the probe study of Kim and Cave (1999, 2001 and Theeuwes et al., 2000; Experiment 2 ) also disconfirm the notion of filtering costs as discussed in the previous paragraph. Obviously to find a probe RT benefit at the location of the singleton distractor at 60 ms spatial attention must have been at that location.
Attentional capture and inhibitory surrounds In a recent study Mounts (2000) showed that attentional selection of one object results in the inhibition of processing of neighboring objects. Mounts' Experiment 1 was similar to that of Theeuwes (1992). Mounts showed that the identification of a letter (E or H) was slowed by the presence of an irrelevant color singleton. Importantly, Mounts (2000) showed that the identification of the letter was slowest when it was located next to the irrelevant color singleton. With increasing target-distractor separation, RT to the target letter decreased. Mounts (2000) argued that finding a gradient i.e., greater latencies in target discrimination when the target is near an object that captures attention can be considered as another "way to identify attentional capture" (p. 1486). The earlier discussed Theeuwes (1992) study which showed attentional capture by a color singleton while searching for a shape singleton also showed a clear target-distractor distance effect. Note that these data were never published in the open literature (only as a technical TNO report; see Theeuwes, 1991 c) because at that time it was unclear what this target-distractor separation could imply. Figure 4 presents that data from Experiment 1 of Theeuwes (1992) in which eight participants searched for a green circle while a red color singleton was present. Figure 4 presents the data from display size 9 (the effect of distance was reliable [F(3,21) = 3.7; p< 0.05]). The above effect shows a clear effect of the distance between the target shape and the color distractor. In line with Mounts (2000), finding this clear target-
Attentional and Oculomotor Capture
133
distractor separation effect can be considered as additional evidence that the distractor singleton captured attention. Mounts (2000) argues that these results indicate that the attentional selection of the salient singleton distractor results in inhibition of perceptual processing for neighboring objects. Mounts (2000) explains the mechanism by which selection may occur in these task only in terms of inhibition to prevent ambiguities in perceptual coding (see e.g. Luck, Girelli, McDermott & Ford, 1997) or in terms of resources,, i.e., the salient item pulls more resources away from the more closer than from the distant elements (e.g., Bahcall & Kowler, 1999).
I
'I
650
I
I
olor distractor
-625
E
~ 600 c t~
575
E 55O no distractor
525 I
0
2.3 =
.t
1
4.6 =
j I
2
6.9 =
I
3
9.2 =
distance between target and distractor /in elements & degrees)
Figure 4. Mean RT as a function of the distance between the shape singleton target and the color singleton distractor (in number of items and visual angle) for display size 9. Data were re-analyzed from Theeuwes 1992; Experiment 1.
However, Mounts does not address the issue of how attention gets to the singleton target after it is captured by the distractor. The probe data from Kim and Cave (1999, 2001) indicate a clear serial mechanism of spatial attention: first focal attention goes to the singleton distractor (at 60 ms) and then it goes to the target (at 150 ms). If this is the case then the distance effects of Mounts (2000) and of Theeuwes (see above) may not be in line with an explanation in terms of attentional capture. If attention is first at the location of the singleton distractor and then has to be shifted to the location of the target, one would have expected that the distance effect would increase with increasing separation between target and distractor, at
134
Theeuwes and Godijn
least when one believes that the further the focus of attention has to travel the longer it takes (see Tsal, 1983). However, even though this may seem plausible there is not much evidence that the time to shift attention depends on the distance over which attention must be shifted. In fact, Kwak, Dagenbach and Egeth (1991) showed quite elegantly that attention might shift in a time-invariant way. Therefore, if anything, the target-singleton distractor separation effect may not reflect the time to shift attention from one location to the next and may indeed represent inhibition of neighboring objects due to attentional capture as suggested by Mounts. However, there is yet another explanation for the target-distractor distance effect (see Theeuwes, 1991c) which was not considered by Mounts (2000). It is possible that the salience of the singleton target (in Theeuwes' case the green circle) depends on the extent to which this singleton target stands out from its environment. An element surrounded by a homogenous group of contrasting elements (in Theeuwes' case green squares) will be more salient than an element that is surrounded by a less homogeneous local environment. Therefore it is possible that the build-up of activation signaling the singleton target is slower when a color singleton (e.g., a red square) is nearby than when the color singleton is in the opposite side of the visual field. According to this view, the closer the singleton distractor is to the singleton target, the less salient the singleton target becomes and the longer it takes to re-shift attention to the singleton target after it has been captured by the color singleton. This mechanism is related to the "weight linkage" process described by Duncan and Humphreys (1989) which refers to the ease with which one is able to reject nontarget elements. Obviously, if the activation signaling the singleton target depends on the extent to which the singleton target is unique in its local surroundings then the separation effect simply represents the strength of activations instead of inhibitory processing.
Oculomotor capture Theeuwes et al. (1998, 1999; Kramer, Hahn, Irwin, & Theeuwes, 1999) developed a paradigm referred to as the oculomotor capture paradigm which uses the same logic as the additional singleton paradigm. Instead of inferring capture on the basis of a slowed response to the target, capture is reflected by an inappropriate eye movement toward the irrelevant item. As in Theeuwes' earlier paradigms, in the oculomotor capture paradigm, goal-directed selection was pitted against stimulusdriven selection. In addition to examining the capture of attention in terms of pattems of manual RTs and accuracies, in these experiments also the eye movement pattem was registered. Participants had the explicit instruction to search and make a saccadic eye movement towards the only gray element in the display. On some trials, an irrelevant item presented with abrupt onset was added to the display. The condition in which a to-be-ignored onset was presented somewhere in the visual
Attentional and Oculomotor Capture
13 5
field was compared to a control condition in which there was no onset added to the display. The results showed that when no onset was added to the display, observers made saccades that generally went directly to the uniquely colored circle. However, in those trials in which an onset was added to the display, in about 40% of the trials the eye went in the direction of the onset, stopped briefly, and then went on to the target. Figure 5 shows the results. The graphs on the left side depict the control condition without the onset; the graphs on the right side depict the condition in which an onset was presented. Note that in the condition with the onset, the eye often went to the distractor. This occurred even when the onset appeared at a side opposite to that of the target circle (see Figure 5 bottom panels: an abrupt onset is presented at 150 degrees separation). When asked, none of the observers was aware that their eye movement behavior was influenced by the appearance of the onset, even though in many cases the eyes went in a direction completely opposite where the target was located (see Theeuwes et al., 1998). The observation that irrelevant singletons may not only capture attention but may also capture our eyes is not surprising given the fact that there is a close relationship between the attentional and oculomotor system (e.g., Rizzolatti, Riggio, Dascola, & Umilta, 1987). It is generally agreed that there is an obligatory and selective coupling between saccade execution and visual attention to one common target object (see e.g., Hoffman and Subramaniam,1995; Deubel and Schneider, 1996; Remington, 1980). When a saccade is executed, attention precedes the eyes to the saccade target location. If one accepts the notion that attention may get captured exogenously by a salient singleton, and one accepts the notion of an obligatory coupling between attention and saccade execution, it is not surprising that attentional capture is accompanied by oculomotor capture. In this sense, finding oculomotor capture can be interpreted as evidence against the earlier discussed contingent capture hypothesis (e.g., Folk et al., 1992) according to which the control of selection is never purely stimulus driven. If, in Theeuwes et al. (1998, 1999, see also Irwin, Colcombe, Kramer, & Hahn, 2000) an attentional set to attend to a color singleton would have overridden the bottom-up capture of the abrupt onset, as claimed by the contingent capture hypothesis, then one would have expected no oculomotor capture whatsoever. The results displayed in Figure 5 clearly show that there is oculomotor capture. Capture of spatial attention even when there is no oculomotor capture? In Theeuwes et al. (1998, 1999) in about 30 to 40% of the trials, the eyes first moved to the onset before they moved on the singleton target. In the other 60 to 70% of the trials the eyes moved directly to the singleton target as if the presence of the onset had no effect on processing at all (see Figure 5). The question is whether indeed in those trials in which the eyes go directly to the singleton target,
136
Theeuwes and Godijn Control (30 degrees)
Abrupt onset at 30 degrees
A"-- "\
9
\_/
.. 9
.
.
Control (90 degrces)
:: '.J'. 9
:.::i.. i.
..: <::,~~ i ~ :
.'" ."~','i'4.~?,-: ~ ~ - .... :.
~-~
Abrupt onset at 90 dcgrccs
~:.,~
9 :-~'i'~:.:.
,:'"."~"
/-h
9".', ~..);'.'.:~.i:"~
~ ~- :b, . ~ 9
~'.
' : , .-(~.-~:. i..
~::~.'...
q"i'; i-' " }'
....
41) Control ( 150 degrees)
Abrupt onset at 150 degrees
40
9
,
...u...
~.,"-,:.;.j.
',~ ,,~
Q .~..
. ,::?-
Q
'. ".
..
i
:~:~,.,..'. 9
~
~ ,.
-...
"
:'"'"~""::.... ' ' .
,~. 9" ' ~ . ~ - ~ 3 ~ .
O L. . . .
Figure 5. Oculomotor Capture. Data from Theeuwes et al. (1999). Eye movement behavior in the condition in which an abrupt onset was presented simultaneously with the target. The results are collapsed over all eight participants and normalized with respect to the position of target and onset. Sample points (every 4 ms) are only taken from the first saccade. Left panels: Eye movement behavior in the control condition in which no abrupt onset was presented. Right panels: Eye movement behavior in the condition in which an abrupt onset was presented; Either close to the target (TOP) somewhat away from the target (MIDDLE) and or at the opposite side from the target (BOTTOM). (From Theeuwes et al., 1999).
Attentional and Oculomotor Capture
137
the presence of the onset had no effect. Given the evidence that onsets capture attention exogenously (e.g., Theeuwes, 1994a; Yantis & Jonides, 1984), it is quite feasible that even for the trials in which the eyes never moved to the onset, attention did. Recently, Godijn and Theeuwes (submitted-a) addressed this question using the phenomenon of "inhibition of return" (Posner & Cohen, 1984) as a way to determine whether spatial attention ever went to the location of the onset even for those trials in which the eyes directly went to the target. The basic claim underlying inhibition of return (IOR) is that after attention is reflexively shifted to a location in space, there is delayed responding to stimuli subsequently displayed at that location (see Klein [2000] for a review). For example, Abrams and Dobkin (1994) showed that participants were slower to initiate an eye movement to a previously attended location. We used a modified version of the oculomotor capture paradigm of Theeuwes et al. (1998, 1999). As in Theeuwes et al. observers made a goaldirected saccade to a uniquely colored target element while an irrelevant sudden onset was presented somewhere in the visual field. After fixating the initial target element another element in the visual field became the next target element to fixate. This new target element was presented at the location at which the sudden onset had previously appeared or at a location of a non-onset element. The trials of interest were those in which the eyes went directly to the initial singleton target. The results were straightforward: even when the eyes never went to the location of the abrupt onset, participants were slower to move their eyes to the location at which the onset previously had appeared. In other words, regardless of whether or not an actual reflexive saccade to the onset was made, subsequent orienting to this location was inhibited. These findings clearly show a dissociation between the oculomotor and the attentional system: Even when attention is reflexively moved to one location, the eyes may endogenously move to another location. An explanation for the current results is to assume that the presentation of the sudden onset always caused a reflexive shift of attention to the location of the onset, regardless of whether the eyes actually moved there or not. This is in line with previous research showing that sudden onsets capture attention exogenously (e.g. Jonides, 1981; Theeuwes, 199 lb, 1994a; Remington, Johnston, & Yantis, 1992; Yantis & Jonides, 1984). On some trials this shift of attention results in oculomotor capture, i.e., a reflexive saccade is executed to the onset, while on other trials it did not result in oculomot6r capture. Regardless of whether oculomotor capture occurred we assume that at some point attention was at the location of the sudden onset resulting in an inhibitory aftereffect. This inhibition of already attended locations prevents observers from re-orienting the same locations, thereby encouraging the sampling of new information in the visual scene (e.g. Klein & MacInnis, 1999; Posner & Cohen, 1984). How this inhibition is actually achieved is still a matter of debate (for overviews see Klein, 2000; Taylor & Klein, 1998).
138
Theeuwes and Godijn
Further speculations Is top-down control possible? The outline above may suggest that our view implies that there is no topdown control whatsoever in early visual processing. Obviously this is not correct. If spatial location can be used as the basis for selection it is very well possible to prevent capture of attention by irrelevant singletons. For example, Yantis and Jonides (1990) showed the ability of spatial attention to filter out irrelevant information. When the location of the impending target letter was endogenously cued by means of a central arrow and this cue was 100% valid, an irrelevant onset singleton elsewhere in the visual field did not capture attention anymore (see also Theeuwes, 199 l b). Another example of spatial location as the basis for selection to prevent interference from distractors comes from the classic study of Eriksen and Eriksen (1974). In these studies participants made a choice reaction to a letter appearing at a known location. If other letters, appropriate to a different reaction, were separated from the target by more than 1 degree of visual angle, their identity had no effect on responding. Because the location of the target was known it was possible to filter out irrelevant information from other locations. Only elements falling within the hypothetical spotlight (i.e., within 1 degree of the target) affected responding to the target. These results fit well with a notion of a spotlight as a filtering mechanism: within the beam, events are admitted to further processing and outside the beam they are not. Theeuwes (1994b) argued for an attentional spatial window. The size of this attentional window is fully under top-down control. The function of such a window is to limit processing to only those elements within the attentional window and prevent processing of elements outside the attentional window (but see Theeuwes, Kramer & Atchley [2001] who showed that this filtering mechanism is far from perfect). With respect to typical visual search in which there is not an endogenous cue to direct attention, it is possible to prevent attentional capture by irrelevant singletons by varying the spatial attentional window. Typically when a search task requires the detection of an element that is not available preattentively (e.g., a conjunction target or a target letter between nontarget letters), participants may engage in a focused attentional state to allow serial search through the display. Under these circumstances in which there is serial search for a target, salient singletons may not always capture attention exogenously (e.g. Jonides & Yantis, 1988). Only when these singletons are very salient such as abrupt onset (as in e.g., Yantis & Jonides, 1984; Theeuwes, 1990) or when very sensitive behavioral measures are used (i.e., identity intrusion effect as in Theeuwes & Burger, 1998), one may still find evidence for attentional capture. If attention is focused, there is
Attentional and Oculomotor Capture
139
basically no pre-attentive processing in the periphery as was recently shown by Joseph, Chun and Nakayama (1997). However, when participants divide attention over the visual field setting the attentional window across the whole stimulus field, within this attentional spatial window selection is determined by salience. The question is why participants choose to divide attention across the visual field because this may result in capture by feature singletons that may not be the target of search. The answer is simple: such a strategy is beneficial when the target is also a feature singleton. Only by dividing attention across the visual field, preattentive features be picked up (as shown recently by Joseph, et al., 1997). In order to detect the singleton target by parallel processing, attention has to be divided across the stimulus field, and the result is that any salient feature that stands out grabs attention. Within the top-down controlled attentional window no top-down control is possible. It should be noted however that the size of the attentional window which is under top-down control might be adjusted and optimized for a visual search task so that target detection is still fairly easy while interference is reduced to a minimum. So when the target of search is not a feature singleton that can be detected by parallel search participants may adopt the attentional window to a smaller size. For example, when searching for a conjunction target, instead of dividing attention across the whole field, a spatial window that covers for example three items (so-called "clumpwise" search) within which a target may pop-out, will give relatively fast search times. Ultimately, when the difference between target and distractors is so small that attentive processing is required to detect this difference (a low signal-to-noise ratio between target and distractor) the attentional window may be so small that it covers individual items (e.g., the sequential focused search mode). Note that this interpretation invalidates the strict orthodox parallel-serial dichotomy. It suggests that search can be serial -to-parallel in which there is an increasing finer attentional window until the target is detected. Directing attention to a location in space cannot only prevent attentional capture but it can also prevent oculomotor capture. In Theeuwes et al. (1998) Experiment 2 the location of the upcoming singleton target was endogenously cued by means of a central arrow. The results showed that the irrelevant abrupt onset no longer captured the eyes. If attention is endogenously focused on one particular location, an onset distractor from other location does not cause any attentional nor oculomotor capture. Singleton versus feature detection mode
Experiments addressing attentional capture show that under certain conditions irrelevant singletons capture attention while under other conditions these same singletons do not capture attention. When participants search for a target that is a singleton itself (as in the additional singleton paradigm) other more salient
140
Theeuwes and Godijn
singletons capture attention. On the other hand, if the target of search is not a feature singleton (e.g., the target is a letter among other letters or a conjunction target) then feature singletons do not capture attention (e.g., Jonides & Yantis, 1988; Theeuwes, 1990). To account for these differences Bacon and Egeth (1994) suggested that participants may enter in either a feature detection or in a singleton detection mode. When participants engage in a singleton detection mode they choose to direct attention to the location having the largest feature contrast. When engaged in this mode, the most salient singleton will capture attention regardless of whether it is the target or not. Note however that according to Theeuwes such findings reflect bottom-up capture while Bacon and Egeth (1994) suggest that this is a top-down strategy (i.e., the strategy is to engage into a singleton detection mode). When participants engage in a feature detection mode they choose to direct their attention to a particular feature (e.g., a green circle) and when choosing this mode there will be no attentional capture. The account of Bacon & Egeth (1994) has been accepted as a way to reconcile differences in attentional capture between the different paradigms. On the face of it is seems like a reasonable explanation. However, it remains completely unclear why participants would choose a singleton detection mode when this mode will result in erroneous attentional capture. Take the experiments of Theeuwes (1992): participants always searched for a green circle (see Figure 1). It was known that in some conditions a red irrelevant square was present on each and every trial. Why would participants in this condition not switch to a feature detection mode (e.g., pick up the green circle) when this would prevent attentional capture by the red square? If these modes exist and they are under top-down control as suggested by Bacon & Egeth (1994) switching would be the best option. One may argue that switching is possible but it would require some time to set or allow the switch. This explanation also does not hold. In experiment 2 of Theeuwes (1992) participants received 6 blocks of 144 trials in which the irrelevant red square was present on each and every of these 864 trials. This did not alter the results whatsoever, the red square interfered with search for the green circle. On more theoretical grounds it is also questionable whether one can hold on to the idea of these two attentional modes. If we simply follow the definition of attentional capture as given by Yantis and Egeth (1999) that "one can only speak of attentional capture in a purely stimulus-driven fashion when the stimulus feature in question is completely task-irrelevant" then it is clear that finding attentional capture by a singleton that is always completely task irrelevant (as in Theeuwes' experiments) is a clear case of attentional capture. If one argues that participants choose the mode to attend to task-irrelevant features in a top-down way then it becomes basically impossible to argue on empirical grounds for attentional capture. If against the instructions of the experimenter, attention is captured by an irrelevant
Attentional and Oculomotor Capture
141
object or our eyes move in a completely wrong direction and one would still claim that this reflects a top-down strategy then it becomes impossible to empirically address issues regarding attentional and oculomotor capture. One way to reconcile the viewpoints of Bacon and Egeth regarding the two search modes is to use the notion that the size of the attentional window may be under top-down control. If searching for a singleton the attentional window is set wide to encompass the whole stimulus display and any salient element within that window can capture attention. If however the target of search is not a singleton as in Bacon and Egeth (1994) experiment 2 then the size will be adjusted so that salient singletons outside the attentional window do not compete for attention (and do not capture attention). The data of Bacon and Egeth (1994) in fact suggest that this happened. In experiment 2 when the target was no longer a feature singleton, search became partially serial through the display (i.e., small positive search functions of about 10 to 11 ms /item). Because search was partially serial (i.e., clumpwise search), the distracting effect of irrelevant singletons was greatly attenuated. Without claiming the existence of two search modes, one can simply argue that when search becomes serial (as in Yantis & Jonides, 1984; Theeuwes, 1990) or partly serial (as in Bacon & Egeth, 1984) distracting effects are attenuated or may even be absent. Whether search is serial, partly serial or parallel most likely depends on the stimulus material (i.e., target-nontarget relationship; nontargetmontarget relationship, as suggested by Duncan & Humphreys, 1997). How to measure attentional capture?
What we have seen is that attentional capture effects are small (15 to 25 ms) and that they are short lived (capture occurs within a 100 ms after display onset). When the eyes are captured by an onset these effects occur also relatively fast. Mean eye latencies toward the onset are usually around 160 ms, and fixations on the onset only last about 90 ms. Within 250 to 300 ms the eyes are at the location of the target even when they first went to the onset. The fact that capture effects are small and occur within a very small time window suggests that one has to employ sensitive measures to find attentional capture. If the measures are not sensitive enough one may find no effect of the irrelevant singleton and one may conclude inappropriately that there is top-down control. A good example of a more sensitive measure to determine attentional capture was recently offered by Gibson and Wenger (1999, 2000). They developed an alternative measure of attentional orienting that proved to be able to reveal attentional capture even when mean RTs do not show such an effect. Gibson and Wenger replicated Folk et al. (1992) spatial cueing experiments and also found that RTs were not affected by the presence of a cue when the cue did not match the attentional set. The null effect (no RT differences) when the cue and target do not
142
Theeuwes and Godo'n
match (e.g., the target is color singleton, the cue is an onset) has always been considered as evidence for the notion that attentional capture is contingent on topdown control setting. However, in addition to calculating mean RTs, Gibson and Wenger (2000) also calculated the integrated hazard functions based on the underlying response distributions. These functions represent the processing capacity as the cumulative amount of work performed over time. When a target and a cue match (e.g., the target is a color singleton and the cue is a color singleton) and they are presented at the same location one will find RT benefits and the alternative measure involving integrated hazard functions will show a higher processing capacity i.e., observers can perform more work across time. This is exactly what Gibson and Wenger found. However, the more important condition is the condition in which the cue is assumed not to capture any attention because the cue and target did not match. Also in this condition there was a clear increase of processing capacity in particular for the short RTs. The temporary capacity benefit associated only with the shortest RTs also explains why there was no overall effect observed in RT: the same amount of work was performed in both valid and invalid conditions across most of the RT distributions. Gibson and Wenger (2000) concluded that consistent with Theeuwes (1994) attention was captured by the cue regardless of whether it matched the target or not. These findings provide strong evidence against the contingent capture hypothesis of Folk et al. (1992). It is important to note that Gibson and Wenger point out that matching cues have a stronger and longer-lasting effect than non-matching cues, because attention may be disengaged more efficiently from non-matching cues than from matching cues as suggested by Theeuwes (1994b). Another way to determine whether an element captures attention is to make use of the identity intrusion technique (see e.g., see Figure 2; Theeuwes & Burger, 1998). The main features of this technique were already discussed when addressing the question whether there is spatial attentional capture or filtering costs. The basic idea underlying this technique is related to the Eriksen and Eriksen (1984) congruency manipulation which was typically used for focused attention tasks. In this paradigm the singleton element that observers had to ignore was either identical or different from the target element they were looking for. For example in Theeuwes and Burger (1998) participants searched for the target letter E or R among a variable number of nontarget letters. In each display there was one letter that had a unique color, constituting the singleton that had to be ignored. The singleton to be ignored was either identical to the target letter ("congruent" condition: i.e., both letters were "E"s or "R"s) or different from the target letter ("incongruent" condition: the singleton was an "E" and the target letter was an "R"; vice versa). If participants could ignore the color singleton successfully, then it was expected that the identity of the singleton would have no effect on search for the target element. Alternatively, when participants would not be capable of completely
Attentional and Oculomotor Capture
143
ignoring the color singleton then it was expected that the processing of a response~ incongruent singleton will produce performance costs relative to a responsecompatible singleton. This latter effect is referred to as the identity intrusion effect (see Theeuwes & Burger, 1998). Finding an identity intrusion effect does not only show that attention was directed at the location of the to-be-ignored item but it also reveals that the presence of the item was associated with active stimulus processing. In this particular task in which participants searched for a target letter while a to-be-ignored uniquely colored letter was present indicate that it is only possible to ignore processing an irrelevant singleton when both the target color and the distractor color were known and remained fixed over trials. When either the color of the target or the color of the distractor was varied over trials, participants were no longer able to filter the distractor. By measuring the identity intrusion effect one may have a more sensitive measure of attentional capture than the more traditional measures such as RT interference. Another way to assess attentional capture which recently has been used quite frequently is by simply asking participants whether they noticed anything unusual. In these so called inattentional blindness studies (Mack & Rock, 1998), the notion is that an object that captures attention reaches our awareness and therefore one should be able to report such an object. For example, in the original Mack and Rock experiments participants viewed a briefly presented cross and had to decide which arm of the cross was longer. After several of these trials, another object was presented along with the cross.. After the critical trial, participants were asked whether they noticed anything unusual that was not present on the other trials. When an unexpected object was presented at fixation (e.g., a uniquely salient colored object, or a moving object etc.) while the cross was presented in the periphery 75% of the participants did not report this object. The fact that many people did not notice the unusual and very salient object has led to the conclusion that salient objects do not capture attention. Obviously, this conclusion may be wrong. This type of research is based on the implicit assumption that objects that capture attention receive attention and therefore will enter our awareness and will be remembered for later report. Obviously there is no reason to assume such a relationship. In fact, there is not much evidence for such a relationship. For example, in our studies (e.g., Theeuwes et al. 1998) we showed that participants had no awareness at all and could not report anything about what captured their attention and their eye movements. Even though the eyes were captured and moved in the wrong direction over a distance of more than 12 degrees of visual angle, none of the participants reported anything about these erroneous eye movements. The bottom line is that the inattentional blindness studies simply use the wrong measure to determine attentional capture. The failure to report an object does not say anything about what captured attention, or what captured the eyes.
144
Theeuwes and Godijn
Even though inattentional blindness may have nothing to do with attentional capture, and as such should not be used as a measure for attentional capture, it remains an important research issues why objects that receive attention do not reach awareness and cannot be reported (see e.g. Simons, 2000). One way to explain this is to assume that attention did not stay long enough at the attended object to reach awareness. As we have shown attentional capture takes place within 100 ms after display onset. One may argue that 100 ms at one location is not enough to reach awareness. In our recent oculomotor capture studies (Godijn & Theeuwes, submitted-b) we show that the eye needs to be more than 100 ms at the location of an object in order to pick up the color information at that location. If the eyes fixated for less than 100 ms no information was picked up. Final comment
In this chapter we have shown that goal-directed (preattentive) selection is relatively ineffective. Attentional and oculomotor capture occurs irrespective of any top-down control. Attentional capture effects are relatively small (15 to 25 ms) and short-lived (they occur within the first hunderd ms after display onset). We have shown that other findings that do seem to suggest top-down control over attentional capture, report such effects because the paradigms or the measures employed may not be sensitive enough to reveal these short-lived and small attentional capture effects. References
Abrams, R.A. & Dobkin, R.S. (1994). Inhibition of retum: Effect of attentional cuing on eye movement latencies. Journal of Experimental Psychology: Human Perception and Performance, 20, 467-477. Bacon, W.F. & Egeth, H.E. (1994). Overriding stimulus-driven attention capture. Perception & Psychophysics, 55, 485-496. Bahcall, D.O., & Kowler, E. (1999). Attentional interference at small spatial separations. Vision Research, 39, 71-86. Breitmeyer, B.C., & Ganz L. (1976). Implications of sustained and transient channels for theories of visual pattem masking, saccadic suppression, and information processing. Psychological Review, 83, 1-36. Caputo, G. & Guerra, S. (1998). Attentional selection by distractor suppression. Vision Research, 38, 669-689. Deubel, H. & Schneider, W.X. (1996). Saccade target selection and object recognition: Evidence for a common attentional mechanism. Vision Research, 6, 1827-1837.
Attentional and Oculomotor Capture
145
Duncan, J., & Humphreys, G.W. (1989). Visual search and stimulus similarity. Psychological Review, 96, 433-458. Egeth, H.E. & Yantis, S. (1997). Visual attention: control, representation and time course. Annual Review of Psychology, 48, 269-297. Eriksen, B.A. & Eriksen, C.W. (1974). Effects of noise letters upon the identification of a target letter nonsearch task. Perception & Psychophysics, 16, 143-149 Eriksen, C.W., & Hoffman, J.E. (1972). Temporal and spatial characteristics of selective encoding from visual displays. Perception & Psychophysics, 12, 201-204. Folk, C.L., Remington, R.W. & Johnston, J.C. (1992). Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 18, 1030-1044. Folk, C.L., Remington, R., & Wright, J.H. (1994). The structure of attentional control: Contingent attentional capture by apparent motion, abrupt onset and color. Journal of Experimental Psychology: Human Perception and Performance, 20, 317-329. Folk, C.L.. & Remington, R.W. (1998). Selectivity in distraction by irrelevant featural singletons: Evidence for two forms of attentional capture. Journal of Experimental Psychology: Human Perception and Performance, 24, 847858. Gibson, B.S., & Wenger, M. J. (1999). A closer look at contingent capture. Paper presented at the 40th annual meeting of the Psychonomic Society. Gibson, B.S. & Wenger, M. J. (2000). A new look at contingent capture. Unpublished Manuscript Godijn, R.J. & Theeuwes, J (submitted-a). Oculomotor Capture and Inhibition of Return. Godijn, R.J. & Theeuwes, J. (submitted-b). Parallel programming of saccades: Evidence for a competitive integration model. Hoffman, J.E. & Subramaniam, B. (1995).The role of visual attention in saccadic eye movements. Perception & Psychophysics, 5 7, 787-795. Itti, L. & Koch, C. (2000). Saliency based search mechanism for overt and covert shifts of visual attention. Vision Research, 40, 1489-1506. Irwin, D.E., Colcombe, A.M., Kramer, A.F., & Hahn, S. (2000). Attention and oculomotor capture by onset luminance and color singletons. Vision Research, 40, 1443-1458. Jonides, J & Yantis, S. (1988). Uniqueness of abrupt visual onset in capturing attention. Perception & Psychophysics, 43, 346-354 Joseph, J.S. & Optican, L.M. (1996). Involuntary attentional shifts due to orientation differences. Perception & Psychophysics, 12, 201-204. Joseph, J.S., Chun, M.M. & Nakayama, K. (1997). Attentional requirements in a "preattentive" feature search task. Nature, 387, 805-807.
146
Theeuwes and Godijn
Kahneman, D., Treisman, A. & Burkell, J. (1983). The costs of visual
Journal of Experimental Psychology: Human Performance, 9, 510-522. filtering.
Perception and
Kaptein, N.A., Theeuwes, J. & Van der Heijden, A.H.C. (1995), Search for a conjunctively defined target can be selectively limited to a color-defined subset of elements. Journal of Experimental Psychology: Human Perception & Performance, 21, 1053-1069. Kawahara, J. & Toshima, T. (1996). Stimulus-driven control of attention: Evidence from visual search for moving target among static nontargets. The Japanese Journal of Psychonomic Science, 15, 77-87 Koch, C. & Ullman, S. (1985). Shifts in selective visual attention: towars the underlying neural circuity. Human Neurobiology, 4, 219-227. Kim, M.S. & Cave, K.R. (1999). Top-down and bottom-up attentional control: On the nature of interference from a salient distractor. Perception & Psychophysics, 61, 1009-1023. Kim, M.S. & Cave, K.R. (2001). Perceptual grouping via spatial selection in a focused-attention task. Vision Research, 41, 611-624. Klein, R. (1980). Does oculomotor readiness mediate cognitive control of attention? In R.S. Nickerson (Ed.). Attention and Performance VIII (pp.259-276). Hillsdale, NJ: Erlbaum. Klein, R. M. (2000) Inhibition of return. Trends in Cognitive Science, 4, 138-147 Kramer, A. F., Hahn, S., Irwin, D. E., & Theeuwes, J. (1999). Attentional capture and aging: Implications for visual search performance and oculomotor control," Psychology and Aging, 14, 135-154. Kwak, H.W., Dagenbach, D., & Egeth, H.E. (1991). Further evidence for a time-independent shift of the focus of attention. Perception & Psychophysics, 49, 473-480. Lavie, N (1995). Perceptual load as a necessary condition for selective attention. Journal of Experimental Psychology: Human Perception and Performance, 21, 451-468. Luck, S.J., Girelli, M. McDermott, M.T. & Ford, M.A. (1997). Bridging the gap between monkey neurophysiology and human perception: An ambiguity resolution theory of visual selective attention. Cogntive Psychology, 33, 64-87. Kumada, T. (1999). Limitations in attending to a feature value for overriding stimulus-driven interference. Perception & Psychophysics, 61, 61-79 Nothdurft, H.C. (2000). Salience from feature contrast: variations with texture density. Vision Research, 40, 3181-3200. Mack, A. & Rock, I. (1998). Inattentional blindness. MIT press.
Attentional and Oculomotor Capture
147
Mounts, J.R.W. (2000). Attentional capture by abrupt onsets and feature singletons proceduces inhibitory surrounds. Perception & Psychophysics, 62, 14851493. Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3-25. Posner, M.I. & Cohen, Y. (1984). Components of visual orienting. In H. Bouma & D. Bouwhuis (Eds.), Attention & Performance X (pp.531-556). Hillsdale, NJ: Erlbaum. Pratt, J. Sekuler, A.B. & McAuliffe, J. (2001). The role of attentionalset on attentional cueing and inhibition of return. Visual Cognition ,8, 33-46. Pratt, J. Hillis, J. & Gold, J.M. (in press) The effect of the physical characteristics of cues and targets on facilitation and inhibition. Psychonomic
Bulletin & Review. Remington, R.W. (1980). Attention and saccadic eye movements. Journal of Experimental Psychology: Human Perception & Performance, 6, 726-744. Remington, R.W. Johnston, J.C. & Yantis, S. (1992). Involuntary attentional capture by abrupt onsets. Perception & Psychophysics, 51,279-290. Rizzolatti, G. Riggio, L., Dascola, I. & Umilta, C. (1987). Reorienting attention across the horizontal and vertical meridians: evidence in favor of a premotor theory of attention. Neuropsychologica, 25, 31-40. Sagi, D., & Julesz, B. (1985). Detection versus discrimination of visual orientation. Perception, 14, 619-628. Simons, D.J. (2000) Attentional capture and inattentional blindness. Trends
in Cognitive Science, 4, 147-155. Taylor, T. L., & Klein, R. M. (1998). On the causes and effects of inhibition of return. Psychonomic Bulletin & Review, 5, 625-643. Theeuwes, J. (1990). Perceptual selectivity is task dependent: Evidence from selective search. Acta Psychologica, 74, 81-99. Theeuwes, J. (199 l a). Cross-dimensional perceptual selectivity. Perception & Psychophysics, 50, 184-193. Theeuwes, J. (1991 b). Exogenous and endogenous control of attention: The effect of visual onsets and offsets. Perception & Psychophysics, 49,83-90. Theeuwes, J. (1991 c). Selective search for the target properties color and form. Report IZF 1991 B-13, TNO Institute for Perception, Soesterberg. Theeuwes, J. (1992). Perceptual selectivity for color and form. Perception
& Psychophysics, 51,599-606. Theeuwes, J. (1993). Visual selective attention: a theoretical analysis. Acta
Psychologica, 83, 93-154. Theeuwes, J. (1994a). Stimulus-driven capture and attentional set: selective search for color and visual abrupt onsets. Journal of Experimental Psychology: Human Perception and Performance, 20, 799-806.
148
Theeuwes and Godijn
Theeuwes, J. (1994b). Endogenous and exogenous control of visual selection. Perception, 23, 429-440. Theeuwes, J. (1995a). Abrupt luminance change pops-out; Abrupt color change does not. Perception & Psychophysics, 57, 637-644. Theeuwes, J. (1995b). Temporal and spatial characteristics of preattentive and attentive processing. Visual Cognition, 2, 221-233. Theeuwes, J. (1996). Perceptual selectivity for color and form: On the nature of the interference effect. In A.F. Kramer, M.G.H. Coles & G.D. Logan (Eds.). Converging Operations in the Study of Visual Attention. Washington DC: American Psychological Association. p. 297-314. Theeuwes, J., & Burger, R. (1998). Attentional control during visual search: The effect of irrelevant singletons. Journal of Experimental Psychology: Human Perception and Performance, 24, 1342-1353. Theeuwes, J, Kramer, A.F, Hahn, S. & Irwin, D. E. (1998). Our eyes do not always go where we want them to go: capture of the eyes by new objects. Psychological Science, 9, 379-385 Theeuwes, J. Kramer, A.F., Hahn, S., Irwin, D.E. & Zelinsky, G.J. (1999a). Influence of attentional capture on oculomotor control. Journal of Experimental Psychology: Human Perception & Performance, 25, 1595-1608. Theeuwes, J., Atchley, P. & Kramer, A.F.(2000). On the time course of top-down and bottom-up control of visual attention. In S. Monsell & J. Driver (Eds.). Attention & Performance (Vol 18). Cambridge: MIT Press. Theeuwes, J., Kramer, A.F. & Atchley, P. (2001). Spatial attention in early vision. Acta Psychologica, 108, 1-20. Theeuwes, J. & Godijn, R.J. (submitted). Irrelevant singletons capture attention: evidence from inhibition of return. Todd, S., & Kramer, A.F. (1994). Attentional misguidance in visual search. Perception & Psychophysics, 56, 198-210. Turatto, M. & Galfano, G. (2000). Color, form and luminance capture attention in visual search. Vision Research, 40, 1639-1643. Tsal, Y. (1983). Movements of attention across the visual field. Journal of Experimental Psychology: Human Perception & Performance, 9, 523-530. Yantis, S (1996). Attentional capture in vision. In A.F. Kramer, M.G.H. Coles & G.D. Logan (Eds.). Converging Operations in the Study of Visual Attention (pp. 45-76).. Washington DC: American Psychological Association. Yantis (2000). Goal directed and stimulus driven determinants of attentional control. In S. Monsell & J. Driver (Eds.). Attention and Performance (Vol 18). Cambridge: MIT Press. Yantis, S. & Egeth,H.E. (1999). On the distinction between visual salience and stimulus-driven attentional capture. Journal of Experimental Psychology: Human Perception & Performance, 25, 661-676.
Attentionaland OculomotorCapture
149
Yantis, S. & Jonides, J. (1984). Abrupt visual onsets and selective attention: Evidence from selective search. Journal of Experimental Psychology: Human Perception and Performance, 1O, 601-621. Yantis, S. & Jonides, J. (1990). Abrupt visual onsets and selective attention: Voluntary versus automatic allocation. Journal of Experimental Psychology: Human Perception and Performance, 16, 121-134. Authors Notes
Correspondence concerning this article should be addressed to Jan Theeuwes, Dept. of Cognitive Psychology, Vrije Universiteit, Van der Boechorststraat 1, 1081 BT Amsterdam, The Netherlands. Electronic mail may be sent to
[email protected].
This Page Intentionally Left Blank
Attraction, Distraction, and Action: MultiplePerspectiveson Attentional Capture C. Folk and B. Gibson(Editors) 02001 ElsevierScience B. All fights reserved.
151
Attention Capture, Orienting, and Awareness Steven B. Most and Daniel J. Simons
Any viable model of attention must navigate between two requirements: the need for sustainability and the need for interruptibility (Allport, 1989). Successful completion of our goals often requires extended periods of focused attention. For example, building a house out of playing cards requires sustained attention to the cards, the table, and the position of your hands. Any disruption of focused attention would likely result in failure. However, some distractions are important to notice. If a lion were to appear just as you were putting the last couple of cards in place, it would be better to notice it than to obliviously finish building the house; attentional engagement should be interrupted or re-allocated in the face of unexpected dangers. This book focuses on how and when attention is diverted away from a primary goal by an irrelevant or unexpected event, a phenomenon known as attention capture. This definition of attention capture accommodates most of the current approaches to studying the diversion of attention, in part because it is sufficiently broad. However, conflicting operational definitions of capture, based on different assumptions about the role of attention, have muddied the theoretical landscape of capture research. For example, the term "attention capture" sometimes refers to changes in response time caused by irrelevant stimuli, regardless of whether or not the stimuli also capture awareness. This operational definition implies that attention itself is a mechanism by which selection occurs prior to or independent of awareness. What, then, if not attention, drives processing of items to the level of consciousness? On the other hand, the notion of inattentional blindnessDthe failure to become cognizant of an unexpected stimulus when already engaged in a primary task (e.g., Mack & Rock, 1998; Most et al., 2001)--assumes that if an unexpected stimulus does not reach awareness, it must not have been attended. If, as implied by this assumption, attention functions as the gateway to awareness, what is captured when response times are affected in the absence of awareness? We use the term implicit attention capture to refer to instances when irrelevant stimuli affect response times in a primary task but do not necessarily lead to awareness. We use the term explicit attention capture when unexpected stimuli leap into conscious awareness despite an individual's efforts to attend to something else (Simons, 2000). The explicit capture of awareness is perhaps the more intuitive notion of attention capture. When someone remarks that an attractive stranger at a dinner party "captured their attention," we usually take this to mean that they noticed the stranger--not that they were a moment slower in reaching for a cocktail
152
Most and Simons
wiener. There are moments when it is absolutely essential for objects and events to enter our consciousness. When a small child runs into the path of our car, it is essential that we register him explicitly. The distinction between implicit and explicit capture provides a reasonable way to cluster existing capture research (Simons, 2000), but few studies have directly explored the relationship between implicit and explicit capture (see Gibson, this volume, for initial studies along these lines). For example, studies using implicit measures of capture often ignore entirely whether or not observers were aware of the capturing stimulus, and studies of explicit capture typically ignore the involuntary spatial orienting often measured in implicit capture studies. Consequently, it would be premature to make strong claims about the functional independence of these forms of capture. Nonetheless, in this chapter we hope to show that a complete understanding of attention capture depends on considering both implicit and explicit measures, and we provide a model for how these forms might interact. Readers should keep in mind, however, that the implicit and explicit capture literatures are largely separate, and that the relationship between these literatures has not yet been firmly established. Our model stresses the need for research that measures both implicit effects on performance and explicit effects on awareness by providing one speculative view of how these distinct forms of capture might interact. The attention literature is replete with proposals to account for different types of attention shifts. For example, attention shifts can be characterized based on the degree to which they are voluntarily directed (e.g., Jonides, 1981) and based on their relative time-courses (e.g., Nakayama & Mackeben, 1989). To complicate matters, these dimensions are neither cleanly overlapping nor completely orthogonal. The central goal of this chapter is to place findings using implicit and explicit measures of capture into the broader context of research on attention. We first review evidence for different forms of attention orienting, noting the distinction between voluntary orienting and reflexive orienting. We then review evidence for implicit attention capture, noting the difficulty in determining when orienting is entirely reflexive and when it is influenced by the observer's expectations. Given this difficulty, we suggest that implicit and explicit capture can be better understood by appealing to the distinction between transient and sustained components of attention. After discussing evidence for implicit capture, we tum our focus to the explicit capture of awareness, providing an overview of recent work from our lab on selective looking and inattentional blindness. Finally, we reintroduce and update Neisser's (1976) model of a perceptual cycle as a way to integrate implicit and explicit capture into a single framework. By considering different forms of orienting as well as the importance of both implicit and explicit capture, we can gain a more complete understanding of the role of capture in perception and awareness.
Orienting and attention Most early work on attention capture focused on visual orienting: shifts of attention in response to visual cues or other stimuli. Unlike recent work on implicit
Capture, Orienting and Awareness
| 53
attention capture, early empirical work on orienting emphasized the relationship between attention and awareness. Studies explored how orienting affects conscious awareness of a target, finding that subjects are faster, more accurate, and more likely to detect targets when a prior cue forewarns them where they should shift their attention (e.g., Colegate, Hoffman, & Eriksen, 1973; Eriksen & Hoffman, 1972). This section briefly reviews evidence for different forms of orienting and the nature of the cues that drive them. Orienting can produce observable behavioral responses such as eye or head movements, but it can also be measured in the absence of observable behavior (Posner, 1980; Posner & Petersen, 1990; Posner, Snyder, & Davidson, 1980). Such covert orienting is typically inferred from differences in response times between trials in which a cue accurately predicts the target location (a valid cue) and trials in which it signals the wrong location (an invalid cue). If observers are better able to respond to targets on valid than on invalid trials, then they must have oriented to the presence of the cue. Perhaps the most important distinction to arise from studies of orienting is that between reflexive (exogenous) and voluntary (endogenous) shifts of attention. To explore the difference between these two forms of attention shifts, orienting studies have used two distinct types of cueing. Peripheral cues appear away from fixation, usually at the location of the target (e.g., Eriksen & Hoffman, 1972; Eriksen & Hoffman, 1973) whereas central cues appear at fixation and indicate symbolically where the target is likely to appear (e.g., the cue might be an arrow; see Posner, 1980). Because central cues are separated from the location of the target of an attention shift, they must be interpreted before the observer knows where to direct attention. In contrast, peripheral cues typically appear at the target location, so they require no such cognitive effort or interpretation. Consistent with this claim, shifts following peripheral cues are unaffected by concurrent cognitive load and are difficult to suppress (Jonides, 1981). Furthermore, they lead to faster responses to valid cues and slower responses to invalid cues than do central cues. Thus, compared to shifts in response to central cues, attention shifts following peripheral cues appear to be more reflexive and automatic (Jonides, 1981). Building on these initial demonstrations that peripheral cues produce reflexive orienting, more recent research has sought to determine whether any peripheral cue can automatically capture attention or whether capture depends on the characteristics of the cue. Note that this question is not a new one: William James (1950/1890) suggested that one class of cue, object motion, is especially likely to draw attention: "movement is the quality by which animals most easily attract each other's attention. The instinct of 'shamming death' is no shamming of death at all, but rather a paralysis through fear, which saves the insect, crustacean, or other creature from being noticed at all by his enemy. It is paralleled in the human race by the breath-holding stillness of the boy playing 'I spy' to whom the seeker is near; and its obverse side is shown in our involuntary waving of arms, jumping up and down, and so forth, when we wish to attract someone's attention at a distance." (pp. 173174)
154
Most and Simons
Although it has long been recognized that not all cues are equally effective in drawing attention, findings of reflexive orienting in response to peripheral cues led to a marked shift in the focus of attention research. Rather than exploring how attention influences conscious detection of a target, much recent work has focused on the sorts of stimuli that induce automatic orienting. This emphasis on the characteristics of the cue rather than on conscious detection of the target has led to the operationalization of attention capture as an effect on response time, or implicit capture.
Implicit evidence for capture Most studies infer the presence of capture from response times in visual search tasks, without systematically measuring whether or not the capturing stimulus was explicitly noticed. The conclusions drawn from these studies are far from consistent. Some suggest that only a select class of stimuli produce automatic, reflexive orienting whereas others suggest that many stimuli will capture if observers have the appropriate expectations. More importantly, some of these tasks lead to the conclusion that capture only occurs if observers have the appropriate attention set. In other words, differences in task characteristics have led to a debate about whether attention shifts can ever be determined solely by the properties of the stimulus or whether all capture depends on the observer's expectations. If, as some suggest, there is no such thing as a completely stimulus-driven shift of attention, does it make sense to talk about attention "capture" at all? This section reviews evidence from studies using implicit measures of capture, focusing especially on this debate. The stimulus perhaps most frequently mentioned as one that might capture attention is the abrupt onset of a new stimulus. Indeed, when subjects engage in a speeded search for a target, abruptly appearing stimuli consistently seem to affect response time (e.g., Jonides & Yantis, 1988; Yantis & Jonides, 1984). In fact, studies using the irrelevant feature task often find that only abrupt onsets capture attention. In a typical experiment using this task, subjects search for a letter target in an array. Initially, the array contains a set of place-markers in the form of figure-8s. Then, after 1 second, segments on the place markers disappear to reveal letters. At the same time that these segments disappear, an additional letter appears in a location not previously occupied by a place marker. This new letter constitutes an abrupt onset because no object had previously occupied that location. However, the location of this abrupt onset is not predictive of the target location - - it is the target of the search only 1/n of the time, where n is the number of items in the display. Under these conditions, capture is inferred from the relationship between the search speed and the number of items in the display. If the onset captures attention, then observers should search that item first. Thus, on those trials when the onset just happens to be the target of the search, search speed should be unaffected by the number of items in the display.
Capture, Orientingand Awareness
155
In the irrelevant feature task, abrupt onsets appear to be special: color, luminance, and motion singletons are not as effective at capturing attention (Hillstrom & Yantis, 1994; Jonides & Yantis, 1988; but see Franconeri & Simons, 2001 for evidence that some types of motion do capture in this task). One interpretation of this finding is that abrupt onsets signify the appearance of a new object, an event which might have special significance for the visual system (Yantis & Hillstrom, 1994). However, whatever special role onsets may play, they only seem to capture attention when observers are uncertain about where the target will appear. Sudden onsets do not capture attention when subjects know in advance where to attend in order to see the target (Yantis & Jonides, 1990). Although most findings from the irrelevant feature task suggest that only onsets capture attention, results from the additional singleton task suggest instead that the most salient features in a scene determine capture. In these studies, observers typically search for a target defined by a unique color or shape (e.g., a green circle among red circles or a green circle among green diamonds) and report the orientation of a line positioned inside the target (Theeuwes, 1992; Theeuwes, 1994). On some trials, an additional distracter singleton appears (e.g., a blue item) and on other trials no such distracter appears. The additional singleton is antipredictive of the target location (i.e., it is never the target of the search) and capture is indicated by a slowed search in the presence of this additional singleton. If observers are unable to avoid attending to this additional singleton, then it will slow their search, thereby indicating attention capture. Unlike the irrelevant feature task, in the additional singleton paradigm, not only can sudden onsets capture attention, but so can color and shape singletons (Theeuwes, 1992; Theeuwes, 1994). Capture is determined by the salience of the additional singleton when compared to rest of the items in the display. For example, when subjects search for a red item among green ones, an irrelevant shape singleton does not affect search speed. However, when the color discrimination is more difficult, the irrelevant shape singleton does affect response time (Theeuwes, 1992). One interpretation of findings from the additional singleton task is that the visual system calculates how different each item is from the rest of the display, preattentively and in parallel, and attention is allocated serially and in descending order, starting with items having the largest difference signals (Theeuwes, 1992; Theeuwes, 1994). Alternatively, interference may depend on the search strategy observers adopt. If observers know that the target of their search will be a unique item, they will actively search for singletons in the display. Consequently, if the additional item is also a singleton, they would search it as well. That is, observers establish an attention set for singletons, and any singleton that appears in the display will disrupt performance (Bacon & Egeth, 1994). According to this view, capture occurs not because it is the most salient item in the display, but because observers are actively searching for singletons. Consistent with this interpretation, when the target appears amid a heterogeneous assortment of distracters, thereby eliminating the usefulness of the strategy of searching for a singleton, additional form and color singletons no longer capture attention (Bacon & Egeth, 1994).
156
Most and Simons
The notion that the observer's attention set can influence whether or not stimuli capture attention gains additional support from yet another search task. Evidence from the irrelevant pre-cue task indicates that top-down control can override reflexive orienting, thereby suggesting that all attention capture is mediated by the observer's attention set (Folk & Remington, 1998; Folk & Remington, 1999; Folk, Remington, & Johnston, 1992; Folk, Remington, & Wright, 1994). In this task, observers make a speeded response to a pre-specified target that appears in the location of one of four peripheral placeholders. Just prior to the target display, a pre-cue appears at one of the placeholders. Capture is indicated when an invalid pre-cue slows performance on the search t a s k - it draws attention to the wrong location. In this task, when the pre-cue is a unique color, it disrupts performance only if observers are searching for a uniquely colored item; if observers are searching for an abruptly onsetting item, the color pre-cue does not capture attention. Similarly, when the pre-cue is a sudden onset, it captures attention when subjects are searching for an onset, but not when they are searching for a unique color. Thus, orienting is contingent upon the demands of the task; observers establish an attention set for the target feature, and any feature that subsequently matches their attention set will capture attention. This idea is known as the contingent involuntary orienting hypothesis (Folk et al., 1992). More recent work (Folk & Remington, 1998) further suggests that these control settings can restrict capture to specific values on a feature dimension (e.g, red items will only capture attention when observers have a set for red). The claim that attention capture is contingent on the observer's attention set has been controversial. The primary support for this claim comes from the pre-cue task, but this task is open to the same criticism levied at the additional singleton task: because of the task demands, performance may be heavily influenced by the observer's strategy. In order to verify that capture is driven entirely by the stimulus itself, the task must be devoid of such strategic influences. Because subjects in the pre-cue task know that the target will be characterized by a critical feature, they might intentionally marshal attention resources the moment a stimulus sharing that feature appears. Thus, the influence of the pre-cue on task performance might result from a voluntary attention shift rather than from capture by the stimulus itself (Yantis, 1993). Nevertheless, the finding that irrelevant onsets do not always capture attention, even in cases where observers are uncertain about the target's location, has constrained claims about involuntary orienting. Onsets may be unique in their power to capture attention, but their impact might be limited to cases in which observers have not already established a search strategy or an attention set for another feature (Yantis, 1993). The idea of contingent orienting raises a more general problem for claims that a stimulus can influence processing independent of the attention set of the observer. The irrelevant feature task assumes that observers have no task-relevant attention set--that observers are effectively in a "neutral" attention state (Yantis, 1993). However, observers may never be completely free of constraints on attention (Folk, Remington, & Johnston, 1993). Past experiences or individual differences
Capture, Orienting and Awareness
15 7
might lead to long-lasting attentional biases, regardless of the demands of the current task. For example, clinically anxious patients respond faster when a target appears in the location previously occupied by a threat cue than in a location previously occupied by a neutral cue (MacLeod, Mathews, & Tata, 1986). These patients may have developed a default attention set for threat-related stimuli, which could influence performance even when the threatening nature of the cue is irrelevant. Perhaps a lifetime of experiences can contribute to default attention biases such that no person is ever in a truly "neutral" state. The notion that attention settings might vary across individuals is entirely consistent with the contingent involuntary orienting hypothesis (Folk et al., 1992). The difficulty of determining an observer's default attention set raises a broader concern about the dichotomy between stimulus-driven, exogenous orienting of attention and goal-directed, endogenous orienting. Variability in the default attention set, together with findings that strategic control can modulate involuntary attention shifts (Bacon & Egeth, 1994; Folk et al., 1992; Mtiller & Rabbitt, 1989; Yantis & Jonides, 1990) and with evidence that people can learn to orient to parts of a transient peripheral cue (Kristjfinsson, Mackeben, & Nakayama, in press; Kristj~,nsson & Nakayama, submitted), contribute to a blurring of the exogenous/endogenous dichotomy.
An alternative dichotomy" The time-courses of orienting The distinction between exogenous and endogenous orienting divides attention shifts into those driven by the stimulus and those driven by the intentions of the observer. Consequently, they focus on the locus of control (external vs. internal) for attention shifts rather than on the nature of the attention shifts themselves. Clearly some attention shifts are relatively reflexive and others are more voluntary (Jonides, 1981; Mtiller & Findlay, 1988; Nakayama & Mackeben, 1989; Posner, 1980). Although this distinction does provide a useful categorization of attention shifts, a different distinction might help to bypass the somewhat fuzzy category boundary between voluntary and involuntary shifts. Specifically, two distinct classes of attention shifts emerge when we focus on the time course of shifts rather than on the factors driving the shift. These distinct timing profiles are related to the locus of control of the shift, but the mapping might not be perfect. One class of shift tends to be transient in nature: facilitation of processing at the cued location occurs almost immediately, but the effectiveness of the cue diminishes rapidly. This decline in effectiveness is particularly interesting because it appears obligatory, occurring even when subjects know that the cue validly predicts the target location (Nakayama & Mackeben, 1989). That is, following this sort of attention shift, processing at the cued location is temporary and is inhibited briefly following peak facilitation, even if observers try to avoid this inhibition. Such transient shifts are generally thought to be reflexive rather than voluntary. This sort of transient orienting to a cue is maximally effective when the cue precedes a target by 50-250 ms (Mtiller & Findlay, 1988; Nakayama & Mackeben, 1989); if
15 8
Most and Simons
the cue precedes the target by less than 50ms, attention does not have enough time to shift to the cued location before the target appears, so facilitation is reduced. If the cue precedes the target by more than 250ms, attention can shift to the cue, but the facilitation at the cued location diminishes and inhibitory processes take effect. In contrast to transient shifts, another class of attention shifts tends to produce maximal facilitation only after as much as 300ms from the appearance of the cue (Mfiller & Findlay, 1988). Following this somewhat delayed shift of attention, processing benefits at the cued location can persist for an extended period. That is, such sustained shifts lead to a benefit that survives even a substantial delay between the cue and target (Mtiller & Findlay, 1988; Mialler & Rabbitt, 1989; Nakayama & Mackeben, 1989). Thus, whereas transient shifts tend to be relatively reflexive, sustained shifts tend to be associated with voluntary control. For example, peripheral cues such as a flash often elicit transient attention shifts, whereas central cues such as an arrow can elicit sustained, voluntary attention shifts. Unfortunately, most studies of attention capture have not measured the time course of the attention shift to the capturing stimulus. It is possible that the kinds of processing costs and benefits apparent in abrupt onset studies reflect transient shifts of attention, in which case processing of a valid onset item should be slowed given an appropriate lead time. More extensive studies of the presence of transient or sustained shifts in capture tasks might help to account for some of the variability in the literature without the need to determine whether or not a shift was influenced by an attention set or a voluntary goal. In some respects, the distinction between transient and sustained orienting might provide a truer dichotomy than the exogenous/endogenous distinction. Transient and sustained shifts might well reflect the operation of independent attention mechanisms for orienting, conceivably with different functions (see Briand & Klein, 1987 for further discussion of independent attention mechanisms). In fact, evidence suggests that they can operate simultaneously and independently. When cues eliciting sustained and transient orienting (e.g., an arrow and a flash) are presented in the same display, their effects are additive if the shifts are compatible and subtractive if they are incompatible. That is, if the flash and an arrow both cue the same location, facilitation is greater than when they cue conflicting locations. If they cue different locations, then peripheral cueing interferes with sustained orienting more strongly than central cueing interferes with transient orienting. However, facilitation is attenuated in both cases (Mtiller & Rabbitt, 1989). The potential independence of these two forms of orienting might allow observers to maintain sustained attention on one aspect of a display without precluding transient attention capture by sudden, important events. We consider a possible functional role for the interplay between transient and sustained attention more fully later in the chapter.
Capture, Orientingand Awareness
159
Interim summary Early work in modem attention research focused on the relation between attention and conscious detection of a target (Colegate et al., 1973; Eriksen & Hoffman, 1972; Posner et al., 1980). However, the discovery that peripheral cues tend to draw attention more automatically than central cues (Jonides, 1981) led to a surge of interest in the kinds of stimuli capable of automatically attracting attentional resources. Research has since investigated the effects of stimulus salience (Theeuwes, 1992; Theeuwes, 1994), abrupt onsets (Hillstrom & Yantis, 1994; Jonides & Yantis, 1988; Yantis & Hillstrom, 1994; Yantis & Jonides, 1984; Yantis & Jonides, 1990), and top-down control settings (Bacon & Egeth, 1994; Folk & Remington, 1998; Folk & Remington, 1999; Folk et al., 1992) on attention capture. Much of this research has focused on the distinction between exogenouslyand endogenously-driven shifts of attention. However, this distinction might be a red herring, displacing the components of attention themselves as the main focus of investigation. An emphasis on the distinction between transient and sustained attention shifts may prove more useful, especially when both implicit and explicit aspects of attention capture are considered together. In the next section we provide an overview of work on explicit attention capture, and in the final section, we attempt to integrate these findings with the implicit capture literature.
Selective Looking, Inattentional Blindness, and Explicit Attention Capture Orienting and awareness Most of the studies reviewed thus far inferred the existence of attention capture from indirect response times measures, without systematically measuring whether or not stimuli also capture conscious awareness. If implicit attention capture were always associated with instances of conscious detection, the need to measure explicit awareness would be obviated. However, this is not the case. For example, despite the robust effects of abrupt onsets on response time measures, subjects often report not noticing the onsets (Yantis, 1993, footnote 2). The dynamic signal produced by the onset has no temporal persistence, and observers do not always notice its occurrence even if they are drawn to it. The notion that orienting may occur independently of awareness has been with us for some time. To quote Posner, Snyder, and Davidson (1980): "...it is possible to entertain the hypothesis that subjects may orient toward a signal without having first detected it. This would mean simply that the signal was capable of eliciting certain kinds of response (e.g., eye movements or shifts of attention) but has not yet reached systems capable of generating responses not habitual for that type of signal" (p. 162). This hypothesis gains further support from work with blindsight patients (Kentridge, Heywood, & Weiskrantz, 1999) as well as nonpatient populations (Lambert, Naikar, McLachlan, & Aitken, 1999; McCormick, 1997). For example, normal subjects were given a task in which a target could
160
Most and Simons
appear in one of two locations, preceded by a cue that produced transient orienting. In contrast to typical cueing tasks, the target was more likely to appear in the uncued location and observers knew this. When the cue was visible to observers, they were faster to respond in the target location (opposite the cue). That is, they were able to make a voluntary attention shift to the appropriate location. However, when the cue was presented below the threshold for conscious awareness, observers were faster to respond to a target appearing in the cued location, indicating that they had oriented to the cue without consciously perceiving it (McCormick, 1997). Such findings of orienting without awareness are consistent with the operational definition of capture used in the irrelevant feature, additional singleton, and irrelevant pre-cue tasks: provided that a stimulus automatically influences behavior, it can be said to have captured attention. Awareness of the stimulus is irrelevant in such tasks. However, not all researchers accept this operational definition of capture. An alternative approach argues that a stimulus has not captured attention unless it has entered into conscious awareness (e.g., Mack & Rock, 1998). According to this view, "attention is nothing but perception" (Neisser, 1976, p. 87). Effects on performance might involve the diversion of attentional resources, but they do not necessarily involve attention capture. Attention capture must involve the explicit capture of awareness by a previously unexpected stimulus. Paralleling studies using implicit measures, studies of explicit capture have rarely assessed indirect evidence such as response times, instead focusing exclusively on awareness. Thus, they provide little further insight into the relationship between orienting and awareness. As a result, studies of explicit capture are consistent with the possibility that awareness can occur without prior orienting or with the possibility that orienting always precedes awareness. In the sections that follow, we propose a model in which we assume that orienting must precede awareness. This supposition is speculative and has yet to be empirically tested. However, the model based on this assumption produces a number of interesting and testable empirical predictions.
The perceptual cycle One framework that helps to explain the difference between these two definitions of capture draws on what Neisser (1976) termed the perceptual cycle. According to this model, conscious perception is a gradual, constructive process, rather than an all or none phenomenon. Observers have schemas or expectations for what belongs in the scene (i.e., which objects should be present, what they should look like, etc.), which are modified by information in the environment. These schemas guide attention, thereby allowing the observer to pick up more information from the scene. As observers gain more details about the objects in the world, they accommodate their schemas to these details and adjust subsequent visual exploration appropriately (See Figure 1). The immediate past constantly guides subsequent information processing.
Capture, Orienting and Awareness
161
The perceptual cycle model has two central tenets: 1) conscious awareness of a stimulus accumulates gradually, and 2) the observer plays an active role in this process. Therefore, awareness of a stimulus requires a degree of sustained processing, and unless it is incorporated into a cycle of expectation and exploration, it might not be "seen" at all (Neisser, 1979). In order for unexpected stimuli to be seen they must either modify the existing perceptual cycle or trigger the formation of a new cycle. In order to modify an existing cycle, the stimulus must be
Figure 1. Schematicdiagram of a perceptual cycle. Adapted from Neisser (1976).
sufficiently relevant to the current cycle to be noticed in the exploration phase. Some classes of stimuli, such as brightly flashed lights, might automatically trigger the formation of a new perceptual cycle or they could simply induce a transient orienting response without conscious perception. Observers will only become aware of an unexpected stimulus if it is available long enough for a complete cycle of accommodation and exploration to take place. 1 Furthermore, if a person is already engaged in a perceptual cycle--for example, engaged in an attentionally demanding task--then even new stimuli that elicit transient attention shifts and are present for prolonged periods of time may fail to capture awareness, because to do so would mean interrupting the current cycle.
Inattentional blindness and early studies of selective looking The hypothesis that new stimuli will remain "unseen" if they fail to interrupt or modify an ongoing perceptual cycle gains support from increasing evidence for inattentional blindness, the finding that unexpected salient events often go unnoticed when attention is otherwise engaged (Mack & Rock, 1998). In a typical inattentional blindness task, observers view a series of trials in which a cross appears for 200 ms followed immediately by a mask. On each trial, they judge whether the horizontal or the vertical line of the cross was longest. After a few trials in which nothing else happens, a critical trial occurs in which an additional stimulus appears simultaneously with the cross. After that trial, observers are asked if they
162
Most and Simons
saw anything other than the cross. Under these conditions, nearly 25% of observers failed to detect the additional stimulus, even when it had a unique color, shape, or motion (Mack & Rock, 1998). 2 In these inattentional blindness studies, the unexpected object was only present for 200ms. Consequently, observers might not have had time to complete a perceptual cycle. However, even unexpected objects that are visible for extended periods of time can escape detection if the observer's attention is otherwise engaged. For example, in one series of studies on selective looking, observers watched a movie in which a group of people in white shirts and a group of people in black shirts each passed a basketball among themselves (Becklen & Cervone, 1983; Neisser & Dube, 1978, cited in Neisser, 1979). The two groups were filmed separately and the films were then overlaid so that the figures had a partially transparent appearance and often shared the same space on the screen. The primary task was to count the total number of passes made by one of the two groups. Partway through the movie, a woman carrying an open umbrella (overlaid in the same manner) walked through the middle of the basketball players, from one side of the display to the other. Despite the fact that she shared the same physical space as the basketball players, was present for an extended period of time, and was clearly visible to anyone not engaged in the counting task, in one study only 21% of the subjects noticed her (Neisser & Dube, 1978, cited in Neisser, 1979). These results have recently been replicated and extended to a condition in which none of the figures are partially transparent (Simons & Chabris, 1999). Even when the umbrella woman is fully visible, many observers fail to notice her. In fact, 50% of observers failed to notice a person in a gorilla outfit who stopped in the middle of the display and thumped her chest at the viewer before leaving the screen. Based on these sorts of selective looking studies, Neisser (1979) concluded, "we do not know what preattentively noted fragments of information lead to noticing... We do not know what a perceiver must bring to a situation if he or she is to notice what another equally skilled perceiver would overlook" (p. 218). Our recent work has attempted to gain insight into this issue by combining the dynamic and sustained aspects of selective looking experiments (Becklen & Cervone, 1983; Neisser & Dube, 1978, cited in Neisser, 1979; Simons & Chabris, 1999) with the more precise, controllable computerized inattentional blindness task (Mack & Rock, 1998). We find that expectations alone cannot account for the detection of or blindness to an unexpected object. Controlled studies of sustained inattentional blindness
Many factors may influence whether or not someone will notice an unexpected object when they are absorbed in a demanding task. For example, distinctive or salient features might pop into awareness in the same way that they do when subjects are actually searching for them in a field of distracters (e.g., Treisman & Gelade, 1980). Spatial proximity to the focus of sustained attention could also influence conscious detection; unexpected objects appearing close to the focus of
Capture, Orientingand Awareness
163
attention might be noticed more readily. Furthermore, the observers attention set might influence noticing, as would be predicted if the contingent involuntary orienting hypothesis were applied to explicit capture (Folk et al., 1992). In this section, we discuss studies that explore each of these hypotheses in turn. These studies engage subjects in an ongoing, attentionally demanding task and then test whether or not observers notice an unexpected stimulus. In one version of our task (Most et al., 2001), four white objects and four black objects move on random paths in a rectangular computer window for 15 seconds. As they move, each object occasionally "bounces" off one of the display's edges, and the observer's task is to count the total number of bounces made by either the white or black objects (as indicated by the experimenter). During the first two trials, nothing unexpected occurs. However, on the third, critical trial, an additional, unexpected object enters the right side of the display, travels in a linear path behind a fixation point, and exits the left side of the display, remaining visible for a total of 5 seconds (see Figure 2). After this critical trial, observers are asked whether or not they saw anything other than the original eight items in the display. Even salient unexpected objects go unnoticed. For example, in one experiment, almost 30% of observers failed to notice a bright red cross that was fully visible for 5 seconds (Most et al., 2001, Experiment 3). This example by itself illustrates the power of expectations: observers did not expect an additional object, and many failed to notice it even though it was the only red item in the display. However, an alternative explanation for this failure to notice is that the unexpected object fell outside the focus of attention. Perhaps it did not appear in close enough proximity to any of the target items to be detected. Indeed, observers were looking for target bounces at the edges of the display when the unexpected object traversed the middle of the display. To explore this possibility, we modified the task so that observers counted the number of times that the target set of items (black or white Ls and Ts) came into contact with a horizontal line bisecting the display. With this task, we could vary the proximity of the unexpected object (a gray cross) to the attended region by varying its distance from the line. We found a small effect of proximity in this experiment, suggesting that spatial location is relevant to noticing. However, 47% of observers still failed to detect the unexpected item when it traveled on the line, which was presumably the locus of attention (Most, Simons, Scholl, & Chabris, 2000b). Although spatial proximity to the attended region appears to play some role in noticing, a more important determinant is the relationship between the unexpected object and the observer's goal-orientation in the primary task. Noticing was strongly influenced by the similarity of the unexpected item to the objects that the observers were attending and ignoring. The more similar the unexpected object was to the target items, and the less similar it was to the distracter items, the more likely
164
Most and Simons
Figure 2. A critical trial in a sustained inattentional blindness task (adapted from Most et al., 2001). On each trial, the black and white L's and T's move on random paths, bouncing off the edges of the display, and subjects count the number of bounces made by either the black or white items. On a critical trial, an additional object (here, a white cross) enters from the fight, travels in a linear path behind a fixation point, and exits to the left. (Arrows were not present in the experimental display.)
it was to be detected. For example, when observers attended to white L's and T's and ignored black ones, 94% saw an unexpected white cross on the critical trial but only 6% saw an unexpected black cross. Detection was intermediate with gray crosses. Furthermore, when observers were attending to the black shapes and ignoring the white ones, these noticing rates were reversed (Most et al., 2001, Experiment 1). Even though the unexpected cross always had a unique shape and motion, these distinctive features did not lead to detection; however, variations in luminance influenced detection. Why did variations in luminance so strongly influence noticing? One possibility is that luminance affected noticing because luminance is a privileged stimulus dimension for the visual system, often implicated in scene segmentation and motion perception (Marr, 1982). Alternatively, luminance might have influenced noticing because it was the only dimension distinguishing the attended from the ignored items. This possibility raises the intriguing hypothesis that observers established an attention set on the basis of the task demands, and that the attention set allows some classes of items to enter into conscious awareness while keeping others out. To test this possibility directly, we designed a new version of this task in which the moving, bouncing items in the display were 2 black circles, 2 white circles, 2 black squares, and 2 white squares. Depending on the condition, observers
Capture, Orienting and Awareness
165
counted the total number of bounces made by all the white shapes (both circles and squares), all the black shapes (both circles and squares), all the circles (both black and white), or all the squares (both black and white). Thus, the critical dimension could be either luminance or shape (depending on the instructions), and the display items across all four conditions were identical. In all conditions, the unexpected object was an additional black circle, which entered the display, traveled on a unique linear path, and exited the display. As in our earlier experiment (Most et al., 2001, Experiment 1), observers who attended to the black shapes were more likely to notice the additional black circle than those who attended to the white shapes. More importantly, the effect seems to be driven by the nature of the critical dimension: observers who attended to all the circles were more likely to detect the additional black circle (82% noticing) than those who attended to the squares (6% noticing; Most, Clifford, Scholl, & Simons, 2000a). The magnitude of this difference in noticing was comparable to that for luminance-based attention sets. Given the strong effect of top-down control settings on noticing rates, perhaps variations along feature dimensions unrelated to the attention set will fail to influence conscious awareness. For example, if observers are discriminating attended from ignored items on the basis of shape, will variations in color, however extreme, fail to affect noticing? In a study designed to explore this question, observers attended to black squares while ignoring black circles, and the unexpected object was either a black triangle or a white triangle. In other words, shape was the dimension relevant to the attention set but the two unexpected items differed from each other only in luminance, an irrelevant dimension. Even though luminance was irrelevant, it still influenced noticing rates: 68% noticed the white triangle and 38% noticed the black (Most et al., 2000a). Thus, features can influence noticing even when they are unrelated to the critical, attended dimension. However, as in the case of the red cross, more than 30% still failed to notice the white triangle, the only white item in the display, suggesting that even extreme salience does not completely override the attentional selection required by the primary task. 3 Interim summary
Do some features automatically capture awareness in the absence of expectations? Or does explicit capture depend more on what the observer brings to the situation? Explicit capture likely requires more processing than does implicit capture because the former may depend on the active construction of a conscious percept (Neisser, 1976). Because an individual's expectations and schemas guide this process, it is reasonable to expect that attention sets will play an important role and, in fact, this is exactly what we found in our studies. When subjects were engaged in an attentionally demanding task that required them to establish an attention set, unexpected stimuli consistent with that set were much more likely to be noticed than set-inconsistent stimuli (Most et al., 2000a; Most et al., 2001; see also Simons & Chabris, 1999). Stimulus variations along irrelevant dimensions had only a limited effect on noticing, and even salient, but irrelevant features failed to
166
Most and Simons
override completely the selectivity imposed by the attention set (Most et al., 2000a; Most et al., 2001). Perhaps surprisingly, spatial proximity of an unexpected object to the focus of sustained attention appeared not to play a large role in noticing (Most et al., 2000b). If conscious detection requires a relatively extended period of sustained attention, then it might well rely on a different attention mechanism than does implicit capture. The relatively reflexive shifts revealed by implicit measures appear strongly linked to a transient component of attention, whereas explicit capture likely requires the diversion of sustained attention. The next section considers how the distinction between sustained and transient components of attention may aid in the integration of the implicit capture and explicit capture literatures.
Integrating Implicit and Explicit Attention Capture Implicit measures of capture are based on performance, whereas inattentional blindness is a measure of awareness. However, to say that studies of implicit capture and inattentional blindness assess different aspects of attention, and to leave it at that, seems rather hollow. A more satisfying reconciliation would show how these two literatures can be integrated. In this brief discussion, we attempt an integration by returning to the notion of a perceptual cycle (Neisser, 1976). This notion posits a repeated and sustained visual exploration of the environment, which eventually produces conscious awareness. The following anecdote illustrates the framework. As one of us was working on the manuscript for this chapter, he noticed a darting motion in his peripheral vision. Presumably the motion had induced a transient shift of attention. Further exploration revealed a color discontinuity with the carpeting along with ongoing motion signals, leading to a tentative interpretation of the information: some kind of animal in the room. But what kind of animal? An insect? Further inspection revealed that the source of the motion was a mouse (a description of the author's subsequent reaction is beyond the scope of this chapter). The following evening, while again working on this manuscript, the author was distracted by a color discontinuity in his peripheral vision. The immediate reaction, upon an initial interpretation of it as another mouse, was quickly followed by further visual analysis, which revealed the source of the scare to be a 1969 copper penny. In both cases, the properties of the stimulus were combined with a schema that then guided subsequent visual exploration. Had the objects been present only for an instant, an orienting response might have occurred, but this orienting quite likely would not have produced a conscious percept. Furthermore, had the author not seen the original mouse, his schema for what should appear in his office likely would not have included a mouse, and he would have been unlikely to initially interpret the coin as an animal. In fact, without the prior influence of the mouse on his schema, he might not have noticed the coin at all. Neisser hypothesized that information with "no temporal dimension...can lead to an orienting response, but...cannot specify the identity or the meaning of events and has no phenomenal impact" (Neisser, 1979, p. 214). Although the
Capture, Orientingand Awareness
167
original perceptual cycle model captures some intuitions about how conscious perception might proceed, the model does not explicitly incorporate different forms of orienting. An updated version of this model might provide a useful framework for considering how implicit and explicit forms of capture relate to each other and to orienting in general. Given that implicit capture requires no awareness of a cue and is more closely aligned with reflexive shifts of attention than with voluntary shifts, implicit capture studies might help to illuminate the kinds of stimuli (e.g., onsets, unique colors) capable of triggering a new perceptual cycle in the absence of expectations. Like repeatedly striking a match, each transient shift caused by such stimuli can potentially kindle the cyclical process leading to awareness. However, a new or modified perceptual cycle will proceed only if attention, like the spark of the match, is sustained. This possibility is consistent with our finding that distinctive unexpected items, while not always noticed, are detected more often than less salient stimuli (Most et al., 2000a). Figure 3 presents our modified version of the perceptual cycle, reframed in terms of recent work on attention. In this model, information from or about the visual scene initially establishes a schema for the sorts of objects that belong in the scene. This initial schema is likely to be fairly crude. When engaged in selective processing of the scene, an individual may adopt an attention set based on his or her schema. The attention set determines which specific objects or features the observer will attend to and then guides sustained attention to those aspects of the scene consistent with the attention set. As a result of the new information gained through sustained attention to the scene, the schema is fleshed out and the attention set is then updated again. As this process is repeated, perception of the scene is gradually enriched. The key question is what happens when a new cue appears in the scene, something that might capture attention. When such a signal occurs, it has the potential to disrupt sustained attention, thereby redirecting attention to the signal itself. Even when such signals do not fully disrupt sustained attention, they could still influence performance. That is, they could implicitly capture attention by inducing a shift of transient attention. Some stimuli, either due to consistency with the attention set or to the strength of the signal might either become part of the ongoing perceptual cycle or initiate a new one. Through repeated cycles of exploration and accommodation into a schema, such stimuli might reach awareness. This model of the perceptual cycle allows for sustained shifts of attention as a function of the observer's schema and attention set. It also allows for transient shifts of attention in response to a new signal in the scene. Implicit capture is attributed to a transient signal that may or may not completely disrupt the perceptual cycle. Explicit capture results from a signal that succeeds in disrupting or being incorporated into the cycle, thereby becoming a focus of sustained attention. Of course, this model is necessarily incomplete in that it cannot readily encompass all of the claims in the implicit and explicit capture literatures, and it cannot account for contradictory claims within them. Also, the mechanisms underlying the formation and operation of attention sets are not clearly understood. The model does have the advantage that it refocuses capture research on the components of attention (e.g.,
168
Most and Simons
types o f attention shiPts), rather than on the features that m a y or m a y not automatically draw attention. Notably, it predicts that all attention capture, including explicit capture, results from a transient shift. That is, explicit capture will not occur in the absence o f implicit capture. This is an empirically testable hypothesis, although one that has not yet been addressed systematically. Although this model is necessarily vague, it provides a potentially valuable framework that a c c o m m o d a t e s both implicit and explicit capture.
Figure 3. A modified version of Neisser's (1976) perceptual cycle model. When observers are engaged in selective processing, crude information from a visual scene (or from task instructions) initially establishes a schema for what is likely to be in the scene. This schema then contributes to an attention set which guides sustained attention as the observer inspects the scene. This inspection yields more information about what is in the scene, which, in turn, leads to modification of the schema to incorporate the new information. Unless interrupted or redirected, this cycle repeats, eventually leading to awareness of the attended aspects of the scene. This part of the model is comparable to Neisser's original proposal. Importantly, when a new signal appears, it can cause a reflexive shift of attention to a different aspect of the scene, thereby disrupting sustained attention. In so doing, it might produce evidence for implicit attention capture. However, unless it is incorporated into the current perceptual cycle or triggers a new one, it will not garner additional sustained attention and will not reach awareness.
Conclusion What is "attention capture"? Here we have distinguished between implicit and explicit measures o f capture, with the former focusing on effects o f an irrelevant stimulus on performance and the latter focusing on the effects o f an unexpected
Capture, Orienting and Awareness
169
stimulus on awareness. Although both approaches have been common in the capture literature, relatively few studies have yet explored the relationship between implicit and explicit capture (but see Gibson, this volume). Our review of this distinction highlights the need for studies that directly compare different forms of orienting while simultaneously measuring effects on awareness of the cue. Although the distinction between implicit and explicit capture helps to classify the current results, neither literature in isolation can resolve the fundamental question of whether or not a stimulus can draw attention regardless of the goals, schemas, and expectations of an observer. In attempting to address this question, the capture literature has often focused on the distinction between exogenous and endogenous capture. There are times when objects and events affect our behavior and/or enter into conscious awareness without our having explicitly decided to attend to them. However, the impossibility of determining the observer's attention set with sufficient precision (Folk et al., 1993) precludes any strong claim that a stimulus has automatically drawn attention. Rather than emphasizing the types of cues that automatically draw attention, perhaps we can more effectively operationalize capture in terms of the nature of the shift itself. A transient shift of attention is likely to be a relatively reflexive response to a stimulus, and it has a number of characteristics that seem consistent with attention capture. Furthermore, such shifts can be measured without regard to the observer's attention set. This treatment of capture has the advantage that it is readily measured and that it can be distinguished from sustained shifts of attention, which are more likely to be under voluntary control. Moreover, it retums the emphasis of capture research to the components of attention rather than the nature of the stimulus. However, it does not distinguish between effects on behavior and effects on awareness. Perhaps a slightly broader definition can accommodate both forms of capture: when a person is engaged in a primary task and has no explicit intention to process additional stimuli, awareness of an unexpected object constitutes explicit attention capture and a shift of attention to a stimulus without awareness of it constitutes implicit attention capture. The main goal of this chapter was to integrate findings using implicit measures of attention capture with those using explicit measures. In the process, we have argued that implicit and explicit measures actually reflect different phenomena that, while intimately linked, may be dissociated from each other. Transient orienting responses can occur in the absence of awareness; whether or not awareness of unexpected stimuli can occur without an initial transient shift of attention is a question for further research. By thinking of the function of capture in terms of a perceptual cycle, we can combine both implicit and explicit capture into a single framework. If a person is engaged in a hazardous task requiring sustained and focused attention, anything more than a transient shift of attention to an unexpected object could result in disaster (Nakayama & Mackeben, 1989). On the other hand, if the unexpected object is also of critical importance or is related to the task, then it would be appropriate to incorporate it as a focus of sustained attention. Implicit
170
Most and Simons
capture resulting in a transient shift of attention might allow the perceptual system to rapidly evaluate whether or not sustained attention should be directed to a stimulus. This shift might affect performance, but if the eliciting stimulus is not centrally relevant for the current task it will not reach awareness; particularly salient stimuli might also trigger a new perceptual cycle, but even highly salient stimuli often go unnoticed if they are irrelevant to the current task. When the transient shift reveals a stimulus to be consistent with the observer's attention set, sustained attention is reallocated and the stimulus becomes part of a perceptual cycle. Explicit capture of awareness results from the integration of a stimulus into the current perceptual cycle or from the formation of a new cycle. By thinking of capture in terms of the perceptual goals of the organism rather than in terms of the nature of the stimulus itself, we can gain a better appreciation for how different forms of capture are linked and for the functional roles they play. Footnotes
1 A reasonable objection to this notion is that we often are aware of very brief stimuli. For example briefly flashed stimuli are often seen, and it is not uncommon to be able to make out whole scenes during a flash of lightning (Neisser, 1976). Two points, however, defuse the effectiveness of this objection. First, even when stimuli appear briefly, subjects often expect to see them, and such expectations may serve to facilitate the processing required for conscious awareness. Second, although it is true that brief stimuli are often perceived, it is also true that these stimuli tend to persist in iconic memory, thereby allowing processing to continue well after the actual items have disappeared (Neisser, 1976; Sperling, 1960). 2 These findings are consistent with implicit evidence for capture that even abrupt onsets do not capture attention when observers know the spatial location of the target in advance (Yantis & Jonides, 1990). It would be worthwhile to investigate whether inattentional blindness still occurs when subjects are uncertain about where the cross would appear. However, the data do suggest that observers are equally bad at detecting the additional stimulus when it appears on one of the arms of the cross as when it appears in one of the quadrants (Mack & Rock, 1998). Thus, failure to detect it might not have been dependent on prior knowledge of the target's location. 3Ongoing experiments are exploring whether unexpected objects with sudden onsets are noticed more than those that appear gradually. Preliminary data from our lab suggest that even sudden onsets fail to capture awareness much of the time. References
Allport, A. (1989). Visual attention. In M. I. Posner (Ed.), Foundations of
cognitive science (pp. 631-682). Cambridge, MA: MIT Press.
Capture, Orientingand Awareness
171
Bacon, W. F., & Egeth, H. E. (1994). Overriding stimulus-driven attentional capture. Perception & Psychophysics, 55(5), 485-496. Becklen, R., & Cervone, D. (1983). Selective looking and the noticing of unexpected events. Memory and Cognition, 11(6), 601-608. Briand, K. A., & Klein, R. M. (1987). Is Posner's "beam" the same as Treisman's "glue"?: On the relation between visual orienting and feature integration theory. Journal of Experimental Psychology: Human Perception and Performance, 13(2), 228-241. Colegate, R. L., Hoffman, J. E., & Eriksen, C. W. (1973). Selective encoding from multielement visual displays. Perception & Psychophysics, 14(2), 217-224. Eriksen, C. W., & Hoffman, J. E. (1972). Some characteristics of selective attention in visual perception determined by vocal reaction time. Perception & Psychophysics, 11(2), 169-171. Eriksen, C. W., & Hoffman, J. E. (1973). The extent of processing of noise elements during selective encoding from visual displays. Perception & Psychophysics, 14(1), 155-160. Folk, C. L., & Remington, R. (1998). Selectivity in distraction by irrelevant featural singletons: Evidence for two forms of attentional capture. Journal of Experimental Psychology: Human Perception and Performance, 24(3), 847-858. Folk, C. L., & Remington, R. (1999). Can new objects override attentional control settings. Perception & Psychophysics, 61(4), 727-739. Folk, C. L., Remington, R. W., & Johnston, J. C. (1992). Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 18(4), 1030-1044. Folk, C. L., Remington, R. W., & Johnston, J. C. (1993). Contingent attentional capture: A reply to Yantis (1993). Journal of Experimental Psychology: Human Perception and Performance, 19(3), 682-685. Folk, C. L., Remington, R. W., & Wright, J. H. (1994). The structure of attentional control: Contingent attentional capture by apparent motion, abrupt onset, and color. Journal of Experimental Psychology: Human Perception and Performance, 20(2), 317-329. Franconeri, S. L., & Simons, D. J. (2001, May). Dissocluding and looming objects capture attention. Poster presented at the Annual Meeting of the Vision Sciences Society. Hillstrom, A. P., & Yantis, S. (1994). Visual motion and attentional capture. Perception & Psychophysics, 55(4), 399-411. James, W. (1950/1890). The principles of psychology. (Vol. 2). New York: Dover. Jonides, J. (1981). Voluntary versus automatic control over the mind's eye's movement. In J. B. Long & A. D. Baddeley (Eds.), Attention and performance IX (pp. 187-203). Hillsdale, NJ: Lawrence Erlbaum. Jonides, J., & Yantis, S. (1988). Uniqueness of abrupt visual onset in capturing attention. Perception & Psychophysics, 43(4), 346-354.
172
Most and Simons
Kentridge, R. W., Heywood, C. A., & Weiskrantz, L. (1999). Attention without awareness in blindsight. Proceedings of the Royal Society of London, B., 266, 1805-1811. Kristjfinsson,/k., Mackeben, M., & Nakayama, K. (in press). Rapid, object based learning in the deployment of transient attention. Perception. Kristjfinsson,/i~., & Nakayama, K. (submitted). A primitive memory system for the deployment of transient attention. Lambert, A., Naikar, N., McLachlan, K., & Aitken, V. (1999). A new component of visual orienting: Implicit effects of peripheral information and subthreshold cues on covert attention. Journal of Experimental Psychology: Human Perception and Performance, 25(2), 321-340. Mack, A., & Rock, I. (1998). Inattentional blindness. Cambridge, MA: MIT Press. MacLeod, C., Mathews, A., & Tata, P. (1986). Attentional bias in emotional disorders. Journal of Abnormal Psychology, 95(1), 15-20. Marr, D. (1982). Vision : a computational investigation into the human representation and processing of visual information. San Francisco: W. H. Freeman. McCormick, P. A. (1997). Orienting attention without awareness. Journal of Experimental Psychology: Human Perception and Performance, 23(1), 168-180. Most, S. B., Clifford, E., Scholl, B. J., & Simons, D. J. (2000a, November). What you see is what you set: The role of attentional set in explicit attentional capture. Poster presented at Object Perception and Memory (OPAM), New Orleans, LA. Most, S. B., Simons, D. J., Scholl, B. J., & Chabris, C. F. (2000b). Sustained inattentional blindness: The role of location in the detection of unexpected dynamic events. Psyche, 6(14), http://psyche.cs.monash.edu.au/v6/psyche-6-14most.html. Most, S. B., Simons, D. J., Scholl, B. J., Jimenez, R., Clifford, E., & Chabris, C. F. (2001). How not to be seen: The contribution of similarity and selective ignoring to sustained inattentional blindness. Psychological Science, 12(1), 9-17. Mfiller, H. J., & Findlay, J. M. (1988). The effect of visual attention on peripheral discrimination thresholds in single and multiple element displays. Acta Psychologica, 69, 129-155. Mfiller, H. J., & Rabbitt, P. M. A. (1989). Reflexive and voluntary orienting of visual attention: Time course of activation and resistance to interruption. Journal of Experimental Psychology: Human Perception and Performance, 15(2), 315-330. Nakayama, K., & Mackeben, M. (1989). Sustained and transient components of focal visual attention. Vision Research, 29(11), 1631-1647. Neisser, U. (1976). Cognition and reality." Principles and implications of cognitive psychology. San Francisco, CA: W. H. Freeman. Neisser, U. (1979). The control of information pickup in selective looking. In A. D. Pick (Ed.), Perception and its development: A tribute to Eleanor J. Gibson (pp. 201-219). Hillsdale, NJ: Lawrence Erlbaum.
Capture, Orientingand Awareness
173
Posner, M. I. (1980). Orienting of attention. Quarterly Journal of
Experimental Psychology, 32, 3-25. Posner, M. I., & Petersen, S. E. (1990). The attention system of the human brain. Annual Review of Neuroscience, 13, 25-42. Posner, M. I., Snyder, C. R. R., & Davidson, B. J. (1980). Attention and the detection of signals. Journal of Experimental Psychology: General, 109, 160-174. Simons, D. J. (2000). Attentional capture and inattentional blindness. Trends in Cognitive Sciences, 4(4), 147-155. Simons, D. J., & Chabris, C. F. (1999). Gorillas in our midst: Sustained inattentional blindness for dynamic events. Perception, 28, 1059-1074. Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs: General and Applied, 74(11), 1-29. Theeuwes, J. (1992). Perceptual selectivity for color and form. Perception & Psychophysics, 51,599-606. Theeuwes, J. (1994). Stimulus-driven capture and attentional set: Selective search for color and visual abrupt onsets. Journal of Experimental Psychology." Human Perception and Performance, 20(4), 799-806. Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97-136. Yantis, S. (1993). Stimulus-driven attentional capture and attentional control settings. Journal of Experimental Psychology." Human Perception and Performance, 19(3), 676-681. Yantis, S., & Hillstrom, A. P. (1994). Stimulus-driven attentional capture: Evidence from equiluminant visual objects. Journal of Experimental Psychology: Human Perception and Performance, 20(1), 95-107. Yantis, S., & Jonides, J. (1984). Abrupt visual onsets and selective attention: evidence from visual search. Journal of Experimental Psychology." Human Perception and Performance, 1O, 601-621. Yantis, S., & Jonides, J. (1990). Abrupt visual onsets and selective attention: Voluntary versus automatic allocation. Journal of Experimental Psychology: Human Perception and Performance, 16(1), 121 - 134.
Acknowledgments Thanks to Christopher F. Chabris, Erin Clifford, Bradley Gibson, /krni Kristjfinsson, and Stephen Mitroff for commenting on an earlier draft of this chapter and to Patrick Cavanagh for helpful discussion. Daniel Simons was supported by NIMH grant #R01-MH63773-01 and by an Alfred P. Sloan Research Fellowship.
This Page Intentionally Left Blank
Part I11 Multiple Modalities
This Page Intentionally Left Blank
Attraction, Distraction,and Action: MultiplePerspectives on Attentional Capture C. Folk and B. Gibson (Editors) 9 ElsevierScience B.V. All rights reserved.
177
Using Pre-pulse Inhibition to Study Attentional Capture: A Warning About Pre-pulse Correlations J. Toby Mordkoff and Hilary Barth
Many abrupt stimuli (such as loud noises, sudden taps, or electrical shocks) will evoke an involuntary startle response. For example, an unexpected tap to the glabella -- which is the flat surface of the lower forehead between the eye-brows -will elicit an immediate, eye-blink reflex. Maybe more interesting: The magnitude of this reflex can be reliably reduced by presenting a less-intense stimulus (called a pre-pulse) just prior to the startle stimulus. This reduction in the magnitude of startle due to a preceding stimulus is known as pre-pulse inhibition or reflex modification (for an introduction and review, see, e.g., Anthony, 1985; Graham, 1975; Hackley, 1993; Hoffman & Ison, 1980). Pre-pulse inhibition has been observed in a variety of species using a wide assortment of pre-pulse and startle stimuli. For simplicity, the present chapter will be primarily concerned with the results from tone/tap/blink experiments on humans -- where the pre-pulse is a quiet tone, the startle stimulus is a tap to the glabella, the reflex is an eye-blink, and the subjects are people. In this case, when the tone is presented between 100 and 1000 ms before the tap, the magnitude of the blink is reliably reduced. The amount of inhibition depends on a variety of factors, including the specific delay between the two stimuli (i.e., the tone-tap stimulusonset-asynchrony or SOA; see, e.g., Hoffman & Stitt, 1980); the rise-time, duration, and intensity of the pre-pulse (e.g., Blumenthal, 1995; Blumenthal & Levey, 1989; Stitt, Hoffman, & DeVido, 1980); whether the pre-pulse is attended or ignored (e.g., Hackley & Graham, 1984, 1987); and the emotional or clinical state of the subject (e.g., Bradley, Cuthbert, & Lang, 1993; Dawson, Hazlett, Filion, Nuechterlein, & Schell, 1993; Lang, Bradley, & Cuthbert, 1990). To give an idea of how powerful pre-pulse inhibition can be: At a tone-tap SOA of 150 ms, the magnitude of the eye-blink reflex can be reduced by as much as 65% in normal undergraduates by a moderate, attended, pre-pulse tone (Hoffman & Stitt, 1980).
Temporal Dynamics of Pre-pulse Inhibition The specific time-course of pre-pulse inhibition has been explored by systematically varying the SOA between the pre-pulse tone and the startling tap. For example, Stitt et al. (1980; Exp. 1) employed tone-tap SOAs of 100, 150, 250, 450, and 850 ms, as well as the required control condition that does not include a
17 8
Mordkoff and Barth
pre-pulse tone. Like many others, these researchers found that pre-pulse inhibition grows quickly, peaks, and then fades within this range. In particular, a maximum amount of inhibition (i.e., a minimum amount of startle) was found at an SOA of 150 ms, while at all of the other SOAs some inhibition was observed, demonstrating an extended time-course. However, on at least one end of the SOA continuum, an important boundary exists: If the pre-pulse tone is presented simultaneously with or just after the startling tap, then the magnitude of the startle reflex can be augmented, instead of reduced (e.g., Hoffman, Cohen, & Stitt, 1981). Other studies of the time-course of pre-pulse inhibition have examined the latency of the startle reflex and shown that this, too, can be affected by a preceding tone. For example, Graham and Murray (1977) measured the latency of the eyeblink reflex (to a glabellar tap) and found that a tone presented just prior to the startle stimulus significantly decreased the latency of the blink. These results are important because they show that pre-pulse tones do not suppress all aspects of the startle reflex; in this case, while the magnitude of startle was reduced, the speed of the reflex was increased.
Pre-pulse Inhibition and Attentional Capture The models of pre-pulse inhibition that have emerged from these studies almost always refer to some form of sensory gating (see, e.g., Anthony, 1985; Brunia, 1993; Hackley, 1993; Hoffman & Ison, 1980; Perlstein, Fiorito, Simons, & Graham, 1993). For any of a variety of reasons (ranging from capacity limits to attentional switching), the encoding and subsequent processing of the startle stimulus is reduced when it follows a pre-pulse. Because of this gating, the startle stimulus is perceived as being less intense and, therefore, produces a smaller reflex (Cohen, Hoffman, & StiR, 1981; Perlstein et al., 1993). These models of pre-pulse inhibition are somewhat reminiscent of more recent explanations of certain phenomena from the literature on attentional capture. In both cases, the processing of some stimuli is delayed or inhibited because the process are occupied by some other, salient stimulus. Going farther, the waxing and waning of pre-pulse inhibition over time -- while much longer in extent -- bears some similarity to the time-courses of the attentional blink (Raymond, Shapiro, & Amell, 1992) and attentional dwell time (Ward, Duncan, & Shapiro, 1996). On the assumption that pre-pulse inhibition is a form of attentional capture (or vice versa, by historical precedence), one might be tempted to use the magnitude of pre-pulse inhibition as a new measure of attentional capture. Such an approach to the study of capture would have the distinct advantage of being based on an involuntary response (to the startling stimulus) and, therefore, would be less open to strategic interpretations and demand characteristics. However, before such a research project is begun, it would seem important to consider an alternative
Pre-pulse Inhibition
179
account of pre-pulse inhibition. In particular, it would seem important to ask whether pre-pulse inhibition is truly automatic and unaffected by the context in which it is studied.
Classical Conditioning of Pre-pulse Inhibition One issue that has received relatively little attention is whether pre-pulse inhibition is caused by or is subject to classical conditioning. With regard to cause, Graham (1975) has reported that pre-pulse inhibition can be observed on the very first trial experienced by a participant, which goes far to suggest that conditioning is not necessary (see, also, the discussion in Hackley, 1993). At the same time, however, there is evidence from animal studies (see, e.g., Crofton, Dean, Sheets, & Peele, 1990; also Blumenthal, 1997) to suggests that correlations between the prepulse and startling stimuli can alter the magnitude of pre-pulse inhibition. To see why classical conditioning might be a factor in studies of pre-pulse inhibition, consider the following experimental design (which is typical of the timecourse studies from the 1980s). There are six main conditions: the control condition (i.e., startle stimulus only) and five different pre-pulse conditions with various SOAs between 100 and 1000 ms; the pre-pulses are tones and the startle stimuli (presented at a rate of about three per minute) are glabellar taps. Now take the position of the experimental participant and consider whether the pre-pulse tones provide any cues with regard to the startling taps. The answer is clearly yes. Five out of six taps are preceded (within 1000 ms) by a tone; conversely, if a tone has just been detected, then the probability of a tap within the next 1000 ms is 1.00 (when the baseline probability under this particular design is only 0.05). In summary, the experimental design used for most studies of the timecourse of pre-pulse inhibition includes the main ingredient to encourage classical conditioning: viz., a strong correlation between stimulus events. Furthermore, it has been known for some time that several human reflexes (such as the skinconductance response) are subject to classical conditioning (e.g., Mordkoff, Edelberg, & Ustick, 1967). Therefore, while most of the time-course studies of prepulse inhibition have been discussed in terms of the effects of an irrelevant tone on the eye-blink reflex, it is not yet clear how much of the inhibition is actually due to some conditioned, as opposed to an automatic, response. Overview The primary goal of this experiment was to test the idea that correlations between pre-pulse tones and startling taps can have significant effects on the amount and time-course of pre-pulse inhibition. This was done by manipulating these correlations (between subjects). A secondary goal was to test whether the
180
Mordkoff and Barth
correlations between pre-pulse tones and startling taps (that have existed in most previous designs) are responsible for the finding of reduced eye-blink latency in pre-pulse conditions. This was done by examining peak reflex latency, as well as peak magnitude. Finally, on the assumption that evidence of conditioning would be found, a third goal was to separate and explore the automatic and conditioned components of pre-pulse inhibition. The separate components would then be compared to the effects that are seen in the literature on attentional capture. Methods
Participants. The twelve undergraduate volunteers (all female; ages 18-21) were recruited by poster and each paid 5 dollars (U.S.) for one-hour of participation. No subject had participated in a similar experiment previously and all were naive as to the present purposes. Apparatus. The stimulus events were controlled by an IBM-compatible micro-computer. The 50-ms, 55-dB, pre-pulse tones were produced by a grey-noise generator (Coulboum $81) and played through the left speaker of a set of stereo head-phones (Telephonics TDH-39). The glabellar taps were administered by sending a 50-ms pulse of 23 volts to a small solenoid with a teflon tip that was mounted on the band of head-phones (see Marsh, Hoffman, & Stitt, 1979, for details). Both stimulus devices were controlled using custom-made, 5-volt (dc) relays interfaced to the micro-computer by a parallel port, which provides better than millisecond timing resolution. The data were collected using three Ag/AgC1 electrodes (Grass Instruments): one bi-polar pair approximately 5 cm above and below the right eye (for the vertical electro-oculogram; vEOG), plus a "ground" on the left side of forehead. Electrode impedance was always less than 5K ohm. The vEOG signals were amplified using one channel of an isolated dc amplifier (Scientific Instrumentation; band-pass: 0.2 to 100 Hz) and then digitized at 200 Hz by an analog-to-digital translation board (ADAC). The data were stored in a compressed format and analyzed off-line. Procedure. The three electrodes were first installed (after cleaning the subject's skin), then the head-phones were placed over the participant's ears, and finally the tip of the solenoid was aligned with the subject's glabella. Next, a series of three tones and two taps were administered and the participant was given the opportunity to withdraw from the experiment (none did). The participant was then moved to be seated in a comfortable chair in a small (6'x 6'), moderately-lit room. Each was provided with reading materials (that they could read, but only if they wished) and a small box on which a response key was mounted (used only to move forward through the instructions at the start of the session and then set aside). The instructions (provided on a small CRT monitor and controlled by the response key)
Pre-pulse Inhibition
181
asked the participants to "avoid making large movements and try to avoid thinking about the tones and taps." At that point the experiment began, with a total of 72 taps being administered over a 25-minute period. Design. The experiment employed a 2 x 6 mixed-factor design, with Correlation Condition (high vs low) being manipulated between subjects, and Prepulse Condition (five SOAs, plus a control) being manipulated within subjects. As regards the second factor, this experiment was a near-replication of many previous studies (e.g., Stitt et al., 1980, Exp. 1) in that there was a no-tone, control condition (under which only a tap was administered), and five different tone-tap conditions with SOAs of -50 ms (i.e., tap then tone), +50 ms (tone then tap), +150 ms, +350 ms, and +750 ms. Each of the six levels of the pre-pulse factor occurred once per block in a randomly-selected order. (Blocking was used to equalize any habituation effects across these six conditions.) The inter-trial interval was randomly selected on each trial to be between 15 and 25 seconds. There were 12 blocks of trials in the experiment, although subjects were not made aware of this grouping. The correlation between the tones and the taps was manipulated (between subjects) by the inclusion of additional tones (in the absence of taps) during the inter-trial intervals. The high-correlation condition did not include these extra tones (replicating what is typical of experiments designed to measure the time-course of pre-pulse inhibition). The low-correlation condition included three to five tones -each identical to the pre-pulse tones -- during each inter-trial interval (i.e., one tone every five seconds; distributed randomly). The one limit placed on this procedure was that the last inter-trial tone was always at least three seconds before the start of the next trial. Data Reduction. Off-line, the individual vEOG waveforms were "baselined" using the mean of the measures taken during a 200-ms window just prior to each trial. Next, average waveforms for each participant in each of the six pre-pulse conditions were found, and then quantified in terms of the peak amplitude and the latency of the peak. Finally, the peak amplitudes in each of the five conditions that included a pre-pulse were re-expressed as a proportion of the control condition (i.e., pre-pulse peak minus control peak, divided by control peak; hereafter: relative peak amplitude). This was done to prevent participants with larger mean amplitudes from "dominating" the subsequent analysis; it is the amplitude equivalent of the Vincentizing procedure that is used in response-time research (see, e.g., Mordkoff & Yantis, 1991). Results
The mean relative peak amplitudes of the eye-blinks in each of the five prepulse conditions are shown in Figure 1. A mixed-factor ANOVA revealed a main effect of Pre-pulse Condition [F(4,40) - 35.91, Huynh-Feldt e = .800, p < .001 ], no
182
Mordkoff and Barth
main effect of Correlation Condition [F(1,10) < 1], but a significant interaction [F(4,40) = 35.91, p < .001 ]. For the high-correlation group, the simple
0.50 Low-correlation Condition 0.25
-o~ t-o
o
0.00
~ ~ o:E (].) t-- o
-0.25
1E ~ o
-0.50
--,,.
(D
Q..
~._n
(/') v
m
-0.75 -1.00
i
i
|
|
-50
50
150
350
750
Tone-Tap SOA (ms) Figure 1. Mean peak amplitude of the eye-blink startle reflex (relative to the control condition) as a function of Correlation Condition and the SOA between the pre-pulse tone and the glabellar tap. (The filled points represent the low-correlation condition; the open points represent the high-correlation condition.)
main effect of SOA was significant [F(4,20) = 38.24, c = .937, p < .001], with the -50-ms condition showing marginal augmentation [t(5) = 2.46, p < .06], the + 150and +350-ms conditions showing significant pre-pulse inhibition [t(5) = 6.14 & 5.39, respectively, both p < .005], and the other two conditions not differing from control [both t(5) < 1.50]. For the low-correlation group, the simple main effect of SOA was also significant [F(4,20) = 9.60, c = .705, p < .01 ], with the +50-, + 150-, and +350-ms conditions all differing from control [t(5) = 3.63, 3.90, & 2.70, respectively, all p < .05 (or better)] and the -50- and +750-ms conditions not differing from control [both t(5) < 1.20]. Finally, pair-wise comparison between the two Correlation Conditions at each SOA (equal variance n o t assumed; see Figure 1) showed significant differences at -50 ms [t(6.10) = 2.59, p < .05], + 150 ms [t(7.34) = 3.75, p < .005], and +350 ms [t(7.76) = 3.43, p < .01 ], but not at +50 ms [t(8.28) = 1.33] or +750 ms [t(5.78) < 1]. The mean peak latencies of the eye-blinks are shown in Figure 2. A mixed-factor ANOVA revealed a main effect of Pre-pulse Condition [F(5,50) =
Pre-pulse Inhibition
183
18.26, Huynh-Feldt e = .681, p < .001], a main effect of Correlation Condition [F(1,10) = 14.19,p < .005], and a significant interaction [F(5,50) - 28.20,p < .001]. For the high-correlation group, the simple main effect of SOA was significant [F(5,25) = 159.39, c = .698, p < .001], with the control and - 50-ms conditions differing from the other four (by the adjacent-cells or "repeated" test). For the lowcorrelation group, the simple main effect of SOA was not significant
120 E 110 0c" 100 (/)
v
._1
90
~correlation
(D 8o 13_ (1) 70
Conditio~
=.,..
1E
O9
60
50 No-Tap Control
-50
50
150
350
750
Tone-Tap SOA (ms) Figure 2. Mean peak eye-blink startle latencies as a function of Correlation Condition and the SOA between the pre-pulse tone and the glabellar tap (plus the control condition). (The filled points represent the low-correlation condition; the open points represent the high-correlation condition.)
[F(5,25) = 2.12, e = .669, p > .15] (and the adjacent-cells test, albeit unjustified, also revealed no pair-wise differences). Finally, the startle peak latencies across the high- vs low-correlation conditions were not different in the control condition [t(7.50) - 1.55, p > .15], nor in the -50-ms condition [t(6.75) < 1], but did differ at all of the positive SOAs [+50 ms: t(6.45) = 4.37, p < .005; +150 ms: t(6.70) = 3.84, p < .01; +350 ms: t(6.21) = 5.33,p < .005; and +750 ms: t(5.75) = 6.61,p < .005]. Discussion In general, the data from the high-correlation condition replicated previous studies of pre-pulse inhibition that have used this sort of experiment design (e.g.,
184
Mordkoff and Barth
Stitt et al., 1980). In particular, pre-pulse inhibition waxed and waned over the SOA range of +50 to +750 ms, reaching a maximum at an SOA of +150 ms, and there was weak (and, in this case, statistically insignificant) augmentation at an SOA o f - 5 0 ms. Also replicating and extending some previous work (e.g., Graham & Murray, 1977), pre-pulse tones reduced startle peak latency in the high-correlation condition, but only when they preceded the startle stimulus. In contrast, the data from the low-correlation condition show a very different pattern. With regards to pre-pulse inhibition, the maximum effect was here much smaller (25%, as opposed to 60%), started and peaked sooner in terms of SOA (at +50 ms, as opposed to at +150 ms), and faded in half the time (less than 350 ms, as opposed to 750 ms). Furthermore, no effect on startle peak latency was observed at any SOA. Taken as a whole, the results from the high- and low-correlation conditions make several points concerning classical conditioning of pre-pulse inhibition. First, the impressive size and extended time-course of this phenomenon is probably due to some form of conditioning. When the correlation between pre-pulse tones and startling taps is weakened -- as was done here with the addition of pre-pulses during the inter-trial intervals -- the magnitude and time-course of pre-pulse inhibition is shrunk to a significant degree. This extends the related results that have been observed in non-humans (e.g., Gewirtz & Davis, 1995) Second, the finding of reflex augmentation by tones that follow the startling taps (as opposed to precede them) is probably also due to classical conditioning. When the correlation between the tones and taps is weakened, no such augmentation is observed. Finally, the reduction in startle latency that has been observed when a tone precedes the tap is most likely due to some form of conditioning. As above, when the correlation is decreased, no such effects are observed.
An alternative explanation" Pre-pulse habituation Before continuing to discuss other implications of these findings, at least one alternative interpretation of the difference in results between the high- and lowcorrelation conditions must be addressed. This alternative focuses on the number of pre-pulse tones that were administered during the experimental session, as opposed to the correlation between the pre-pulse tones and startling taps. In other words, this alternative concerns possible habituation of pre-pulse inhibition (see, e.g., Gewirtz & Davis, 1995). As a start, note that participants in the low-correlation condition experienced approximately five times as many pre-pulse tones as those in the highcorrelation condition. Therefore, if repeated presentation of the tones leads to a weakening of their effect (regardless of correlation), then the smaller amount of pre-
Pre-pulse Inhibition
185
pulse inhibition (and the null effect of the tones on startle peak latency) in the lowcorrelation condition can be explained without reference to classical conditioning. While this altemative cannot be ruled out definitively, there are several reasons to doubt that it is the sole cause of the difference between the high- and lowcorrelation conditions. First, while the overall amount of pre-pulse inhibition was smaller in the condition that involved more tones, the specific amount of pre-pulse inhibition at one SOA was actually larger (albeit, not significantly so). Second, not only was the amount of pre-pulse inhibition affected by correlation condition, but the time-course was altered, as well, and it is not clear why habituation would affect the latter. Third, a post-hoc analysis of the relative peak amplitudes that included "practice" as an additional factor (by dividing the twelve blocks of trials into three sets of four) produced neither a main effect nor any interactions involving "practice" [all F < 1.00]. Finally, at least one direct test for habituation of pre-pulse inhibition (after controlling for changes in the startle reflex, as was done here) failed to find any evidence for such an effect (Blumenthal, 1997).
Automatic and conditioned components of pre-pulse inhibition On the assumption that the low-correlation condition did not evoke any classical conditioning and the high-correlation condition did, the present results may now be used to provide separate estimates of the automatic and conditioned components of pre-pulse inhibition. This analysis is based on the auxiliary assumption that the two components are additive, such that the amount of one has no influence on the amount of the other. (This assumption cannot be tested using these data, but is here used to gain a foothold on the results.) To do this, one first uses the results from the low-correlation condition as a direct estimate of the automatic component. Next, because it is here being assumed that the highcorrelation condition provides a measure of the sum of the automatic and conditioned components, one merely subtracts the amount of pre-pulse inhibition in the low-correlation condition from the amount in the high-correlation condition to find the conditioned component. The results from this procedure are shown in Figure 3 (plotted in terms of inhibition vs. augmentation, as opposed to relative peak magnitude). As can be seen, the automatic component of pre-pulse inhibition is relatively small, always inhibitory, and very short-lived. In contrast, the conditioned component is larger, both facilitatory and inhibitory (depending on SOA), and extends over a wider period of time.
186
Mordkoff and Barth
4O + Automatic Component ---o-- Conditioned Component
t-
E
o
:,~ tO--~"
._o
30
~,
2o
E
O o ~
xg (1)
~--(::D rr
-10 t.9 "~-Q .F
-20
"-
-40
E
-30
-50
,
,
,
,
,
-50
50
150
350
750
Tone-Tap SOA (ms) Figure 3. Separate estimates of the automatic and conditioned components of reflex modification. (The filled points represent the automatic effect; the open points represent the conditioned effect. This analysis assumes additivity between the two components and uses the low-correlation condition as a direct estimate of the automatic effect.)
Going farther, the automatic component of pre-pulse inhibition shows a remarkable resemblance to two of the phenomena from the capture literature: viz., the attentional blink (e.g., Raymond et al., 1992) and attentional dwell-time (e.g., Ward et al., 1996). These effects are observed in visual tasks that require participants to watch for and report the presence of various stimuli, usually displayed in rapid succession. As can be seen by comparing the results across these studies, the time-course of these perceptual interference effects are very similar to the present estimate of the automatic component of pre-pulse inhibition. Also similar to several recent discussions of pre-pulse inhibition (e.g., Brunia, 1993; Hackley, 1993), the attentional blink and attentional dwell-time have mostly been discussed in terms of sensory or attentional gating (see, also, Moore, Egeth, Berglan, & Luck, 1996). In contrast, the conditioned component of pre-pulse inhibition is probably better understood in terms of what a correlated pre-pulse can "tell" an experimental participant, and what the likely response to this information would be. Recall that under the high-correlation design, a pre-pulse tone is a perfect predictor of a startling tap; if the participant has just detected a tone, a tap must occur within 750 ms (if it hasn't already). In light of this, the participant's initial reaction could well
Pre-pulse Inhibition
187
be something like fear, which has been shown to increase the startle reflex (see, e.g., Falls & Davis, 1994; Leaton & Cranney, 1990); hence the finding of pre-pulse augmentation of the eye-blink reflex at SOAs near zero (see, also, Flaten, 1993). Upon further processing, however, the participant could use the "warning" provided by the pre-pulse to prepare for the upcoming tap; hence the reduction of the reflex at SOAs of 150 ms or more (see, also, Ison, Sanes, Foss, & Pinckney, 1990). Conclusions
In summary, the present study has shown that several of the important and well-known effects of pre-pulse tones on the glabellar-tap, eye-blink reflex are probably at least partially due to classical conditioning. In particular, the size of the automatic component of pre-pulse inhibition is much smaller (and much shorterlived) than the total effect that is observed when the experimental design includes a strong correlation between the pre-pulse and startle stimuli. Furthermore, while the effects of a pre-pulse tone on eye-blink peak latency are large and facilitatory in the correlated condition, the effects of an uncorrelated tone are nil. Therefore, besides raising the general issue of classical conditioning in studies of pre-pulse inhibition in humans, the present study should also be seen as a warning to researchers to pay special attention to the experimental designs that are used to examine this class of phenomena. On the positive side, as long as these warnings are heeded, the present study goes far to suggest that pre-pulse inhibition could well be used as an important new tool for the study of attentional capture. The main value of this new measure is that it does not require instructions (in that people cannot help but blink when they are tapped on the glabella) and the involuntary reaction (an eye-blink) is easily measured. References
Anthony, B. J. (1985). In the blink of an eye: Implications of reflex modification for information processing. In Advanaces in Psychophysiology, Vol 1, pp. 167-218. Blumenthal, T. D. (1995). Prepulse inhibition of the startle eyeblink as an indicator of temporal summation. Perception & Psychophysics, 5 7, 487-494. Blumenthal, T. D. (1997). Prepulse inhibition decreases as startle reactivity habituates. Psychophysiology, 34, 446-450. Blumenthal, T. D., & Levey, B. J. (1989). Prepulse rise time and startle reflex modification: Different effects for discrete and continuous prepulses. Psychophysiology, 26, 158-165.
18 8
Mordkoff and Barth
Bradley, M. M., Cuthbert, B. N., & Lang, P. J. (1993). Pictures as prepulse: Attention and emotion in startle modification. Psychophysiology, 30, 541545. Brunia, C. H. M. (1993). Waiting in readiness: Gating in attention and motor preparation. Psychophysiology, 30, 327-339. Cohen, M. E., Hoffman, H. S., & Stitt, C. L. (1981). Sensory magnitude estimation in the context of reflex modification. Journal of Experimental Psychology: Human Perception and Performance, 7, 1363-1370. Crofton, K. M., Dean, K. F., Sheets, L. P., & Peele, D. B. (1990). Evidence for an involvement of associative conditioning in reflex modification of the acoustic startle response with gaps in background noise. Psychobiology, 18, 467-474. Dawson, M. E., Hazlett, E. A., Filion, D. L., Nuechterlein, K. H., & Schell, A . M . (1993). Attention and schizophrenia: Impaired modulation of the startle reflex. Journal of Abnormal Psychology, 102, 633-641. Falls, W. A., & Davis, M. (1994). Fear-potentiated starle using three conditioned stimulus modalities. Animal Learning & Behavior, 22, 379-383. Flaten, M. A. (1993). Startle reflex facilitation as a function of classical eyeblink conditioning in humans. Psychophysiology, 30, 581-588. Gewirtz, J. C., & Davis, M. (1995). Habituation of prepulse inhibition of the startle reflex using an auditory prepulse close to background noise. Behavioral Neuroscience, 109, 388-395. Graham, F. K. (1975). The more or less startling effects of weak prestimulation. Psychophysiology, 12, 238-248 Graham, F. K., & Murray, G. M. (1977). Discordant effects of weak prestimulation on the magnitude and latency of reflex blink. Physiological Psychology, 5, 108-114. Hackley, S. A. (1993). An evaluation of the automaticity of sensory processing using event-related potentials and brain-stem reflexes. Psychophysiology, 30, 415-428. Hackley, S. A., & Graham, F. K. (1984). Early selective attention effects on cutaneous and acoustic blink reflexes. Physiological Psychology, 11,235-242. Hackley, S. A., & Graham, F. K. (1987). Effects of attending selectively to the spatial position of reflex-eliciting and reflex-modulating stimuli. Journal of Experimental Psychology: Human Perception and Performance, 13, 411-424. Hoffman, H. S., Cohen, M. E., & Stitt, C. L. (1981). Acoustic augmentation and inhibition of the human eyeblink. Journal of Experimental Psychology: Human Perception and Performance, 7, 1357-1362. Hoffman, H. S., & Ison, J. R. (1980). Reflex modification in the domain of startle: I. Some empirical findings and their implications for how the nervous system processes sensory input. Psychological Review, 87, 175-189.
Pre-pulse Inhibition
189
Hoffman, H. S., & Stitt, C. L. (1980). Inhibition of the glabella reflex by monaural and binaural stimulation. Journal of Experimental Psychology: Human Perception and Performance, 6, 769-776. Ison, J. R., Sanes, J. N., Foss, J. A., & Pinckney, L. A. (1990). Facilitation and inhibition of the human startle blink reflexes by stimulus anticipation. Behavioral Neuroscience, 104, 418-429. Lang, P. J., Bradley, M. M., & Cuthbert, B. N. (1990). Emotion, attention, and the startle reflex. Psychological Review, 97, 377-395. Leaton, R. N., & Cranney, J. (1990). Potentiation of the acoustic startle response by a conditioned stimulus paired with acoustic startle stimulus in rats. Journal of Experimental Psychology: Animal Behavior Processes, 16, 279-287. Marsh, R. R., Hoffman, H. S., & Stitt, C. L. (1979). Eyeblink elicitation and measurement inthe human infant. Behavior Research Methods &
Instrumentation, 11,498-502. Moore, C. M., Egeth, H., Berglan, L. R., & Luck, S. J. (1996). Are attentional dwell times inconsistent with serial visual search? Psychonomic Bulletin & Review, 3, 360-365. Mordkoff, A. M., Edelberg, R., & Ustick, M. (1967). The differential conditionability of two components of the skin conductance response. Psychophysiology, 4, 40-47. Mordkoff, J. T., & Yantis, S. (1991). An interactive race model of divided attention. Journal of Experimental Psychology: Human Perception and Performance, 17, 520-538. Perlstein, W. M., Fiorito, E., Simons, R. F., & Graham, F. K. (1993). Lead stimulation effects on reflex blink, exogenous brain potentials, and loudness judgments. Psychophysiology, 30, 347-358. Raymond, J. E., Shapiro, K. L., & Arnell, K. M. (1992). Temporary suppression of visual processing in an RSVP task: An attentional blink? Journal of Experimental Psychology: Human Perception and Performance, 18, 849-860. Stitt, C. L., Hoffman, H. S., & DeVido, C. J. (1980). Modification of the human glabella reflex by antecedent acoustic stimulation. Perception & Psychophysics, 27, 82-88. Ward, R., Duncan, J., & Shapiro, K. L. (1996). The slow time-course of visual attention. Cognitive Psychology, 30, 79-109.
This Page Intentionally Left Blank
Attraction, Distraction, and Action: Multiple Perspectives on Attentional Capture C. Folk and B. Gibson (Editors) 9 ElsevierScience B. V. All fights reserved.
9
191
TemporalExpectancies, Capture, and Timing in Auditory Sequences Mari Riess Jones
It is surely true that we know more about the way people attend to visual than to auditory arrays. Yet anyone who has tried to follow a friend's conversation or listened to a musical performance knows that attention is not limited to a specific modality. This chapter considers guided attending as well as stimulus-driven attention in the auditory domain, with a special emphasis upon the dynamics of attending. It is divided into four parts. The introduction reviews major paradigms and findings in research on attending to visual and auditory events with primary concentration on auditory research. A second section describes dynamic attending in the context of temporal sequences; it incorporates a new role for abrupt onsets and capture in events that transpire over time. The third section presents results from recent experiments on dynamic attending to auditory sequences; and a final section provides concluding remarks. I. Attending to Visual and Auditory Events" An Overview In Part I, I attempt to provide an overview of research on auditory attention with some focus on issues of timing. Relative to research on visual attention, much less is known either about attention to auditory events or the role of time in attending to these events. Consequently, in this introductory section, although I draw comparisons between attending to visual and auditory stimuli, my main focus is upon auditory events and the role of time. My stepping-stone into the world of auditory attention begins with a discussion of paradigms used in research on visual attention because these are familiar and well established. At the same time, it invites observations about important differences between visual and auditory paradigms. In the visual domain, it is common to rely on search tasks in which presented elements form static spatial arrays whereas in the auditory domain tasks are more likely to rely on monitoring of elements that comprise dynamic temporal arrays. Format differences between visual-spatial and auditory-temporal presentations are critical because they levy different constraints on attending. Consequently, my review of auditory attending concentrates on temporal properties of tasks and stimuli.
192
Riess Jones
Visual arrays
In attending to visual arrays, it has been useful to distinguish voluntary control of a spatial search process, which involves expectancies, from an involuntary process which involves stimulus-driven capture. Although the voluntary versus involuntary distinction becomes less clear-cut in research with auditory arrays, I organize this part of the chapter around two topics associated with this distinction, namely expectancy and capture. Expectancy. Expectancy typically refers to an anticipatory orienting of attention. By its very nature, anticipation implies a temporal component of expectancy. Nevertheless, it is most common to link attentional orientation simply to a spatial locale of a future target; thus, expectancy involves either a specific or nonspecific orientation to some location in space. A specific expectancy is one confined to a narrow spatial region, whereas a non-specific expectancy is one in which an attender anticipates that a future target may occur in anywhere within a wide spatial region. Operationalized, specific and non-specific orientations may be associated, respectively, with cued and uncued search paradigms. A cued search paradigm relies on distinct cues to instill specific expectancies about "where" a target might occur in space whereas an uncued search (by definition) does not. In visual attention, cued search usually involves two discrete, successively presented, visual elements namely, a cue and a target; together they form a short sequence. This paradigm often examines the locative function of a cue stimulus: The cue is used to signal some future spatial location of the second element in the sequence, the target. As shown in Figure l a, the cue-target task involves time constraints; these largely concern the time interval between the onset of the cue and that of the target. This interval, known as the inter-onset-interval, IOI, usually assumes only one or two values (an IOI is also termed SOA) within a session. If the cue is an arbitrary stimulus (e.g., a symbol such as an arrow), neutrally located in space, then it is termed an endogenous cue; this is distinguished from an exogenous cue (discussed shortly) which may be either similar to a cued target and/or located close to a possible target location. Usually an endogenous cue acquires its meaning for a viewer by virtue of its probabilistic connection to a forthcoming target. For instance, an endogenous cue is effective in guiding attending if a viewer knows, in some sense, that there is a high conditional probability (validity) of the target at a given location following a given cue (Kahneman & Treisman, 1984; Kahneman & Tversky, 1982; Posner, 1980; Theeuwes, 1991). In other words, operationally cue validity is taken as a determinant of expectancy. In this context, a valid symbolic cue typically generates faster and more accurate responding to a target that appears at the specified location than does an invalid cue (Downing, 1988; Posner, 1980, Posner, Synder, & Davidson, 1980; Shulman, Remington, & McLean, 1979). This sort of cuing task, along with the evident influence of cue validity, has reinforced the widespread practice of reserving the term 'expectancy' to describe endogenous cuing. In visual attention, valid endogenous cues are commonly considered to be an
Auditory Attentional Capture
193
important vehicle for the voluntary orienting of visual attending to locations in space. Endogenous cuing paradigms often equate cued attending with specific expectancies about "where" a target will occur. But emerging evidence suggests that we can enlarge the description of endogenous cuing to include the temporal component of an expectancy. People appear able to take advantage of temporal constraints of the cue-target task to anticipate "when" as well as "where" a future target may occur. Admittedly, to suggest that an endogenous cue directs attending to cued regions in time is less conventional than to suggest the allocation of attention in space. Nevertheless, recent research shows such effects (Coull & Nobre, 1998; Miniussi, Wilding, Coull & Nobre, 1999; Coull, Firth, Buchel & Nobre, 2000; Kingstone, 1992; Rothstein, 1985). The general strategy pairs two different symbolic cues (e.g., cue A and cue B), respectively, to two different time intervals (IOIs) of forthcoming targets. Figure l a (left panel) shows that cue A is paired most often with a long IOI (t3 - t~) and cue B with a short one (t2 - tl). People pick up on these probabilities, responding more quickly to validly cued targets appearing after a long IOI (cue A) than to those invalidly cued by cue A. That is, a target that suddenly occurs earlier than expected (i.e., after cue A and a short IOl), catches the viewer by surprise. One interpretation offered for these findings is that unexpectedly early targets produce an automatic reaction, whereas unexpected late ones stimulate a voluntary re-orientation of attention. Thus, with only two lOis, when a target fails to appear following B by a short IOI (at t2 ), one can confidently predict 'when' it will occur, namely after the long IOI (i.e., at t3 ) in Figure 1a (left). To sum up, given the right visual cue as well as some consistency of experienced time intervals, people can learn to specifically anticipate "when" a target will occur after a cue. Clearly, people can allocate attention in time as well as space. Many cue-target tasks use only a single visual cue, as in the right panel of Figure l a. Note that they offer greater specificities of cues with respect to space but not with respect to time. In these, because a wide range of IOIs are associated with the same cue, the target will be anticipated to occur anywhere within a broad temporal region from t 2 to t3. For instance, if short and long IOIs are 1/2sec and lsec, respectively, then attention will be focused in time over this 1/2second region. All of this means that attending has a temporal component which can be focused narrowly or widely (in time) depending on the range of lOis used in a task. In other words, depending on temporal constraints of the task, a cue-target experiment designed to specifically orient attending to a "where" in some space-like dimension, may also inadvertently provoke distinct temporal expectancies. By contrast, in uncued visual search, expectancies in both space and time are less specific. The common uncued search task presents a large number, d, of elements simultaneously within a 2D spatial array. By definition, this task lacks the sequential presentation of two items (cue, target). Because both cues and IOIs are absent, specific expectancies about a target's location in space or time are less likely. In spite of such uncertainties, a general expression regarding expectancies about target location in space and time is possible. Probabilistically, the canonical search
194
Riess Jones
task is maximally uncertain; on average, when a target is present its probability of being at a given location is 1/d (see e.g., Nissen & Corkin, 1985 for unequal target probabilities). Often such search is conceived as a series of deliberate attention shifts from one location to another resulting from various goal-oriented strategies (e.g., feature search, singleton search etc.). Imagine, for instance, an array of d elements, say short lines, that differ from one another with respect to orientation and color. A conjunctive search task defines a target in terms of a certain co-occurrence of two features as in a vertical red line, a definition that motivates the search. In such tasks, it has been argued that attention shifts serially over many spatial locations, guided by the goal of target localization and possibly by prioritized relationships among displayed elements (Cave & Wolfe, 1990; Treisman & Sato, 1990; Wolfe, Cave, & Franzel, 1989; Wolfe, 1994) 1. I speculate that subjectively such shifts transform a simultaneous array into a successive one, leading to the creation of an attentional trajectory in space and time: Attention moves from location X at time tn to location Y at time tn+l, and so on. Building on probabilistic notions, rough estimates of an expected time to target discovery can be determined from the expected number of shifts, d/2 (assuming equal probability weights). If the average attention shift time is T, then the expected search time is Td/2. Thus, Td/2 represents a kind of temporal expectancy in that it gives an expected time to target discovery. Furthermore, letting T be the average pace of attending (i.e. how fast or slow one shifts attending) associated with this space-time trajectory, this pace can be affected by instructions and task goals, among other things. My motive for describing uncued search of a large spatial array in this manner is strategic because later in this chapter I suggest that an important difference between searching static spatial arrays and monitoring dynamic temporal ones concerns the way attending is paced in the two situations. The description of uncued search I have outlined emphasizes that in these visual-spatial tasks one's attentional pace can be flexible because people are free to pace themselves; thus, when asked to respond quickly, they can voluntarily change T. Later, I will contrast this with constraints imposed on attentional pacing in a sequence monitoring task involving d temporally distributed auditory elements; in this task, people are "paced" by the rate imposed by a succession of elements. Despite the flexibility available in visual search, an expected trajectory of attention can be short-circuited, in some cases, by a single distinctive stimulus i.e., a singleton (Egeth & Yantis, 1997). If the singleton is putatively task irrelevant, but nonetheless is so distinctive that it grabs our attention, then attention shifts to an unscheduled spatial location at an unexpected time. According to Yantis and Egeth (1999), in this case, the singleton captures attending. When this happens in visual search, where the expected time to target discovery is Td/2, capture represents a kind of violation of the expectancy algorithm. That is, a target is discovered either earlier or later than expected. If a distracting singleton is the target, then observed search time will be less than expected, Td/2; if not, 2 then observed search time will exceed Td/2. To sum up, probabilistically based expectancies can be identified in both
Auditory Attentional Capture
195
cued and uncued spatial search tasks. In cued search tasks, they express relatively specific anticipations about "where" in space and "when" in time a target may occur, whereas in uncued search, they are less specific. Nonetheless, in spite of uncertainties in the latter, it is possible to estimate a global temporal expectancy about target discovery that suggests the presence of flexible attentional allocation over time in uncued search. Moreover, generally expectancies about where and when a target may occur are seen as reflecting mainly voluntary attentional activities. Capture. Capture has been touched upon above in the uncued search task. It refers to an apparent truncation of a largely voluntary search due to the presence of a distinctive singleton which "pulls" attention to a specified spatial location at an unexpected time. Central to the debate over visual capture is a concern over whether certain stimulus properties, inherent in the attractor element, are responsible for a derailment of attending. Some contend that task parameters and the goal object 'set' attention in such a way that, under the right circumstances, any unique or relationally distinct element can pull attention to itself (or its location) by virtue of its relevance to the task (Folk, Remington, & Johnston, 1992; Folk, Remington, & Wright, 1994; Gibson & Amelio, 2000). Others maintain that certain special properties of a task-irrelevant singleton, such as its abrupt onset, are important for capture to occur, either because they signal a new object (Jonides & Yantis, 1988; Yantis, 1993; Yantis & Egeth, 1999; Yantis & Jonides, 1984) or because they introduce a significant luminance change over time (Gellaty, Cole & Blurton, 1999; Gibson, 1996a,b; Yantis & Jonides, 1996). Abrupt onsets are central to debates over capture. Perhaps this is because they differ intriguingly from other salient features of a singleton, such as color, form etc. The defining property of an abrupt onset is dynamic rather than static: it involves relative timing. Relative to onsets of other elements in a display, a singleton onset is one that happens early or late; in a real sense, their 'abruptness' is a matter of relative timing because it depends on onset times and time intervals associated with other elements (e.g., see Miller, 1989; Remington, Johnston, & Yantis, 1992). Any element with an unusual onset time relative to the established time structure of a task can draw attention. For example, a cue-target IOI of 200 ms can seem surprisingly short when it follows a series of 1,100 ms intervals but less so when it follows a series of 400 ms intervals. Some timing deviations make powerful claims on our attention. In this respect, time deviations, including abrupt onsets, assume an exogenous cueing function in that, regardless of how often they occur (cue validity), they seem to "pull" attention to their spatio-temporal locations. Furthermore, because abrupt onsets seem to override cue validity and instructions, the attentional shifts they provoke are considered, at least partially, automatic (Theeuwes, 1991). To sum up, although often overlooked, the relative timing of a stimulus element, within a larger experimental context, may be an important aspect of the larger debate about capture in attending. The debate over capture really turns on the issue of automaticity. A popular dichotomy holds that expectancies are voluntarily controlled whereas stimulus-driven attending is involuntary (where involuntary equates with automatic).
196
Riess Jones
This dichotomy goes hand-in-hand with above distinctions about stimulus cues; endogenous cues are purported to instantiate voluntary attending whereas exogenous cues provoke involuntary, i.e., automatic, control of attending. Because involuntary attending is equated with "stimulus-driven" attending, one area of concern in debates over capture regards whether a particular stimulus item "qualifies" as an exogenous cue. Presumably, an item qualifies if it: 1. Immediately over-rides a voluntary intentional control of search, and 2. Is insensitive to memory/cognitive load (Kahneman & Treisman, 1984; Theeuwes, 1991). Folk and colleagues (Folk et al. 1992; 1994; 1999) have maintained that, by these criteria, capture is not automatic; rather it is contingent upon the attentional set one voluntarily assumes in a given task (i.e. whether or not a feature is task relevant). In their view, involuntary attentional orientation depends on voluntary control settings conferred by task relevant features of the goal object (e.g., Gibson & Amelio, 2000). In this case, it is possible to ignore even compelling stimulus singletons (e.g., abrupt onsets) if they are not relevant to a task goal. To the contrary, Yantis maintains that even taskirrelevant abrupt onsets can capture attention, given a neutral attentional set, i.e., a relatively wide attentional focus (Yantis & Jonides, 1990; 1996; Theeuwes, 1991). Only when a task tightly circumscribes the goal item and search, such that one's attention is narrowly focused on a particular property or spatial region, will an irrelevant abrupt onset fail to capture attending. Thus, the kernel of the debate on automaticity of stimulus-driven attending concerns whether or not the attentiongetting potential of certain stimulus properties is independent of one's goals in a task. Summary. Expectancy is often defined operationally in a probabilistic fashion in visual attention (using endogenous cue validity). Such expectancies not only appear to direct attending to "where" in space a target might occur, but it also seems that people can allocate attention in time to anticipate "when" that target will happen. The latter finding realizes a temporal component of attending inherent in the concept of expectancy. The phenomenon of attentional capture also incorporates a role for timing in that often those elements most likely to compel attention are ones that violate an established task time structure: They arrive unexpectedly early or late (e.g., abrupt onsets). These sudden onsets seem to over-ride expectancies set in motion by valid endogenous cues. But debates over the automaticity of visual capture remain.
Auditory arrays In the auditory domain much research employs temporal rather than spatial arrays; these arrays comprise short or long sequences of sounded elements. I discuss attending to these sequences by revisiting the topics of expectancy and capture outlined in the preceding section. Thus, I consider auditory versions of the cuetarget task involving short sequences (d - 2 elements), as well as auditory counterparts to the uncued spatial search tasks involving many (d > 2) temporally distributed elements used in sequence monitoring tasks. Precise parallels between
197
Auditory Attentional Capture
auditory and visual tasks are risky, especially in the latter case (i.e., uncued spatial search versus sequence monitoring). Some reasons for caution are instructive. For instance, the large visual arrays of the uncued search task are usually spatial and static, whereas the long auditory arrays of sequence monitoring are temporal and dynamic. I contend that these formatting constraints differentially affect the flexibility and the pacing of attending. A second reason for caution involves the respective functions underlying responses to spatial and temporal arrays. In our everyday lives, the visual search tasks often call for responses that resemble quests for a lost item, whereas auditory sequence monitoring resembles situations where we must listen, reply to and/or assess communications of others (speech, music). Both require attending, but in the former it serves a locative function whereas in the latter it serves as the foundation for communication. Cue-Target Tasks
t
Long IOI
I
Long IOI
Cue A
Cue O ,m
Cue B
E
,m
S h o r t IO1 S h o r t IO1
t 1
t 2
t 3
t 1
t 2
t 3
Time
Figure la. Left panel: The cue-target paradigm is shown indicating different temporal contingencies over trials using two distinct cues (A, B) to, respectively, different IOIs for target occurrences (targets are solid ovals). Given the onset time, t~, of a cue on a trial, one cue (cue A) is shown to provide a specific temporal expectancy for a target at time, t3, whereas the other (cue B) provides a temporal expectancy at t2. Right panel: The same cue is associated with two different lOis on different trials; targets (solid ovals) are anticipated in the temporal region between t2 and t3, yielding a less specific temporal expectancy than in the left panel.
198
Riess Jones
A Sequence Monitoring Task
Critical I01
0
0
"Same" "Lower" Comparison tone
Standard
tone
Time v
Figure lb. A sequence monitoring task showing tones interpolated between standard and comparison pitches. Time intervals between successive interpolated tones are IOIs; these specify a sequence rate and rhythm; the final IOI, the critical IOI, may be a variable in certain tasks.
Expectancy. Expectancies can be conceived in several ways. As we have seen, the probabilistic association view links expectancy with a slow and deliberate voluntary control process. It is most evident in visual cue-target tasks where expectancy is operationalized probabilistically as cue validity (e.g., with endogenous cues). However, a different conception emerges in sequence monitoring tasks. It involves pattern-directed attending in which expectancy is linked to pattern relationships (Jones & Yee, 1993; Garner, 1974). In this view, an expectancy is an extrapolation of attending that is determined partly by relationships among sequence elements. In this case, expectancy is neither necessarily slow nor inevitably voluntary. Cue-target paradigm. Paradigmatically, auditory versions of the cue-target task offer nice parallels to their visual counterparts. In the auditory domain, these designs continue to facilitate questions about the locative function of a cue, now a sound cue: How does one sound signal the location of another? Again, Figure l a is applicable to the cue-target paradigm. A sound that is arbitrarily related to the sound of a target (e.g., a church bell followed by a bird whistle) may acquire a signaling function merely through probabilistic associations. When endogenous cues are correlated with a target in terms of conditional probability, they become valid and function to determine a listener's expectancy about a target's location in space. A word is necessary about the term "space-like" on the ordinate of Figure 1a. Whereas in visual cueing studies, the cue often orients attending to a location in two or three-dimensional space in the world, in auditory studies two variants of this exist. In one, the sound cue signals the location of another sound in the 3D environment, meaning that space-like refers to a measure of real spatial distance. In the other variant, one sound cue signals another's position in Hertz (Hz) i.e., tone frequency (pitch); here "space-like" refers to an interval distance in pitch space. It has been argued that in the auditory domain pitch space functions psychologically much like real space (Kubovy, 1981; Jones, 1976; Woods et al., 2001).
Auditory Attentional Capture
199
In pursuit of probabilistically determined expectancies, Spence & Driver (1994) asked whether valid sound cues can orient attending in real space. They pitted endogenous sound cues against so-called exogenous ones in a task where listeners judged the spatial elevation (high versus low) of a target sound. The target sound could occur either on the listener's left or right side (Experiment 5). Listeners were told that a 2KHz endogenous sound would validly (.75) signal the location of a forthcoming target on the side opposite that of the endogenous sound cue. On invalid trials, the same sound exogenously cued the target, which then appeared on the same side as the exogenous sound cue. The design was similar to that of Figure 1a (right panel) where a cue sound preceded a target by one of three cue-target IOIs (from lOOms to 1,000 ms, with a variable cue onset time, tl). People were faster at identifying valid versus invalid targets at longer lOis, a finding consistent with the probabilistic view that valid endogenous sound cues are linked to slow voluntary expectancies. In turn, these expectancies support a listener's shift of attention to a region in space, as reported with visual cue-target designs (e.g., Downing, 1988; Posner, 1980). Others confirm that acoustic cues can orient listeners to specific regions in 3D space, but these studies involved (arguably) exogenous cues and are discussed shortly (Mondor, 1999; Mondor & Breau, 1999; Mondor & Zatorre, 1995; Mondor, Zatorre, & Terrio, 1998; Spence & Driver, 1994). It appears that useful parallels exist between auditory and visual cuing of attending to regions in space in cue-target designs. One, not mentioned above, involves time constraints. If attending is not modality-specific (and I suspect it is not), then time intervals (IOIs) may operate in a similar way in all cue-target designs. I previously argued that over the course of a session, the lOis used in these designs permit people to anticipate a region in time as well as one in space. This remains relevant to understanding the anticipatory aspect of expectancies in auditory as well as in visual cue-target designs. For example, in the Spence and Driver experiment, the target follows a cue sound within 1/10 th to 1 sec (with a probability of 1.00); this consistent pairing of temporal range with a single sound cue circumscribes a region in time for which attending may be heightened (see also Swets & Green, 1966). In sum, regardless of modality, it appears that cue-target designs afford the orienting of attending in space.., and in time. People may come to expect not only the "where" of future targets, but also their "when". Sequence Monitoring Tasks. Rigid parallels between visual and auditory arrays are problematic in the monitoring of long sequence (d > 2). It is probably wiser to acknowledge that auditory sequences are important objects of study in their own right because they approximate the stimuli we routinely encounter in our acoustic environment in the guise of speech, music and other environmental sound patterns. Unlike research in uncued visual search of large spatial arrays, fledging research on attending to many elements distributed in temporal arrays is less standardized. One version of the monitoring task embeds a target sound (e.g., a change in pitch, timbre or duration) either within or at the end of a sequence. Figure l b illustrates the case where a target (pitch) is specified by an initial standard tone and a comparison tone terminates the sequence. Although superficially some target
200
Riess Jones
embedding tasks bear similarities to visual search tasks, as a rule these tasks differ importantly in the temporal constraints they levy on attending. As indicated earlier, in visual search people can voluntarily pace attending to each of d visual elements by varying the rate (defined earlier as T) of attention shifts. By contrast, in auditory monitoring people have less freedom to voluntarily adjust the pace of attending. In fact, because a sequence conveys a series of IOIs between sounds, a temporal sequence may preclude voluntary attentional pacing. A static visual array does not change over time: Objects do not appear, then disappear. But, this is precisely what happens within a temporal sequence. Somehow, people must "attend at the right time" by coordinating their moment-to-moment attending with "when" successive elements appear, either voluntarily or involuntarily. I suspect this means that people must adjust the pace of attending (T) to match the rate (IOI) of sequence (Jones et al., 2001; Large & Jones, 1999). I consider this topic in part II. Lets return to conceptions of expectancy mentioned earlier. The probabilistic association view links expectancy to a slow voluntary control of attending based on probabilities associated with a single discrete cue. This approach to expectancy is eminently well suited to studying the locative function of sounds. But to apply it to sequence monitoring is challenging. First, this requires identifying a plausible discrete cue within a sequence comprising many potential cues. Yet, just as with its static visual counterpart (i.e., uncued search), the sequence monitoring task offers many potential sound cues, leading to great uncertainty about cuing. Second, when presented with a sound sequence, people rarely respond to it by listening for a discrete sound as an indicator of the spatial location (in the environment) of a subsequent sound; rather, their default attention mode seems to involve a tracking of relationships among a series sounds, much as we naturally do when listening to the prosody of speech or music. Consequently, in sequence monitoring tasks, expectancy has also been based on pattern-directed attending. This view links expectancy to an extrapolation of compelling aspects of pattern structure. Instead of relying strictly on cue validity, it assumes that relationships among successive elements, including their time relationships, contribute to attending and to the induction of expectancies. In both short and long sequences, these relationships include pitch changes and pitch arrangements as well as temporal relationships between onset times (t~) and IOIs (rate, rhythm). Expectancies have often been conceived as slow and voluntary. Is this necessarily the case with expectancies in sequence monitoring? According to some probabilistic interpretations, the answer is "yes." This implies that expectancies should appear only in response to slow sequences. By contrast, according to certain pattern-directed approaches, attentional pace and expectancies are rate dependent because time is part of pattem structure. As I propose later (part II), certain aspects of tracking a pattem that are involved in anticipatory orienting may become less efficient at fast rates. However, this view does not necessarily link changes in tracking efficiency with a sharp dichotomy between involuntary processes (e.g., at fast rates) and voluntary ones (e.g., at slow rates). Clearly, attentional pacing is a focal issue in this chapter and I formalize it
Auditory Attentional Capture
201
shortly. But to pave the way for this, let us first consider evidence for attentional pacing in the monitoring of slow auditory sequences (mean IOI > 200 ms). Expectancies, as extrapolations of attentional pace, can be assessed by examining effects of the time structure of an induction sequence on listeners' judgments about "when" future targets may occur. Barnes and Jones (2000) found that a regular stimulus rhythm produced temporally paced expectancies, which influenced people's judgments about subsequent time intervals. Time judgments were more accurate with expected than unexpected target timing, given the rate and rhythm of the induction sequence (cf. McAuley & Kidd, 1998; Large & Jones, 1999). Such findings suggest that temporal expectancies exist in responding to long sequences of tones and, once again, are based on stimulus timing (lOis). Because these were slow sequences, it is tempting to also infer that these expectancies are voluntary. But, at least by one criterion of involuntary control (over-riding instructions), we cannot firmly conclude this because listeners in the Barnes and Jones study had trouble complying with instructions to "ignore" the induction sequence. Sequence time structure also affects other aspects of monitoring performance. Jones, Boltz and Kidd (1982) found that sequence rhythm (as well as pitch patterning) affected monitoring; listeners were better at detecting a change in pitch of a target when it was located on a rhythmic accent than when it occurred at a temporally unaccented time point (see also Boltz, 1993; Kidd, 1993; Kidd, Boltz & Jones, 1984). Often when instructions are used to direct attending to certain pitch regions (e.g., attend to high tones), their influence on performance is qualified by listeners' bias toward relying on pattern structure itself, including its rhythm (Jones, Jagacinski, Yee, Floyd, & Klapp, 1995; Klapp, Hill, Tyler, Martin, Jagacinski, & Jones, 1985; Klein & Jones, 1996). Finally, classic findings have shown that people's anticipations about future elements in slow auditory (as well as visual) sequences are often predicated on various relationships among elements earlier in a sequence (e.g. Garner, 1974; Gamer & Gottwald, 1968; Restle, 1970; Jones, 1981 for a review). For example, a high tone that occurs rarely in a sequence may, in spite of its low probability of occurrence, be strongly expected at a particular point in time simply if it "fits" within local pattern relationships. All of these findings indicate that time relations as well other pattern relationships (e.g. pitch) between discrete tones strongly affect people's expectancies about the "when" in time and the "where" in pitch space of some target tone. In sum, research with slow sequences reveals the presence of patterndirected attending but offers no conclusive evidence bearing on whether or not the resulting expectancies are voluntary. According to Bregman (1990) any expectancies evident at these rates, including those based on rhythm, are voluntary and result from domain-specific leaming that operates only at slow rates. Others eschew the voluntary/involuntary dichotomy. Thus, I have suggested that to the extent pattem-directed attending is engaged by sequence rhythm at any rate, it rests on internal activities that are unlearned, rather primitive and responsive to timing (Jones, 1976). What happens with fast auditory pattems? If expectancies indeed reflect
202
Riess Jones
slow voluntary attending, based on learned schemes, then they should disappear in tasks when people monitor fast sequences. Although findings with fast sequences are less clear-cut, there remains some evidence of pattern-directed attending at these rates. Howard, O'Toole, Parasuraman, and Bennett (1984) assessed pattern-directed attending in fast tone sequences using trained listeners. Performance depended on several factors, including the relative pitch of the target, the conditional probability of a given (target) pitch within a particular pattern and the pattern itself (rising vs falling pitch patterns). This is one of the few reports in which conditional probability of a single tone within a pattem was pitted against the nature of pattem structure itself; in a sense, the whole pattern functioned as a "cue." The data suggest that both conditional probability (validity) of a target and pattern structure contribute to the allocation of attending in a sequence. Dowling, Lung and Herrbold (1987) reached similar conclusions. People listened for probe tones within a sequence when distractor tones were added. In different experiments, variations of pitch and timing relationships significantly affected performance suggesting their influence on attending to fast sequence (replicated by Puente & Jones, under review). However, other research fails to support pattern-directed attending and I consider it shortly (e.g., Mondor & Terrio, 1998). In sum, with fast auditory sequences there is mixed evidence for patterndirected attending. One theoretical account of such findings assumes that different processes underlie monitoring of slow versus fast sequences. Whereas voluntary expectancies have been proposed as the vehicle of selective attending in slow sequences, involuntary perceptual grouping processes (using Gestalt principles) are supposed to determine pattern perception in fast sequences (Bregman, 1990). In this view grouping is an after-the-fact automatic process that precludes attention and expectancies. Another account assumes that pattern-directed attending operates at all rates: Attending is paced by stimulus time relationships but accurate rhythmic pacing systematically falters at fast rates (Jones, 1976). Summary of research on auditory expectancies. Research with short auditory sequences (using the cue-target paradigm) indicates that expectancies about locations of future sounds can be probabilistically manipulated by cue validity of endogenous sound cues. Research with longer sequences (using monitoring tasks) indicates that pattern relationships, including rate and rhythm, influence expectancies and target identification performance in slow sequences. With rapid auditory sequences, people continue to respond to pattern relationships but evidence for the impact of pitch and time relationships on attending and expectancies is mixed.
Capture Capture is often cited as the signature example of stimulus-driven attending, although its claim of automaticity invites scrutiny. Issues pertinent to auditory capture can be addressed using exogenous sound cues in a cue-target paradigm, and in principle, using monitoring tasks by embedding a distinctive sound
Auditory Attentional Capture
203
singleton in a sequence of sounds. However, common criteria for automaticity of stimulus-driven attending seem more applicable to the former task than to sequence monitoring. That is, criteria such as the power of a stimulus cue or singleton to quickly over-ride voluntary control conferred by instructions/intentions, cue validity, and cognitive/memory load, can be easily adapted to the cue-target, whereas this is more challenging given sequence monitoring. In this section, I follow the organization of the preceding one, assessing first cue-target and then sequence monitoring tasks. To preview, relatively little research using auditory stimuli directly speaks to stimulus-driven attending and issues of attentional automaticity. The cue-target paradigm. The simplest question relating to auditory capture asks whether the location of a discrete sound cue can specifically "call" attention to its location either in the environment or in pitch space, regardless of its cue validity. Surely, the fact that alarm sounds are so ubiquitous across different cultures suggest something rather universal about the compelling effect of sudden or unusual environmental sounds. Intutively, such sounds certainly seem to grab our attention. But a sudden sound may only be surprising. And...a threatening sound has little survival value if it merely provokes surprise; in principle, such sounds afford information about location, distance or time-of-arrival of a threatening sound source. Such information can be picked up and used, if the sound waves reaching the listener automatically effect an orienting of attention characteristic of exogenous cuing. Spence and Driver (1994) addressed the issue of exogenous cueing. In Experiment 1 they found evidence for stimulus-driven attention by sound cues using the task of Figure l a. They told listeners to ignore an initial cue sound that was temporally surprising (i.e., tl was unpredictable) but which carried no informative validity about the target's location. The cue was followed by a target with IOIs in the range of 100 to 1,000 ms. The sudden onset of the exogenous sound cue indeed "pulled" the attending to the perceived location of this sound cue, thereby quickening responses to targets appearing there, especially at short IOIs. These findings meet certain automaticity criteria; first a temporally unpredictable exogenous cue over-rides instructions to ignore the cue, second, it operated in spite of low validity, and third its effects were immediate, i.e., confined to short IOIs. Related research suggests that the compelling effect of an exogenous sound cue is stronger as its distance from the target sound sources diminishes and its validity increases (Mondor & Zatorre, 1995; see see also Mondor, 1999; Mondor & Bregman, 1994; Mondor Zatorre, & Terrio, 1998). It is possible to limit the discussion of exogenous cuing to the automatic orientation of attending to a locale in space (as in the Spence & Driver study above). But given the focus of this chapter, let me comment on implications of time constraints used in these tasks. I suggest that spatial orientation of attending is only part of the story in cue-target designs. Because these designs incorporate a defined set of time intervals between an exogenous cue and target, they inevitably also invite expectancies about "when" a target will occur. So, is there evidence for such a
204
Riess Jones
claim? Admittedly, there is very little evidence, largely because the topic is rarely addressed. Consequently we find only suggestive evidence that time constraints may be important to understanding exogenous cueing. For example, Mondor (1999), using exogenous sound cues to orient attending in real space, manipulated IOIs to assess Inhibition of Return (IOR). He found that performance, specifically the locus of IOR, changed systematically depending on whether targets were temporally predictable or not. This sort of data is intriguing in suggesting that experiments designed to assess exogenous cues to spatial location may also inadvertently be introducing temporal expectancies associated with these cues. In this case, people are more likely to specifically anticipate "when" temporally predictable targets will happen than unpredictable ones. Not all exogenous cuing studies meet automaticity criteria. Nevertheless, a number do show that people's attention is somehow drawn to the region in 3D space of the sound cue. Similar effects emerge in another variant of the auditory cueing design where the pitch distance between cue and target sounds is varied. An exogenous sound cue, with a given tone frequency appears to draw attending to that region in pitch space hence facilitating responding to a target of similar pitch (e.g., Scharf, Quigley, Aoli, Peachey, & Reeves, 1984; Mondor & Bregman, 1994). However, we do not know how attention is allocated in pitch space: Is it shifted voluntarily or involuntarily? Available research offers no conclusive answers. Research on this topic has not directly assessed whether exogenous cues to a pitch region override either instructions or the effect of valid endogenous cues at brief lOis (as research with visual exogenous cues demonstrates). Nor have invalid exogenous sound cues been pitted against valid exogenous ones to determine whether the former overrides the latter in orienting attention to specific sound frequencies. In a few studies exogenous sound cues, when invalid, do appear to orient attending to the target's pitch space locale as indicated by a decline in performance with increased separation of cue and target in pitch space (cf. Greenberg & Larkin, 1968; Scharf et. al., 1984). But in others, cue similarity (tone frequency) and validity have been correlated. For example, Mondor and Bregman (1994) varied cue validity, with valid cues always identical to a particular target frequency. With the cue-target lois ranging from 550 to 1,600 ms, 3 listeners were faster and more accurate identifying target properties (e.g., tone duration) with valid cues at longer IOIs; this outcome pattern resembles that which is often found with endogenous cueing. Nevertheless, Mondor and Bregman suggest that these sounds exogenously guided attending to regions in pitch space, with attending distributed as a space-like gradient. [See also Jones (1976) who earlier hypothesized that people allocate attending to regions of space (real or pitch space) and of time.] To sum up, it seems clear that unexpected sounds do more than simply surprise people. Exogenous cues can serve a locative function in facilitating attentional orientation to locations in real or pitch space related to a cue sound. Nevertheless, several questions remain. One concerns how attentional allocation is accomplished: Is exogenous cueing truly automatic? Another concerns the degree to which timing constraints modulate observed effects of exogenous cuing.
Auditory Attentional Capture
205
Sequence monitoring tasks. I conclude Part I by considering whether something akin to capture by sound singletons occurs in auditory sequence monitoring. Theoretically, to emulate capture paradigms in sequence monitoring, listeners must be required to respond to a feature of a target sound which is coincident with a singleton ...only sometimes. The idea is to discover if the singleton facilitates feature identification, when target feature and singleton cooccur, and inhibits target identification when they do not co-occur due to the power of the singleton to "call" attending to itself and its serial location in a sequence. Moreover, to insure that any observation of capture is determined entirely by an automatic "pull" levied by the singleton and not contingent on the task, the singleton feature should be irrelevant to the task goal and it should be neither a defining nor reported feature. In practice, very rarely are these guidelines applied in the research reviewed. One reason that few monitoring studies have addressed capture and automaticity, as such, is that other issues have dominated this field. Because sequences are foundations of communication patterns, common questions have concemed how people perceive, attend to and make sense of the sequences themselves. In this context, it is not surprising to learn from sequence monitoring studies that people are more likely to notice elements that "stand-out" relationally from surrounding elements in various ways (e.g., increasing pitch or time difference, etc.; Bregman, 1990; Miller & Heise, 1950; Heise & Miller, 1951; van Noorden, 1975; Woods et al., 2001). Relatedly, distinctive tones are especially important in musical sequences where their attention-getting potential dignifies them as "accents;"accents are often distributed strategically in time by composers and performers with the goal of manipulating "when" listeners should attend (e.g., Jones, 1987). Generally, the degree to which any sound element grabs attention depends on its relationships to surrounding tones; if these are all similar to one another, then a distinctive singleton will seem still more prominent and attention-getting. Conversely, the more similar a singleton becomes to other items in a well-formed group the less accurately is it judged (Bregman, 1990; Bregman & Rudnicky, 1978; Divenyi & Hirsh, 1978; Jones, Kidd, & Wetzel, 1981; Jones & Yee, 1993; Mondor, Zattorre, & Terrio, 1998; Watson, Kelly & Wroton, 1976; van Noorden, 1975). Furthermore, the salience of a cue may depend on sequence rate; for instance, at fast rates, singletons based on frequency differences are more salient cues than those associated with sound source location (left versus right ear), but the reverse obtains at slow rates (Woods et al., 2001). One interpretation of all of this is that the attention-getting potential of a single tone within a larger serial context depends on the way a listener responds to the context. If, in listening to an unfolding sequence, the singleton "fits" together relationally with surrounding tones and/or confirms an expectancy about the pattem structure, then that tone will be less likely to be noticed as a separate object and more likely to be integrated into the ongoing sequence. Recently, direct attempts have been made to emulate visual capture in auditory sequence monitoring. Listeners monitor a sequence for a given feature (duration, rise time, intensity, etc.) that may or may not coincide with a distinctive
206
Riess Jones
singleton (Woods, Alho, Algazi, 1994; Woods et al., 2001). For example, Mondor and Terrio (1998) asked untrained listeners to respond to an irrelevant target feature in regularly timed (isochronous) tone sequences forming either rising or falling pitch trajectories. The singleton either departed from a trajectory in pitch (near or far in pitch space) or fell on the trajectory (null pitch change). A to-be-identified target feature (duration, intensity etc.) always coincided with the singleton. Listeners were best with very distinctive pitch singletons (far) and worst when the singleton tone "fit into" a pitch trajectory (null pitch change). That distinctive singletons facilitate target identification suggests that they may capture attention. At the same time, because performance was poorest for the target on the pattem-directed trajectory, Mondor and Terrio concluded that pattern-directed expectancies were not present. Nevertheless, it is interesting that in a subsequent study, they found that irregularities within a pitch sequence weakened the capture-like effect, thus underscoring the importance of pattern structure. It remains possible that capturelike phenomena are somehow contingent upon listeners' use of pattern regularities. Although currently the contingent capture idea suggested here differs in important respects from that proposed by Folk and his colleagues for visual capture, greater convergence may emerge over time. I would be remiss if I ignored the attentional blink (AB) task in a discussion about the impact of singletons planted in fast sequences (Raymond & Shapiro, 1992; Shapiro, Raymond, & Amell, 1994). Only a few auditory AB studies exist; all employ random sequences of sounded digits or letters that are conveyed in a regular time pattem (Amell & Jolicoeur, 1997; Chun & Potter, 1995). Unlike other sequence monitoring research, issues of pattern relationships have not been central to explaining the common AB finding that a distinctive singleton (target) briefly interferes with identification of a subsequent element (probe). Nevertheless, I speculate that a case might be made that the time pattem induces a regular attentional pace and this pace is somehow briefly disrupted (the blink) by the target in these sequences. In this interpretation, a kind of capture is initiated by the singleton. Moreover, because the blink has been shown to emerge only when people are instructed to explicitly monitor for the target, one may infer that such capture is contingent on attention set and task relevance; I propose that the blink may also be contingent on sequence rate and rhythm. Finally, given that abrupt onsets have been central to the debate over visual capture of attention, it is surprising that no comparable published work exists with abrupt onset singletons in auditory sequences. I suspect this is due to format differences between spatial search and temporal monitoring because in sequences all onsets are, in one sense, abrupt. Following my earlier claim that abrupt onsets may operate by virtue of their relative (not absolute) time properties, it is possible that the most attention-getting onsets in auditory sequences are ones which deviate from a temporal regularity implied by other sound onsets. Certainly larger temporal violations of an ongoing time pattem are more noticeable than smaller ones (e.g., Jones & Yee, 1998; Large & Jones, 1999). Of course, in these time judgment studies, time was task relevant; it was both the defining and reported feature.
Auditory Attentional Capture
207
Accordingly, we cannot conclude that evidence for noticeability of a time change represents stimulus-driven capture by an unexpected time change if people are already set to attend to timing. Recently, however, Ralph Barnes (in our lab) was able to demonstrate capture by tones with unexpected timings in isochronous sequences; temporally deviant tones were better identified in pitch where pitch (not time) was both a defining and reported dimension. Summary of research on auditory capture. In both cue-target and sequence monitoring tasks, evidence is less clear-cut regarding stimulus-driven attending than for expectancies based on sound stimuli. Nevertheless, some findings indicate that a sudden, invalid, cue sound "calls" attention to locations in real and (possibly) pitch space. Other research suggests that in sequence monitoring a version of attentional capture may obtain, but it is contingent on listeners use of sequence structure.
Summary of part I A recurrent theme of this, admittedly selective, review of attending to auditory events involves the role of stimulus timing. This theme emerges in my interpretations of data arising from both cue-target and sequence monitoring tasks. It is justified, I think, because auditory events, unlike visual objects, are preeminently temporal. In light of this, it is rather astonishing that we know less about the orienting of auditory attending in time than we do about its orienting in real space and pitch space. Accordingly, in my summary I return to two questions implicit in my discussions of attending in time. The first question simply asks whether people can allocate attending in time. Evidence from both visual and auditory domains indicates that they can. Temporal expectancies in cue-target designs have been manipulated mainly via valid endogenous cues and in sequence monitoring tasks by assessing effects of sequence rate and rhythm on judgments about "when" future sounds will occur. My bias is that the basis for these expectancies rests in the time structure of a task and its stimuli; thus, we will find that consistency of onset times and IOIs in certain cuetarget designs facilitates expectancies about "when" a target may occur; similar temporal expectancy effects arise from the regularities of tone onsets and recurrent IOIs in stimulus sequences. It is conceivable that people use consistent time relationships in either short (cue-target) or long (monitoring) sequences to pace attending and that a temporal expectancy represents an extrapolation of this pace. But, one lesson to be drawn from this review is that relatively little current research has directly addressed this topic. I believe that this is largely because time (e.g., as in IOIs) is not usually conceived as part of the structure of a task or a stimulus; it is rarely considered a potentially relevant aspect of an exogenous or endogenous "cue" itself. Although time has certainly been an important variable in the reviewed research, its manipulation often reflects the view that time is a void that can be filled with various processing activities (e.g., rehearsal, decay, etc.). I suggest that this view of time limits its interpretation as part of the stimulus structure that people use to attend.
208
Riess Jones
The second question concerns capture. What is capture in the auditory domain and is it contingent on aspects of the task, including task goals? Capture has received far less study in auditory than in visual events. Yet, sudden sounds indeed can orient attending in exogenous cueing designs; and distinctive singleton elements appear to attract attention within auditory sequences. However, because it is the "out-of-context" sounds in sequences that tend to grab attention, my inclination is to view capture as an expression of a listener's response to a violation of structural relationships that characterize a task or stimulus sequence. Capture-like performance in response to relationally distinct singletons has been demonstrated in monitoring of long sequences where it appears contingent on people's use of pitch relationships among surrounding sequence tones. Does this mean that capture reflects a listener's response to an expectancy violation? This question is more speculative and pertinent research is limited. Finally, it is odd that debates over abrupt onsets as a possible source of automatic control of attending find no counterpart in auditory sequence monitoring. Is this because all onsets are abrupt in sequences. If the singleton quality of an abrupt onset is removed, does this reduce its salience? Or is there a role for abrupt onsets in attending to auditory sequences? I address such questions in the next section.
II. Dynamics of Attending to Auditory Sequences In this section, I develop the idea of dynamic attending as a form of patterndirected attending based on stimulus time relationships. I attempt to show how dynamic attending explains auditory expectancies and temporal capture in slow sequence monitoring tasks.
Temporal expectancies: If people can allocate attention in time, how do they do this? The clearest evidence that people can allocate attending to regions in time comes from two sources. As indicated in Part I, over the course of a session, valid visual cues that are probabilistically paired with time intervals effectively elicit expectancies about the future onset times of a target. It is possible to argue that this dynamic attending is voluntary and based on the kind of learning that enables us to use clock codes to plan for future events. The second source for temporal targeting of attending comes from findings that people show temporal expectancies in their monitoring of slow sequences. In this case, dynamic attending is more directly influenced by stimulus time relationships and possibly this attending is more primitive. As suggested earlier, curiously little research exists in the auditory domain on orienting attending in time. For instance, is it possible to exogenously "cue" a time interval? I suspect this is difficult for many of us to imagine. But it is not unreasonable to propose that such a cue would simply be another time interval. Although I have suggested that IOIs indeed play a role in understanding auditory
Auditory Attentional Capture
209
attention, this role remains to be examined and the applicability of terminology such as exogenous/endogenous is unsettled; in fact, it is not clear that IOIs should even be considered "cues" because they participate in time pattems. But if forced into such dichotomies, then an interesting candidate for an exogenous time cue certainly might be the repeated IOIs that form a rhythm. This analogy tends to misconstrue both rhythm as a time pattem and dynamic attending as it relates to rhythm. Nevertheless, because the cue terminology is so familiar, perhaps this flawed analogy will facilitate understanding forthcoming experiments that raise the possibility of stimulus-driven aspects of attending and expectancies. In this part of the chapter my concern is with demonstrating how the idea of dynamic attending can be used to explain temporal expectancies that are influenced by temporal aspects of slow auditory sequences. Therefore, I do not focus upon how a temporal expectancy is acquired in response to discrete symbolic cues, although this is an important problem. Instead, I consider ways in which certain stimulus properties in sequences facilitate allocation of attending in time; mainly these properties relate to onsets of single sounds and IOIs between successive sounds within sequences (although I do not rule out the impact of recurrent time intervals throughout a session). Key to understanding this approach is the principle of synchrony (Jones, 1976). The idea is simple: Attending, lets say, one's attentional focus, must be selectively synchronized with a to-be-attended object to insure accurate judgments about that object. My colleagues and I have proposed that synchrony is achieved in one or both of two ways, by: 1. Anticipatory attending which is directed in advance of the onset of a tone; and/or 2. Reactive attending which entails a quick re-direction of an attentional focus in time to a tone following its onset (Jones, Moynihan, MacKenzie & Puente, in press; Barnes & Jones, 2000). Anticipatory attending is pattem-directed; it realizes a stimulus-controlled pace based on pattem timing. In this view, an expectancy is an extrapolation of such an attentional pace. Reactive attending refers to people's rapid responses to element onsets that are not correctly anticipated, i.e., expected. Reactive attending is stimulus-driven by abrupt onsets, with strong parallels to conventional capture in visual attention; I remm to this point shortly. What is unusual about this argument is the claim that anticipatory attending can also be stimulus-driven. As I will show, this is because it is responsive to pattems of stimulus onsets and stimulus sequence IOIs; in this regard, we find a new role for abrupt onsets. Together, both anticipatory and reactive attending occur in response to stimulus-sequences and they effectively pace attending. Let me clarify anticipatory attending in order to justify the controversial claim that expectancies can be stimulus driven. First, I assume that, locally, an abrupt onset captures attending by time-locking an individual's attentional focus (Jones, 1976; Posner, 1980). Sequences, by definition, comprise many onsets and while the abrupt onset is no longer a singleton within a sequence, it nevertheless retains its abrupt quality in conveying a local change from silence to sound; in auditory sequences such onsets nevertheless remain quite salient (Vos & Ellerman, 1989). Furthermore, if each onset in a tone sequence commandeers a brief reactive
210
Riess Jones
attentional shift, then together, a series of attentional shifts occurs, with each attention shift lasting T ms. I claim that T comes to approximate sequence IOIs ( T o IOI). Thus, attentional pace, first discussed as a feature of flexible attentional search in uncued visual search, returns in sequence monitoring where it is much more constrained by the temporal array. Furthermore, a new role emerges for abrupt onsets: They summon attention to particular points in time thereby outlining lOis that ultimately determine T. Clearly, anticipatory attending depends, in part, on abrupt onsets. Effectively, anticipatory attending builds on the local capture of attending by successive onsets. But, it also depends on stimulus lois which participate in sequence rate and rhythm. The average IOI of a sequence determines its rate, and when all IOIs are equal the sequence has a very simple and regular (isochronous) rhythm. I suggest that these stimulus properties come to control the pace of attending in an anticipatory manner. Moreover, if the same or relationally congruent IOIs occur over several onsets, then sequence rate determines attentional pace, thus limiting attentional flexibility. Instead, I propose that a periodic activity is engaged that synchronizes attending with tone onsets and matches T to pattern lOis. Extrapolation of this induced attending activity means that it may persist over time. In this manner, stimulus-driven temporal expectancies reflect specific anticipatory targeting of an attentional focus in time. Now consider reactive attending. This provides a more elementary way of pacing attending which relies largely on fast reflexive attentional shifts to prior tones. Whereas anticipatory attending is dependent on the oscillator period and stimulus rate, reactive attending is not. It involves a fast response to onsets and so may be more evident in fast sequences than anticipatory attending. Not only this; it is also more likely to occur in irregular than in regular rhythms. Nevertheless, in this view both anticipatory and reactive attending are largely stimulus driven.
Figure 2a. An entrainment model responding to a series of tone onsets (black bars) by changing its period (peak to subsequent peak) to match inter-onset-time intervals (IOIs) and aligning its phase (expected - observed onset times) to the stimulus rhythm. An expected point in time is given by a pulse peak.
Auditory Attentional Capture
211
A Predicted Expectancy Profile 0.8
Proportion Correct
0.5 Very Early Early
OnTime
Late
Very Late
Critical IOI Figure 2b. A predicted expectancy profile in which proportion correct (PC) judgments about tones is shown as a function of the onset time of a final tone, (critical IOI), relative to an induction rhythm. The profile 'recovers' the shape of an attention pulse from the entrainment model (top panel).
Recently Ed Large and I outlined a formal model of dynamic attending (developed by Large) that incorporates both anticipatory and reactive attending (Large & Jones, 1999). Figure 2a illustrates some properties of this approach, which describes attending as the entrainment of one or more intemal oscillations (limit cycle oscillators). 4 Entrainment refers to a real-time 'locking' of attending to certain properties of an unfolding stimulus (Jones, 1976). In this model, Large proposed that a given oscillator carries a pulse of attentional energy, where the pulse expresses an attentional focus in time. In other words, pulse width represents the width of an attentional focus over a region in time; it covers a temporal region of heightened attending surrounding an expected onset time. Basically, the oscillator entrains to timing patterns by generating, quasiperiodically, a pulse of attentional energy, as suggested in Figure 2a. When entrained to an isochronous rhythm, an oscillator temporally targets an attentional focus (pulse) over a span of time equal to sequence IOIs. If we further assume that accuracy in identifying an element (tone) within a sequence increases with attending energy associated with that tone in pitch space and time, then tones that fall at temporally expected times in the future will enjoy a greater concentration of attention and will be judged more accurately than ones that happen at unexpected times. Temporal capture: expectancies?
How
does
it
relate
to
stimulus-driven
temporal
A temporal expectancy is associated with a shift of attention in time that realizes anticipatory attending. Expectancies are affected by stimulus time properties, such as sequence (and session) IOIs; in other words, they are strongly
212
Riess Jones
affected by sequence rate. Furthermore, because anticipatory attending relies on discrete stimulus onsets, as well as IOIs, in a real sense it depends upon reactive attending. Thus, in various ways, the two kinds of attending are inter-dependent. In part for these reasons, I postpone discussion of whether we can confidently describe anticipatory attending, and hence expectancies, as strictly voluntary and reactive attending (which underlies capture) as strictly automatic and involuntary. For the present, I merely stipulate that, respectively different stimulus timing properties contribute to determining these two aspects of entrainment. If expectancies can be, in a sense, stimulus driven, where does capture fit into this picture? In the entrainment model, capture by which I will mean temporal capture (Barnes & Jones, 2000), is related to reactive attending. This is modeled as phase entrainment of oscillatory attending. Reactive attending is determined mainly by a phase parameter; this parameter describes a time-locking function of attending to an abrupt onset. Although the phase parameter governs reflexive shifts of attention in time to tone onsets, each shift is contingent on the degree to which an onset time deviates from an expected time. In other words, reactive attending is contingent on pattern-directed attending in time. Thus, if a listener correctly anticipates the time of a tone's onset, then expected and observed times are identical and no adaptive phase shift will occur. Nor does reactive attending appear in this case. Adaptive shifts occur only when anticipatory attending incorrectly targets attending in time. In the latter, the phase parameter modulates alignment of an attentional pulse to the onset time of a tone; this is phase entrainment. It governs the extent to which an attentional pulse is pulled to each stimulus onset in time, reducing the temporal difference between observed and expected time, thereby achieving attentional synchrony. Reactive attending, then, parallels in time the more familiar version of capture in space often found with abrupt visual onsets. However, in temporal events, we refer to this type of capture as temporal capture. Temporal capture is less evident as attentional focus narrows. In this model, period entrainment occurs along with phase entrainment. An oscillator's period comes to match IOIs within a stimulus pattern. This gravitation of the oscillator's period to the average IOI (sequence rate) of an auditory pattern is a slower adaptive process than phase entrainment. It is determined by the value of a period parameter which governs how rapidly an oscillator adapts its period in response to changes in stimulus IOIs. Adaptability means, for instance, if an auditory pattern begins at one rate, then speeds up (a common happening), within limits the oscillator will keep pace with it. In real time, the oscillator always attempts to align attentional focus with stimulus onsets and to match its period with the average IOI of a sequence. In sum, in Large's model an attending oscillator adapts to shifts in the rate of a pattern by changing its pace accordingly. This dynamic version of pattern-directed attending, carries two implications for understanding temporal capture. The first, we have already mentioned: All tone onsets have a potential for capturing attention via phase entrainment. The second implication draws this model closer to debates on capture because it qualifies the
Auditory Attentional Capture
213
condition of temporal capture in terms of attentional focus. That is, attentional focus width changes over time in response to the ongoing time pattern and capture is more or less likely depending on focus width. Specifically, attentional focus in time widens in irregular time pattems and narrows in regular ones. With a narrow focus, attention targeting in time is more precise than with a wide focus; thus, expectancies are more specific to certain temporal regions with regular sequences. More attending energy is allocated to expected than to unexpected onset times. Accordingly, with a narrow focus, the model predicts that accuracy of identifying tones will be better when they arrive at expected rather than unexpected times. Indeed, people may even fail to notice extremely unexpected tones, given a very narrow attentional focus, implying that temporal capture is less likely with a narrow focus. By contrast, irregular time patterns induce a wide attentional focus; expectancies are less specific, ranging over a broader region in time. In irregularly timed sequences, attending becomes more variable and attentional pulses are erratically targeted in time. In these cases, listeners have a better chance of detecting tone onsets that are very unexpected than with a narrow attention focus and hence are more susceptible to temporal capture. In short, temporal capture by abrupt onsets is contingent on the width of an attentional focus in time.
III.
Evidence for Dynamic Attending to Slow Auditory Sequences.
In this section, I describe experiments concerned with stimulus timing and dynamic attending 5. First, I show that slow auditory sequences, simply by virtue of recurrent IOIs and tone onsets may pace anticipatory attending and effect a directed allocation of attending in time. Second, I describe experiments that suggest a means of allocating attending in time based on sequence rate and rhythm. Finally, I show that when the temporal regularity is removed from an inducing pattern, effects of temporal targeting of attending vanish. All of the research I report uses listeners with minimal musical training who receive relatively little practice in the task. The task requires that they judge the pitch of a comparison tone relative to a preceding standard tone. Our strategy was to render the onset of a comparison tone either temporally expected or unexpected by introducing an interpolated rhythmic sequence between the standard and comparison tones to assess its effects on comparison judgments. In some respects this represents a reversal of a task used by Mondor and Bregman (1994) in which people judged the duration of a tone cued by the frequency of another tone. We required that people judge the pitch of a tone whose onset time is cued by a series of stimulus IOIs. Although our task presents auditory sequences, thereby qualifying as a sequence monitoring task, in our version of this task the comparison (i.e., target) tone always follows the sequence and listeners are actually told to ignore the interpolated sequence. Following Yantis and Egeth (1999), in this task pitch is both the defining and reported dimension; the frequency of a to-be-attended tone (comparison) is systematically varied and people must report its relative pitch.
214
Riess Jones
Our aim is to manipulate relative timing in a task where time is putatively irrelevant and probabilistically uninformative. In addition, we sought to discover if sequence timing might over-ride instructions to ignore the distractor tones.
The task and general methodology The task adapts an old procedure, the interpolated tones task (Deustch, 1999, for a review). It is shown in Figure lb. A standard tone of 150 ms is followed by eight, 60 ms, interpolated tones (randomly re-arranged on each trial). All tones (recorder voice) and sequences were generated via customized software (MIDILAB 6.0; Todd, Boltz, & Jones, 1989) using a Yamaha TG 100 tone generator interfaced with a pentium PC. Sound sequences were delivered over Beyerdynamic DT770 headphones at a comfortable listening level. All together we used six different standard pitches, 415Hz (Ab4), 440 Hz (A4), 466 Hz (Bb4), 622 Hz (E b 5), 659 Hz (E5), 698 Hz (F5), each associated with three different comparison pitches (+ 1,-1, 0 semitone, ST, differences). Interpolated tones randomly varied within three semitones (544.4 Hz to 789 Hz) centered on 659 Hz, if the standard was between 415 and 466 Hz; they varied between 370 to 523.3 Hz, centering on 440 Hz if the standard was between 622 Hz and 698 Hz. Many different interpolated pitches and arrangements of pitches were employed (one constraint, described below, relates to the final interpolated tone). The listener's task was to judge the pitch of a comparison (Same, Higher, Lower) relative to the standard. In adapting this task to study dynamic attending we introduced three modifications. First, in our initial experiments we fixed all IOIs of interpolated tones to the same 600 ms value to create a time pattern with a regular rhythm and varied the relative onset time of the comparison tone to render it either temporally expected or unexpected given this rhythm. The final IOI, which immediatedly preceded the comparison tone, termed the critical IOI, assumed five different values rendering a comparison either Very Early, Early, On Time (600 ms IOI), Late, or Very Late. A second modification built upon Deutsch's finding that a single repetition of the standard pitch within the interpolated sequence boosts overall accuracy in this rather difficult task (Deustch, 1972). Therefore, we included one repetition of the standard in our interpolated sequence, constraining it to be the final one. This had two advantages. In addition to rendering the task less difficult, it prevented spurious frequency cuing associated with the final interpolated tone, and controlled biased responding based on whether the final interpolated tone was higher or lower than the comparison, evident in many pilot studies. 6 The third modification involved instructions. From prior research (our own and others) we know that people do well in this task in the absence of interpolated tones. Therefore, we asked participants to "Ignore all intervening tones." They were told (validly) that this would help their performance in the task. However, our motives were not entirely altruistic. One criterion of automaticity of stimulus-driven attending involves ascertaining whether a task irrelevant stimulus property cannot be ignored, in spite of instructions to do so (Yantis & Egeth, 1999). In principle
Auditory Attentional Capture
215
timing is irrelevant in this task because listeners must judge only pitch. Moreover, to the degree they succeed in "tuning out" these distracting tones, they will be more accurate and less likely to show influences of sequence time structure. But, if people involuntarily respond to these timing patterns, then they should be unable to comply with instructions.
Experiment 1" Allocation of attending to expected regions in time The main independent variable in Experiment 1 was the critical IOI, which assumed five levels, one of which matched sequence rate (IOI). The dynamic attending model (part II) suggests sequence rate and rhythm systematically attending and determine accuracy of pitch judgments as function of the critical IOI. Specifically, if people extrapolate the pace induced by the auditory sequence, then they will most likely target attending to a comparison consistent with that rate. This predicts an expectancy profile as a function of critical IOIs where the expected time corresponds to the critical IOI identical to sequence IOIs. Figure 2b suggests such a profile; proportion of correct pitch judgments, PC, about a comparison tone is best for the temporally expected comparison and worst for very unexpected ones, thus recovering the shape of an attentional pulse (Figure 2a). In light of discussions in Part I, a broad expectancy region is outlined by the range of critical IOIs in this task. At the same time, the model of Part II suggests that a regular sequence rhythm can focus attention more narrowly within this region. On each trial entrainment to tone onsets (phase entrainment) and to IOIs (period entrainment) paces attending to sequence IOIs. This leads to the prediction of a narrow attentional focus and specific temporal expectancies, weighted in favor of the critical IOI of 600 ms. Accordingly, attentional energy should be maximal at the expected time point and symmetrically drop as critical IOIs depart from the expected one. Methodological Details. A full description of our methodology appears elsewhere (Jones et al., in press). Twenty-one participants, all with little musical training, served for 45 minutes in Experiment 1. They received 180 trials with five levels of critical IOI occurring equally often (but randomly). Critical IOI values were 524, 579, 600, 621, and 674 ms (ranging, respectively, from Very Early to Very Late). Participants also received a post-session questionnaire that queried listeners on compliance with instructions (among other things). Results and discussion. Figure 3 presents the results of Experiment l where mean PC is a function of comparison onset time (five critical IOIs). On average, listeners were best in judging pitch when the comparison tone sounded at an expected time (On Time) and worst in the two very unexpected conditions (Very Early and Very Late), yielding a main effect of timing (critical IOI), F(1,80) = 3.79, Mse = .012, p = .007. The quadratic trend traced out by the observed expectancy profile was also significant, F (1,20) = 9.27, Mse - .005, p = .006.
216
Riess Jones a.
b.
Experiment 1
Experiment 2
0.8 0.75 0.7 0.65
l/
/
u I/
. .
I In
.
/////
.
_ 0.6
0.5
I/ Very Early
I II Early
On Time
Late
Very Late
//l Next Beat
V e r y Early Early
// On Time
Late
Very Late
Onset Time of Comparison IOI(ms)
524
579
600
621
676
1124 1179 1200
1221 1276
Figure 3. Observed expectancy profiles of PC from Experiments 1 and 2 as function of critical IOIs.
These findings suggest that the :'hythaa of an interpolated tone sequence significantly affects subsequent pitch judgments. Although sequence timing is, in principle, irrelevant in this task, people apparently tacitly "used" consistent stimulus time relationships to allocate attention in time. The observed expectancy profile emerges, reinforcing the hypothesis that stimulus timing contributes to expectancies. These were relatively slow sequences, best suited to revealing anticipatory attending stimulated by the pattern's time structure. Automatic Gestalt grouping processes are probably not involved because, according to Bregman (1990), they operate at faster rates; furthermore, grouping principles do not predict anticipatory attending. For instance, if the Gestalt rule of temporal proximity were operational, then the comparison tone most likely to group with interpolated tones is given by a critical IOI of 524. In other words, the resultant temporal grouping due to proximity predicts a linear, not a quadratic, PC trend with lowest scores for a critical IOI of 524 ms, due to maximal interference (from grouping), and highest for the critical IOI of 676 ms.
Experiment 2" How do people allocate attending in time? The entrainment model proposes that the regular timing of interpolated tones induces an attentional periodicity. If so, then we anticipate that this oscillation of attentional energy should persist for at least a few (IOI) cycles before dying out. In musical terms: " the beat goes on .... " To test this we inserted a "missing beat,"
Auditory Attentional Capture
217
namely a lengthened silence equal to two sequence IOIs, between the last interpolated tone and the onset of the expected comparison tone. Thus, the On Time comparison IOI had a critical IOI of 1,200 ms instead of 600 ms IOI as in Experiment 1. The dynamic attending hypothesis continues to predict a quadratic PC profile in this case because stimulus rhythm should induce a periodic entrainment process where attending oscillations persist, cyclically, over lengthened silent time intervals before they die out. The temporal separation of a comparison tone from the interpolated sequence also permits us to assess several predictions. One concerns instructions; others concern the role of absolute time. With respect to instructions, it is possible that people in Experiment 1 had difficulty following instructions to ignore the interpolated tones simply because they do not distinguish the comparison from interpolated tones. In Experiment 2, a clear temporal segregation of the comparison from interpolated tones is given. If this segregation improves listeners ability to comply with instructions to ignore the interpolated tones, then we should observe much better performance in Experiment 2 than in Experiment 1 and the quadratic profile associated with the interpolated rhythm should vanish. Absolute time refers to the length of a critical IOI on an interval scale. People may be responding to time in an absolute (linear) fashion, rather than in a relative (periodic) fashion. If so, then the absolute time feature of a critical IOI may form the basis either for Gestalt grouping, by time interval similarity, or for a retention interval, leading to forgetting. In either case, a linear not a quadratic trend over critical lOis is predicted. For example, if the Gestalt rule of similarity applies to these time intervals; then on an interval scale, similarity is greatest between interpolated lois (600) and the Very Early critical IOI; it is lowest with the Very Late critical IOI. Because grouping tends to increase errors, this leads to the prediction of a linear trend with poorest performance for maximum grouping. 7 Finally, if memory loss due to decay or interference during a retention interval is operative, then we expect poorer performance in Experiment 2 than Experiment 1 and a monotonic decline over time (absolute critical IOI), with greatest accuracy for the shortest critical IOI. Methodological Details. In Experiment 2 we used 19 participants in a task identical to that of Experiment 1 with the exception that we added 600 ms to all levels of the critical IOI. Relative to the last tone in the interpolated sequence, these were Unexpected Very Early (1,124 ms), Unexpected Early (1,179 ms), Expected On Time (1,200 ms), Unexpected Late (1,221ms), Unexpected Very Late (1,267 ms). In this experiment, although the critical lOis were enlarged, we continued to use the same magnitude of time deviations from an expected On Time of 1,200 ms in Experiment 2; this means that expectancy violations associated with very unexpected onsets were proportionally smaller (.06). Results and discussion. Results of Experiment 2 appear in Figure 3 along with those of Experiment 1. Overall, accuracy was slightly, but not significantly, higher in Experiment 2 than in Experiment 1. But again we found that people were significantly more accurate in judging the pitch of comparisons that occur at the
218
Riess Jones
expected time (On Time comparisons), F (4,72) = 2.51, Mse = .009, p < .05. This suggests that the rhythmic expectancy established by interpolated timing persists through a missing beat. The quadratic function, although weaker than in Experiment 1, was significant due to relatively good performance with the On Time comparison, F(1,18) = 7.18, Mse = .003, p=.013 Our findings are more compatible with a dynamic attending hypothesis than with the hypothesis that a time gap facilitates ignoring interpolated tones. They are also not compatible with absolute time explanations, based either on Gestalt rules or on memory loss. The latter two accounts predict linear, not quadratic, functions over time. Instead, the observed quadratic trend suggests that temporal expectancies are carried by a persisting periodic process induced by stimulus rhythm. This interpretation is supported by evidence that the peak accuracy occurred when the critical IOI equaled two interpolated IOIs. If the attentional shift of an oscillator, entrained to the stimulus rhythm, corresponds to a period ofT = IOI (from pulse peak to pulse peak), then when this oscillator extrapolates the internalized stimulus rhythm, two periodic attending shifts will require a total time of 2T = 2IOIs.
Experiment 3: Larger Expectancy violations In Experiment 3 we enlisted new participants and assigned some to an experimental condition, in which they received the same interpolated rhythmic sequence as in Experiment 2, but with larger expectancy violations. Other participants were assigned to a control condition, in which they received no interpolated rhythm between the standard and comparison tones. Two alternative hypotheses address possible differences between experimental and control groups. First, if the quadratic trends observed in earlier experiments result from range or midpoint effects conferred merely by critical IOIs then both groups in Experiment 3 should exhibit the same quadratic PC trend as a function of critical IOIs. If listeners are responding only to the set of critical IOIs encountered in a session, and not to the rhythm of interpolated sequences, then we should not be able to reject a null hypothesis that control and experimental groups do not differ. On the other hand, if rhythm has a special role in determining this expectancy profile, then we should find a quadratic trend in the performance of the experimental but not the control group. Methodological Details. We recruited 13 na'fve participants for the experimental condition. For this condition, we duplicated Experiment 2 sequences except that the very unexpected time changes were increased to render proportionally larger (.09) changes, given the longer critical IOIs. The four unexpected comparison times were: Unexpected Very Early (-115 ms), Unexpected Early (-15 ms), and Unexpected Late (+15 ms), Unexpected Very Late (+115 ms), all relative to 1,200 ms. The very unexpected deviations exceed those of Experiment 2 in absolute magnitude (76 ms vs 115 ms) whereas the other two are slightly smaller in magnitude (21 ms vs 15 ms). The Expected On Time, comparison
Auditory Attentional Capture
219
remained identical in both Experiments 2 and 3 (0 ms deviation relative to 1,200 ms IOI). In the control condition, 16 other naive listeners received the same set of five critical IOIs as in the experimental condition. But on all trials, all interpolated tones but the final one were eliminated. Participants in this condition received the standard tone, a silence of 4,800 ms, then the single (final) interpolated tone followed, equally often, by one of the five critical IOIs which was always terminated by the comparison tone onset. Results and discussion. Results of Experiment 3 indicated that performance of the control listeners differed significantly from those of experimental listeners, both overall, F(1,27) = 20.17, Mse = .368, p = .0001, and in terms of an interaction with time level, F (4,108) = 10.16, Mse = .006, p < .0001. The control group performed very well (PC > .95) at all five critical IOIs evidencing a flat expectancy profile. The experiment group performed less well (mean PC .67) and evidenced a significant quadratic trend over the time levels, F(1,12) 23.27, Mse =.002, p = .004. Figure 4a presents the mean PC scores for this group as a function of time level. The observed expectancy profile is significantly sharper than observed in Experiment 2, reflected in a significant difference in the two quadratic trends, F (1,30) = 7.30, Mse = .004, p = .011. This suggests that the proportionately larger expectancy violations in Experiment 3 indeed produced expectancy levels that were more difficult to cooe with.
b.
a. Experiment 3 0.8
0.8
0.7
0.7
0.6
0.6
0.5
Experiment 4
0.5 Very Early Early OnTime Late Very Late
Critical IOI
Figure 4.
Very Early Early OnTime Late Very Late
Critical I 0 I
Observed expectancy profiles of PC from Experiments 3 and 4 as a function of
critical IOIs.
The findings of Experiment 3 indicate that people in the experimental condition did not succeeed in "tuning out" the interpolated rhythm as instructed. Rather their performance indicates that with a lengthened critical IOI (missing beat) attending persists in a periodic fashion over time before it fades.
220
Riess Jones
Experiment 4: Attentional focus, stimulus timing and attentional capture In this final experiment, we consider other ways in which stimulus timing may exert control over moment-to-moment attending. Preceding experiments were designed to determine whether anticipatory attending is induced by regular stimulus IOIs; in these, we hypothesized that attending would involve a relatively narrow attentional focus. Our dynamic attending model assumes that variability of stimulus timing, i.e., rhythmic irregularity, widens the focus. In this case, we might find a greater role for reactive attending to abrupt onsets. The idea is that different IOIs in an irregular rhythm (or an experimental session) can force a widening of listeners' attentional focus. In turn, more incorrect temporal expectancies will appear, providing greater opportunity for reactive attending (see e.g., Large & Jones, 1999). Accordingly, in Experiment 4 we increased the IOI variability of interpolated sequences to produce irregular rhythms without changing average rate (mean IOI = 600 ms). Our model predicts that resulting expectancy profiles will be symmetrical around the mean IOI but flatter than observed with regular rhythms. Methodological Details. We recruited 11 na'fve listeners as before. Methodologically, all aspects of Experiment 4 were identical to those of the experimental condition of Experiment 3 except that sequences were temporally irregular. The following properties characterized the context IOIs that were randomly arranged: 1. Mean IOI of interpolated sequences remained 600 ms; 2. Standard deviation of IOI was 21 lms for '/2 the sequences and 249 ms for the others; 3. The IOIs ranged from 200 ms to 850 ms; 4. First and last context IOIs remained 600 ms; 5. The On Time comparison IOI remained 1,200 ms and the four unexpected comparison IOIs were identical to those of Experiment 3. Results and Discussion. Results appear in Figure 4b. The mean PC scores do not follow a quadratic trend over critical IOI levels (this trend component is not significant); instead the expectancy profile is flat with an overall average PC of .71. A comparison of performance with regular (Experiment 3, experimental group) and irregular (Experiment 4) rhythms indicated that although the two conditions did not produce significant differences in overall accuracy, a significant interaction of context timing (regular vs irregular) with comparison timing (five levels of comparison IOIs) did appear, F (4,88) - 4.90, Mse = .009, p - .0013. In contrast to listeners with regular timing, those encountering the irregular rhythms do better with very unexpected comparisons especially those that are unexpectedly late, suggesting a widening of the attentional focus. In this experiment the distribution of critical IOIs were identical to those in Experiment 3, yet in Experiment 3 a clear expectancy profile emerged whereas it did not in Experiment 4. This suggests that the dominant stimulus timing influence in Experiments 1 - 4 involved the sequence rhythm. In addition, the Experiment 4 finding that listeners are better in judging unexpectedly late comparisons in the irregular context than in the regular one raises two interpretations. One assumes that a wide attentional focus, instigated by an irregular rhythm, renders the listener more sensitive to reactive attending and temporal capture. The other evokes the visual
Auditory Attentional Capture
221
cue-target designs where unexpectedly early targets were deemed to automatically slow responding and unexpectedly late ones were interpreted to induce a voluntary re-orientation of attending (see part I). In other words, listeners may simply "wait" for a late comparison when a wide focus is induced. We are exploring both of these possibilities.
Summary of experimental findings The main outcomes of these experiments are as follows: 1. Regular rhythms, interpolated between standard and comparison tones, selectively facilitated pitch judgments for comparisons occurring at rhythmically expected times, as indicated by significant quadratic PC profiles over time (observed expectancy profiles). 2. The quadratic expectancy profile emerged even with critical IOIs lengthened relative to the interpolated IOIs, suggesting that the rhythm created by onsets and IOIs of the interpolated rhythm induced a persisting periodic attending process. 3. The quadratic profile was sharper with larger violations of the rhythmic expectancy. 4. The quadratic profile flattened for the same critical IOIs when the interpolated rhythm was missing or irregular. 5. The quadratic profile was based on data from listeners told to ignore the interpolated sequences.
IV. General Concluding Remarks A main goal of our research is to learn about the effects of event timing on attending to auditory events. Although space dimensions figure prominently in attending to visual elements, the time dimension becomes more critical in attending to auditory elements. Yet, we know relatively little about effects of timing on attentional orienting to sounds. In the introduction, I suggested that one important difference between attending to spatial versus temporal arrays is that the former offers an attender greater opportunity for voluntary control, i.e., to search at a comfortable pace among spatially distinct elements. In uncued search, where spatial arrays are static, no obvious time constraints enforce a particular attentional pace. But, with temporal arrays (sequences) come stimulus time constraints that transform the search task into a monitoring one; time constraints in sequence monitoring derive from stimulus properties and limit the flexibility of timed attending and the voluntary pacing of attending. In this respect, such stimulus properties control attending by enforcing a particular pace not only on attending but also on expectancies. At least two quite general consequences follow from our experimental findings. The first involves the stipulation of time as an irrelevant dimension in this task. In
222
Riess Jones
principle, timing could be ignored because listeners were not explicitly required to make judgments about it or to attend to time; that is, pitch was both the defining and reported dimension. Nevertheless, the timing of interpolated tones had a systematic influence on pitch judgments. The second general consequence is that these results suggest a blurring of the strict dichotomy between voluntary and involuntary attention that has been fruitful in research on visual attending. Voluntary control of attention is often equated with expectancies and involuntary control with stimulusdriven attending. However, if stimulus properties contribute to expectancies, then perhaps attending in time depends upon an interaction of voluntary and involuntary components.
Expectancies: What are they? According to a probabilistic-association tradition, expectancies entail an orientation of attention that is conveyed by knowledge acquired from conditional probabilities (cue validity) linking endogenous symbolic cues to targets. In this view an expectancy is not essentially stimulus-driven in that it derives from the acquisition of statistical contingencies, which, in turn, determine slow, deliberate and voluntary processes. According to a pattern-directed approach, an expectancy may also be an orientation of attention that is given by an extrapolation of stimulus relationships inherent in task structure (within a session) and/or sequence structure (within a trial or stimulus pattern). This suggests that expectancies are, at least in part, stimulus based. The claim that an expectancy may be stimulus driven is clearly baffling in light of the widespread belief that expectancies are determined by cue validity. But this presents a puzzle only if we overlook the defining feature of an expectancy. I have argued that this is simply an orientation of attending toward a future happening (as opposed to reactive attending or remembering, both of which suggest orienting to past events). This future orientation is manifest in anticipatory attending, which may or may not be voluntary, conscious and deliberate. And anticipations can be affected in many ways! One way entails cue validity. For instance, people can acquire future-oriented attending from training with valid symbolic cues to time intervals (Kingstone, 1992; Coull & Nobre, 1998). But another way involves the direct reliance on stimulus time relationships where strong anticipatory attending is likely when these are regular (current findings, plus Barnes & Jones, 2000; Large & Jones, 1999). The real puzzle comes with understanding how these different circumstances actually promote timed anticipations and whether all anticipations are necessarily entirely voluntary. Furthermore, it is not clear whether the same mechanisms are responsible for these different ways of establishing expectancies. The research reported here examined the second way of establishing an attentional orientation and temporal expectancies. In this case, reinforcement may entail entrainments of internal periodicities. The dynamic attending model instantiates this stimulus-driven approach to temporal expectancy. It does so in a fashion that makes clear that an inter-dependency obtains between anticipatory
Auditory Attentional Capture
223
attending (expectancies) and reactive attending (responses to expectancy violation and capture).
Stimulus properties and temporal capture Although expectancy figures prominently in this analysis, capture also is an essential part of the story. Theoretically, I suggest that temporal capture involves an oscillator's phase adjustment to a tone onset. It can happen in two contexts: One where anticipatory attending is weak, meaning a wide attentional focus; the other where anticipatory attending is strong, meaning a narrow focus in time. Initially, in novel situations, anticipatory attending is always weak, nonspecific and associated with a wide attentional focus. This characterizes attending as a listener monitors the first few sounds of sequence and/or a session as well as attending to irregularly timed sounds throughout a session (e.g., Experiment 4). In either case, a wide attentional focus renders a listener open to onsets that are associated with a broad range of time intervals. Here temporal capture refers to the phase shift of attending locally to any tone onset. Thus, one interpretation of the flat expectancy profile in Experiment 4 is that people are more adept at responding to very unexpected onsets than are listeners in experiments with regular rhythms (Experiments 1-3). Alternatively, because this is especially true for unexpectedly late onsets, it raises the possibility that phase adjustments are made more readily for late than for early unexpected onsets when the attentional focus is wide (see Coull and Nobre in part I for a related hypothesis for endogenous temporal cuing). Each sound onset falling within this focal region locally commandeers a swift corrective attending shift. Anticipatory attending is strong and specific, instantiating a narrow attentional focus, in contexts where a time structure has been consistently established either rhythmically or otherwise. In such contexts, singular unexpected onsets always offer a potential for violating specific expectancies about "when" an element will occur. The degree to which an expectancy is corrected and the attentional pulse realigned to a tone's onset depends upon the magnitude of the expectancy violation. However, given a narrow focus of attention in time, people are relatively capable of "tuning out" extremely deviant onsets. In Experiments 1-3, the regular rhythmic context appeared to induce a narrow focus in time as suggested by resulting expectancy profiles. In other words, if a tone onset is not very unexpected, given a rhythmic context, then phase adjustments can "fine tune" a temporal expectancy by correcting such violations. In this respect, there are parallels with visual capture where a distinctive singleton may derail attending, if it is not too remote in space. In temporal sequences, attentional re-orienting in time is reactive attending and involves temporal capture. To sum up, both the dynamic attending model and the data suggest that temporal capture may be contingent upon attentional set (focus width). According to the entrainment model, a phase shift in time to an abrupt onset ensues whenever that onset time departs from an expected time. However, temporal capture is more
RiessJones
224
likely to be evident with a wide than a narrow attentional focus. Clearly, this analysis assumes that the presence of a pattern-directed expectancy and the nature of an attentional focus determine the extent to which temporal capture makes an impact on performance. Footnotes
For illustrative purposes I assume serial search in this example although Ward, Duncan, & Shapiro (1996) review many alternative strategies. 2 Others suggest priority tags and queues (Yantis & Johnson, 1990). 3 The ISis ranged from 500 ms to 1,500 ms but tone durations varied over different experiments. 4 Limit cycle oscillators are dynamical systems in which in the limit a stable period of some specified value is achieved. Stability means that whenever a single perturbation to the period occurs (e.g. a different IOI than preceding ones, implying a new period), the oscillator's period will change briefly in the direction of the new IOI, but return to the attractor state of matching the oscillator period with the average IOI following the perturbation. 5 Experiments (and related data) of part III are reported in greater detail in Jones et al. (in press). 6 In post-session questionnaires, very few listeners reported even being aware of this; data from those who did were not analyzed. In addition, we eliminated from the analysis the data from any subject who produced a significantly highly proportion of "Same" responses. 7 Grouping in temporal sequences, unlike in visual arrays, leads to predictions of lowered accuracy for judgments about tones within a common group. Whereas in visual attention, facilitation of elements belonging to a common object may occur, in auditory attentional, interference occurs when elements are grouped (see part I). References
Amell, K. M., & Jolicouer, P. (1999). The attentional blink across stimulus modalities: Evidence for central processing limitations. Journal of Experimental Psychology: Human Perception and Performance, 25, 630-648. Barnes, R., & Jones, M. R. (2000). Expectancy, Attention, and Time.
Cognitive Psychology, 41,254-311. Boltz, M. (1993). The generational of temporal and melodic expectancies during musical listening. Perception & Psychophysics, 53, 585-600. Bregman, A. (1990). Auditory Scene Analysis. Cambridge: MIT Press. Bregman, A. & Rudnicky, A. I. (1975) Auditory segregation: Stream or streams? Journal of Experimental Psychology: Human Perception and
Performance, 1,263-267 Cave, K. R., & Wolfe, J. M. (1990). Modeling the role of parallel
Auditory Attentional Capture
225
processing in visual search. Cognitive Psychologist, 22, 225-271. Chun, M. M., & Potter, M. C. (1995). A two stage model for multiple target detection in rapid serial visual presentation. Journal of Experimental Psychology: Human Perception and Performance, 2I, 109-127. Coull, J. T., Frith, C. D., Buchel, C., & Nobre, A. C. (2000). Orienting attention in time: behavioral and neuroanatomical distinction between exogenous and endogenous shifts. Neuropsychologia, 38, 808-819. Coull, J. T., & Nobre, A. C. (1998). Where and When to pay Attention: The neural systems for directing attention to spatial locations and to time intervals as revealed by both PET and fMRI. The Journal of Neuroscience, 18, 7426-7435. Deutsch, D, (1999). The processing of pitch combinations. In The psychology of music (second edition) Ed: Deutsch, D. Academic Press, London, 349-411. Deutsch, D. (1972). Effect of repetition of standard and comparison tones on recognition memory for pitch. Journal of Experimental Psychology, 93, 156-162. Divenyi, P.L. & Sachs, R.M. (1978) Discrimination of time intervals bounded by tone bursts. Perception & Psychophysics, 24, 429-436 Dowling, W. J., Lung, K. M., & Herrbold, S. (1987). Aiming attention in pitch and time in the perception of interleaved melodies. Perception & Psychophysics, 4I(6), 642-656. Downing, C. J. (1988). Expectancy and visual-spatial attention: Effects on perceptual quality. Journal of Experimental Psychology: Human Perception and Performance, 14, 188-202. Egeth, H., & Yantis, S. (1997). Visual Attention. Annual Review of Psychology, 48, 269-297. Folk, C. L., Remington, R. W., & Johnston, J. C. (1992). Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 18, 1030-1044. Folk, C. L., Remington, R. W., & Wright, J. H. (1994). The structure of attentional control: Contingent attentional capture by apparent motion, abrupt onset and color. Journal of Experimental Psychology." Human Perception and Performance, 20, 317-329. Garner, W., & Gottwald, R. L. (1968). The perception and learning of temporal patterns. Quarterly Journal of Experimental Psychology, 20, 97-109. Garner, W. R. (1974). The perception and learning of temporal patterns. The Processing of Information and Structure. Potomac, MD: Erlbaum. Gellatly, A. Cole, G., & Blurton, A. (1999) Do equiluminant object onsets capture visual attention? Journal of Experimental Psychology: Human Perception and Behavior, 25, 1609-1624 Gibson, B. S. (1996). Visual quality and attentional capture: A challenge to the special role of abrupt onsets. Journal of Experimental Psychology, 22, 14961504. Gibson, B. S., & Amelio, J. (2000). Inhibition of return and attentional control settings. Perception & Psychophysics, 62, 496-504.
226
RiessJones
Green, D. M., & Swets, J. (1966). Signal detection theory and psychophysics. New York: Wiley. Greenberg, G. Z., & Larkin, W. D. (1968). Frequency-response characteristic of auditory observers detecting signal of a single frequency in noise: The probe-signal method. Journal of the Acoustical Society of America, 44, 15131523. Heise, G. A., & Miller, G. A. (1951). An experimental study of auditory patterns. American Journal of Psychology, 64, 68-77. Howard, J. H. J., O'Toole, A. J., Parasuraman, R., & Bennett, K. B. (1984). Pattem-directed attention in uncertain-frequency detection. Perception & Psychophysics, 35, 256-264. Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention and memory. Psychological Review, 83, 323-335. Jones, M. R. (1981). Music as a stimulus for psychological motion: Part I. Some determinants of expectancies. Psychomusicology, 1(2), 34-51. Jones, M. R. (1987). Dynamic pattern structure in music: Recent theory and research. Perception & Psychophysics, 41, 621-634. Jones, M. R., Boltz, M., & Kidd, G. (1982). Controlled attending as a function of melodic and temporal context. Jones, M. R., Jagacinski, R. J., Yee, W., Floyd, R. L., & Klapp, S. (1995). Tests of attentional flexibility in listening to polyrhythmic patterns. Journal of Experimental Psychology: Human Perception and Performance, 21, 293-307. Jones, M. R., Kidd, G., & Wetzel, R. (1981). Evidence for rhythmic attention. Journal of Experimental Psychology: Human Perception and Performance, 7, 1059-1073. Jones, M. R., & McAuley, D. J. (2000). Categorical time judgments in extended temporal contexts. Jones, M. R., Moynihan, H., MacKenzie, N., & Puente, J. (in press). Temporal aspects of stimulus-driven attending in dynamic arrays. Psychological Science. Jones, M. R., & Ralston, J. T. (1991). Some influences of accent structure on melody recognition. Memory and Cognition, 19, 8-20. Jones, M. R., & Yee, W. (1993). Attending to auditory events: the role of temporal organization. In S. McAdams & E. Bigand (Eds.), Thinking in Sound: The Cognitive Psychology of Human Audition (pp. 69-112). Oxford: Clarendon Press. Jonides, J. & Yantis, S. (1988) Uniqueness of abrupt visual onset in capturing attention. Perception & Psychophysics, 43, 346-354 Kahneman, D., & Treisman, A. (1984). Changing Views of Attention and Automaticity. In R. D. Parasuraman, D.R. (Ed.), Varieties of Attention. New York: Academic Press. Kahneman, D., & Tversky, A. (1982). Variants of uncertainty. Cognition, 11,143-157. Kidd, G. R. (1993). Temporally directed attention in the detection and discrimination of auditory pattem components. Poster (2pPP19) presented at
Auditory Attentional Capture
227
conference of Acoustical Society of America. Toronto. Kidd, G. R., Boltz, M., & Jones, M. R. (1984). Some effects of rhythmic context on melody recognition. American Journal of Psychology, 97, 153-173. Kingstone, A. (1992). Combining Expectancies. The Quarterly Journal of Experimental Psychology, 44, 69-104. Klapp, S., Hill, M. D., Ryler, J. G., Martin, A. E., Jagacinski, R. J., & Jones, M. R. (1985). On marching to two different drummers: Perceptual aspects of the difficulties. Journal of Experimental Psychology." Human Perception and Performance, 6, 814-827. Klein, J. M., & Jones, M. R. (1996). Effects of attentional set and rhythmic complexity on attending. Perception & Psychophysi'cs, 58, 34-46. Kubovy, M. (1981) Concurrent pitch segregation and the theory of indispensable attributes. In M. Kubovy & J. Pomerantz (Eds.), Perceptual organization (pp. 55098). Hillsdale, NJ: Erlbaum Large, E. W., & Jones, M. R. (1999). The Dynamics of Attending: How People Track Time-Varying Events. Psychological Review, 106(1), 119-159. McAuley, D. J., & Kidd, G. R. (1998). Effect of deviations from temporal expectations on tempo discrimination of isochronous tone sequences. Journal of Experimental Psychology: Human Perception and Performance, 24, 1786-1800. Miller, G. A. & Heise, G. A. (1950) The trill threshold. The Journal of the Acoustical Society of America, 22, 637- 638. Miniussi, C., Wilding, E. L., Coull, J. T., & Nobre, A. C. (1999). Orienting attention in time. Brain, 122, 1507-1518. Miller, J. (1989) The control of attention by abrupt visual onsets and offsets. Perception & Psychophysics, 45, 567-572 Miller, G.A. & Heise, G.A. (1950) The trill threshold. Journal of the American Acoustical Society of America, 22, 637-638 Mondor, T. (1999). Predictability of the cue-target relation and the timecourse of auditory inhibition of return. Perception & Psychophysics, 61, 1501-1509. Mondor, T., & Zatorre, R. J. (1995). Shifting and focusing auditory spatial attention. Journal of Experimental Psychology: Human Perception and
Performance, 21,387-409. Mondor, T., Zatorre, R. J., & Terrio, N. A. (1998). Constraints on the selection of auditory information. Journal of Experimental Psychology: Human Perception and Performance, 24, 66-79. Mondor, T. A., & Breau, L. M. (1999). Facilitative and inhibitory effects of location and frequency cues: Evidence of a modulation in perceptual sensitivity.
Perception & Psychophysics, 61,438-444. Mondor, T. A., & Bregman, A. S. (1994). Allocating attention to frequency regions. Perception & Psychophysics, 56, 268-276. Mondor, T. A., & Terrio, N. A. (1998). Mechanisms of perceptual organization and auditory selective attention: The role of pattern structure. Journal of Experimental Psychology: Human Perception and Performance, 24, 1628-1641. Nissen, M. J., & Corkin, S. (1985). Effectiveness of attentional cueing in
228
RiessJones
older and younger adults. Journal of Gerontology, 40, 185-191. Posner, M. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3-25. Posner, M. I., Snyder, C. R. R., & Davidson, B. J. (1980). Attention and the detection of signals. Journal of Experimental Psychology: General, 109, 160-174. Puente, J. & Jones, M.R. (under review) Determinants of attending and expectancy in listening to auditory pattems. Raymond, J. E., Shapiro, K., & Amell, K. M. (1992). Temporary suppression of visual processing in an RSVP task: an attentional blink? Journal of Experimental Psychology: Human Perception and Performance, 18, (849-860). Remington, R. W., Johnston, J. C., & Yantis, S. (1992). Involuntary attentional capture by abrupt onsets. Perception & Psychophysics, 51, 279-290. Restle, F. (1970). Theory of serial pattem learning: Structural trees. Psychological Review, 77, 481-495. Rothstein, A. (1973) Effect on temporal expectancy of the position of a selected foreperiod within a range. The Research Quarterly, 44, 132-139 Scharf, B., Quigley, S., Aoki, C., Peachey, N., & Reeves, A. (1987). Focused auditory attention and frequency sensitivity. Perception & Psychophysics, 42, 215-223. Shapiro, K., Raymond, J. E., & Amell, K. M. (1994). Attention to visual pattern information produces the attentional blink in rapid serial visual presentation. Journal of Experimental Psychology: Human Perception and Performance, 20, 357371. Shulman, G. L., Remington, R. W., & McLean, J. P. (1979). Moving attention through visual space. Journal of Experimental Psychology: Human Perception and Performance, 5, 522-526. Spence, C., & Driver, J. (1994). Covert Spatial Orienting in Audition: Exogenous and Endogenous Mechanisms. Journal of Experimental Psychology: Human Perception and Performance, 20, 555-574. Swets, J.A. & Green, D.M. (1966) Signal Detection Theory and Psychophysics. New York: John Wiley Theeuwes, J. (1991). Exogenous and endogenous control of attention: The effect of visual onsets and offsets. Perception & Psychophysics, 49, 83-90. Treisman, A., & Sato, S. (1990). Conjunction search revisited. Journal of Experimental Psychology: Human Perception and Performance, 16, 459-478. Todd, R., Boltz, M. & Jones, M.R. (1989) The MIDILAB auditory research system. Psychomusicology, 8, 17-30 Ward, R., Duncan, J.Z. & Shapiro, K. (1996) The slow time-course of visual attention. Cognitive Psychology, 30, 79-109 Watson, C.S., Kelly, W.J. & Wroton, H.W. (1976) Factors in the discrimination of tonal pattems. II. Selective attention and leaming under various levels of uncertainty. Journal of the Acoustical Society of America, 60, 1176-1186 van Noorden, L. P. A. S. (1975). Temporal coherence in the perception of tone sequences, University of Technology, Eindhoven.
Auditory Attentional Capture
229
Vos, P., & Ellerman, H. H. (1989). Precision and accuracy in the reproduction of simple tone sequences. Journal of Experimental Psychology: Human Perception and Performance, 15, 179-187. Wolfe, J. M. (1994). Guided Search 2.0: A revised model of visual search.
Psychonomic Bulletin and Review, 1,202-238. Wolfe, J. M., Cave, K. R., & Franzel, S. L. (1989). Guided Search: An alternative to the Feature Integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance, 15, 419-433. Woods, D. L., Alho, K., & Algazi, A. (1994). Stages of auditory feature conjunction: An event related brain potential study. Journal of Experimental Psychology: Human Perception and Performance, 20, 81-94. Woods, D. L., Alain, C., Diaz, R., Rhodes, D. & Ogawa, K. H. (2001). Location and frequency in auditory selective attention. Journal of Experimental Psychology: Human Perception and Performance, 27, 65-74. Yantis, S. (1993). Stimulus-driven attentional capture. Current Directions in Psychological Science, 2, 156-161. Yantis, S., & Egeth, H. (1999). On the distinction between visual salience and stimulus-driven attentional capture. Journal of Experimental Psychology." Human Perception and Performance, 25, 661-676. Yantis, S., & Jonides, J. (1984). Abrupt visual onsets and selective attention: Voluntary versus automatic allocation. Journal of Experimental Psychology: Human Perception and Performance, 1O, 601-621. Yantis, S. & Johnson, D. N. (1990) Mechanisms of attentional priority. Journal of Experimental Psychology, 16, 812-825 Yantis, S., & Jonides, J. (1990). Abrupt visual onsets and selective attention: Evidence from visual search. Journal of Experimental Psychology: Human Perception and Performance, 16, 121-134. Yantis, S., & Jonides, J. (1996). Attentional capture by abrupt onsets: new perceptual objects or visual masking? Journal of Experimental Psychology: Human Perception and Performance, 22, 1505 - 1513.
Acknowledgements I am grateful to colleagues who assisted in this research and who read and commented on earlier version of this chapter. These include Ralph Barnes, Jennifer Hoffman, Susan Holleran, Noah Mackenzie, Heather Moynihan, Amandine Pennel. I also thank William Johnston, and Charles Folk who commented on an earlier version of this chapter. This research was sponsored in part by a grant awarded to Mari Riess Jones from the National Science Foundation (BCS-9809446). Portions of this research were also reported at an annual meeting of the Psychonomic Society (New Orleans, Louisiana), November, 2000.
This Page Intentionally Left Blank
Attraction, Distraction, and Action: MultiplePerspectiveson Attentional Capture C. Folk and B. Gibson (Editors) 9 ElsevierScience B. V. All rights reserved.
10
231
Crossmodal Attentional Capture" A Controversy Resolved? Charles Spence
There has been a rapid growth of interest in the study of crossmodal attentional capture in recent years, as more and more researchers have started to address the question of whether or not the presentation of a spatially-nonpredictive peripheral event in one sensory modality will lead to a reflexive shift of attention in another modality (such as, for example, whether a sudden white noise burst or tap on the hand will capture visual attention). Although there has been a great deal of controversy regarding the existence of crossmodal capture between audition and vision (e.g., Spence & Driver, 1997a; Ward, 1994; Ward, McDonald, & Lin, 2000), empirical research now supports the view that crossmodal capture effects occur between all combinations of auditory, visual, and tactile stimuli, at least under certain conditions. In the present chapter, the key behavioural findings on crossmodal capture are reviewed, and an attempt is made to resolve this controversy over the existence of audiovisual capture effects. Introduction
Our senses are constantly bombarded by information from the multitudinous distal events occurring in our everyday environment. These events are often specified by information that is available to several sensory modalities simultaneously. For example, when conversing we not only listen to the sound of a person's voice, but also watch their lip movements to hear what is being said (Driver & Spence, 1994). Although the information arriving at the various sensory epithelia are initially processed independently, converging neural pathways rapidly lead to extensive multisensory integration in a variety of neural structures, such as the superior colliculus and the inferior parietal lobe (e.g., Bushara et al., 1999; Sherrington, 1920; see Stein & Meredith, 1993, for a review). In fact, multisensory integration has been reported to occur in all species known to possess more than one sensory system (Stein, London, Wilkinson, & Price, 1996). Given this extensive multisensory convergence it would make sense for our attentional mechanisms to be coordinated across the modalities as well.' To date, however, the majority of empirical research has focused on the study of attentional capture within just vision (e.g., Folk, Remington, & Johnston, 1992; Yantis, 2000), or just audition (e.g., Jones, Moynihan, MacKenzie, & Hoffman, in press; McDonald & Ward, 1999;
232
Spence
Spence & Driver, 1994). Everyday life provides numerous examples of overt crossmodal attentional capture, for example, when we suddenly tum our heads to inspect the source of a loud bang (the auditory capture of visual attention), or else to look at a fly that has unexpectedly landed on our arm (the tactile capture of visual attention). However, most research has focused on the question of whether or not crossmodal links also exist for the case of covert attentional orienting (i.e., for orienting that takes place in the absence of eye, head, or hand movements; cf. Posner, 1978; Spence & Driver, 1994). Many researchers have made a distinction between two different types of covert attentional orienting: The exogenous (also referred to as reflexive, automatic, involuntary, or stimulus-driven) orienting sometimes found in response to salient but spatially nonpredictive peripheral e v e n t s - such as a sudden sound, or an unexpected tap on the hand; and the endogenous (or voluntary) orienting induced by advance knowledge regarding where a target is most likely to o c c u r - such as a verbal instruction informing participants that targets will be more likely on one hand than the other (e.g., Klein & Shore, 2000; Spence & Driver, 1994). Numerous qualitative differences have been found between these two forms of covert orienting, and different neural substrates have been implicated (e.g., Briand, 1998; Briand & Klein, 1987; Ladavas, 1993; Rafal, 1996; Rafal, Henik, & Smith, 1991). Although the present review will focus primarily on the nature of any crossmodal links in purely exogenous spatial orienting, it should be noted that extensive crossmodal links in endogenous spatial attention have also been reported (e.g., Driver & Spence, 1994; Eimer & Driver, 2000; Eimer & Schr6ger, 1998; Hillyard, Simpson, Woods, Van Voorhis, & Munte, 1984; Lloyd, Merat, McGlone, & Spence, submitted; Spence & Driver, 1996; Spence, Pavani, & Driver, 2000; TederS~ilej~irvi, Mtinte, Sperlich, & Hillyard, 2000). The majority of experiments on crossmodal attentional capture (or exogenous crossmodal spatial orienting) 2 have utilized chronometric measures of performance (Posner, 1978), where changes in the speed and/or accuracy of performance have been taken to show that the presentation of a particular spatial cue in one modality can exogenously capture attention in another sensory modality. For example, in a typical spatial-cueing study, participants are instructed to maintain central fixation while making a speeded-detection or discrimination response to a target presented on either side of fixation. A spatially-nonpredictive peripheral cue (such as a sudden visual onset or short noise burst) is presented shortly before the target (typically at stimulus onset asynchronies [SOAs] of 0 - 1,000 ms) on either the same or opposite side. Numerous studies have now shown that response latencies are often faster for targets presented on the same side as the cue (sometimes referred to as ipsilateral or valid trials) than for targets appearing on the uncued side (contralateral or invalid trials). These spatial cueing effects, which typically last for several hundred milliseconds after the cue onset, have been shown to occur both when the cue and target are presented in the same modality and, more importantly for present purposes, when they are presented in different modalities as well. However, although such crossmodal cueing results have often been attributed
Crossmodal Attentional Capture
233
to a beneficial shift in covert exogenous crossmodal attention toward the cued position (i.e., to crossmodal attentional capture), there are several alternative, nonattentional, explanations for the behavioral effects reported in the majority of previous studies. In this review, I will summarize the key findings from studies of crossmodal attentional capture, and highlight the methodological confounds that compromise the interpretation of many of these previous studies. I hope to show that although there has been a great deal of controversy in this area in recent years (e.g., Spence & Driver, 1997a; Ward, 1994; Ward et al., 2000), a consensus view is now emerging in this fertile research area that crossmodal attentional capture effects can occur between all combinations of auditory, visual and tactile stimuli.
Speeded Detection Tasks Perhaps the most commonly used task by researchers to investigate crossmodal capture effects has been the simple detection task. Many studies have shown that simple detection latencies for visual targets presented to the left or right of fixation can be facilitated by the prior (or simultaneous) presentation of a spatially nonpredictive auditory (or visual) cue from the same, rather than opposite side in both normal participants (e.g., Klein, Brennan, D'Aloisio, D'Entremont, & Gilani, 1987, Experiment 1; Reuter-Lorenz & Rosenquist, 1996, Experiment 2; Schmitt, Postma, & de Haan, 2000, Experiments 1 & 2; Schmitt, Postma, & de Haan, in press, Experiment 1; though see Ward et al., 1998, Experiment 1, for contradictory results), and unilateral parietal patients (Farah, Wong, Monheit, & Morrow, 1989). By contrast, auditory detection latencies in normal participants have been reported to be unaffected by either auditory or visual spatial cueing (e.g., Klein et al., 1987, Experiment 6; Schmitt et al., 2000, Experiment 1; Schmitt et al., in press, Experiment 1; Spence & Driver, 1994, Experiment 8; Ward et al., 1998, Experiment 2). Some researchers (e.g., Buchtel & Butter, 1988) have taken these results to demonstrate the existence of asymmetrical crossmodal capture effects, such that auditory cues can capture visual attention, while visual cues do not capture auditory attention. However, an alternative interpretation is that auditory detection latencies may simply be relatively insensitive to the spatial distribution of attention (e.g., Klein et al., 1987; Posner, 1978; Spence & Driver, 1994). Some of the strongest evidence in support of this claim comes from the fact that visual and auditory cues still have no effect on auditory detection latencies when they are made spatiallypredictive with regard to the likely location of the upcoming target (i.e., when both exogenous and endogenous attention should facilitate auditory target detection on the cued side). For example, Buchtel and Butter (1988) reported that visual cues which predicted the target side on 80% of trials had no effect on auditory detection latencies, a result which has been replicated by several other researchers (e.g., Posner, 1978; Schmitt et al., 2000, Experiment 3; Schmitt et al., in press, Experiment 3). Moreover, Spence and Driver (1994) have also reported that spatially-predictive auditory cues (75% valid with respect to the likely target
234
Spence
location) have no effect on auditory detection latencies, despite the fact that the same cues lead to clear attentional effects in a variety of auditory discrimination tasks (see also Hugdahl & Nordby, 1994; McDonald & Ward, 1999; Schmitt et al., 2000; though see Buchtel, Butter, & Ayvasik, 1996). Taken together, these results suggest that the most parsimonious explanation for why visual cues have no effect on auditory detection latencies is that they (auditory detection latencies) are simply insensitive to the spatial distribution of attention. 3 To date, very few published studies have examined crossmodal capture between other pairs of sensory modalities using the simple detection task (Butter, Buchtel, & Santucci, 1989; Tassinari & Campara, 1996). For example, Butter et al. reported that visual and tactile detection latencies were facilitated by the prior presentation of a spatially-predictive peripheral cue in either vision or touch, hence apparently showing symmetrical crossmodal attentional capture effects between vision and touch. However, it is important to note that the use of informative and peripheral cues in Butter et al.'s studies means that both exogenous orienting (to the location of the peripheral cueing event) and endogenous orienting (to the same location, but only because the subsequent target was more likely to appear there) may have been induced by the cues. It therefore remains uncertain whether the facilitatory effects on tactile and visual detection latencies they reported should be attributed to exogenous orienting (i.e., to crossmodal attentional capture), to endogenous orienting, or to some unknown combination of these two effects (see Mondor & Amirault, 1998; Spence & Driver, 1996; Ward, 1994, on this point). In fact, because the cues were spatially informative, they may have produced a strategic shift in attention to the likely side in just the expected target modality (i.e., a purely unimodal shift of attention). For example, in the case of a visual target presented after an informative tactile cue, there may have been only an endogenous shift in just visual attention to the likely side of the anticipated visual target, exactly as would have taken place following, say, the interpretation of a central arrow (see Driver & Spence, 1994; Johnen, Wagner, & Gaese, 2001; Pashler, 1998, pp. 91-92). It should be noted that this uncertainty regarding the most appropriate interpretation of Butter et al.'s results also applies to many other studies where researchers have attempted to investigate crossmodal attentional capture by using predictive peripheral cues (e.g., Buchtel & Butter, 1988; Mondor & Amirault, 1998, Experiments 2 & 3; Schmitt et al., 2000, Experiments 3, 5, & 6; Schmitt et al., in press, Experiments 3 & 4). Finally, it is also important to note that the facilitatory effects reported in all of these simple detection studies may reflect a shift in the participant's criterion for responding rather than a genuine perceptual effect. In fact, it has been argued for many years within the visual attention literature (e.g., Duncan, 1980; Mtiller & Findlay, 1987; Sperling & Dosher, 1986) that spatial cueing effects on simple RT may reflect criterion shifts, rather than genuine attentional effects. That is, participants may simply reduce the amount of evidence necessary for deciding that a target has occurred on the cued side, and also possibly increase their criterion for responding on the uncued side, thus resulting in differences in simple detection
Crossmodal Attentional Capture
235
latencies for targets presented ipsilateral versus contralateral to the cue, without the need to invoke attention. Therefore the most parsimonious conclusion here is probably that the cueing effects reported in previous crossmodal attentional capture studies that have used a simple detection response measure reflect a combination of attentional and/or criterion-shifting effects.
Speeded Discrimination Tasks Given the problems inherent in the use of simple detection latencies to assess the spatial distribution of attention, many researchers have opted to use speeded discrimination tasks instead, so that both speed and accuracy can be measured. (The adoption of a more risky criterion should result in faster but less accurate performance.) These tasks can be grouped into four broad categories depending of the particular discrimination involved: non-spatial discrimination tasks, left/right discrimination tasks, orthogonal-cueing tasks, and implicit spatial discrimination tasks. Markedly different conclusions regarding the existence of crossmodal attentional capture have been developed on the basis of studies using each of these methodologies, as highlighted below.
Non-spatial discrimination tasks Klein et al. (1987; Experiment 5) reported an experiment in which participants made a speeded duration discrimination responses (short vs. long) to visual targets presented to either side of fixation. Spatially nonpredictive auditory cues were presented 250 ms before the target from either exactly the same position as the target, or from the mirror-symmetrical location on the other side. Target discrimination latencies were significantly faster for visual targets presented on the cued side (mean RT of 612 ms) than for targets presented on the uncued side (mean RT of 651 ms). However, participants also tended to make more erroneous responses on cued trials than on uncued trials (means of 20% vs. 18% errors respectively), making it uncertain whether Klein et al.'s RT facilitation effects reflect a genuine perceptual attentional benefit for targets on the cued side, a simple criterion-shifting effect, or some unknown combination of the two effects. Mondor and Amirault (1998, Experiment 1) failed to demonstrate crossmodal attentional capture in their study when participants were required to make speeded discrimination responses to an unpredictable sequence of auditory and visual targets presented from the left or right of fixation. Participants had to judge the colour (red vs. green) of visual targets and the direction of frequency change (upward vs. downward frequency glide) of auditory targets. Every target was preceded unpredictably by an auditory or visual cue from the same or opposite side of fixation (at SOAs of 150 or 300 ms). Mondor and Amirault reported that while auditory cues reliably captured auditory spatial attention, and visual cues reliably captured visual attention, there was no evidence of crossmodal attentional capture (either from auditory cues to visual targets or vice versa; though see
236
Spence
Widmann & Schr6ger, 1999), unless the cues were made spatially predictive with regard to the likely target location (i.e., 75% valid, 25% invalid; Experiments 2 & 3) making this finding ambiguous for present purposes. Similarly, Ward et al. (1998, Experiments 3 & 4) also reported no evidence of crossmodal attention capture in their studies when participants made non-spatial discrimination responses ('x' vs. '+' discrimination for visual targets, and 3,000 Hz vs. 4,000 Hz pure tone frequency discrimination in audition; though see also Ward et al., Experiment 9). Mondor and Amirault's (1998) failure to demonstrate any crossmodal cueing effects following the presentation of spatially nonpredictive cues may have been due to the relative positioning of their auditory and visual stimuli. The visual stimuli were presented from a computer monitor 14 degrees to either side of fixation, while auditory stimuli were presented from loudspeakers centered 17 degrees to either side. Similarly, Ward et al. (1998) also presented visual targets from an eccentricity of 12 degrees and auditory targets from an eccentricity of 24 degrees to either side of fixation. (Note that Klein et al., 1987, conducted one of the few early crossmodal attentional cueing studies to present cue and target stimuli from exactly the same spatial location on ipsilateral trials). Research has shown that introducing even small lateral discrepancies (of as little as 3 degrees) between the locations of auditory and visual stimuli can lead to a dramatic reduction, or even elimination, of crossmodal attentional effects (see Eimer & Schr6ger, 1998, for a particularly convincing demonstration of this). Dufour (1999; Experiment 2) also reported that auditory cues do not facilitate speeded orientation discrimination responses for visual targets (line segments oriented at + 45 degrees presented 40 ms later) presented amongst visual line segment distractors, even when presented from the same lateral eccentricity as the visual target on ipsilateral trials. Interestingly, Dufour reported that auditory cues did improve visual performance on an unspeeded conjunction discrimination task. On each trial of this experiment, a target letter 'T', flanked by 4 'T' distractors in different orientations was presented randomly to either the left or right of fixation, and participants had to make an unspeeded discrimination response regarding the orientation of the target. Participants performed significantly better (by approximately 8%; overall performance was in the range of 55-65% correct) on this conjunction discrimination task when the auditory cue was presented ipsilateral to the target. These results suggest that the crossmodal capture of visual attention by auditory cues may only occur when the task requires a particularly attentiondemanding discrimination response (such as when searching for a conjunction target; Treisman & Gelade, 1980).4 Unfortunately, however, an overt orienting explanation of Dufour's cueing effects cannot be ruled out, since the eye position of their participants was not monitored. The results of these studies of crossmodal attentional capture using non-spatial discrimination tasks do not, therefore, provide any unequivocal evidence to support the existence of audiovisual crossmodal attentional capture effects. More promising results have been reported by Spence, Nicholls, Gillespie, and Driver (1998) in experiments where participants made speeded continuous vs.
Crossmodal Attentional Capture
237
pulsed discrimination responses to tactile targets presented unpredictably to the index finger of either hand. Every target was preceded by a spatially uninformative auditory or visual cue (at an SOA of 150, 200, or 300 ms) on either the same or opposite side. Tactile discrimination response latencies were significantly faster, and also more accurate, when the cue was presented on the same side as the target, revealing the crossmodal capture of tactile attention by both auditory and visual cues. An overt crossmodal capture account of these results was ruled out by monitoring the eye position of participants, and eliminating all trials on which an eye movement was detected. Spence et al.'s (1998) results therefore provide the strongest evidence to date for crossmodal attentional capture (the auditory capture of touch, and the visual capture of touch) using non-spatial discrimination tasks. Only further research will reveal whether unambiguous crossmodal capture effects can also be demonstrated between other pairs of sensory modalities using nonspatial discrimination tasks.
Left/right discrimination task Simon and Craft (1970) reported that participants made speeded left/right spatial discrimination responses to visual targets more rapidly when they were accompanied by a spatially-uninformative auditory cue (presented over headphones) on the same rather than the opposite side. Similar results have also been reported by Bernstein and Edelstein (1971) for sounds presented monaurally up to 45 ms after the lateralized visual target. By contrast, Ward (1994; Experiment 1; see also Ward et al., 1998, Experiments 5 & 6) reported no effect of freefield auditory cues on visual left/right discrimination RTs, despite the fact that visual cues facilitated visual discrimination responses. However, in a second experiment, Ward (1994) found that auditory spatial discrimination responses were facilitated by both visual and auditory cues, with the largest facilitation effects when auditory and visual cues were presented simultaneously from the same side. Ward (1994) took the asymmetrical results from his left/right discrimination experiments to demonstrate the existence of asymmetrical crossmodal attentional capture effects, such that visual cues capture auditory attention, but auditory cues do not capture visual attention. However, Spence and Driver (1997a) have argued that Ward's results, together with those reported in other left/right discrimination studies (e.g., Bernstein & Edelstein, 1971; Simon & Craft, 1970) may partially reflect the facilitatory effects of response priming (or spatial compatibility) instead. The lateralized cues could have biased participants toward making a response on the side of the cue, which would in turn be expected to speed responses to targets appearing on that side (i.e., ipsilateral to the cue), and hence facilitate ipsilateral target performance, given that participants responded with their left hand to left targets and with their right hand to right targets. Moreover, Spence and Driver (1997a) argued that Ward's (1994) failure to demonstrate any facilitatory effect of auditory cues on visual left/right discrimination latencies could also be explained in terms of relative stimulus-response compatibility effects, since
Spence
238
auditory and visual stimuli were presented from different lateral eccentricities (e.g., auditory cues were presented from 24 degrees from fixation, whereas visual targets were presented from only 12 degrees from fixation; though see Ward et al., 2000). However, Spence and Driver's account of Ward's null results now seems less tenable given that the same null effect of auditory cues on visual /right discrimination responses has subsequently been replicated by Ward et al. (1998, Experiment 7) when auditory and visual stimuli were presented from the same
lateral eccentricity. Interestingly, however, Schmitt et al. (2000; Experiment 2) reported robust facilitatory effects between all four possible combinations of auditory and visual cue and target stimuli in their study (in which each cue-target combination was presented in a separate block of trials), when both stimuli were presented from the same eccentricity on ipsilateral trials. The most obvious methodological difference between Schmitt et al.'s study and the experiments reported by Ward and colleagues (Ward, 1994; Ward et al., 1998), is that Schmitt et al. presented auditory cues in a simple blocked cueing environment (i.e., where only auditory cues were presented within a particular block of trials), whereas Ward and colleagues presented auditory cues in a more complex cueing environment (where they were unpredictably mixed with visual and multimodal cues; see Ward et al., 1998, on this point). It now seems that left/right discrimination responses to visual targets are facilitated by the ipsilateral presentation of an auditory cue only when they are presented in a simple cueing environment (e.g., Bernstein & Edelstein, 1971; Schmitt et al., 2000; Simon & Craft, 1970), but not when they are presented in more complex cueing environments (e.g., Ward, 1994; Ward et al., 1998, Experiments 5-7; a point to which we return later). Nevertheless, it is important to note that no firm conclusions regarding the magnitude of crossmodal attentional capture effects can be drawn from the results of experiments utilizing the left/right discrimination task, given the possibility of response bias confounds.
Orthogonal-cueing paradigm Spence and Driver (1994, 1997a; Driver & Spence, 1998) developed the
orthogonal spatial-cueing paradigm to investigate attentional capture in a spatial task that was free from response bias (thus extending the non-spatial orthogonal crossmodal cueing paradigm first developed by Klein et al., 1987, Experiment 5). Participants in Spence and Driver's studies made speeded discrimination responses regarding the elevation (up vs. down) of a series of targets presented from above or below fixation on either the left or right (see Figure 1). Every target was preceded by a spatially-nonpredictive cue in either the same or different sensory modality. Participants were instructed to ignore the cue as much as possible, and to make a speeded discrimination response (using either response buttons or foot pedals) to the elevation of the target, regardless of the side on which it appeared, and also regardless of its modality.
239
Crossmodal Attentional Capture
Over a number of studies, it has been shown that spatially nonpredictive auditory cues on one side lead to better elevation judgments (on average, around 2030 ms faster, and somewhat more accurate) for auditory, visual and tactile events presented in the vicinity of the sound shortly after its onset (at SOAs of 100-300 ms; Driver & Spence, 1998; Schmitt et al., 2000, Experiment 5; Spence & Driver, 1994, 1997; Vroomen et al., in press, Experiment 1; though see Ward et al., 1998, Experiment 8). These results show that salient auditory events can lead to a rapid capture of covert visual and tactile spatial attention, though spatially-coincident bimodal audiovisual cues have been shown to be no more effective at capturing auditory attention than unimodal auditory cues (e.g., Vroomen, Bertelson, & de Gelder, in press, Experiment 1; Spence & Driver, 1999). Similarly, spatially nonpredictive tactile events on one hand lead to better auditory, visual, and tactile Target Loudspeaker and Light
9
Cue Loudspeaker
Figure 1. Schematic view of the position of cue and target loudspeakers (shown by ellipses), the target lights (black circles), the central fixation light, and the participant in Spence and Driver's (1997a) studies of audiovisual links in covert spatial attention.
judgments on that side (e.g., Chong & Mattingley, 2000; Kennett, Eimer, Spence, & Driver, 2001; Kennett, Spence, & Driver, submitted; Spence & McGlone, in press; Spence et al., 1998). This shows that tactile events also elicit crossmodal attentional capture. Finally, nonpredictive visual flashes have been shown to lead to better visual and tactile judgments in their vicinity (Chong & Mattingley, 2000; Kennett et al., 2001, submitted). Importantly, these crossmodal capture effects occur even when the cues are presented in a modality which is completely irrelevant to the participant's task (i.e., if the cues are always auditory, while the targets are always visual). Spence and Driver (1997a) have, however, repeatedly found that visual cues do not affect auditory judgments (at least when eye movements are prevented). This null result has been shown to hold up across numerous variations in the physical properties of the particular visual and auditory stimuli used, and has now been replicated by several different researchers (e.g., Rorden & Driver, 1999,
240
Spence
Experiment 4; Schmitt et al., 2000, Experiment 5; Spence & Driver, 2000; Vroomen et al., in press, Experiments 1-3). 5 In fact, the only situation where spatially nonpredictive visual cues have been shown to lead to the crossmodal capture of auditory attention in the orthogonal-cueing task is when they are paired with a centrally-presented auditory cue (Spence & Driver, 2000; Vroomen et al., in press). For example, Vroomen et al. reported a series of experiments in which they showed that a laterally-presented visual cue only captured auditory attention when it was presented at the same time as an auditory cue from a loudspeaker at fixation (but not when it was presented in isolation), presumably due to the well-known ventriloquism effect. (Note that the auditory cue, by itself, could not have induced a lateralized orienting effect since it was presented centrally.) Similar results, albeit with a somewhat different time-course, have been reported by Spence and Driver (2000) in their study of attentional capture in the vertical dimension. They showed that a visual cue presented either above or below fixation led to a vertical shift of auditory attention only when it was paired with a hard-to-localize auditory pure tone cue centered at fixation, but not when it was paired with a highly-localizable white noise cue (see Figure 2). These results demonstrate that attentional capture can be
A,
0
~
Cue Lights
0 Fixation Light Loudspeaker Cone
0 Figure 2A. shows a schematic front-on view of the position of the cue and target loudspeakers, cue lights (grids of LEDs), and fixation light as seen by participants in Spence and Driver's (2000) study of attentional capture by ventriloquized sounds. On each trial, either the upper or lower grid of lights was illuminated, serving as the visual cue. This was paired either with a hard-to-localize pure tone cue from all five loudspeaker cones, or else with an easy-to-localize white noise burst from just the loudspeaker situated behind the fixation light. The target consisted of pulsed white noise presented from one of the four comer loudspeakers. In order, to maintain the orthogonality of the design, participants were required to make a left-fight discrimination response, given that cueing occurred in the vertical direction.
241
Crossmodal Attentional Capture
Be
12., "~ =s
lO-
Unlocalizable Tone Cue
,....
8-
m ~9 9
Localizable
6 4-
~~eNoiseCue
0
-2
~!!i
ill
9
9
100
m s
700 200
m s
m s
Stimulus Onset Asynchrony
(SOA)
Figure 2B. shows the mean cueing effects in reaction time (contralateral- ipsilateral trials) as a function of the cue sound, and the cue-target stimulus-onset asynchrony (SOA). Pure tone cue trials are represented by the striped bars, and localizable white noise cue trials by the dotted bars. The * indicates a significant attentional capture effect, which was not compromised by the accuracy data.
directed toward the apparent location of a ventriloquized sound, suggesting that multisensory integration precedes (or else co-occurs with) reflexive shifts of covert attention (see Driver, 1996, for a similar conclusion for the case of endogenous attention). Researchers have also used the orthogonal-cueing paradigm to investigate whether crossmodal attentional capture effects lead to the facilitation of responses to all stimuli presented on the cued side, or to a more spatially-specific cueing effect. For example, participants in a study by Driver and Spence (1998) were presented with spatially nonpredictive auditory cues from one of four possible loudspeakers, two on either side of fixation (situated at eccentricities of 13 and 39 degrees; see Figure 3A). Lights were placed directly above and below each of these loudspeakers, and visual targets consisted of the brief offset of one of these eight lights. Participants were required to make a speeded discrimination response regarding the elevation of the visual offset. The maximal facilitation of response latencies (and the most accurate responses) occurred when cue and target were presented from the same lateral position, with cueing effects dropping off as the lateral eccentricity between the cue and target increased, irrespective of whether the cue and target stimuli were presented in the same or different hernispaces. These results show that the peripheral presentation of a spatially-nonpredictive auditory cue leads to spatially-specific crossmodal capture of visual attention (see Schmitt et al., in press, Experiment 2, for similar results using a 4-button localization paradigm) to a particular location within a hemifield. Similar results have been reported in unimodal studies of both visual and auditory attention (see Rorden & Driver, 2001). To date, virtually all studies of crossmodal attentional capture have been performed with the eyes and head in alignment (i.e., eyes looking straight-ahead
242
Spence
with respect to the head). However, gaze is frequently deviated in daily life, which realigns visual receptors relative to auditory receptors. This raises the important issue of whether audiovisual links in spatial attention are controlled by a fixed earretina mapping, or whether instead the relationship between the modalities gets spatially remapped whenever gaze is deviated. Recent studies using the orthogonal cueing paradigm suggest that spatial alignment is maintained even when the eyes are deviated with respect to the head, or when the hands are crossed over the midline (see Figure 3B; Driver & Spence, 1998), showing that crossmodal links are maintained under receptor misalignment, so that our attention may be focused on the same external location across the modalities regardless of the posture adopted (see also Figures 3C and 3D).
Figure 3A. Schematic view of Driver and Spence's (1998) study in which participants made speeded elevation discriminations for target lights, regardless of where the immediatelypreceding sound cue had been. In a typical crossmodal attention study, participants fixate directly ahead, and visual discriminations are best at the same eccentricity and side as the immediately preceding auditory cue. In Figure 3B, where participants fixated eccentrically (note that all visual events have been laterally translated along with gaze), visual discriminations were again best for lights at the same external location as the immediatelypreceding sound, but these now occupied different retinal locations as compared with Figure 3A. This result demonstrates remapping between auditory locations in the control of exogenous crossmodal attention, to keep vision and audition in register as regards external space despite deviations in gaze. A similar remapping of visuotactile space has also been demonstrated when participants adopt a crossed hands posture. In particular, a tactile cue presented to the left hand will facilitate elevation discrimination responses to visual targets on the left when the hands are uncrossed (Figure 3C), but will facilitate responses to lights on the fight when the hands are crossed (Figure 3D).
Crossmodal Attentional Capture
243
Implicit spatial discrimination task McDonald and Ward (1999, 2000, submitted) have developed another spatial task, called the implicit spatial discrimination task, to investigate intramodal and crossmodal attentional capture effects. In one combined behavioral and electrophysiological study, McDonald and Ward (2000) presented spatially nonpredictive auditory cues from either the left or right of fixation, followed a short time later by a visual stimulus on either the left, the right, or else at fixation. Participants were required to make a speeded detection response to visual onsets presented on either side (go signals, 80% of trials), but to refrain from responding on trials where the visual stimulus was presented at fixation (the no-go signals, 20% of trials). Participants responded significantly faster on ipsilaterally-cued trials than on contralaterally-cued trials at short SOAs (100-300 ms). McDonald and Ward (2000) also recorded event-related brain potentials (ERPs) while participants performed the implicit spatial discrimination task to examine the neural basis of their capture effect. They found that the presentation of the spatially nonpredictive auditory cue modulated ERPs to visual targets over modality-specific, extrastriate visual cortex, as reported previously in unimodal exogenous studies of visual attention (e.g., Hopfinger & Mangun, 1998). Interestingly, this crossmodal effect took place only after the initial sensory processing of the visual target had been completed (i.e., spatial cueing had no effect on the N1 and P 1 components at posterior brain sites - early negative and positive peaks related to early sensory-perceptual processing). 6 These results show that crossmodal attentional capture effects can influence the sensory processing of visual target stimuli as early as the extrastriate visual cortex, presumably via reentrant input to visual cortex from higher multisensory areas (see Driver & Spence, 2000; Kennett et al., 2001; Miyauchi et al., 1993, for similar results regarding the tactile capture of visual attention). In a number of other studies using the implicit spatial discrimination task, McDonald and Ward (1999, 2000, submitted; Ward et al., 2000) have also shown that auditory cues capture auditory attention, and that visual cues capture both visual and auditory attention. Unfortunately, however, it has proved difficult to rule out a criterion-shifting account of the behavioral effects reported in these experiments, given the use of a speeded detection response (see Ward et al., p. 1263; McDonald et al., 2001, p. 144; though note that it seems unlikely that criterion-shifting could explain the electrophysiological effects reported by McDonald and Ward in modality-specific visual cortex). Using a modified version of the task, McDonald et al. (2001) have recently shown that visual cues can still facilitate responses to ipsilaterally-presented auditory stimuli even when speed-accuracy trade-offs have been ruled out. Participants in McDonald et al.'s study were presented a spatially nonpredictive visual cue to either side of fixation, which was followed by a high or low tone (1,175 Hz vs. 1,109 Hz respectively) presented from the left, right or else from the center. Participants were required to make a speeded frequency discrimination response to tones presented from either of the peripheral locations, but to refrain
244
Spence
from responding to tones presented from fixation. Participants responded significantly faster on ipsilateral trials (mean RT of 642 ms) compared to contralateral trials (mean RT of 696 ms). Importantly, by using a two-alternative discrimination response, rather than the simple detection response used in their previous go/no-go studies, McDonald et al. were able to show that participants made no more errors on the ipsilateral trials than on the contralateral trials (3.9% errors vs. 4.0% errors respectively), hence ruling out a criterion shifting account of their findings. Analysis of ERP data from their study also revealed that visual cues led to a significant modulation of the neural processing of auditory targets at both early and late stages of auditory processing. The early negative peak (occurring 120-140 ms after the presentation of the target), thought to be related to the initial sensoryperceptual selection, was significantly larger for sounds presented ipsilateral to the visual cue, than for sounds presented contralateral to the cue. Importantly, McDonald and colleagues (McDonald, Teder-S~.lej~.rvi, & Hillyard, 2000; McDonald, Teder-S~.lej~.rvi, Di Russo, & Hillyard, 2000; see also Widmann & Schr6ger, 1999) have also recently demonstrated, using both psychophysical (signal detection measures) and electrophysiological measures that the auditory capture of visual attention still occurs, when participants perform a task in which criterion shifting can be ruled out.
Crossmodal Attentional Capture It should be clear from the preceding review that there has been a great deal of research on the topic of crossmodal attentional capture over the last few years. Using a variety of experimental paradigms, researchers have demonstrated different patterns of crossmodal capture, and consequently developed a variety of different, and often conflicting, theories to account for their data. Many of the inconsistencies in this area can, however, be attributed to one or more non-attention explanations based on one or more of the following alternative mechanisms: criterion shifting, response priming, overt orienting, and the use of insensitive response measures. With the development of more advanced experimental paradigms, such as the orthogonal spatial-cueing paradigm and the implicit spatial discrimination task, researchers are now able to show robust and reliable crossmodal attentional capture between most combinations of successive auditory, visual, and tactile stimuli. The most contentious remaining issue is whether auditory cues capture visual attention, and conversely, whether visual cues capture auditory attention (e.g., Spence & Driver, 1997a; Ward, 1994; Ward et al., 1998, 2000). However, research using the orthogonal-cueing paradigm (Driver & Spence, 1998; Spence & Driver, 1997, 1998; Schmitt et al., 2000; Vroomen et al., in press), the implicit spatial discrimination paradigm (McDonald & Ward, 2000, submitted), and other psychophysical and electrophysiological measures (e.g., McDonald et al., 2000a, b; Spence & Lupib.nez, 1998), now shows that auditory cues can capture visual attention. Similarly, recent work from McDonald et al. (2001; McDonald & Ward,
245
Crossmodal Attentional Capture
2000; see also Spence & Lupi~.nez, 1998; Widmann & Schr6ger, 1999) also provides irrefutable evidence that visual cues can capture auditory attention under certain conditions. For example, Spence and Lupib.nez reported a temporal order judgment study in which two pure tones (2,000 Hz and 500 Hz respectively) were presented, one to either side of fixation, at SOAs between 15-250 ms. Participants made unspeeded discrimination responses regarding which tone (either high or low) had been presented first (or second). On each trial, a spatially nonpredictive peripheral visual cue was presented from an LED placed directly in front of one of the loudspeakers used to present the tones. As in other crossmodal capture studies, participants were instructed to ignore the visual cue as much as possible, while still keeping their eyes open. Participants judged the tone presented on the visually cued side to have occurred 44 ms before the tone on the uncued side, demonstrating psychophysically that the visual capture of auditory attention also speeds up the 'time of arrival' of stimuli on the cued side (the phenomenon of crossmodal spatial prior entry; e.g., Shore, Spence, & Klein, 2001; Spence, Shore, & Klein, in press). Taken together, the empirical evidence therefore supports the conclusion that crossmodal attentional capture can occur between all possible combination of auditory visual and tactile stimuli (see Figure 4). The question which must now be answered is why researchers have sometimes failed to demonstrate such crossmodal capture effects?
Touch
Vision Figure 4. Schematic view of the crossmodal attentional capture effects demonstrated to date. Note that auditory, visual, and tactile cues have now been shown to facilitate responses to ipsilaterallypresented targets in all three modalities.
Many of the apparent failures to demonstrate crossmodal cueing effects for auditory target stimuli can be accounted for simply in terms of the use of response measures, such as auditory simple detection latencies, which are insensitive to the spatial distribution of attention (e.g., Buchtel & Butter, 1988; Klein et al., 1987; Spence & Driver, 1994). Many other null results may be attributed to the fact that
246
Spence
auditory stimuli have often been presented from different (in particular, more eccentric) locations than the visual stimuli (e.g., Mondor & Amirault, 1998; Ward, 1994). Moreover, closer inspection of some of the apparent failures to demonstrate the auditory capture of visual attention, reveal numerical trends toward a cueing effect, suggesting that some of these null results may reflect a lack of statistical power, rather than the absence of crossmodal cueing p e r se. For example, there was an 11 ms non-significant trend in Ward et al. (2000; which was only 1 ms away from significance), a 7 ms trend toward the auditory capture of visual attention in Ward (1994), a 10 ms trend in Mondor and Amirault's (1998, Experiment 1) study which was associated with a similar trend in the error data, and a 20 ms trend in Dufour's (1999) study (see also Spence & Driver, 1997a, p. 17). Finally, it now seems likely that the null effect of visual cues on auditory elevation discrimination responses in the orthogonal-cueing task may have been caused by the fact that auditory targets were presented from different elevations than the visual cues, even on ipsilateral trials (McDonald & Ward, submitted; Ward et al., 2000). A review of previous studies reveals that the auditory targets have always been presented from at least 14 degrees away from the visual cue in the majority of studies (e.g., Schmitt et al., 2000; Spence & Driver, 1997; Vroomen et al., in press), v The spatial distribution of attention following crossmodal capture may depend more on the attributes of the cue modality, than on the modality of the target (cf. Spence & Driver, 1998, p. 134). In particular, as suggested by Ward et al. (2000, p. 1264), it is possible that visual cues may lead to a more spatially-localized shift of attention than either auditory or tactile cues. It is already well known that the spatial acuity of the visual system is far better than that seen in response to tactile or auditory stimuli (e.g., Fisher, 1962; Simpson, 1972; Warren, 1970), and consequently it is possible that visual cues may also elicit a more spatially-localized crossmodal capture of auditory and tactile attention than either auditory or visual cues. Empirical support for this claim comes from a recent study reported by Chong and Mattingley (2000) in which they investigated crossmodal attentional capture between vision and touch using the orthogonal-cueing paradigm. They found that the presentation of a tactile cue led to the facilitation of responses to visual targets irrespective of their distance from the tactile cues. By contrast, the presentation of a visual cue was shown to facilitate tactile elevation discrimination responses more when the visual cues were close to the hand than when they were further away. If, as Chong and Mattingley's results suggest, visual cues lead to a more spatially-focused attentional capture effect than tactile, and perhaps auditory cues (see Figure 5), this would provide one possible explanation for why visual cues have only been shown to capture auditory attention when auditory targets are presented from close to the cue, as in the implicit spatial discrimination task (e.g., McDonald et al., 2001; Spence & Lupifinez, 1998; Widmann & Schr6ger, 1999), or when targets are presented from similar elevations to the cues as in the orthogonalcueing task when visual or tactile targets are used (e.g., Chong & Mattingley, 2000; Kennett et al., 2001, submitted; see Figure 3). 8 This means that it may be
Crossmodal Attentional Capture
247
particularly important in the future to ensure that cue and target stimuli are presented from the same position when assessing crossmodal attentional capture following visual cues (cf. Kadunce, Vaughan, Wallace, Benedek, & Stein, 1997). 9 It will clearly also be important for future research to address more carefully the spatial distribution of attention following the peripheral presentation of spatially nonpredictive cues. Ao
C.
Visual Cue
Auditory Cue
~
Bo
0 0' 0
Spatial Attention
Spatial Attention
Figure 5. Schematic illustration of how spatially nonpredictive cues in different sensory modalities (here audition, Figure 5A; and vision, Figure 5C) might elicit attentional capture effects of different spatial specificity, hence providing one explanation for why auditory cues may facilitate
elevation discrimination responses for visual targets (Figure 5B, while visual cues may fail to elicit any significant effect on auditory elevation discrimination latencies (Figure 5D). Taken together, it seems that methodological factors, together with this differential spatial-cueing effects of different cue modalities can account for many of the failures to demonstrate crossmodal attentional capture effects in previous studies. Before moving on, however, it is important to assess Ward and colleagues (e.g., Ward, 1994; Ward et al., 2000) recent claim that crossmodal attentional capture effects may be modulated by strategic factors. In particular, they suggest that participants may adopt different strategies in situations where they are presented with a single cue, as compared to situations in which a variety of different cues are presented (i.e., situations in which the cueing environment is complex). They suggest that participants can ignore auditory cues when they are intermixed with visual cues (i.e., when the cueing environment is complex), but cannot ignore the
248
Spence
auditory cues when only a single type of cue is presented at any given time. According to this claim, auditory cues should only capture visual attention in simple, but not complex (i.e., multimodal) cueing environments. Support for this view comes from the fact that auditory cues only facilitate visual performance in the left/right discrimination task when the cue modality is predictably auditory (e.g., Bernstein & Edelstein, 1971; Schmitt et al., 2000; Simon & Craft, 1970, but not when it is unpredictably either auditory or visual (e.g., Ward, 1994; Ward et al., 1998, Experiment 8). Until recently, the majority of orthogonal cueing studies, also presented auditory cues in a simple unimodal cue environment and so provided results consistent with Ward's claim (e.g., Driver & Spence, 1998; Schmitt et al., 2000; Spence & Driver, 1997, 1998). However, more recent studies have shown that auditory cues still facilitate visual elevation discrimination responses even when a complex cueing environment is used (e.g., Spence & Driver, 2000; Vroomen et al., in press; Spence & Driver, 1999), showing that the cue complexity account of crossmodal attentional capture cannot be correct, at least when performance is assessed in the orthogonal-cueing task. l~
Spatial relevance One general finding to emerge from the study of crossmodal attentional capture is that cueing effects appear more robust when participants make some form of spatial discrimination response, as compared to when they make a non-spatial response. A similar pattern of results has also been reported in purely auditory studies of attentional capture as well (e.g., McDonald & Ward, 1999; Spence & Driver, 1994; and Pavan;, L~tdavas, & Driver, in press, for similar results in a patient population). It seems clear that neither methodological confounds nor strategic factors can account for such findings. In a thorough review of this area, McDonald and Ward (1999) have suggested that attentional capture effects for auditory target stimuli will only be demonstrated under conditions where space is made relevant to the accomplishment of the participant's task, either by forcing them to make a spatial discrimination response, or by some other means (the so-called spatial relevance hypothesis; see also Klein & Taylor, 1994). ~t It seems likely that making space relevant to their task forces participants into responding on the basis of some representation of auditory stimuli, in which spatial information is explicitly coded. Many of the neural structures, in which auditory spatial information is represented (e.g., the superior colliculus, and the inferior parietal lobule; Bushara et al., 1999; King, 1993; Weeks et al., 1999) are multimodal, hence providing plausible neural substrates for the behavioural effects identified in these crossmodal capture studies. It seems possible that crossmodal attentional capture effects will only be demonstrated in situations where participants are forced to respond on the basis of information coded in multimodal brain structures, such as those typically implicated in spatial representation. For example, using PET, Bushara et al. (1999) recently found that both auditory and visual localization tasks result in activation in the inferior parietal lobe (see also Weeks et al., 1999). The fact that common neural
249
Crossmodal Attentional Capture
substrates are involved in both auditory and visual spatial processing might help to explain why crossmodal capture effects are so strong when a spatial discrimination response is required. It will be an interesting question for future research to determine whether certain tasks may be more affected by intramodal attentional capture, whereas, other tasks will be affected by both intramodal and crossmodal attentional capture equally (see McDonald & Ward, 2000; and Hopfinger & Mangun, 1998; for preliminary electrophysiological support for such a distinction).
Modality-Specific vs. Supramodal Attention Systems Over the years, researchers have proposed a number of different accounts of how attention may be coordinated across the modalities (see Spence & Driver, 1996; McDonald & Ward, submitted, for reviews). One of the earliest suggestions came from Wickens (e.g., 1980, 1984) who proposed that people have entirely modality-specific (i.e., auditory, visual, and tactile) attentional systems, such that the distribution of attention in one modality has no effect on attention in the other modalities (see Figure 6A). This purely modality-specific resource account is Ao
B~
Visual
Auditory
Tactile
;t
Supramodal Auditory + Visual + Tactile
CO DO Supramodal Auditory + Visual + Tactile Tactile Visual
Auditory
Tactile
Figure 6. Schematic illustration of the ways in which researchers have conceptualized how the attentional systems might be coordinated across the different sensory modalities. A) Independent modality-specific attentional resources; B) Single supramodal attention system; C) Hierarchical supramodal plus modality-specific attentional systems; and D) Separable-but-linked attentional systems (see McDonald& Ward, submitted; and Spence & Driver, 1996, for reviews).
250
Spence
clearly inconsistent with many of the behavioral and electrophysiological results reported here. Other researchers have argued for a single supramodal attentional system (e.g., Farah et al., 1989, pp. 469-470), that allocates attention to locations in space regardless of the modality of the stimuli presented there (see Figure 6B). The single supramodal account has been ruled out for the case of endogenous spatial orienting, by researchers who have shown that people can simultaneously direct their auditory, visual, and/or tactile attention in different directions simultaneously (e.g., Driver & Spence, 1994; Lloyd et al., submitted; Spence & Driver, 1996; Spence et al., 2000). Researchers have suggested more complex attentional architectures, such as the hybrid account proposed by Posner (1990, pp. 202-203), whereby the various modality-specific attentional subsystems are thought to feed into a higher-level supramodal system (see Figure 6C; see also Bushara et al., 1999, p. 764; Woods, Alho, & Algazi, 1992). By contrast, Spence and Driver (1996) argued for separable-but-linked modality-specific attentional systems in each modality to account for their endogenous spatial attention data (i.e., without the need for a higher-order supramodal system). According to Spence and Driver, separate modality-specific attentional systems may operate upon the representations of auditory, visual, and tactile space, but strong crossmodal links ensure that attention in the different modalities is normally directed to the same spatial location (see Figure 6D). Until recently, the lack of any unambiguous evidence that visual cues could pull auditory attention led many researchers to argue that the separable-but-linked account of crossmodal links in spatial attention might also provide the most parsimonious account of crossmodal links in exogenous spatial attention as well (e.g., Driver & Spence, 1998; Mondor & Amirault, 1998, p. 753; Schmitt et al., 2000, in press; Spence et al., 1998). However, given that that recent empirical data now shows that crossmodal capture can occur between all combinations of auditory, visual, and tactile stimuli (at least when cue and target are presented from the same spatial location), the validity of the single supramodal account needs to be reassessed. ~2 One critical behavioral experiment that might help to tease apart these various alternatives would involve simultaneously presenting spatially nonpredictive cues from different positions in different modalities (i.e., a visual cue on the left together with an auditory cue on the right), followed unpredictably by auditory or visual targets. According to the separable-but-linked hypothesis one might expect to see facilitation of responses to auditory targets on the side of the auditory cue, and the facilitation of responses to visual targets on the visually-cued side (clearly this effect might well be modulated by the relative intensity and timing of the cues used). Ward (1994) actually carried out this experiment, but his use of the confounded left/right localization task, makes any interpretation of his results difficult. Using their implicit spatial discrimination task, Ward et al. (2000) reported no significant spatial cueing effects when auditory and visual cues were presented simultaneously from opposite sides in their study. Resolution of this issue will clearly be an important issue for future research (e.g., Farah et al., 1989; McDonald et al., 2001; Spence & Driver, 1996), but may well require an increasing reliance of
Crossmodal Attentional Capture
251
cognitive neuroscience techniques to elucidate the brain structures underlying these crossmodal attentional effects (see Driver & Spence, 2000; Macaluso, Frith, & Driver, 2000; Spence & Driver, 1996). Neural Correlates of Crossmodal Capture
It is clear that the crossmodal capture effects reported here imply that some degree of spatial integration must arise between the sensory modalities prior to participants making a response, such that common locations are treated as such across the different senses when different postures are adopted (Driver & Spence, 1998). However, how such crossmodal coordination of spatial representation arises is a nontrivial problem, given that information is initially coded in very different coordinates in the various senses. For instance, visual stimuli are initially coded retinotopically, auditory stimuli tonotopically, and tactile stimuli somatotopically, so there is little in common between a shared location across the modalities at input (i.e., at the level of the sensory epithelia). One candidate neural substrate for the crossmodal capture effects reviewed here is the superior colliculus (SC; Spence & Driver, 1997a). A majority of cells in the deeper layers of this subcortical structure are multimodal (Stein & Meredith, 1993), and neurophysiological and neuropsychological studies have implicated its involvement in the subcortical control of overt and covert exogenous orienting both in animals (e.g., Peck, 1987; Robinson & Kertzman, 1995; Stein, & Meredith, Honeycutt, & McDade, 1989; Stein, Wallace, & Meredith, 1995) and in humans (e.g., Rafal et al., 1991). Neurophysiological studies in a number of species have demonstrated that, by the time sensory information reaches the deeper layers of the SC, it has been transformed into spatiotopically arrayed 'maps' of auditory, visual, and somatosensory space (e.g., see Groh & Sparks, 1996; King, 1993; Stein & Meredith, 1993). Moreover, these maps are in approximate spatiotopic register with each other (e.g., a bimodal visual-auditory cell that responded to visual stimuli above and to the right of the animal would also respond to sounds coming from the upper right as well), and with the motor maps found in the deepest layers, which are associated with overt orienting of the eyes, head, and body. Although much of the neuroscience interest in recent years has tended to focus on multimodal integration and its implications for spatial orienting within just the SC (perhaps because the deeper layers of the SC have one of the densest concentrations of multisensory neurons in the brain; Stein et al., 1995), there are actually many other neural centers, such as the posterior parietal cortex, the putamen, and the premotor cortex, that also show multimodal spatial integration and may also be involved in the modulation of attention (e.g., Graziano & Gross, 1994, 1998; Rizzolatti, Scandolara, Matelli, & Gentilucci, 1981). For example, neurophysiological studies reported by Graziano and Gross (1994, 1998) have demonstrated the existence of bimodal cells in several areas of the monkey brain (including the premotor cortex, parietal area 7b, and the putamen) that respond to tactile stimuli on the hand, as well as to visual stimuli presented near the hand.
252
Spence
Critically, the visual receptive field (RF) of such neurons follow the hand around as different postures are adopted, hence maintaining appropriate visuotactile register across posture change. Cells in these areas therefore provide another possible neural substrate for the crossmodal capture effects between vision and touch when participants adopt different postures, such as crossing their hands (e.g., Kennett et al., 2001, submitted; Spence, Kingstone, Shore, & Gazzaniga, 2001).
Crossmodal Capture in the Applied Domain Given the extensive evidence for crossmodal attentional capture in the laboratory, it is important to ask whether such findings have any application outside the laboratory. In recent years, there has been a rapid growth of interest in the use of auditory, tactile and multimodal warning signals to capture the attention of operators working in visually-cluttered environments (e.g., Liu, 2001; Selcon, Taylor, & McKenna, 1995; Sklar & Sarter, 1999; see Spence & Driver, 1997b, for a review). In particular, in situations where operators have to respond rapidly to time-critical information, such as missile approach warning signals for pilots, when even small time savings can be vital (see Doll, Gerth, Engelman, & Folds, 1986; Selcon et al., 1995). For example, freefield (and virtual) auditory cues have been shown to provide an effective means of crossmodally capturing a pilot's visual attention, particularly when searching for visual targets in the large and cluttered visual displays typical of many aircraft (e.g., Bolia, D'Angelo, & McKinley, 1999; Doyle & Snowden, 1999; Perrott, Cisneros, McKinley, & D'Angelo, 1996; Perrott, Saberi, Brown, & Strybel, 1990; Perrott, Sadralodabai, Saberi, & Strybel, 1991). Perrott et al. (1996) showed that the time required by pilots to localize and respond to a visual target presented amongst visual distractors, at any azimuth from 0 degrees to 360 degrees, and from elevations 90 degrees above to 70 degrees below fixation, can be dramatically reduced by presenting a localized free-field auditory warning signal from the same spatial location (without any concomitant increase in errors, thus ruling out a non-attentional explanation of these findings). These auditory facilitation effects have not only been reported for visual stimuli presented out of the current field of view, but also for visual targets lying within just a few degrees of fixation, where the RT facilitation seen in cluttered visual environments can still exceed 300 ms. In fact, in certain situations, the benefits of using auditory cues to capture a pilot's visual attention have been shown to outweigh those achieved by enhancing the saliency of the visual target stimuli themselves (e.g., Perrott et al., 1991). Given the growing interest in this area it seems clear that cockpit designers will increasingly move toward using auditory, tactile, and/or multimodal warning signals to crossmodally capture pilot's (and other interface operator's) visual attention. However, as Spence and Driver (1997b, see also McBride & Ntuen, 1997) have pointed out, it is important to note the potential trade-off associated with using multimodal warning signals, which is that their implementation may require the operator to monitor additional channels (and hence to divide their attention between several modalities simultaneously; see Spence, Nicholls, & Driver, 2001).
Crossmodal Attentional Capture
253
Conclusions
There has been a rapid growth of interest in the study of crossmodal attentional capture in recent years. Taken together, this research clearly shows that crossmodal attentional capture can occur between all possible combinations of auditory, visual, and tactile stimuli, at least under certain conditions. These crossmodal capture effects occur even when the cue modality is entirely irrelevant to the participant's task, suggesting that crossmodal capture occurs automatically. It will nevertheless be an important questions for future research to determine to what extent exogenous crossmodal capture effects can be modulated by endogenous factors, such as the direction of endogenous spatial attention to a particular modality or location (cf. Klein et al., 1987; Spence, Ranson, & Driver, 2000; Widmann & Schr6ger, 1999). Crossmodal attentional capture seems to have a particularly robust effect on spatial tasks, or in situations in which space is relevant to the participant's task (McDonald & Ward, 1999, submitted; Posner, 1978; Spence & Driver, 1994), and this may be because such tasks require participants to respond on the basis of a multimodal neural representation of space in the brain. In conclusion, it is clear that the existence of extensive crossmodal links in spatial attention between audition, vision, and touch, makes good functional sense given that information regarding an event presented in different modalities will normally occur in the same spatial location. The existence of crossmodal attentional capture ensures that mechanisms of attention are coordinated across the modalities, so that the common relevant information from novel events in our environment will get selected together across the different senses, regardless of posture. Footnotes
It should be noted that such coordination poses a considerable computational challenge, because the stimulus properties signalling a common source across the modalities (e.g., the various cues to location in audition, vision, and touch) differ so greatly at the initial stages of sensory processing (e.g., vision is retinotopic, whereas audition is initially tonotopic and then head-centred, while touch is initially coded somatotopically). 2 The term 'crossmodal attentional capture' is used here to denote situations in which the presentation of a spatially-nonpredictive peripheral event in one sensory modality leads to an exogenous shift of attention in another modality to the cued location. It is important to distinguish this use of the term 'crossmodal capture' from that seen in crossmodal conflict situations, where information in one modality is shown to dominate over conflicting information presented in another modality. For example, in the well-known ventriloquist effect where we hear a voice as coming from the lips we see move when they are presented from different (i.e., conflicting) locations. This use of the term 'capture' to describe such intersensory bias effects (i.e., crossmodal perceptual capture) has a long history in experimental psychology (e.g., Caclin, Soto-Faraco, Kingstone, & Spence, submitted; Posner,
254
Spence
Nissen, & Klein, 1976; Rock & Harris, 1967), but should be distinguished from the crossmodal attentional capture effects discussed here. 3 It has been argued that simple detection responses to auditory stimuli may be based on an 'early' tonotopic stimulus representation, in which spatial location information is not made explicit (see Spence & Driver, 1994; McDonald & Ward, 1999, on this point). This contrasts with vision, where even the earliest representations are spatial (i.e., retinotopic). 4 It should be noted that Dufour's use of a speeded discrimination response in one experiment and an unspeeded discrimination response in the other experiment makes it difficult to draw any firm conclusions regarding the underlying reason why crossmodal capture effects were reported in only one experiment (i.e., it is unclear whether the difference should be attributed to differences in the nature of the tasks, or of the particular response measures used). It is also interesting to note that Briand and Klein (1987) reported a somewhat different pattern of results in their unimodal study of visual capture. They showed that the peripheral presentation of a visual cue facilitated performance on both visual feature detection and conjunction detection tasks, though the effects were larger for the conjunction task. 5 The only exception to this finding was reported by Ward et al. (1998, Experiment 8) who actually reported that visual cues had a significant inhibitory effect on ipsilateral auditory elevation responses in the orthogonal cueing task. As discussed later, this atypical result may have been caused by the fact that Ward et al. used a highly-localizable white noise cues, rather than the pure tone cues (which are hard to localize in terms of their elevation) used in previous studies. 6 It is interesting to note here that Hopfinger and Mangun (1998) showed P 1 modulation at short SOAs, suggesting a possible difference between intramodal and crossmodal attentional capture effects. 7 It should be noted that this spatial elevation discrepancy between cue and target stimuli in the orthogonal-cueing task is particularly pronounced for the case of auditory targets, where the target loudspeakers have to be separated by a large elevation difference in order for participants to be able to discriminate target elevation reliably. 8 One result which does not immediately fit into this framework is the finding that visual cues facilitate elevation discrimination responses for visual targets positioned 14 degrees or more above or below the cue light (Spence & Driver 1997; Ward et al., 1998, Experiment 8). However, it is possible that such a result may partially reflect local landmarking (which may facilitate elevation discrimination responses for targets on the cued side), rather than attentional facilitation. 9 Although cue and target stimuli can be presented from the same, or very similar, locations in the majority of orthogonal-cueing studies (e.g., see Figure 3), the one situation in which spatial co-location of cue and target stimuli is more difficult is when visual cues precede auditory targets (see Figure 1), precisely the situation in which crossmodal attentional capture effects have not been demonstrated using this task.
CrossmodalAttentionalCapture
255
10Note that one problem with Ward et al.'s account of cue complexity is that no definition has yet been given of what constitutes a complex, rather than a simple, cueing environment. ~ Though note that McDonald and Ward's (1999) spatial relevance hypothesis cannot account for the intramodal auditory capture effects reported by Mondor and Amirault (1998, Experiment 1). ~z However, as McDonald et al. (2001) point out, even the demonstration of reciprocal crossmodal capture effects between all possible combinations of auditory, visual, and tactile stimuli does not necessarily imply that attentional capture is mediated by a purely supramodal mechanism, because one cannot rule out the possibility that a shift of attention in one modality might elicit a separate shift of attention in the other 'tightly-linked' modalities. References
Bernstein, I. H. & Edelstein, B. A. (1971). Effects of some variations in auditory input upon visual choice reaction time. Journal of Experimental Psychology, 87, 241-247. Bolia, R. S., D'Angelo, W. R., & McKinley, R. L. (1999). Aurally-aided visual search in three-dimensional space. Human Factors, 41,664-669. Briand, K. A. (1998). Feature integration and spatial attention: More evidence of a dissociation between endogenous and exogenous orienting. Journal of Experimental Psychology: Human Perception and Performance, 24, 1243-1256. Briand, K. A., & Klein, R. M. (1987). Is Posner's "beam" the same as Treisman's "glue"?: On the relation between visual orienting and feature integration theory. Journal of Experimental Psychology: Human Perception and Performance, 13,228-241. Buchtel, H. A. & Butter, C. M. (1988). Spatial attention shifts: Implications for the role of polysensory mechanisms. Neuropsychologia, 26, 499-509. Buchtel, H. A., Butter, C. M., & Ayvasik, B. (1996). Effects of stimulus source and intensity on covert orientation to auditory stimuli, Neuropsychologia, 34, 979-985. Bushara, K. O., Weeks, R. A., Ishii, K., Catalan, M.-J., Tian, B., Rauschecker, J. P., & Hallett, M. (1999). Modality-specific frontal and parietal areas for auditory and visual spatial localization in humans. Nature Neuroscience, 2, 759765. Butter, C. M., Buchtel, H. A., & Santucci, R. (1989). Spatial attentional shifts: Further evidence for the role of polysensory mechanisms using visual and tactile stimuli. Neuropsychologia, 27, 1231-1240. Chong, T. & Mattingley, J. B. (2000). Preserved cross-modal attentional links in the absence of conscious vision: Evidence from patients with primary visual cortex lesions. Journal of Cognitive Neuroscience, 12 (Supp.), 38.
256
Spence Doll, T. J., Gerth, J. M., Engelman, W. R., & Folds, D. J. (1986).
Development of simulated directional audio for cockpit applications (USAF Report AAMRL-TR-86-014). Wright-Patterson Air Force Base, OH: Armstrong Aerospace Medical Research Laboratory. Doyle, M. C. & Snowden, R. J. (1999). The effect of auditory warning signals on visual target identification. In D. Harris (Ed.), Engineering Psychology
and Cognitive Ergonomics, Vol. 4: Job Design, Product Design and HumanComputer Interaction (pp. 245-251). Ashgate Publishing: Hampshire. Driver, J. (1996). Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading. Nature, 381, 66-68. Driver, J., & Spence, C. J. (1994). Spatial synergies between auditory and visual attention. In C. Umilt/l & M. Moscovitch (Eds.), Attention and performance: Conscious and nonconscious information processing, (Vol. 15, pp. 311-331). MIT Press: Cambridge, MA. Driver, J., & Spence, C. (1998). Crossmodal links in spatial attention. Philosophical Transactions of the Royal Society Section B, 353, 1319-1331. Driver, J., & Spence, C. (2000). Multisensory perception: Beyond modularity and convergence. Current Biology, 1O, R731-R735. Dufour, A. (1999). Importance of attentional mechanisms in audiovisual links. Experimental Brain Research, 126, 215-222. Duncan, J. (1980). The demonstration of capacity limitation. Cognitive Psychology, 12, 75-96. Eimer, M., & Driver, J. (2000). An event-related brain potential study of cross-modal links in spatial attention between vision and touch. Psychophysiology, 3 7, 697-705. Eimer, M. & Schr6ger, E. (1998). ERP effects of intermodal attention and cross-modal links in spatial attention. Psychophysiology, 35, 313-327. Farah, M. J., Wong, A. B., Monheit, M. A., & Morrow, L. A. (1989). Parietal lobe mechanisms of spatial attention: Modality-specific or supramodal? Neuropsyehologia, 27, 461-470. Fisher, G. H. (1962). Resolution of spatial conflict. Bulletin of the British Psychological Society, 46, 3A. Folk, C. L., Remington, R. W., & Johnston, J. C. (1992). Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 18, 1030-1044. Graziano, M. S. A. & Gross, C. G. (1994). Mapping space with neurons. Current Directions in Psychological Science, 3, 164-167. Graziano, M. & Gross, C. (1998). Spatial maps for the control of movement. Current Opinion in Neurobiology, 8, 195-201. Groh, J. M. & Sparks, D. L. (1996). Saccades to somatosensory targets. 2. Motor convergence in primate superior colliculus. Journal of Neurophysiology, 75, 428-438.
CrossmodalAttentionalCapture
257
Hillyard, S. A., Simpson, G. V., Woods, D. L., Van Voorhis, S., & Munte, T. F., (1984). Event-related brain potentials and selective attention to different modalities. In F. Reinoso-Suarez & C. Ajmone-Marson (Eds.), Cortical integration, (pp. 395-414). New York: Raven Press. Hopfinger, J. B. & Mangun, G. R. (1998). Reflexive attention modulates processing of visual stimuli in human extrastriate cortex. Psychological Science, 9, 441-447. Hugdahl, K. & Nordby, H. (1994). Electrophysiological correlates to cued attentional shifts in the visual and auditory modalities. Behavioral and Neural Biology, 62, 21-32. Johnen, A., Wagner, H., & Gaese, B. H. (2001). Spatial attention modulates sound localization in barn owls. Journal of Physiology, 85, 1009-1012. Jones, M. R., Moynihan, H., MacKenzie, N., & Hoffman, J. (in press). Stimulus-driven attending in dynamic arrays. Psychological Science. Kadunce, D. C., Vaughan, J. W., Wallace, M. T., Benedek, G., & Stein, B. E. (1997). Mechanisms of within- and cross-modality suppression in the superior colliculus. Journal of Neurophysiology, 78, 2834-47. Kennett, S., Eimer, M., Spence, C., & Driver, J. (2001). Tactile-visual links in exogenous spatial attention under different postures: Convergent evidence from psychophysics and ERPs. Journal of Cognitive Neuroscience, ! 3, 462-468. Kennett, S., Spence, C., & Driver, J. (submitted). The spatial coordinates of visuo-tactile links in covert exogenous spatial attention. Perception & Psychophysics. King, A. J. (1993). A map of auditory space in the mammalian brain: Neural computation and development. Experimental Physiology, 78, 559-590. Klein, R., Brennan, M., D'Aloisio, A., D'Entremont, B., & Gilani, A. (1987). Covert cross-modality orienting of attention. Unpublished manuscript. Klein, R. M., & Shore, D. I. (2000). Relationships among modes of visual orienting. In S. Monsell & J. Driver (Eds.), Control of cognitive processes: Attention and performance XVIII (pp. 195-208). Cambridge, MA: MIT Press. Klein, R. M. & Taylor, T.L. (1994). Categories of cognitive inhibition with reference to attention. In D. Dagenbach & T. H. Carr, Inhibitory processes in attention, memory, and language (pp. 113-150). Academic Press. Ladavas, E. (1993). Shifts of attention in patients with visual neglect. In I.H. Robertson & J.C. Marshall (Eds), Unilateral neglect: Clinical and experimental studies, (pp. 193-209). Hillsdale, NJ: Erlbaum. Liu, Y.-C. (2001). Comparative study of the effects of auditory, visual and multimodal displays on drivers' performance in advanced traveller information systems. Ergonomics, 44, 425-442. Lloyd, D. M., Merat, N., McGlone, F., & Spence, C. (submitted). Crossmodal links in covert endogenous spatial attention between audition and touch. Perception & Psychophysics. Macaluso, E., Frith, C., & Driver, J. (2000). Modulation of human visual cortex by crossmodal spatial attention. Science, 289, 1206-1208.
258
Spence
McBride, M. E., & Ntuen, C. A. (1997). The effects of multimodal display aids on human performance. Computers and Industrial Engineering, 33, 197-200. McDonald, J. J., Teder-S~ilej~irvi, W. A., Di Russo, F., & Hillyard, S. A. (2000a). Looking at sound: Involuntary auditory attention modulates neural processing in extrastriate visual cortex. Poster presented at the Annual Meeting of the Society for Psychophysiological Research. San Diego: California, October. McDonald, J. J., Teder-S~.lej~irvi, W. A., & Hillyard, S. A. (2000b). Involuntary orienting to sound improves visual perception. Nature, 407, 906-908. McDonald, J. J., Teder-S~ilej~irvi, W. A., Heraldez, D., & Hillyard, S. A. (2001). Electrophysiological evidence for the "missing link" in crossmodal attention. Canadian Journal of Experimental Psychology, 55, 143-151. McDonald, J. J., & Ward, L. M. (1999). Spatial relevance determines facilitatory and inhibitory effects of auditory covert spatial orienting. Journal of Experimental Psychology: Human Perception and Performance, 25, 1234-1252. McDonald, J. J., & Ward, L. M. (2000). Involuntary listening aids seeing: Evidence from human electrophysiology. Psychological Science, l 1, 167-171. McDonald, J. J. & Ward, L. M. (submitted, 2000). Crossmodal consequences of involuntary spatial attention and inhibition of return. Journal of Experimental Psychology: Human Perception and Performance. Miyauchi, S., Hikosaka, O., Shimojo, S., & Okamura, H. (1993). Spatial attention is cross-modal: An evoked potential study. Investigative Ophthalmology and Visual Science, 34, 1234. Mondor, T. A. & Amirault, K. J. (1998). Effect of same- and differentmodality spatial cues on auditory and visual target identification. Journal of Experimental Psychology: Human Perception and Performance, 24, 745-755. M~iller, H. J. & Findlay, J. M. (1987). Sensitivity and criterion effects in the spatial cueing of visual attention. Perception & Psychophysics, 42, 383-399. Pashler, H. E. (1998). The Psychology of Attention. MIT Press: Cambridge: MA. Pavani, F., L~.davas, E., & Driver, J. (in press). Selective deficit of auditory localisation in patients with visuospatial neglect. Neuropsychologia. Peck, C. K. (1987). Visual-auditory interactions in cat superior colliculus: Their role in control of gaze. Brain Research, 420, 162-166. Perrott, D. R., Cisneros, J., McKinley, R. L., & D'Angelo, W. (1996). Aurally aided visual search under virtual and flee-field listening conditions. Human Factors, 38, 702-715. Perrott, D. R., Saberi, K., Brown, K., & Strybel, T. Z. (1990). Auditory psychomotor coordination and visual search performance. Perception & Psychophysics, 48, 214-226. Perrott, D. R., Sadralodabai, T., Saberi, K., & Strybel, T. Z. (1991). Aurally aided visual search in the central visual field: Effects of visual load and visual enhancement of the target. Human Factors, 33, 389-400. Posner, M. I. (1978). Chronometric explorations of mind. Hillsdale, NJ: Erlbaum.
Crossmodal Attentional Capture
259
Posner, M. I. (1988). Structures and functions of selective attention. In T. Boll & B. K. Bryant (Eds.), Master lectures in clinical neuropsychology and brain function: Research, measurement and practice, (pp. 171-202). Washington, DC: American Psychological Association. Posner, M. I. (1990). Hierarchical distributed networks in the neuropsychology of selective attention. In A. Caramazza (Ed.), Cognitive
neuropsychology and neurolinguistics." Advances in models of cognitive function and impairment, (pp. 187-210). Hillsdale, NJ: Erlbaum. Posner, M. I., Nissen, M. J., & Klein, R. M. (1976). Visual dominance: An information-processing account of its origins and significance. Psychological Review, 83, 157-171. Rafal, R. (1996). Visual attention: Converging operations from neurology and psychology. In A. F. Kramer, M. G. H. Coles, & G. D. Logan (Eds.), Converging operations in the study of visual selective attention (pp. 139-102). Washington, DC: American Psychological Association. Rafal, R., Henik, A., & Smith, J. (1991). Extrageniculate contributions to reflex visual orienting in normal humans: A temporal hemifield advantage. Journal of Cognitive Neuroscience, 3,322-328. Reuter-Lorenz, P. A., & Rosenquist, J. N. (1996). Auditory cues and inhibition of return: The importance of oculomotor activation. Experimental Brain Research, 112, 119-126. Rizzolatti, G., Scandolara, C., Matelli, M., & Gentilucci, M. (1981). Afferent properties of periarcuate neurons in macaque monkeys. II. Visual responses. Behavioural Brain Research, 2, 147-163. Robinson, D. L., & Kertzman, C. (1995). Covert orienting of attention in macaques. III. Contributions of the superior colliculus. Journal of Neurophysiology, 74, 713-721. Rock, I., & Harris, C. S. (1967, 17 May). Vision and touch. Scientific American, 216, 96-104. Rorden, C., & Driver, J. (1999). Does auditory attention shift in the direction of an upcoming saccade? Neuropsychologia, 37, 357-377. Rorden, C., & Driver, J. (2001). Spatial deployment of attention within and across hemifields in an auditory task. Experimental Brain Research, 13 7, 487-496. Schmitt, M., Postma, A., & de Haan, E. (2000). Interactions between exogenous auditory and visual spatial attention. Quarterly Journal of Experimental Psychology, 53A, 105-130. Schmitt, M., Postma, A., & de Haan, E. (in press). Cross-modal exogenous attention and distance effects in vision and hearing. European Journal of Cognitive
Psychology. Selcon, S. J., Taylor, R. M., & McKenna, F. P. (1995). Integrating multiple information sources: using redundancy in the design of warnings. Ergonomics, 38, 2362-2370. Sherrington, C. S. (1920). Integrative action of the nervous system. New Haven: Yale University Press.
Spence
260
Shore, D. I., Spence, C., & Klein, R. M. (2001). Visual prior entry.
Psychological Science, 12, 205-212. Simon, J. R., & Craft, J. L. (1970). Effects of an irrelevant auditory stimulus on visual choice reaction time. Journal of Experimental Psychology, 86, 272-274. Simpson, W. E. (1972). Latency of locating lights and sounds. Journal of Experimental Psychology, 93, 169-175. Sklar, A. E., & Sarter, N. B. (1999). Good vibrations: Tactile feedback in support of attention allocation and human-automation coordination in event-driven domains. Human Factors, 41,543-552. Spence, C. J., & Driver, J. (1994). Covert spatial orienting in audition: Exogenous and endogenous mechanisms facilitate sound localization. Journal of Experimental Psychology." Human Perception and Performance, 20, 555 -574. Spence, C., & Driver, J. (1996). Audiovisual links in endogenous covert spatial attention. Journal of Experimental Psychology: Human Perception and Performance, 22, 1005-1030. Spence, C., & Driver, J. (1997a). Audiovisual links in exogenous covert spatial orienting. Perception & Psychophysics, 59, 1-22. Spence, C., & Driver, J. (1997b). Cross-modal links in attention between audition, vision, and touch: Implications for interface design. International Journal of Cognitive Ergonomics, 1, 351-373. Spence, C., & Driver, J. (1998). Auditory and audiovisual inhibition of return. Perception & Psychophysics, 60, 125-139. Spence, C., & Driver, J. (1999). A new approach to the design of multimodal warning signals. In D. Harris (Ed.), Engineering Psychology and Cognitive
Ergonomics, Vol. 4: Job Design, Product Design and Human-Computer Interaction (pp. 455-461). Ashgate Publishing: Hampshire. Spence, C., & Driver, J. (2000). Attracting attention to the illusory location of a sound: Reflexive crossmodal orienting and ventriloquism. Neuroreport, 11, 2057-2061. Spence, C., Kingstone, A., Shore, D. I., & Gazzaniga, M. S. (2001). Representation of visuotactile space in the split brain. Psychological Science, 12, 90-93. Spence, C., & Lupi~.nez, J. (1998). Crossmodal links in attention revealed by the orthogonal temporal order judgment task. Paper presented at II Congreso de la Sociedad Espanyola de Psicologia Experimental (SEPEX 98). Granada, Spain, 17th December. Spence, C., & McGlone, F. P. (in press). Reflexive orienting of tactile attention. Experimental Brain Research. Spence, C., Nicholls, M. E. R., & Driver, J. (2001). The cost of expecting events in the wrong sensory modality. Perception & Psychophysics, 63, 330-336. Spence, C., Nicholls, M. E. R., Gillespie, N., & Driver, J. (1998). Crossmodal links in exogenous covert spatial orienting between touch, audition, and vision. Perception & Psychophysics, 60, 544-557.
Crossmodal Attentional Capture
261
Spence, C., Pavani, F., & Driver, J. (2000). Crossmodal links between vision and touch in covert endogenous spatial attention. Journal of Experimental Psychology: Human Perception and Performance, 26, 1298-1319. Spence, C., Ranson, J., & Driver, J. (2000). Crossmodal selective attention: On the difficulty of ignoring sounds at the locus of visual attention. Perception & Psychophysics, 62, 410-424. Spence, C., Shore, D. I., & Klein, R. M. (in press). Multimodal prior entry.
Journal of Experimental Psychology." General. Sperling, G., & Dosher, B. A. (1986). Strategy and optimization in human information processing. In K. Boff, L. Kaufman, & J. Thomas (Eds.), Handbook of Perception and Performance, Vol. 1 (pp. 2-1 - 2-65). New York: Wiley. Stein, B. E., London, N., Wilkinson, L. K., & Price, D. P. (1996). Enhancement of perceived visual intensity by auditory stimuli: A psychophysical analysis. Journal of Cognitive Neuroscience, 8, 497-506. Stein, B. E. & Meredith, M. A. (1993). The merging of the senses. Cambridge, MA: MIT Press. Stein, B., & Meredith, M. A., Honeycutt, W. S., & McDade, L. (1989). Behavioral indices of multisensory integration: Orientation to visual cues is affected by auditory stimuli. Journal of Cognitive Neuroscience, 1, 12-24. Stein, B. E., Wallace, M. T., & Meredith, M. A. (1995). Neural mechanisms mediating attention and orientation to multisensory cues. In M. S. Gazzaniga (Ed.), The cognitive neurosciences, (pp. 683-702). Cambridge, MA: MIT Press. Tassinari, G., & Campara, D. (1996). Consequences of covert orienting to non-informative stimuli of different modalities: A unitary mechanism? Neuropsychologia, 34, 235-245. Teder-S~lej/~rvi, W. A., Mfinte, T. F. Sperlich, F.-J., & Hillyard, S. A. (2000). Intra-modal and cross-modal spatial attention to auditory and visual stimuli. An event-related brain potential study. Cognitive Brain Research, 8, 327-343. Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97-136. Vroomen, J., Bertelson, P., & de Gelder, B. (in press). Directing spatial attention towards the illusory source of a ventriloquized sound. Acta Psychologica. Ward, L. M. (1994). Supramodal and modality-specific mechanisms for stimulus-driven shifts of auditory and visual attention. Canadian Journal of Experimental Psychology, 48, 242-259. Ward, L. M., McDonald, J. A., & Golestani, N. (1998). Cross-modal control of attention shifts. In R. Wright (Ed.), Visual attention, (pp. 232-268). Oxford University Press: New York. Ward, L. M., McDonald, J. J., & Lin, D. (2000). On asymmetries in crossmodal spatial attention orienting. Perception & Psychophysics, 62, 1258-1264. Warren, D. H. (1970). Intermodality interactions in spatial localization.
Cognitive Psychology, 1, 114-133.
262
Spence
Weeks, R. A., Aziz-Sultan, A., Bushara, K. O., Tian, B., Wessinger, C. M., Dang, N., Rauschecker, J. P., & Hallett, M. (1999). A PET study of human auditory spatial processing. Neurosr Letters, 262, 155-158. Welch, R. B., & Warren, D. H. (1986). Intersensory interactions. In K.R. Boff, L. Kaufman, & J.P. Thomas (Eds.), Handbook of perception and performance, Vol. 1: Sensory processes and perception (pp. 25-1 - 25-36). John Wiley and Sons, New York. Widmann, A., & Schr6ger, E. (1999). Do lateralized visual stimuli exogenously orient auditory attention? Poster presented at the Annual Meeting of the Society for Psychophysiological Research, Granada: Spain, October. Woods, D. L., Alho, K., & Algazi, A. (1992). Intermodal selective attention 1: Effects on event-related potentials to lateralized auditory and visual stimuli. Electroencephalography and Clinical Neurophysiology, 82, 341-355. Yantis S. (1996). Attentional capture in vision. In A. F. Kramer, M. G. Coles, & Logan, G. D. (Eds). Converging operations in the study of visual selective attention. (pp. 45-76). Washington, DC: American Psychological Association. Yantis, S. (2000). Goal-directed and stimulus-driven determinants of attentional control. In S. Monsell & J. Driver (Eds.), Control of cognitive processes: Attention and performance XVIII (pp. 73-103). Cambridge, MA: MIT Press. Author Notes
The author wishes to extend his thanks to Ray Klein and John McDonald for extremely helpful comments on an earlier version of this manuscript, to David Shore and Steffan Kennett for helpful discussions on many of the points covered here, and to Chris Rorden, Francesco Pavani, and Steffan Kennett for artistic assistance. Correspondence concerning this article should be addressed to Dr. Charles Spence, Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford, OX1 3UD, UK. Electronic mail may be sent to
[email protected].
Part IV Developmental
This Page Intentionally Left Blank
Attraction, Distraction,and Action: MultiplePerspectiveson AttentionalCapture C. Folk and B. Gibson(Editors) 9 ElsevierScience B. V. All rights reserved.
11
265
Testing Models of Attentional Capture During Early Infancy James L. Dannemiller
The Selectivity of Visual Attention Early in Life There is a large literature on the development of visual attention during the first year of life (e.g., Atkinson, Hood, Wattam-Bell & Braddick, 1992; Casey & Richards, 1988; Cohen, 1972; Freeseman, Colombo & Coldren, 1993). Ever since Fantz (1958) demonstrated that infants are selective in their looking behavior, developmental psychologists have used this selective visual attention to ask questions about what infants at various ages can discriminate and what natural preferences exist during this early period. Models of these preferences have been proposed based on factors such as contour density (Karmel, 1969) or visibility and contrast sensitivity (Banks & Ginsburg, 1985; Gayl, Roberts & Werner, 1983). More recently, specific phenomena associated with visual attention in adults such as inhibition of return (Hood, 1993), cued facilitation and covert orienting (Johnson & Tucker, 1996), the gap effect in saccadic responding (Matsuzawa & Shimojo, 1997), and attentional pop-out (Catherwood, Skoien & Holt, 1996; Quinn & Bhatt, 1998) have been studied in this age range as well. What do we know about attentional capture as it develops during infancy? It is important in answering this question to understand that capture has an operational definition in visual search work with adults that is not necessarily the same as its definition in the developmental literature. Capture in the adult visual search literature refers to the interference that typically results from the appearance of an unexpected, odd or novel stimulus (Folk & Remington, 1998; Theeuwes, 1991). When attention is captured by such sti~nuli, reaction times to detect a different target increase modestly, but significantly. There is controversy in the adult literature on the issue of whether or not involuntary capture ever really occurs (Folk & Remington, 1998; Yantis & Egeth, 1999). Notice, however, that the operational definition of capture in the adult literature requires instructions to attend to a primary target and is measured by the extent to which a strong, unexpected stimulus interferes with the detection of the primary target. It should be obvious that such an operational definition of visual capture will not work for infants because it is impossible to "instruct" infants to attend to a primary target.
266
Dannemiller
Despite this obvious difference in how capture is operationalized in infancy and adulthood, there is evidence of attentional capture of a different sort in infancy. During early infancy, an infant's visual attention may be really captured by a strong stimulus to the point that s/he has a difficult time disengaging attention to switch it to another location. This type of capture has been called "sticky fixation" because it is as though the infant's fixation is stuck on a particular object or location with an inability to disengage attention from that location. This dramatic type of capture usually disappears by two months of age (Hood & Atkinson, 1993). Other forms of capture during infancy are less dramatic, but they are reasonably reliable. Overt orienting to a strong, exogenous stimulus, especially one presented in the periphery is easily observable from birth. I have used this type of exogenous orienting to ask questions about what captures an infant's attention early in life, how multiple objects might compete for attention, and how easily young infants can differentiate multiple stimuli within their visual fields when those stimuli appear simultaneously (Dannemiller, 1998). Our visual fields are typically populated by numerous surfaces and features, so the studies that I will describe below are really designed to bring that type of complexity into the laboratory to study the development of visual attention. It is my long-term goal to be able use the results of studies like these not only to understand the development of visual attention but also to inform our understanding of visual attention and attentional capture in its mature state. To accomplish this goal, I think that it is necessary to model exogenously driven visual attention quantitatively. I will describe one such model and tests of its predictions in this chapter.
Methodology and Modeling Methodology It is necessary to discuss some preliminary issues before proceeding to a model of attentional capture. The paradigm that I have used to study the early development of visual attention relies on a stimulus display that looks much like a visual search display used with adults. Figure 1 shows an example of a display that I have used over the last several years to study the selectivity of visual attention in infants. There are always equal numbers of bars on both sides of the display. All of the bars are static throughout a trial with the exception of one bar (the target) that oscillates in place usually at 1.2 or 2.4 Hz (periods of 833 and 417 ms, respectively) through one degree of visual angle. This target bar appears randomly across trials on the right or the left side of the display usually 10 degrees from the center. Across trials all of the bars on the display may be the same color and have the same contrast, or they may differ in color or contrast. In all of the work to be reported below, a maximum of two different types of bars is presented on the display. For example, half of the bars on the display may be red, and half may be green. Half may have positive contrast polarity (brighter than the background), and half may
267
Capture in Infancy
have negative contrast polarity (darker than the background). The balance across the display between the two colors or contrasts is manipulated. Balance here refers simply to the n u m b e r of bars o f each o f the two types on each side o f the display. A balanced display has an equal n u m b e r o f each o f the two types o f bars on both sides o f the display. An unbalanced display has more bars o f one type on one side o f the display and more bars o f the other type on the other side o f the display. It is this spatial balance variable that is the major independent variable used to assess the selectivity o f early orienting. More details on this manipulation follow below. Finally, the n u m b e r of bars on the display m a y also be manipulated. All of the bars on the display appear simultaneously from a uniform field. In other words, at the start o f a trial the infant is looking at a display with no bars on a spatially uniform background (typically white), and the bars then appear with a sharp temporal onset.
DID 0 ] 0 0 H In I I I U I 1" U I U 0 BIQ III ipsilateral
U ID 0 0 0 ~p n H U o DIo
D
I Iu I I I I I III
contralateral
Figure 1. Heterochromatic ipsilateral (left panel) and heterochromatic contralateral (right panel) displays. The display was 40 deg horizontally by 31 degrees vertically (not drawn to scale). The bars were 5 deg tall and 0.75 degrees wide. The bars were distributed randomly within 14 imaginary, equal-size columns (seven columns per side) with the constraints that no more than two bars could occupy the same column and all bars had to be completely visible. The moving target (indicated by arrows) appeared randomly across trials 10 degrees to the fight or left of center (a static bar always appeared in the same position on the side opposite to the moving singleton). The target horizontally oscillated in place at 1.2 Hz through either 0.75 or 1.0 degrees (peak to mean), all the other bars in the field were static throughout the trial, and all of the bars on the display appeared simultaneously from a uniform background. In these two heterochromatic conditions, the two classes of bars were distributed in the ratio of 11:3 across the two halves of the display with 14 bars of each class always present on each trial. The terms ipsilateral and contralateral always refer to the location of most of the putatively higher salience bars (red) with respect to the moving singleton. Red bars in this figure are represented by the black bars, and pink bars are represented by the white bars. In the homochromatic condition (not shown), all of the bars on the display were identical. An online observer starts each trial and watches the infant's overt orienting behavior to make a quick (usually < 2 sec), forced-choice j u d g m e n t about the side o f the display with the oscillating target. This observer is "blind" to the location o f the target on each trial, so the only way to produce data that exceeds 50% correct given a sufficiently large n u m b e r o f trials is for the infant to orient preferentially to the
268
Dannemiller
side of the display with the moving target, and for the adult observer to be sensitive to these overt orienting cues. The observer is given feedback on each trial in an attempt to maximize the percentage of correct judgments within the constraint of responding quickly. With several exceptions, this paradigm is essentially equivalent to Teller's (1979) Forced Choice Preferential Looking (FPL) procedure that has been used successfully to study early visual development. One difference is that in the present research the FPL observer is instructed to respond as quickly as possible while maintaining the percentage of correct judgments as high as possible. This contrasts with the standard use of FPL in which the observer can wait indefinitely to accumulate information from the infant before making a judgment. The other major difference between this technique as I am using it and how it typically has been used is that I am using this technique to study discrimination rather than detection. FPL usually generates data on the detection of threshold stimuli. One side of the display is typically blank and the other side has a nearthreshold stimulus. This is very different from the display shown in Figure 1 in which both sides of the display contain visible elements. In this sense, then, an infant faced with a display such as that shown in Figure 1 is doing something more like a discrimination task than a detection task. The discrimination may involve motion because only one side of the display contains a moving target, or it may involve differences on other stimulus dimensions such as color or luminance contrast. Is This Visual Search? Those familiar with visual search studies will notice the resemblance of the display in Figure 1 to visual search displays used with adults. The resemblance stops there. There are several critical differences between visual search tasks with adults and the attentional phenomena that I have been studying with these displays. 1. Adults can be instructed to search for a specific target; young infants cannot be so instructed, at least not directly. 2. As a result of point number one, the data from such displays with adults reflect their interpretation of the task and any attentional set or strategy induced by the instructions. In contrast, the data from such displays when presented to infants reflect their natural orienting tendencies. 3. The data from visual search studies with adults reflect the sensitivities of the adults directly. The data from infants are filtered through the sensitivities of the FPL observer. I have used the same FPL observer for the last six years in my lab, and this observer has tested more than 1000 infants, so I consider the contribution of this observer to the data collected with this paradigm to be stable and in a sense transparent. It might be true that a different observer could yield higher average percentages of correct judgments, but the important variables in these studies are manipulated within-subjects, so such inter-observer differences are largely irrelevant.
Capture in Infancy
269
A model of attentional capture during early infancy Definitions. Given these methodological preliminaries, it is now time to consider a model of what attentional capture might look like during this early postnatal period. What would it mean for an infant's attention to be captured with the display shown in Figure 1? I propose that capture would be indicated by two pieces of evidence: a) the percentage of correct judgments about the location of the moving singleton target would be above chance (50%), and b) the percentage of correct judgments would be influenced reliably by the balance across the two sides of the display of different static elements. This latter criterion essentially pits motion against the spatial imbalance manipulation to determine whether or not responding is systematically related to the latter for trials on which attention does not appear to be captured by the motion singleton. Consider these two criteria. First, capture by a motion singleton is similar to such effects in the literature on adult visual search (Nothdurft, 2000). Second, and perhaps more importantly, on trials when attention apparently is not captured by this singleton, it is nonetheless not randomly directed. Instead, it is influenced systematically by other elements in the visual field. This latter criterion appears to be more similar than the first to the meaning of capture in the literature with adults. These other elements interfere with orienting to the target. In adults, interference from salient distractors can also occur, although its likelihood may depend on whether or not the interfering element shares a feature with the target of the visual search (Yantis & Egeth, 1999). I have modeled this process quantitatively using signal detection theory. The oscillating bar is the signal, and there are multiple noise sources (all of the other static bars on the display). As such it is similar to signal detection models that have been proposed to explain certain types of visual search data collected from adults (Palmer, Ames & Lindsey, 1993; Palmer, Verghese & Pavel, 2000). It should be noted that despite the fact that the bars on the display may differ on multiple dimensions (e.g., color, contrast, motion), the decision variable is unidimensional as described below. Model Assumptions. It is useful to make the assumptions of this model explicit. 1. On each trial, each of the elements on the display produces a signal to orient to its location. 2. These signals are perturbed by intemal noise. This noise is independent of all display characteristics. 3. The variance of this intemal noise is equal across all of the elements on the display, independent of the mean level of response, random from trial to trial, and independent across different elements on the display. 4. When two different classes of bars are present on this display (e.g., red bars and pink bars), these classes may differ in their mean internal responses. It would typically be assumed that bars with the more saturated color (red) or
270
Dannemiller
with more luminance contrast would lead to higher mean internal responses than bars with less saturation or luminance contrast. 5. The internal noise that perturbs these orienting signals is the only significant source of noise in the system. 6. The decision rule that characterizes the infant's overt attentional orienting is to orient to the side of the display with the element that produces the maximum internal response. Notice that an overt, directional response is determined by the element that produces the largest internal response (maximum response decision rule). This is not the only decision rule that could be used. Perhaps the response is actually determined by the side of the display with the greater aggregate response. Indeed, other decision rules are possible, and I will return to this important issue below. Additionally, one might reasonably question the assumption that there are no other significant sources of noise in the system. The FPL observer may contribute noise perhaps in the decision stage, but in this model, this noise is considered negligible relative to the noise internal to the infant's visual system. One way of thinking about this is that given hypothetically identical orienting behaviors exhibited by the infant (e.g., direction of first look, directional head movement) on multiple trials, the FPL observer would have a high probability of making the same forced choice on all of these trials. In contrast, given identical stimulus configurations on multiple trials, the probability would be lower that the infant would orient in exactly the same way on those trials. It is also worth pointing out that there is no way to verify directly the assumption that orienting is always driven by the element with the largest internal response. Just as it is impossible to observe the internal noise that perturbs responses in a standard signal detection paradigm, so here it must be inferred from a pattern of data consistent with the predictions of the model. A potential problem arises when alternative models predict the same pattern of data, but this is a problem in any empirical study, so I will address it below when I consider alternative models. Why would data that obeyed this model constitute evidence of attentional capture? First, everyday, common definitions of capture are consistent with the idea that a strong or odd stimulus typically captures our attention. The maximum response decision rule simply instantiates this everyday definition of capture. Second, it is clear that overt attention in this case is nearly synonymous with eye and head movements. ~ In other words, looking at one side of the display is prima facie evidence that attention has been drawn to that side of the display. This model is the simplest, testable model of capture that I could generate given the limitations of the paradigm and common definitions of capture. It is not necessarily a model of attentional capture in adults. As is shown below, it is also not the only model that might reasonably predict the data from infants in this paradigm. One advantage of studying these processes in infants is that it permits us to examine the origins of attentional capture uncontaminated by issues of task relevance versus irrelevance (Yantis & Egeth, 1999). Highly salient featural
Capture in Infancy
271
singletons (e.g., a red target among all green distractors) may only capture attention to the extent that they provide some reliable information relevant to the location of the actual target. Because it is impossible to instruct infants to search for the moving target in the displays that we use, task relevance is a moot issue. Instead, this paradigm may allow us to examine the influence of salience when the salient elements are neither task-relevant nor task-irrelevant. It is interesting to speculate on when task-relevance may first come to play a role in visual attention, although it is beyond the scope of this chapter. Is the FPL paradigm the best way to test this model? One could argue that the response measure - a speeded forced-choice left versus right judgment - is not well matched to the predictions of the model. Wouldn't it be better to measure eye movements directly because the model predicts orienting to the specific location of the element that produced the largest internal response? It should be noted again that the maximum intemal response is not an observable quantity, so the model does not really predict the specific location of the maximum response on each trial. The model contains the assumption that the processes that govem early orienting use the maximum response as the basis for a directional decision. From this is derived the pattern of results that should be observed when the spatial balance of the two types of static bars on the display is manipulated. The model explains differences in the percentage of correct judgments made by the FPL observer as arising from differences in the mean intemal responses to the two classes of stimuli on this display. These intemal responses cannot be observed any more than the noise that is a standard part of signal detection theory can be observed. As is shown below, it is possible with such FPL data to refute the model proposed above, so for now the direct measurement of saccadic eye movements and fixational dwell times would provide useful but not absolutely necessary converging evidence. Model Predictions. What would the pattern of data look like if this were a valid model of attentional capture in early infancy? Consider the following three conditions all involving 28 bars: a) homochromatic trials in which all of the bars on the display are the same color, b) heterochromatic trials in which more of the putatively higher salience bars appear on the side ipsilateral to the moving target (ipsilateral condition), and c) heterochromatic trials in which more of the putatively higher salience bars appear on the side of the display contralateral to the moving target (contralateral condition). Salience in this case might simply and intuitively be operationalized by color saturation: red bars appear more saturated than pink bars when both are embedded in a white background. In the model presented above, the mean intemal signal to orient produced by a red bar is represented as being greater than the mean intemal signal to orient produced by a pink bar. Both of these mean internal signals, in turn, are less than the mean intemal signal to orient produced by the singleton moving target. To derive quantitative predictions from such a model, it is necessary to specify the noise distributions that perturb these intemal signals to orient as well as the mean internal responses to these various elements. In most signal detection
272
Dannemiller
models, equal-variance Gaussian distributions are used to model the noise (e.g., Green & Swets, 1966). I have used such distributions, but it is easier to model these distributions using double exponential distribution functions (Yellott, 1977). These double exponential distribution functions are probably indistinguishable from Gaussian distribution functions given the precision of the data generated by the paradigm described above. Their advantage is that they make the mathematical predictions from the single-target/multiple-noise maximum-response model more tractable. Yellott (1977) has shown that choice responses such as the left versus right choices involved in the FPL task described above have particularly simple expressions when these double exponential distribution functions are used. The interested reader is referred to Yellott (1977, p. 123 and pp. 137-141) for details of this derivation. For example, to predict the percentage of correct judgments in homochromatic conditions Equation l a will suffice:
(la) pc
-"
e'am + (n - 1). e,as e~m + (2 n -1). e/~
Here, ,am represents the mean internal response to the moving target, and ,as represents the mean internal response to the static distractors. There are n - 1 static distractors on the same side as the target, and 2n - 1 static distractors on the whole display. Equation l a simply shows the probability that the maximum internal response will arise either from the moving target or from one of the static distractors on the same side as the moving target. Equation 1a can be simplified to Equation l b by assuming without loss of generality that the internal response to the static bars averages zero thus yielding exp(/2s) = 1.0.
(lb) pc
=
e,Um+(n-1) e,Um+(2n-1)
In order to generate a point prediction from Equation 1b, it is necessary to assume a value for the mean internal response to the moving target, ,am. Alternatively, one could estimate this parameter from the observed data. If one were further willing to assume that this mean internal response to the moving target remained invariant over changes in the number of static bars on the display, then Equation 1 becomes most useful for predicting how the percentage of correct judgments should vary as the number of static noise bars on the display is manipulated.
Capture in Infancy
273
Equations l a and l b suffice for the first of the three conditions described above: the homochromatic condition. Now consider the equations for the two conditions that represent the spatial imbalance manipulation. Let the mean internal response to the putatively lower salience class of bars (pink) be ,at with the mean internal response to the static red bars again assumed to be 0. Each side of the display has 14 bars. Equation 2a represents the ipsilateral condition with 11 of 14 bars on the target side being red, and 3 of 14 bars on the side opposite to the target being red. The balance is reversed on the contralateral trials represented by Equation 2b. The moving target is assumed here to be red although the prediction can be easily altered if it is assumed to be pink. These two heterochromatic conditions with half red and half pink bars lead from the same model to Equations 2a and 2b shown next: (2a)
Pqpsi -
e'Um+lO+3e~/ e,Um+13+14e,ul
(2b)
PCc~
e~m+2+l le~l c~m+ 13+ 14e~l
It is worth stating that these equations capture the model in the sense that they show the probability that the maximum internal response on a given trial will arise from one of the 13 static or from the one moving bar on the target side of the display. In other words, it is possible to be correct in this two-altemative forcedchoice paradigm either because the maximum intemal response came from the moving singleton target or from one of the other 13 (or more generically, n - 1) static bars on that side. This probability depends on the balance of the two classes of colored bars on the heterochromatic trials, so Equations 2a and 2b look a little more complicated, but they adhere to the same rule as in Equations l a and lb. Using Gaussian noise distributions makes these equations more complicated because they involve integrals that can't easily be simplified, but the results are basically the same. Foley and Schwarz (1998) have used a similar model to explain the detection of single targets at threshold in the presence of multiple spatially-displaced distractors by adults. Some intuitive sense for these equations can be derived from considering the following simpler situation. Suppose that the decision rule is to look to the side with the element that produced the largest intemal response. Suppose further that all of the elements on the display are identical, that none of them is moving, and that one side has n elements and the other side of the display has m elements. What is the
274
Dannemiller
probability that the maximum response will come from the side with n elements? It is not even necessary to specify the form of the density function that describes the intemal noise in this case. All the elements have an equal probability of contributing the maximum response, so the probability that this maximum will occur on the side with n elements is simply n/(n + m). Equations 1a, lb 2a and 2b are generalizations of this intuition to the case where all of the elements are not identical, but where their mean intemal responses can be modeled as shifts along the same internal scale. I will concentrate first on the pattern of data predicted by this model in the two heterochromatic conditions described by Equations 2a and 2b. To give a feel for the predictions from Equations 2a and 2b, consider a case in which the mean intemal response to the moving target was 2.0, and the mean internal response to the lowersalience, pink bars was assumed to be -0.3. Both of these values are relative to the mean intemal response to the higher salience, red bars of 0. The values of 2.0 and 0.3 are similar to d' values in standard signal detection theory. As in standard signal detection theory, what matters is just the separation between the centers of the relevant distributions relative to their spreads and subsequent distributions of the maxima of a set of random draws from these distributions. With these parameters, and with 14 bars on each side of the display distributed in the ratio of 11:3 or vice versa, the model predicts 64% correct in the ipsilateral condition and 57% correct in the contralateral condition. The absolute values of these numbers are not as important as the fact that the maximum response model predicts the difference between the conditions when the static bars compete to capture initial orienting. Essentially, the difference in the percentage of correct judgments between these two conditions depends on the likelihood that the maximum response on a given trial will come from one of the static bars on the display. This probability is higher when more of the higher salience bars are placed ipsilaterally. Examination of Equations 2a and 2b makes it clear that altering the spatial balance of the two classes of bars should produce an effect on the percentage of correct judgments with the percentage being higher on ipsilateral trials than on contralateral trials because ,at is constrained to be negative. The denominators to these two equations are the same; only the numerators are different. The contribution of the motion singleton is the same in the numerator of both equations. The predicted difference in the percentage of correct judgments between these two conditions depends only on the relative salience of the red and pink bars for a fixed value of the moving target strength. The effects are straightforward given Equations 2a and 2b. Larger differences in salience between the two classes of bars will produce larger swings in the percentage of correct judgments as the balance between the two classes across the two sides of the display is manipulated. This ipsilateral versus contralateral difference will also depend on the strength of the signal from the moving stimulus if that is also allowed to vary. An extremely strong signal from this stimulus will always capture attention making the spatial balance between the two classes of static bars largely irrelevant; both the numerators and the denominators of Equations 2a and 2b will be dominated by the response to the moving singleton. In
Capture in Infancy
275
contrast, a very weak signal from this moving target will make it more likely that one of the static bars will capture attention, whether ipsilateral or contralateral to the target. Summary of model predictions. In the sections below I will concentrate on two of the predictions from this model. These predictions were made by assuming that the 2n bars on the display were divided equally between the two sides of the display, and that one of these 2n bars was the moving target. 1. For displays with only one type of element (e.g., all red bars), as n increases, the probability of a correct judgment decreases. a. This prediction can be understood as follows. The distribution of the maximum of the n response on the target side is dominated by the distribution of responses to the target because of its greater mean internal response. As more static bars are added on the target side, there is little effect on the distribution of the maximum from this side. In contrast, as more static bars are added to the side without the moving target, the distribution of the maximum shifts to higher levels on the internal scale. The probability of a correct response depends on the separation between these two distributions, and this separation decreases (the overlap increases) as the number of static bars on the display is increased. 2. When two types of static bars are displayed with the single moving target, the probability of a correct judgment will depend on the balance in the spatial distribution of the two types of bars. The percentage of correct judgments (indexed by the location of the moving target) should be higher when more of the higher contrast, saturation, etc. bars are placed on the same side as the moving target. Conversely, the percentage of correct judgments should be lower when more of the higher contrast, saturation, etc, bars are placed on the side opposite to the moving target. a. This prediction follows from the property of the model that if the two types of static bars differ in their salience (i.e., mean internal responses), then on trials when the internal response is not completely dominated by the moving target, the probability that one of the static bars on the target side will produce the maximum internal response depends on how many of these bars are on the target side. I will only discuss these two predictions in the sections below. The model makes other interesting predictions that I am in the process of testing. For example, one of the interesting properties of this model is that it predicts invariance in the percentage of correct judgments with uniform expansion of the set of elements on the display (see Yellott, 1977, p. 137 for a discussion of invariance under uniform expansion). In other words, if the number of elements of each type including the moving target is simply increased by a factor of k, then the model makes the prediction that this manipulation should have no effect. The factor of k would
276
Dannemiller
multiply both the numerator and the denominator for all the terms in Equations 1 and 2, so it would effectively cancel leaving the same predicted percentage of correct judgments. Here is a simple intuition for understanding this property of the model. If the probability that the maximum internal response will come from the left side of the display in a display with m identical elements on the left and n identical elements on the right is p, then the probability is also p that the maximum will come from the left side if the display is uniformly expanded to include km elements on the left and kn elements on the right.
Some Sample Data on Selectivity and Capture Spatial imbalance effects Figure 2 shows data from 32 3.5-month-olds using these red and pink bars (each having 66% luminance contrast with the white surround). Sixteen of these infants were tested with the moving target oscillating at 1.2 Hz through 0.75 degrees (left panel), while the other sixteen were tested using the same target oscillating through 1.0 degrees (peak to mean). The data in each panel have been averaged over the color of the moving target. Notice several things about these data. First, performance is above chance (50%) in all conditions. Second, not unexpectedly, performance is slightly better with the larger amplitude (right panel) at least in the 10090-
"6
0.75 degrees
1.0 degrees
80-
o
E
70-
L 0,.
6050-'40
.
.
.
.
.
.
.
.
.
i psi
.
.
.
.
.
contra
.
.
.
.
.
.
.
.
.
.
Balance Condition
.
.
i psi
.
.
.
.
.
.
.
.
.
.
.
.
contra
Figure 2. Percentages of correct judgments with the target oscillating through 0.75 degrees (peak to mean) (left panel) or 1.0 degrees (right panel). Data in each panel have been collapsed across the color of the moving target. The dashed, horizontal line indicates chance responding. The ipsilateral advantage in each case is predicted by the maximumresponse model.
Capture in Infancy
277
ipsilateral conditions. The important point is that performance is systematically related to the balance across the display between the red (more saturated) and pink (less saturated) bars, just as predicted by the model if a) saturation is assumed to lead to salience differences and b) if the mean internal response to the oscillating target is assumed to be monotonically related to the amplitude of oscillation. Putting 11 of the 14 red bars on the display on the side opposite to the target interferes with orienting to the target presumably by capturing attention on a nontrivial proportion of trials. This same pattern of ipsilateral performance being higher than contralateral performance has been observed repeatedly across various salience manipulations. For example, red bars are more effective in capturing attention than green bars when both are embedded in a white background (Dannemiller, 1998), but this difference disappears when they are embedded in a yellow background that more nearly equalizes the two color contrasts (Ross & Dannemiller, 1999). When pure luminance contrast is used, higher luminance contrasts capture attention more effectively than lower luminance contrasts, although the effect does not appear to be as strong as it is with color contrasts (Ross & Dannemiller, 1999). When pink and green are paired, the ipsilateral versus contralateral effect is eliminated as would be predicted from the red/green and red/pink results (Dannemiller, in press). Luminance decrements are more effective in capturing attention than equalmagnitude luminance increments just as would be predicted if perceived contrast at this young age followed Michelson contrast as it does in adults (Dannemiller & Stephens, under review). In all of these cases, an imbalance across the display between two classes of elements results in lesser percentages of correct judgments when most of the putatively higher salience elements appear contralaterally to the moving target just as predicted by the maximum response model. The maximum response model fails
Does this model of attentional capture fare as well in its other predictions? No. Equation 1 shows how the percentage of correct judgments should depend on signal strength,/~n, and on the number of static bars on the display. Again for very strong signals, performance should be independent of the number of bars on the screen because the numerator and denominator are dominated by the response to the signal, so the probability of discriminating the two sides of the display approaches 1.0 as it does in adults when there is a highly suprathreshold, featural singleton, and search is described as being parallel (McLeod, Driver & Crisp, 1988; although see also Theeuwes, Kramer & Atchley, 1999). For less powerful motion signals, the number of bars on the display should have a significant impact on orienting. For example, compare the predictions of the maximum-response model in Equation 1 when there are four bars on each side versus 14 bars on each side. With the mean internal response to the moving target singleton set at 2.0, the percentage of correct judgments with four bars on each side of the display should be 72%. With 14 bars
278
Dannemiller
on each side, the predicted percentage decreases to 59% - a difference of 13%. The actual difference will depend on the strength of the motion signal. To test this model, I collected data from 97 infants over the age range from seven weeks to 21 weeks. The amplitude of movement was fixed at 0.75 degrees (peak to mean) and the temporal frequency was set to 1.2 Hz for all infants. We know from previous work (Roessler & Dannemiller, 1997) that sensitivity to movement improves substantially over this age range, so there should be considerable variance in the overall percentages of correct judgments. This should make it easier to test the predictions of the model from Equation 1. Each infant was tested both with eight and with 28 bars on the display. Twenty-four trials of each were presented, and all of the bars on the display were red with the same luminance contrast (66%) against the white background. Figure 3 shows the percentages of correct judgments for all 97 subjects in each of the two conditions. The two regression lines show the best fits to the data from the two conditions. There was a steady increase in the overall percentage of correct judgments with age as expected. What was not expected, however, was the lack of an effect of the number of bars as predicted by the maximum-response model. This model basically predicts that near 50% correct, the differences between the two conditions should be close to zero. As the percentage of correct judgments increases toward approximately 75%, the model predicts approximately a 15% difference in favor of the condition with eight bars. As the percentage of correct 1.0-
0.9
Number of Bars O~"D
8
(~"'0
28
o o o
4.,
O
o
L O
0 r
9
9
9
9
o 9
o
9
9
9
0
o
0
9
9
9
9
0
o.~J..~,'63~
9
9
o
O0
9
o
9
o
ooc~
o
OqlO
0
o
o
O,..,O,''6""
o ..e--6
9
9
9
qlO 9
~
o
eo
o
9
9
0
9
~
0.6
0
13..
9
9
9 9
0.7
O 0 Q..
9
0.8
o
o
o
o 9
9 9
eo 9 oe
9 oo
o
9 o
9
9,
0.5 o Go
0.4
go
o
ooe
0.3
' ' ' 1 ' ' '
40
50
oo
o
o
9
o
9 oo
9
9
I ' ' ' l
60
' ' ' i ' ' ' 1 ' ' ' 1 ' ' ' 1 ' ' ' 1 ' ' ' 1 '
70
80
90
100
110
120
130
' ' 1 ' ' '
I
140
150
Age (days)
Figure 3. Proportions of correct judgments with eight (solid symbols) bars (4 per side) and 28 (open symbols) bars (14 per side) for infants from approximately seven to 21 weeks of age. Number was manipulated within subjects. All of the bars on this display were identical in color and size. The two lines are from the regressions of the proportion of correct judgments against age. The predictions from the maximum response model do not fit these data well. The proportions of correct judgments should have been systematically higher with eight bars compared to 28 bars, especially at the older ages.
Capture in Infancy
279
judgments then exceeds 75% and approaches 100%, the difference should once again approach zero. It is clear that the data do not conform to this prediction. Notice that in Equation 1, there is only one free parameter that relates the internal response to the motion singleton to the percentage of correct judgments for a given number of static bars. To test the predictions of the model more precisely, the percentage of correct judgments for each infant from the condition with eight bars was used to estimate the mean internal response to the motion target for that infant. This estimated parameter, ,am, was then used to predict the percentage of correct judgments for each infant with 28 bars. 2 Given the distribution of estimates of this parameter from the condition with eight bars, the data from the condition with 28 bars should have averaged approximately 10% lower in this sample across age. Instead, the average percentages of correct judgments in these two conditions were 63.4% (SEM = 1.12%) and 61.8% (SEM = 1.22%). The observed difference of 1.6% is far short of the theoretical prediction of 10%.
Considering Other Models Explaining the failure of the maximum-response model There are several possible explanations for why the model of attentional capture described by Equations 1, 2a, and 2b might have failed to capture the full set of results from the experimental conditions described above. In particular, the model appears to predict the differences in ipsilateral versus contralateral performance, but it fails when the number of static bars is manipulated. Here are several potential explanations for the failure of this model of attentional capture: 1. Increasing the number of bars from eight to 28 also increased the density of bars within the display because the size of the display was fixed. The proximity of static bars near the oscillating target was greater on average with 28 bars than it was with eight bars. Perhaps this increased sensitivity to the moving target so that the assumption of equal mean internal responses to the moving target in both conditions was wrong. 2. The method is just not sensitive enough to measure the predicted differences when the number of bars is manipulated. 3. While the actual numbers of bars on the display were eight and 28, the effective number of bars may have been far fewer leading to less substantial effects of this variable. 4. An alternative model of attentional capture explains both sets of results better. Consider these possibilities in turn. Increased density: Detection of the moving singleton might have been easier in the condition with 28 bars because the proximity of nearby static references bars was greater than it was in the condition with eight bars. In adults, the presence of nearby static reference lines enhances the detection of oscillatory motion,
280
Dannemiller
especially at temporal frequencies below approximately 5 Hz (Tyler & Torres, 1972). There is little evidence on this issue in the infant literature although Dannemiller & Freedland (1989) found no evidence that attention to movement was enhanced at 20 weeks of age by the presence of nearby static reference bars. Nonetheless, if sensitivity to the movement were affected in infants by the density of static bars in the vicinity of the moving target, then finding less than the predicted decrease in performance as the number of bars was increased wouldn't necessarily invalidate the maximum response model. Recall, that to generate the predictions for these two conditions I assumed that the mean internal response to the moving target remained invariant. If the proximity of the static reference bars plays a role in determining the value of the ,u,, parameter, then the model could still be correct but the value of this parameter could have been larger in the condition with 28 bars, and this could have offset the predicted drop in the percentage of correct judgments as more bars were added to the display. There are several ways to test the hypothesis that density near the moving target contributed to the differences or the lack of differences between the two conditions. On each trial, the location of every static bar (as well as the target) on the display was recorded. From these data, it was possible to calculate the average distance of the three static bars nearest to the moving target on each trial. I used three static bars because this comprised all of the static bars on the target side in the condition with eight bars on the display. The average distances between the target bar and the nearest three static bars were 117.75 arcmin (SD = 21.16 arcmin) and 61.88 arcmin (SD = 13.38 arcmin) for the trials with eight and 28 bars, respectively. Although the mean proximities of static bars near the moving target on displays with 8 and 28 bars differed as they should have, there was overlap on this measure for the two conditions in the range from 48.1 to 117 arcmin. In other words, even on trials with only three static bars on the same side as the moving target, these static bars were occasionally positioned as closely or even closer to the moving target as the three nearest static bars on displays with 28 bars. For each subject, I calculated the percentage of correct judgments with eight and 28 bars for those trials on which the density measure was within the overlapping range. The average percentages of correct judgments were 66.1% (SEM = 1.6%) and 61.9% (SEM = 1.3%) for eight and 28 bars, respectively. A two-tailed direct difference t-test showed that this difference was significant, t(96) = 2.40, p = .018. Thus, considering only those trials on which the densities of static bars near the moving target were similar in the two conditions did increase the difference in the percentage of correct judgments in the direction predicted by the model. The percentage of correct judgments held nearly constant with 28 bars at 61.9%. This percentage increased from 63.4% to 66.1% when the density was equalized. The density-adjusted difference (4.2%) is still less than half of the difference (10%) predicted by the maximum response model for this sample. Density differences between the two conditions may be part of the answer, but they are not the complete answer for why the model failed with the number manipulation.
Capture in Infancy
281
Sensitivity of the measurements: As noted above, the observed difference between the percentages of correct judgments with eight versus 28 bars was not significant. Could this simply be the influence of large amounts of measurement error leading to insensitive measures? It is hard to argue for this possibility because of the results from numerous previous studies (see above) with heterochromatic conditions that yielded robust differences between ipsilateral and contralateral conditions. In these previous studies, ipsilateral versus contralateral differences from 5% to 10% in the mean percentages of correct judgments were routinely detected by the experimental procedures. Notice, that this null effect of the number of static bars on the screen is exactly what would be predicted from studies with adults with featural singletons. Indeed, many researchers studying visual search in adults would be surprised by the predictions of the maximum response model for a decrement in performance as set size was increased given that a single moving target is so easily distinguished from its static neighbors. Visual search is usually labeled parallel when detection of a singleton target is relatively uninfluenced by the number of distractors on the display. For all of the reasons listed above, however, visual search in adults is very different from the paradigm with infants employed here, and sensitivity to movement is not nearly as high in infants in this range as it is in adults (cf. Roessler & Dannemiller, 1997 with Wright & Johnston, 1985). Perhaps the maximum response model is wrong, and detection of the moving target by these infants should just be considered a case of parallel detection because the percentage of correct judgments was relatively unaffected by an increase in the number of bars from eight to 28. There is one potential problem with this argument. If discrimination of the two sides of the display and subsequent orienting were parallel in these infants in these two conditions, then why would the spatial balance manipulation in heterochromatic conditions produce such reliable effects? Detection should be parallel in those conditions as well leading to no effects when the spatial balance between bars of the two colors was manipulated. Yet, the procedures used here revealed reliable effects in the heterochromatic conditions, but essentially no effects in the homochromatic conditions. Additionally, parallel detection is usually indexed by almost perfect performance, but the average percentages across conditions were near 65%. It seems unlikely that the answer to why the model failed to predict the results of the number manipulation was because of low power or insensitive measures. Alternatively, one could argue that when all bars on the display are identical, then the process of detecting the moving singleton is explained by a model that differs qualitatively from the model used to explain the data from conditions with different types of bars on the display simultaneously. One might also suppose that parallel detection occurred, but after the detection of the moving target an additional source of noise reduced the overall percentage of correct judgments. These are certainly both possibilities, but it would be more parsimonious to find a
282
Dannemiller
single model that could explain both sets of results or to avoid proliferating different noise sources. I consider this issue in more detail below. Actual versus effective number of bars: The calculations from Equation 1 used the actual number of bars in each display: eight and 28. There is no guarantee that all of these bars affected the process that determines orienting in these infants. Any factor that tended to reduce the actual number of bars that influenced the orienting process could invalidate the predictions of the model. For example, suppose that only the bars near the center of the screen actually influenced orienting. Fixation was drawn to the center of the display before starting each trial, so it is certainly possible that bars near the center of the display might be weighted more heavily in the orienting process than bars in the periphery. There is some evidence for a gradient of detectability away from the center of the visual field in infants in this age range (Dannemiller & Nagata, 1995). Conversely, this type of orienting to peripheral stimuli tends to be driven more strongly by elements in the temporal visual field than by elements in the nasal visual field (Rafal, Henik & Smith, 1991), so it is also possible that the bars near the edges of the display may have been more effective than bars near the center. Recall that the bars were positioned randomly among seven equal-size imaginary columns on each side of the display. The argument above for a differential gradient of effectiveness makes a clear prediction. A reduction in the actual number of bars to some effective number of bars should result in a proportional reduction across the two conditions, so that the effective number of bars on each side should remain in the ratio of 4:14. For example, if either of the above gradient effects held with perhaps half of each side of the display containing effective elements, then the effective manipulation would be 2:7. This is easily accommodated in the model. In fact, the model actually predicts slightly larger differences when the effective number of bars changes in this direction, a For a given value of the mean internal response to the moving singleton (e.g., ,am = 2.0), the model predicts percentages of correct judgments of 72% and 59% for four versus 14 bars per side, 81% versus 66% for two versus seven bars per side, and 88% versus 74% for one versus 3.5 bars per side. Once again, either the model is incorrect or the lack of a difference in the percentages of correct judgments is not explained by an effective number of bars that is less than the actual number of bars. Alternative models." It is worth considering alternative models that might explain all of the results described above. I will consider one alternative model next. Others are certainly possible. There may be a clue to an alternative model in the fact that the maximum response model succeeds in predicting the ipsilateral versus contralateral difference under heterochromatic conditions but fails to predict the data from homochromatic conditions when the number of bars is manipulated. The ipsilateral versus contralateral manipulation is inherently a spatially directional manipulation. One side is loaded with more higher salience bars than the other side. In contrast, the manipulation of the number of bars does not involve any directional imbalance
Capture in Infancy
283
beyond that induced by the motion singleton. Both sides of the display are stochastically identical with the exception of the moving singleton. But the orienting response is inherently directional. It is not possible to look to the right and the left simultaneously. A final common pathway resolves competition between the two directional responses, and one wins out. The alternative model that might capture both results is one that recognizes the global nature of this directional response competition. Suppose that the maximum-response model is wrong in the sense it assumes that the bars on the screen ultimately lead to 28 different internal responses that are independent at the point at which overt, directional orienting movements are planned (e.g., saccades and head movements). According to this model, the individual bars on the display are differentiated from each other and lead to independent internal responses. The model assumes independence of these responses once they are perturbed by internal noise. I will call this assumption the differentiated visual field assumption. The differentiated visual field assumption may not be correct for infants in this age range under these conditions.
A hemifield comparison model The elements that appear on the display are certainly visible and easily differentiated by a normal adult human visual system. Are they necessarily differentiated by the visual systems of young human infants? We know from numerous studies that acuity (Dobson & Teller, 1978) and contrast sensitivity (Gwiazda, Bauer, Thorn & Held, 1997) in this age range (2 to 5 month) are far from adult-like. Is it possible that the initial response to the appearance of these bars in the visual field is much less differentiated than it is in adults to the point that there is only an initial, coarse hemifield differentiation? One way to model this would be to assume that the effective variable that determines orienting is based on the aggregate or summed response to each half of the visual field. Once these two aggregate responses have been computed, they are compared (differenced) and perhaps some post-comparison noise is added to the computed difference. Orienting is then directed to the hemifield with the larger aggregate response. Large receptive fields with their attendant extensive spatial summation of local responses to the individual elements could implement this part of the alternative model. If the internal noise that perturbed the individual responses prior to their summation were negligible relative to the post-comparison noise, then this model would essentially predict no difference between the percentages of correct judgments in the condition with eight versus 28 bars. It is easy to understand this prediction. Essentially all that matters in this model is that there is a moving element on one side of the visual field that is substituted for a static element on the other regardless of whether there are eight or 28 bars. This leads to the same difference in the aggregate response to the two sides of the display in both conditions. This difference may be corrupted by some additional decision noise leading to less than
284
Dannemiller
perfect performance even for a relatively strong motion singleton, but importantly, as long as the two sides differ only by one moving target, then this model predicts no difference as the number of static bars is increased. The model explicitly denies the differentiated visual field assumption. Instead, it is the global balance between the two hemifields that drives orienting to one side or to the other. This alternative model may be referred to as the hemifield comparison model. Does this model also predict the ipsilateral versus contralateral difference observed in the numerous studies reported above? Yes. If there is a difference in salience (mean intemal response) to the two classes of static bars that are distributed unevenly across the two halves of the display, then there will be an additional difference in the aggregate responses between the two sides of the display beyond the motion singleton. When more of the higher salience static bars are on the same side as the moving target, this imbalance will favor the target side. When more of the higher salience static bars are on the side opposite to the target, then this imbalance will favor the contralateral side and compete with the moving target. The outcome of the competition will depend on the strength of the motion stimulus and the relative saliences of the two classes of static bars. The important point is that the hemifield comparison model qualitatively predicts the pattern of results in both the homochromatic condition when the number of static bars is manipulated and in the heterochromatic conditions when the balance between the two sides of the display is manipulated. Additional experiments are underway to test quantitative predictions from this altemative model. Arguments against the hemifield comparison model. It is worth considering some of the implications of the altemative model. One of the most striking aspects of this model is that it implies a largely undifferentiated visual field early in development. This implication would appear to be contradicted by numerous studies with young infants showing differentiated responses to elements in the visual field. Two types of studies are relevant. First, eye movement studies over this age range clearly show that infants direct saccades to individual elements in the visual field (e.g., Aslin & Salapatek, 1975; Bronson, 1994). Second, changes to the internal features of pattems are discriminable by 4-month-olds although 1-month-olds have difficulty (Milewski, 1976). These latter studies imply that attention can be directed to specific parts of a pattern, and not just to the pattern as a whole. Both types of studies would appear to contradict the idea that multiple elements within the visual field are only coarsely differentiated early in development. The contradiction may be more apparent than real. It is important to keep in mind that there are methodological differences between the current results and paradigm and the previous studies that imply a differentiated visual field. First, recall that the empirical results reported above testing the maximum response model relied mostly on orienting behavior within the first two seconds after the simultaneous appearances of all of the bars in the visual field. In contrast, scanning eye movement studies with infants typically involve extended inspection (e.g., Bronson, 1994). Second, in the pattern discrimination
Capture in Infancy
285
studies, it was also the case that the patterns were available for inspection for long periods of time, and the measures of discrimination generally involved cumulative durations of fixation on these patterns. Finally, in the eye movement study by Aslin and Salapatek (1975), the visual field was either populated by one or at most two elements on each trial. This is very different from the studies reported above with eight or 28 small bars scattered more or less randomly across the visual field. It is possible that the initial response to the onset of multiple elements in the visual field at this age is a transient, low spatial resolution response that globally compares the two hemifields to generate an initial orienting reaction. After the elements have been present in the visual field for some time, more spatially refined and local responses may then guide subsequent inspection of the visual field. The studies reported above and the previous literature may not necessarily be in conflict because of the very different temporal parameters in these studies and because of the complexity of the visual patterns. Lasky and Spiro (1980) reported that 5-month-old infants required at least two seconds between the offset of a brief visual pattern and the onset of a mask before showing showing recognition of the familiar pattern. Masks that followed the offset of the pattern by less than two seconds disrupted recognition. This could imply that the information available to infants as old as five months within the first two seconds of the onset of a visual pattern is too coarse to support good pattern recognition although it may be sufficient to support a global comparison of the two hemifields as indicated in the results reported above. Hemifield comparisons in adults? Is there any evidence for this type of coarse, hemifield competition process in mature visual attention? In adults, there are several lines of evidence suggesting this type of hemifield competition. Rizzolatti, Riggio, Dascola and Umilta (1987) have argued for a premotor theory of attention. Part of this theory involves the idea that eye movements and attention are closely linked. There is a cost if an eye movement program to one hemifield has to be cancelled and a new movement to the opposite hemifield reprogrammed. Moving the eyes in one direction versus the other involves different muscle activation patterns, so there is a sharp divide of attention at the vertical meridian. Shifting attention horizontally across the vertical meridian incurs costs that are more severe than would be predicted solely from the distance between an invalid cue and a target. Although Rizollatti et al. (1987) do not argue for complete homogeneity of attention within a hemifield, the premotor theory may be compatible with the alternative model described above because in both cases, a response in one direction or the other has to be resolved based on a comparison of activity in the two hemifields. A second line of evidence for hemifield competition comes from the phenomenon of visual extinction (e.g., Friedrich, Egly, Rafal & Beck, 1998; Mattingley, Pisella, Rossetti, Rode, Tiliket, Boisson, & Vighetto, 2000). Individuals with unilateral brain lesions, especially right parietal cortical lesions can detect single targets in the contralesional visual field when these targets are presented alone. They have great difficulty detecting these targets, however, if a competing
286
Dannemiller
stimulus is simultaneously presented in the ipsilesional hemifield. Mattingley et al. (2000) have suggested that this type of extinction may result from an inability to divide attention across the vertical meridian. They attribute this difficulty ultimately to the conflicting requirements of programming eye movements in opposite directions. Once again, a process that requires a resolution of conflicting movement into the two hemifields appears to play an important role in how visual attention works. I would suggest that this hemifield competition may be revealed in the studies reported above with young infants. A coarse comparison of stimulation in the two hemifields is used to resolve the problem of where to look first when multiple elements appear in the visual field simultaneously. Nakayama and Mackeben (1989) have argued that in adults there is an initial transient component to focal visual attention that differs both in locus and in time course from a more sustained component. Additionally, Nakayama and Mackeben (1989) have argued that both the transient and sustained components of focal visual attention are cortical in origin, although they don't necessarily share the same cortical substrates. The transient component is supposed to operate at earlier stages of visual cortical processing. Both the transient nature of this system and it's cortical substrate are compatible with aspects of the infant data reported above. The hemifield comparison model is meant to apply only to the initial orienting response to the appearance of multiple potential attentional targets in the visual field. The transient aspect of the process is similar to the transient portion of the Nakayama and Mackeben (1989) model. Additionally, several of the heterochromatic studies cited above apparently involved the differential salience of color contrast. Color is thought to be processed cortically (Lueck, Zeki, Friston, Deiber, Cope, Cunningham, Lammertsma, Kennard, & Frackowiak, 1989). Other data on hemifield comparisons in infants. What evidence in infants is there for these kinds of hemifield differences? Monocularly, there are hemifield differences in the simple detection of visual targets even in young infants. Targets that appear in the temporal visual field are more easily detected than identical targets that appear at the same eccentricity in the nasal visual field (Lewis & Maurer, 1992). Fogel, Karns & Kawai (1990) have argued for a model of right-side dominance for attention control in young infants. Liegeois and De Schonen (1997) showed that simultaneous attention to the two hemifields emerges very late in development: as late as 24 months af age. In all of these studies, there is evidence that visual attention may involve coarse, hemifield mechanisms from early in development. The hemifield comparison model is compatible with this type of coarse differentiation. Whether it provides a quantitatively robust account of the development of visual capture early in life can only be determined from future experiments. Conclusions
A maximum response model of visual capture, which assumes a welldifferentiated visual field, was tested with infants from approximately two months of
Capture in Infancy
287
age to five months of age. The model predicted observed differential salience effects in initial orienting to the simultaneous appearance of multiple potential attentional targets when those salience effects involved an imbalance between the two sides of the stimulus display. When more of the putatively higher salience elements in the visual field appeared ipsilaterally to a moving singleton target, orienting was biased toward the target side. In contrast when more of these higher salience elements appeared contralaterally to the moving target, competition ensued and attention was drawn less reliably to the moving target. These effects could be explained by a signal detection model in which it is assumed that initial orienting is determined by the element in the visual field that leads to the maximum internal response. In contrast, this maximum response model failed to predict the effects of a manipulation of the number of elements in the visual field. Whereas the model predicted a decrease in orienting to the moving singleton target as more bars were added to both sides of the display, the percentage of correct judgments was essentially unaffected by the number of elements in the visual field. The density of static elements near the moving target differed between the two conditions and may have been responsible for some of the reduction in the size of the predicted effect, but it was probably not responsible for the full reduction. An alternative hemifield comparison model was proposed to account for the results of both types of manipulations (spatial imbalance and number). According to this model, attention is captured initially by the side of the visual field with the greater aggregate response. In contrast to the maximum response model that assumes that all of the elements in the visual field lead to independent, noiseperturbed responses (the differentiated visual field assumption), the hemifield comparison model assumes that a coarse comparison (difference computation) of the two visual hemifields drives initial orienting. Evidence from adult models of visual attention involving costs associated with switching attention across the vertical meridian, visual neglect and a transient component to focal visual attention bear some similarity to the proposed hemifield comparison model with infants. Data on the development of visual hemifield asymmetries are also compatible with this model. Additional studies will be necessary to test the quantitative predictions from this alternative model of attentional capture during early postnatal life. Footnotes
ZSee Sheliga, Craighero, Riggio & Rizzolatti (1997) for similarities between spatial attention and directional response systems. )-Especially at the youngest ages, the percentage of correct judgments with eight bars was sometimes below chance (50% correct). This was considered measurement error from binomial sampling, and the percentage of correct judgments was set to 50% to estimate ,am. This yields an estimate of 0 for this parameter and a predicted percentage of correct judgments of 50% with 28 bars.
288
Dannemiller
3Equation l b can be used to generate predictions assuming that the effective number of bars on each side of the display is approximately half of the actual number of bars. For example, instead of 13 static bars on the target side and 27 total static bars, one would use 6.5 static bars on the target side (numerator) and 13.5 total static bars (denominator). The half of a bar could be handled conceptually (although technically not exactly) by assuming that on half the trials there were six effective bars and on half the trials there were seven effective static bars on the target side. Instead of comparing the predictions of Equation 1b for 8 versus 28 bars, they can be compared for 4 versus 14 bars. When this is done, one of the properties of the model is that the same ratio produces larger differences in the percentage of correct judgments when the number of bars is small. The intuition is that as the number of bars grows, the predicted percentage of correct judgments approaches an asymptote of 50% because the contribution of the moving target, as it appears in both the numerator and in the denominator, gets diluted by the additional static bars. Although the same 8:28 ratio can be realized in many ways, as the number of bars grows, the floor on the percentage of correct judgments at 50% constrains the predicted difference between two conditions that differ in this ratio. References
Aslin, R.N., & Salapatek, P. (1975). Saccadic localization of peripheral targets by the very young human infant. Perception & Psychophysics, 17, 293-302. Atkinson, J., Hood, B., Wattam-Bell, J., & Braddick, O.J. (1992). Changes in infants' ability to switch visual attention in the first three months of life. Perception,
21,643-653. Banks, M.S. & Ginsburg, A.P. (1985). Infant visual preferences: A review and new theoretical treatment. In: Reese. H.W (Ed.), Advances in Child Development and Behavior. New York: Academic Press. Bronson, G. (1994). Infants' transitions toward adult-like scanning. Child Development, 65, 1243-1261. Casey, B.J., & Richards, J.E. (1988). Sustained visual attention in young infants measured with an adapted version of the visual preference paradigm. Child Development, 59, 1514-1521. Catherwood, D., Skoien, P., & Holt, C. (1996). Colour pop-out in infant response to visual arrays. British Journal of Developmental Psychology, 14, 315326. Cohen, L. (1972). Attention-getting and attention-holding processes of infant visual preferences. Child Development, 43, 869-879. Dannemiller, J.L. (1998). A competition model of exogenous orienting in 3.5-month-old infants. Journal of Experimental Child Psychology, 68, 169-201. Dannemiller, J. L. (in press). Relative color contrast drives competition in early exogenous orienting. Infancy.
Capture in Infancy
289
Dannemiller, J., & Freedland, R. (1989). The detection of slow stimulus movement in 2- to 5-month olds. Journal of Experimental Child Psychology, 47, 337-355. Dannemiller, J.L., & Nagata, Y. (1995). The robusmess of infants' detection of visual motion. Infant Behavior & Development, 18, 371-389. Dannemiller, J. L., & Stephens, B. (under review). Contrast polarity and moving target detection in young human infants. Journal of Vision. Dobson, V., & Teller, D.Y. (1978). Visual acuity in human infants: a review and comparison of behavioral and electrophysiological studies. Vision Research, 18, 1469-1483. Fantz, R.L. (1958). Pattern vision in young infants. Psychological Record, 8, 43-47. Fogel, A., Karns, J., & Kawai, M. (1990). Lateral asymmetry in attention for three-month-old human infants during face-to-face interaction with mother.
Developmental Psychobiology, 23, 1-14. Folk, C.L., & Remington, R. (1998). Selectivity in distraction by irrelevant featural singletons: Evidence for two forms of attentional capture. Journal of Experimental Psychology: Human Perception and Performance, 24, 847-858. Foley, J.M., & Schwarz, W. (1998). Spatial attention: effect of position uncertainty and number of distractor patterns on the threshold-versus-contrast function for contrast discrimination. Journal of the Optical Society of America Part A, Optics and Image Science, 15, 1036-1047. Freeseman, L.J., Colombo, J., & Coldren, J.T. (1993). Individual Differences in Infant Visual Attention - 4-Month-Olds' Discrimination and Generalization of Global and Local Stimulus Properties. Child Development, 64, 1191-1203. Friedrich, F.J., Egly, R., Rafal, R.D., & Beck, D. (1998). Spatial attention deficits in humans - a comparison of superior parietal and temporal-parietal junction lesions. Neuropsychology, 12, 193-207. Gayl, I.E., Roberts, J.O., & Werner, J.S. (1983). Linear systems analysis of infant visual pattern preferences. Journal of Experimental Child Psychology, 35, 3045. Green, D.M. & Swets, J.A (1966). Signal Detection Theory and Psychophysics. New York: Wiley. Gwiazda, J., Bauer, J., Thorn, F., & Held, R. (1997). Development of spatial contrast sensitivity from infancy to adulthood - psychophysical data. Optometry and Vision Science, 74, 785-789. Hood, B.M. (1993). Inhibition of Return Produced by Covert Shifts of visual Attention in 6-Month-Old Infants. Infant Behavior and Development, 16, 245-254. Hood, B.M., & Atkinson, J. (1993). Disengaging visual attention in the infant and adult. Infant Behavior and Development, 16, 405-422. Johnson, M.H., & Tucker, L.A. (1996). The development and temporal dynamics of spatial orienting in infants. Journal of Experimental Child Psychology, 63, 171-188.
290
Dannemiller
Karmel, B.Z. (1969). The effect of age, complexity, and amount of contour on pattern preferences in human infants. Journal of Experimental Child Psychology, 7, 339-354. Lasky, R.E., & Spiro, D. (1980). The processing of tachistoscopically presented visual stimuli by five-month-old infants. Child Development, 51, 12921294. Lewis, T.L., & Maurer, D. (1992). The development of the temporal and nasal visual fields during infancy. Vision Research, 32, 903-911. Liegeois, F., & De Schonen, S. (1997). Simultaneous attention in the two visual hemifields and interhemispheric integration: A developmental study on 20-to26-month-old infants. Neuropsychologia, 35, 381-385. Lueck, C.J., Zeki, S., Friston, K.J., Deiber, M.P., Cope, P., Cunningham, V.J., Lammertsma, A.A., Kennard, C., & Frackowiak, R.S.J. (1989). The colour centre in the cerebral cortex of man. Nature, 340, 386-389. Matsuzawa, M., & Shimojo, S. (1997). Infants' fast saccades in the gap paradigm and development of visual attention. Infant Behavior and Development, 20, 449-455. Mattingley, J.B., Pisella, L., Rossetti, Y., Rode, G., Tiliket, C., Boisson, D., & Vighetto, A. (2000). Visual extinction in oculocentric coordinates: a selective bias in dividing attention between hemifields. Neurocase, 6, 465-475. McLeod, P., Driver, J., & Crisp, J. (1988). Visual search for a conjunction of movement and form is parallel. Nature, 332, 154-155. Milewski, A.E. (1976). Infants' discrimination of internal and external pattern elements. Journal of Experimental Child Psychology, 22, 229-246. Nakayama, K., & Mackeben, M. (1989). Sustained and transient components of focal visual attention. Vision Research, 29, 1631-1647. Nothdurft, H.C. (2000). Salience from feature contrast: additivity across dimensions. Vision Research, 40, 1183-1201. Palmer, J., Ames, C.T., & Lindsey, D.T. (1993). Measuring the effect of attention on simple visual search. Journal of Experimental Psychology: Human Perception and Performance, 19, 108-130. Palmer, J., Verghese, P., & Pavel, M. (2000). The psychophysics of visual search. Vision Research, 40, 1227-1268. Quinn, P.C., & Bhatt, R.S. (1998). Visual pop-out in young infants: Convergent evidence and an extension. Infant Behavior and Development, 21,273288. Rafal, R., Henik, A., & Smith, J. (1991). Extrageniculate contributions to reflex visual orienting in normal humans: A temporal hemifield advantage. Journal of Cognitive Neuroscience, 3, 322-328. Rizzolatti, G., Riggio, L., Dascola, I., & Umilta, C. (1987). Reorienting attention across the horizontal and vertical meridians: evidence in favor of a premotor theory of attention. Neuropsychologia, 25, 31-40.
Capture in Infancy
291
Roessler, J., & Dannemiller, J. (1997). Changes in infants' sensitivity to slow displacements over the first 6 months. Vision Research, 37, 417-423. Ross, S., & Dannemiller, J.L. (1999). Color contrast, luminance contrast and competition within exogenous orienting in 3.5-month-old infants. Infant Behavior and Development, 22, 383-404. Sheliga, B.M., Craighero, L., Riggio, L., & Rizzolatti, G. (1997). Effects of spatial attention on directional manual and ocular responses. Experimental Brain Research, 114, 339-351. Teller, D.Y. (1979). The forced-choice preferential looking procedure: A psychophysical technique for use with human infants. Infant Behavior and
Development, 2, 135-153. Theeuwes, J. (1991). Exogenous and endogenous control of visual attention: The effect of visual onsets and offsets. Perception & Psychophysics, 49, 83-90. Theeuwes, J., Kramer, A.F., & Atchley, P. (1999). Attentional effects on preattentive vision: spatial precues affect the detection of simple features. Journal of Experimental Psychology: Human Perception and Performance, 25, 341-347. Tyler, C.W., & Torres, J. (1972). Frequency respnse characteristics for sinusoidal movement in the fovea and periphery. Perception and Psychophysics, 12, 232-236. Wright, M.J., & Johnston, A. (1985). The relationship of displacement thresholds for oscillating gratings to cortical magnification, spatiotemporal frequency and contrast. Vision Research, 25, 187-193. Yantis, S., & Egeth, H.E. (1999). On the distinction between visual salience and stimulus-driven attentional capture. Journal of Experimental Psychology: Human Perception and Performance, 25, 661-676. Yellott, J.I. (1977). The relationship between Luce's choice axiom, Thurstone's theory of comparative judgment, and the double exponential distribution. Journal of Mathematical Psychology, 15, 109-144. Author Note
James L. Dannemiller is in the Department of Psychology and the Waisman Center at the University of Wisconsin - Madison. This research was supported by NICHD R01 HD32927. I thank Mari Riess Jones for helpful comments on an earlier version of this manuscript. I thank Jacqueline Roessler for observing the infants, Manya Qadir for scheduling the infants, and Daniel Replogle for the computer programming. Correspondence concerning this article should be addressed to James L. Dannemiller, Waisman Center, University of Wisconsin - Madison, 1500 Highland Avenue, Madison, WI 53705-2280. Electronic mail may be sent via the Internet to
[email protected].
This Page Intentionally Left Blank
Attraction, Distraction,and Action: MultiplePerspectiveson AttentionalCapture C. Folkand B. Gibson(Editors) 9 ElsevierScienceB. V. All rights reserved.
12
293
Attentional Capture, Attentional Control and Aging
Arthur F. Kramer, Charles T. Scialfa, Matthew S. Peterson and David E. Irwin
The goal of the present chapter is to address the issue of whether there are age differences in attentional capture and if so to explore the nature of such age-related changes in attention. However, before addressing this specific question we provide a brief description of the theoretical context within the realm of cognition and aging in which the issue of attentional capture might be addressed. More specifically, we discuss current models of cognition and aging, such as general slowing models, inhibitory deficit models, and executive control models, and describe how these models treat the issue of attentional control and attentional capture. We then provide a brief discussion of the major issues of concern to the field of attentional capture and control. Next, we review the literature of relevance to age-related changes in attentional capture and control in a number of different domains including spatial cueing, visual search, focused attention, and overt attention. Finally, we conclude with suggestions for future research on aging and attentional capture.
Cognitive Aging: Theory and Research Perhaps the most robust observation in the literature on cognitive aging over the past several decades has been that performance declines on a multitude of laboratory and real-world tasks from young adulthood to old age. Indeed, decreases in performance during aging have been observed from the simplest laboratory tasks such as simple, choice and disjunctive reaction time to complex real-world tasks such as driving, flying and the operation of automated teller machines (Birren & Schroots, 1996; Mead & Fisk, 1998; Salthouse, 1996; Tsang, 1996). However, although observations of age related declines in cognition abound, the psychological mechanisms which underlie these observations continue to be studied and debated. One line of research and theorizing has focused on processing speed as a central explanatory construct for age-related declines in cognitive function. Early processing speed models suggested that slowing was the result of a general decline in function as a consequence of increased noise in the central nervous system attributable to neuronal and glial degeneration (Birren, 1965). More recent slowing models have suggested that multiple independent factors are responsible for slowing under different conditions and in different tasks (Cerella, 1985; Lawrence, Myerson & Hale, 1998; Salthouse, 1996). For example, distinctions have been made between
294
Kramer, Scialfa, Peterson and Irwin
verbal and visuospatial slowing with visuospatial processing showing more substantial age-related declines (Jenkins et al., 2000). Indeed, a large body of research now suggests that a substantial amount of age-related variance in complex cognitive tasks can be accounted for by a relatively small set of slowing factors. Another line of inquiry concerning cognitive decline during aging can be traced to a seminal paper by Hasher and Zacks (1988). In that paper, the authors provided a detailed and critical review of age-related differences in the inhibition of representations of and actions towards environmental events as well as information stored in long term and working memory. Hasher and Zacks (1988; see also Zacks & Hasher, 1997) suggested that age-related processing deficits in a variety of cognitive skills can be accounted for by a decrease in the efficiency of inhibitory processing. More specifically, inefficient inhibition could result in failures of selective attention which may, in tum, result in the intrusion of task irrelevant information into working memory. The consequences of these inhibitory failures would include both increased processing time and reductions in recognition and recall of relevant information. A thorough review of the evidence for and against this general inhibitory hypothesis is beyond the scope of this paper (see Burke, 1997; McDowd, 1997; Zacks & Hasher, 1997 for reviews of this literature). However, within the context of visual attention it is becoming increasingly apparent that specific rather than general inhibitory deficits are observed during the course of normal aging. Consider a classic interference paradigm, the Stroop task. In this task subjects are to verbalize the color in which a word is printed while ignoring the semantic content of the word. Older adults take substantially longer to verbalize colors which are inconsistent with the semantics of the word (e.g. the word blue painted in red ink - Houx, Jolles & Vreeling, 1993; Kwong See & Ryan, 1995; Rogers & Fisk, 1991; Spieler, Balota & Faust, 1996; but see Salthouse, 1996; Vakil, Manovich, Ramati & Blachstein, 1996). Thus, it would appear that older adults have more difficulty suppressing word meaning during color naming. The negative priming paradigm has also served as a popular testbed for this proposed age-related general inhibitory decline. In the negative priming task, subjects are asked to respond to targets and ignore simultaneously presented distractor stimuli. The critical comparison is between trials in which a distractor from trial n-1 becomes a target on trial n (i.e., the ignored repetition (IR) condition) and trials in which different target and distractor stimuli are presented on trials n and n-1 (i.e., the control condition). In general, longer reaction times (RTs) are obtained in IR conditions than in control conditions, defining the negative priming effect. The negative priming effect is quite robust and has been obtained in a variety of tasks, and for a number of stimulus and response types (Neill & Valdes, 1996). Initial aging and negative priming studies suggested that older adults failed to produce the difference between ignored repetition and control conditions seen in younger adults (Kane et al., 1994; Hasher, Stoltzfus, Zacks & Rympa, 1991; Tipper, 1991). These results were interpreted as indicating a failure of selective inhibition by the older adults. However, more recent studies (Kieley & Hartley, 1997; Kramer
Capture, Control and Aging
295
et al., 1994; Sullivan & Faust, 1993; Sullivan, Faust & Balota, 1995) have reported equivalent negative priming effects for young and old adults. Although there is no agreement on why this discrepancy exists, possibilities include the sensitivity of the experimental design to the relatively small negative priming effect (on the order of 10 to 20 ms) and the difficulty of selection in the task (i.e. larger negative priming effects have been reported when selection of the target is difficult; Moore, 1994). The greater variability in response time for older than younger adults may mask the negative priming effects for older adults. It is also conceivable that since the utilization of inhibition in the negative priming task is presumably effortful (Engle, Conway, Tuholski & Shisler, 1995), its use by older adults will only be observed in difficult selection tasks. Finally, the negative priming effect may be subserved by multiple inhibitory mechanisms only some of which are sensitive to aging (May et al., 1995; Zacks & Hasher, 1997). In any event, it is clear from the extant literature that at least some aspects of inhibitory processing are compromised in older adults (Dempster, 1992; Kramer et al., 1994). Whether such processing deficits impact manifestations and mechanisms which support attentional capture (and attentional control) in older adults will be addressed below. Although inhibitory failures might be implicated in lifespan differences in attentional capture, other types of attentional control might also play a role. A broader view of age related changes in cognition, which subsumes inhibitory control, has been offered in the form of executive control/frontal lobe theories of aging. In his recent critical review of the literature on the neuroanatomy, neurophysiology and neuropsychology of aging West (1996) concluded that relatively strong evidence exists for the frontal lobe hypothesis of cognitive aging (see also Dempster, 1992; Kramer et al., 1994). The frontal lobe hypothesis suggests that older adults are disproportionately disadvantaged on tasks that rely heavily on cognitive processes (e.g. executive control processes) that are supported, in large part, by the frontal and prefrontal lobes of the brain. Indeed, there is a good deal of evidence to suggest that morphological and functional changes in brain activity do not occur uniformly during the process of normal aging (Raz, 2000). Researchers have reported substantially larger reductions in gray matter volume in association areas of cortex, and in particular in the prefrontal and frontal regions, than in sensory cortical regions (Coffey et al., 1992; Pefferbaum et al., 1992; Raz et al., 2000). Studies of functional brain activity employing Positron Emission Tomography (PET) have reported similar trends, with prefrontal regions showing substantially larger decreases in metabolic activity than sensory areas of cortex (Azari et al., 1992; Salmon et al., 1991; Shaw et al., 1984). These data on the structure and function of the aging brain are consistent with numerous reports of large and robust age-related deficits in the performance of tasks that are largely supported by the frontal and prefrontal regions of the cortex, as compared to relatively small age-related deficits on non-frontal lobe tasks (Ardila & Rosselli, 1989; Daigneault et al., 1992; Shimamura & Jurica, 1994). Indeed, many of the tasks subserved, in large part, by the frontal lobes involve processes associated with executive control functions such as the selection, control, and
296
Kramer, Scialfa, Peterson and Irwin
coordination of computational processes that are responsible for perception and action. For example, large age-related deficits have generally been reported when adults are required to perform two or more tasks at the same time or to rapidly shift emphasis among tasks (Korteling, 1991; Kramer et al., 1999; Mayr & Liebscher, 1996; Rogers et al., 1994). Functional magnetic resonance imaging and positron emission tomography studies have shown enhanced activation of regions of the prefrontal and frontal cortices when two tasks are performed together but not when they are performed separately (Corbetta et al., 1991; D'Esposito et al., 1995). Kliegl and colleagues (Mayr & Kliegl, 1993; Verhaeghen et al., 1997) have also reported that reliably larger age-related performance decrements are observed in tasks which require coordinative operations (i.e. mental arithmetic operations in which a product must be held in working memory as other computations are performed) than for tasks which require sequential operations (i.e. mental arithmetic operations that do not require storing and retrieving products from working memory while carrying out arithmetical operations). Furthermore, such differences were independent of general age-related differences in the speed of performance. In summary, a good deal of behavioral, neuropsychological, and neuroanatomical evidence has accrued in recent years which supports the view that selective aspects of executive control (e.g. inhibitory processing, coordination of multiple skills and tasks) decline with advancing age. Given the role of control processes in attention we might then expect to observe age related changes in attentional capture, at least under conditions in which top-down factors such as expectations and intentions oppose stimulus-driven factors such as the appearance of new objects or other changes which render some stimuli dimensionally unique (e.g. a flashing marquee on a theatre). Attentional Control: Interaction of Stimulus-Driven and Goal-Directed Attention
Although the central focus of this chapter is on the phenomenon of attentional capture, it is difficult to discuss this construct without first describing the two highly interacting components of attention which determine whether attentional capture will be realized in any particular context. Concepts like goal-directed (topdown) and stimulus-driven (bottom-up) attention have been discussed for at least the past century (James, 1890) in an effort to describe the interactions that occur in the human information processing system in the service of visual selection. Goaldirected attention refers to an individual's ability to selectively process information in the environment. Central to the definition of goal-directed attention is that this form of attentional control relies on an observer's expectancies about events in the environment, knowledge of and experience with similar environments, and the ability to develop and maintain an attentional set for particular kinds of environmental events. In contrast, stimulus-driven attention entails the control of attention by characteristics of the environment, independently of an observer's intentions, expectancies or experience.
Capture, Control and Aging
297
Quite often goal-directed and stimulus driven aspects of control interact to determine the focus of attention (Egeth & Yantis, 1997). For example, searching your office for a particular book might entail some, however imperfect, knowledge of where you last left the book (i.e. goal-directed attention) along with the fact that the book is unusually large (i.e. stimulus-driven attention). In order to understand how these two forms of control interact to influence the focus of attention researchers have deemed it important to examine the mechanisms and characteristics of each of these constructs independently of the other. Indeed, the phenomenon of attentional capture, the main focus of this chapter, has been defined as an expression of stimulus-driven attention in the absence of observer expectancies or attentional set or preparation (Pashler, 1988, Theeuwes, 1992; Yantis & Jonides, 1984). That is, capture occurs when a feature of the environment which an observer is not searching for grabs attention. Although it is beyond the scope of this chapter to provide an extensive review of the literature on attentional capture we briefly mention a few important points in order to provide a context in which to explore changes in attentional capture and control during aging. First, task irrelevant singletons have been shown to capture attention, under some circumstances. For example, Pashler (1988) had subjects search for an obliquely oriented line among "0" distractors, a feature search task. On a subset of trials one of the distractors appeared in a unique color. Although subjects were instructed to ignore these task irrelevant singletons, their appearance disrupted search performance (see also Theeuwes, 1991). Other studies have failed to find evidence of capture of attention by task irrelevant singletons. For example, Jonides and Yantis (1988) instructed subjects to search for a predefined target letter among letter distractors. In each display one letter differed from the other letters in either luminance or color. However, subjects were instructed that this singleton was uncorrelated with the location and the identity of the target; that is the singleton was the target on 1/n trials with n being the total number of distractors in the display. In this situation evidence of attentional capture by the task irrelevant singleton would be provided by an observation of search performance which was fast and independent of the number of distractors in the display when the singleton served as the target. However, performance did not differ for the singleton and non-singleton targets (see also Hillstrom & Yantis, 1994; Todd & Kramer, 1997; Yantis & Egeth, 1999). Several lines of research have helped to uncover the reasons for the discrepancies between studies which have found evidence that task-irrelevant singletons capture attention and those that did not. Bacon and Egeth (1994) argued that the attentional strategies adopted by subjects influence whether a task-irrelevant singleton will capture attention. More specifically, Bacon and Egeth suggested that under certain conditions subjects can adopt a singleton detection search strategy in which they search for the most salient object in the display. In the case of search for a singleton target in the presence of a singleton distractor (e.g. a uniquely shaped target and a uniquely colored distractor) utilization of the singleton detection search strategy will lead to the capture of attention by the distractor, at least on a proportion
298
Kramer, Scialfa, Peterson and Irwin
of the experimental trials. Indeed, when Bacon and Egeth made it difficult for subjects to use the singleton search strategy, by presenting a non-singleton target (i.e. a target that was not uniquely different among some stimulus dimension from other objects in the display), the task irrelevant singleton distractor failed to capture attention (see also Yantis & Egeth, 1999). The effectiveness of task-irrelevant singletons to capture attention has also been suggested to be modulated by attentional control settings (Folk, Remington, & Johnston, 1992, 1993; Folk & Remington, 1998; see also Atchley et al., 2000; Gibson & Kelsey, 1998), which determine what aspects of the environment can automatically guide attention. For example, Folk et al. (1992) found that when the target was a uniquely colored item, an onset distractor precue with no predictive qualities failed to capture attention; likewise, when the target was defined as the onset item, a colored precue showed no evidence of attentional capture. Only when the precue matched the target-defining attribute did it show evidence of attentional capture. These results led Folk and colleagues to propose the contingent involuntary orienting hypothesis which suggests that stimulus-driven shifts of attention are contingent on attentional control settings, expectancies and experience. However, although high level cognitive processes determine the nature of the attentional set, attentional capture of environmental events is purely stimulus driven without the opportunity for further control (Folk et al., 1993). The results of a number of studies suggest that a subset of stimulus characteristics, specifically abrupt onsets and the appearance of new objects in the visual field, may engender attentional capture independently of at least some forms of top-down control and attentional control settings. For example, Yantis and Jonides (1984) had subjects search for a predefined target letter among other letters in a display. In each display all but one of the letters were constructed by removing segments of figure 8 premasks. These letters were referred to as non-onset stimuli. In addition, one new letter was added to the display concurrently with the removal of segments of the figure 8 premasks. This new letter was referred to as an onset. Although in these experiments the onset letter was no more likely to be the target than any of the other letters (i.e. the onset letter was the target 1/n trials, with n being equal to the total number of letters in a display), when the onset letter was the target search performance was fast and independent of the number of letters in the display. These data were interpreted as evidence that the onset was always attended first, that is, that abrupt onsets capture attention. More recent research (Yantis & Hillstrom, 1994) has suggested that it is not the abrupt onset of the stimulus per se which captures attention but instead the fact that a new object has appeared in the visual field. Similar capture effects for abrupt onsets or new objects have also been observed even when the onset never serves as a target (Remington et al., 1992) and for saccadic eye movements as well as covert attention (Irwin et al., 2000; Kramer et al., 1999, 2000; Theeuwes et al., 1998). However, even capture of attention by onsets or new objects can be moderated by other factors. The effectiveness of onsets to capture attention can be reduced by focusing attention elsewhere in the visual field (Theeuwes, 1991; Theeuwes et al., 1998; Yantis and Jonides, 1990),
Capture, Control and Aging
299
with extensive practice in searching for a target in the presence of a onset distractor singleton (Warner et al., 1990), and when multiple offsets occur in a display (Martin-Emerson & Kramer, 1997). In summary, a great deal of evidence now exists for the independence of stimulus-driven and goal-directed components of attention as well as their interaction in guiding attention in the environment. It also appears clear that while stimulus driven attention can have a substantial influence on the prioritization of stimuli for visual selection, a variety of goal-directed and strategic factors often interact with stimulus-driven aspects of control in influencing the prioritization hierarchy.
Aging and Attentional Capture In subsequent sections we examine the influence of age, from young adulthood to old age, on attentional capture and control. The following sections are organized around both specific experimental and theoretical issues in attention as well as with respect to the paradigms which have been employed to examine substantive theoretical issues concerning aging and attention.
Attentional capture and spatial cueing The study of spatial cueing has long served an important role in the examination of top-down and bottom-up factors in the control of attention. For example, Jonides (1981) conducted a series of studies in which he examined the influence of centrally located symbolic cues (e.g. an arrow) and peripheral location markers (e.g. bar markers) on visual spatial attention in tasks which required finding a target among distractors. Important manipulations in these studies included the validity with which the cue predicted the location of the target and the instructions to subjects to either attend to or ignore the cues. When subjects were instructed to attend to the cue, substantial validity effects (i.e. faster and more accurate performance when the cue predicted the location of the target than when it did not) were obtained for both the peripheral and central cues. However, when subjects were instructed to ignore the cues, a validity effects was only obtained for the peripheral cue (see also Remington et al., 1992). These data were interpreted as evidence that peripheral onset cues captured attention automatically while central symbolic cues required voluntary effort to direct attention (i.e. where employed in a strategic fashion). More recently, Mtiller and Rabbitt (1989; see also Cheal & Lyon, 1991) observed that the temporal dynamics differed for central and peripheral cues, with cueing effects reaching asymptote earlier and decreasing more rapidly for peripheral than for central cues. The results of studies such as these prompted researchers to argue for two different attentional mechanisms. The exogenous mechanism presumably responds to peripheral onsets, captures attention automatically and engenders rapid but transitory cueing effects. On the other hand, the endogenous mechanism appears to
300
Kramer, Scialfa, Peterson and Irwin
respond to symbolic cues, require voluntary effort and display a slower but more sustained time course. Studies which have examined age-related differences in peripheral and central cueing effects have, until recently, produced mixed results. For example, studies of cueing effects with centrally located symbolic cues have observed similar cueing effects for older and younger adults (Hartley et al., 1990), larger cueing effects for older than for younger adults (Greenwood & Parasuraman, 1994; Nissen & Corkin, 1985), and smaller cueing effects for older adults (Folk & Hoyer, 1992). Peripheral, abrupt onset cues have been observed to produce similar cueing effects for young and old adults, at least up to age 75 (Folk & Hoyer, 1992; Greenwood & Parasuraman, 1994; Hartley et al., 1990) as well as smaller cueing effects for older adults (Madden, 1990). Discrepancies in the results of these studies could be due to a number of factors including: the validity of the cues, the age and general health of the populations, the nature of the tasks, and the stimulus onset asynchronies (SOAs) between the cues and the imperative displays. A number of recent studies have addressed some of these issues during the examination of potential age differences in spatial cueing effects. Lincourt, Folk and Hoyer (1997) examined both the time course and the magnitude of cueing effects with central and peripheral cues. The central arrow cues indicated the location of the target with 75% validity while the peripheral cue possessed a validity of 25% (chance given four possible target locations). The time course and magnitude of the central cue effects were similar for young and old adults. Peripheral cueing effects were substantially larger for the old than for the young adults. However, there is one important caveat. The peripheral cueing effects for the young adults were unusually small and not significantly different from zero. Nevertheless, these results are potentially important in that they showed larger peripheral cueing effects for the older adults under conditions in which endogenous or top-down control would not likely be effective (i.e. since the peripheral cues predicted neither the location nor the identity of the target). Juola, Koshino, Warner, McMickell and Peterson (2000) presented both central and peripheral cues on each trial. Subjects were instructed that central arrow cues would always predict the target location with high reliability (i.e. 75%). One group of young and older adults were instructed that the peripheral cues would also be highly reliable (i.e. 75%) while a second group of subjects was instructed to ignore the peripheral cues since they would not reliably predict the location of the subsequent target (i.e. validity was 25%). SOAs were also manipulated such that the peripheral cue either appeared simultaneously with the central cue or 157 ms later. The old and young adults showed similar cueing effects when both the peripheral and central cues were valid or invalid. However, while young adults were able to ignore the unreliable peripheral onset cue, especially when it appeared after the central cue, the older adults were unable to ignore the unreliable peripheral cue even when it followed the central cue. Thus, these results like those obtained by Lincourt et al. (1997) suggest that older adults have a diminished ability to inhibit attentional capture. It is conceivable however, that age equivalence might be observed if
Capture, Control and Aging
301
subjects are given a longer period of time to prefocus attention on the basis of a central cue before the appearance of an abrupt onset peripheral cue. Pratt and Bellomo (1999) employed Folk et al.'s (1992) precueing paradigm to examine potential age-related differences in response to color and onset cues. Subjects were presented with either a spatially unreliable color or onset cue and asked, on different trials, to report the identity of a target which appeared as either an onset or in a unique color. Both young and old adults showed that color but not onset targets captured attention when an unpredictable color cue had been presented. Furthermore, the magnitude of the cueing effect was similar for the two age groups. On the other hand, older adults showed a larger capture effect than younger adults when an onset cue preceded an onset target. Moreover, the age difference remained even with an analysis of proportional validity effects, suggesting that the larger capture effects for the older adults cannot be a accounted for by general slowing. In summary, the bulk of the data collected thus far suggests that older adult have more difficulty inhibiting the capture of covert attention by unreliable peripheral onset cues than do young adults. However, additional research is needed to determine if such deficiencies can be overcome by additional preparation time (i.e. a longer interval to shift attention to a different location in the visual field prior to the appearance of an onset cue) or training. Attentional capture and overt attention
In the majority of the research that we discuss in the present chapter attention is assumed to be covert in nature. That is, researchers are interested in attentional capture which occurs independently of the position of the eyes. Indeed, in many of these studies displays are presented briefly to avoid eye movements or subjects are asked to maintain fixation as they perform a task. However, outside the laboratory door shifts of attention often entail shifts of the eyes in order to foveate areas of interest in the visual field. Thus, it appears reasonable to ask whether age related differences are observed with overt attention (i.e. eye movements or saccades) as they are with covert attention. Prior to discussing the research which inquires as to age-related differences in eye movement control and capture we believe that it is important to briefly discuss the relationship between covert and overt (eye movements) attention. It is clearly the case that we can shift attention independently of the eyes (Eriksen & Yeh, 1985; Posner, 1980). However, research which has examined the relationship between covert attention and overt attention (i.e. via saccades) in free viewing situations has, in general, found a close coupling between saccade programming and covert attention. For example, Deubel and Schneider (1996) found that letter identification performance was best when the letter to be identified was also the target of a saccade. Similarly, Hoffman and Subramaniam (1995) had subjects detect a visual target just prior to making a saccade and found that detection performance was best when the location of the target and the subsequent saccade
302
Kramer, Scialfa, Peterson and Irwin
were the same (see also Henderson & Hollingworth, 1999; Kowler et al., 1995; Sheliga et al., 1997). Indeed, we have recently obtained evidence that attention precedes not only voluntary but also involuntary or reflexive saccades (Irwin, Brockmole & Kramer, submitted). Finally, a number of recent Positron Emission Tomography (PET) and functional Magnetic Resonance Imaging (fMRI) studies have reported highly overlapping activation pattems in the brain during covert and overt attentional tasks (Corbetta, 1998; Nobre et al., 2000). Thus, in summary, it would appear that attention often precedes saccades to locations in the visual field. A number of aging studies have employed an eye movement paradigm that would appear to be ideally suited to the study of attentional control and capture. The antisaccade task, first introduced by Hallett in 1978, involves the presentation of an abrupt onset stimulus to the right or left of fixation in an otherwise empty visual field. The subject's task is to detect the onset using peripheral vision and rapidly look in the opposite direction. Performance on the antisaccade task, which clearly requires that subjects suppress a reflexive eye movement towards the onset stimulus while programming and executing a goal-directed saccade in the opposite direction, is dramatically affected by lesions in the frontal and prefrontal regions of the brain that are involved in the programming of goal-directed saccades (Guitton et al., 1985; Pierrot-Deseilligny et al., 1991; Rivaud et al., 1994). Frontal lobe patients have great difficulty inhibiting reflexive saccades to the onset stimulus, typically making saccades to the onset on 70 to 80% of the trials (as compared with approximately 10% misfixations by non-patients). Given the often reported changes in frontal lobe morphology and decreases in metabolism during the course of normal aging (Azari et al., 1992; Coffey et al., 1992; Raz, 2000; West, 1996) the antisaccade task would appear to provide an excellent testbed for the examination of overt shifts of attention, particularly with regard to the study of resistance to capture by the abrupt onset stimulus. Interestingly, the first study of potential age related differences in performance on the antisaccade task was published only recently. Olincy et al. (1997) had young and old adults perform both prosaccade (i.e. move your eyes to the peripheral onset stimulus) and antisaccade tasks. Three important findings were obtained in their study. First, the proportion of saccades to the onset stimulus (erroneous prosaccades in the antisaccade task) increased linearly from approximately 10% for 20 year olds to 50% for 80 year olds. Second, the latency on those trials on which the eyes did move in the opposite direction of the stimulus (i.e. correct antiscaccade trials) increased substantially with aging. Third, the latency of eye movements for correct trials on the antisaccade task was disproportionally increased for older adults relative to the latency on trials in which subjects were instructed to move their eyes to the onset stimulus. On the basis of these results the authors concluded that the inhibitory processes necessary for the suppression of eye movements to taskirrelevant events are compromised during the course of normal aging. However, other researchers have failed to replicate Olincy et al.'s (1997) age-related differences in antisaccade performance. For example, in large lifespan studies of performance on a variety of eye movement tasks Fischer et al. (1997) and
Capture, Control and Aging
303
Munoz et al. (1998) found that older adults made similar numbers of prosaccade errors (i.e. movement of the eyes toward the onset) on the antisaccade task as younger adults. These researchers also failed to find the disproportionate increase in saccadic latency on correct antisaccade trials for the older adults that was observed by Olincy et al. (1997). One potential reason for the discrepancy in effects among these studies might be the amount of practice received by the subjects on the eye movement tasks. The subjects in the Olincy et al. (1997) study received substantially less practice than subjects in Fischer et al.'s and Munoz et al.'s studies. Thus, older adults might be capable of improving their performance on the antisaccade task, that is successfully suppressing an eye movement to the abrupt onset, with a modest amount of practice. Butler et al. (1999) required subjects to identify the direction of an arrow at the position to which the eyes were to move while also performing the antisaccade task. Clearly, this dual-task version of the antisaccade task requires a greater degree of control and coordination than the traditional antisaccade task and therefore might be expected to reinstate the age related differences in performance observed by Olincy et al. (1997) with unpracticed subjects. Indeed, older subjects made a greater number of prosaccade errors in the antisaccade task but the increase in saccadic latency from the pro- to the antisaccade task was similar for the young and the old adults. The studies reviewed thus far suggest that older adults have difficulty suppressing an inappropriate eye movement in situations in which they are relatively unfamiliar with the task and procedures (Olincy et al., 1997) and when attempting to coordinate their eye movements with an additional task (Butler et al., 1999). Interestingly, Roberts et al. (1994; see also Walker et al., 1998) found that young adults also make substantially more prosaccade errors in the antisaccade task when required to perform the antisaccade concurrently with a working memory task. Thus, working memory, which is often reported to decline with age (West, 1996), would appear to be necessary to ensure that the eyes are directed away from the onset, possibility by maintaining this goal in an active state. In an attempt to further examine age related changes in eye movement control Nieuwenhuis et al. (2000) manipulated the SOA between the direction cue and the peripheral target (similar to the peripheral target to be identified in Butler et al., 1999) and found that at short SOAs the older adults showed similar prosaccade errors and saccadic latency in the antisaccade task to younger adults. However, when the SOA between the direction cue and the peripheral target was lengthened older adults' prosaccade errors and saccadic latency increased to a much greater extent than they did for younger adults. The authors interpreted these results as evidence that older adults are able to capitalize on the exogenous nature of the peripheral target at the short cue-target SOAs. That is, older (and younger) adults' eyes were captured by the peripheral abrupt onset target which, in turn, served to diminish age-related differences in eye movement errors and saccadic reaction time. On the other hand, when this was no longer possible, that is when eye movements had to be programmed prior to the presentation of the peripheral targets (i.e., at long
304
Kramer, Scialfa, Peterson and Irwin
cue-target SOAs), older adults were impaired at maintaining and implementing the goal of the task (i.e., to look in the opposite direction of the cue). In other words, older adults appear to demonstrate goal neglect, that is the inability to maintain the cue-action representation at a sufficient action level to prevent inappropriate eye movements (Duncan, 1995; De Jong et al., 1999). Indeed, this distinction fits well with our increasing understanding of the components of the oculomotor system which support voluntary (goal-directed) and reflexive (stimulus-driven) saccades and their relative sensitivity to aging (LaBerge, 1995; Pierrot-Deseilligny et al., 1995; Schall, 1995). Goal-directed saccades depend on the functional integrity of a number of frontal and prefrontal areas including the frontal eye fields, supplementary eye fields, and dorsolateral prefrontal cortex. On the other hand, reflexive saccades appear to be generated in a parietal-midbrain (i.e. superior colliculus) circuit. Furthermore, frontal regions have been found to be more sensitive to aging than midbrain areas such as the superior colliculus (Raz,
2000). The results of a recent series of studies conducted in our laboratory are consistent with the notion that age-related differences in oculomotor capture will be observed to the extent subjects must exert voluntary control over their eye movements in the presence of stimulus-driven influences such as task-irrelevant abrupt onsets. Old and young adults were presented with six gray circles with small figure-8 pre-masks inside. After 1000 ms the color of five of the circles changed to red and segments of the figure-8 pre-masks were removed to reveal letters. Subjects were instructed to move their eyes from the center of the display to the color singleton (i.e. the uniquely colored item) as soon as they detected the color change and identify the letter inside of the gray circle. On a subset of trials a new red circle (i.e. an abrupt onset) appeared simultaneously with the color change which cued the location of the color singleton target. The abrupt onset never served as the target nor did it predict the location of the target (as in the antisaccade task). This task has been referred to as the oculomotor capture paradigm (Theeuwes et al., 1998) Under the conditions described above older and younger adults misdirected their eyes to the task-irrelevant onset on essentially the same proportion of trials (approximately 20 to 40%) across three separate experiments (Kramer et al., 1999, 2000). Interestingly, the great majority of young and the old adults were unaware of the occurrence of the task-irrelevant onset and those few subjects who noticed its appearance, often on a small proportion of the trials on which it actually occurred, said that they never looked at it. However, when we made the subjects aware of the task-irrelevant onset, either by making it brighter than the other stimuli or instructing subjects as to its occurrence, and asked them to make sure that they did not look at it, older adults had a much more difficult time complying with instructions than did younger adults (Kramer et al., 2000). That is, older adults had more difficulty than young adults suppressing inappropriate eye movements when asked to exert voluntary control but not when they were unaware of the onset distractor. This may have occurred as a result of the engagement of working memory to retain multiple goals (e.g. move your eyes to the color target while
Capture, Control and Aging
305
ignoring the new distractor object) when subjects were aware of the task irrelevant distractor (De Jong, 2001; Roberts et al., 1994). Given that age is associated with diminished working memory capacity (Salthouse, 1994; Waters & Caplan, 2001), older adults would have greater difficulty maintaining multiple goals, thereby becoming more susceptible to stimulus-driven capture by the onset distractor. Of course, this hypothesis should be tested in future research. In summary, the literature on age differences in the capture of eye movements suggest that while reflexive control of saccades is relatively ageinvariant, voluntary control of eye movements in the presence of task-irrelevant prepotent stimuli is subject to age-related decline. However, it is also apparent from the literature that voluntary control of saccades is subject to substantial individual differences, particularly among older adults (Fischer et al., 1997; Munoz et al., 1998; Olincy et al., 1997). Examination of how these individual differences relate to differences in the performance of tasks which tap different control processes (Kramer et al., 1994) might be useful in explicating the factors which influence agerelated decline in the voluntary control of behavior and cognition. Attentional capture and visual search
The view that we are explicating in this chapter is that attentional capture can be considered a manifestation of attentional control. That is, attentional control in many circumstances mediates efficient, preferential selection of stimulus qualities that are consistent with current goals and expectations. However, this same selection can detract from performance if it is maintained when no longer useful, as when either stimuli or goals change. From this perspective, there are phenomena in visual search that fall within the domain of attentional control and capture. As such, although research that directly and explicitly examines aging and attentional capture in search is just beginning to appear, other findings can be brought to bear on the issue. Since Rabbitt's (1965) demonstration that older adults had difficulty in card-sorting, there have been many investigations of age-related differences in visual search. In broad terms, this literature concludes that age effects are trivial in feature and conjunction search when target-distractor similarity is low (Humphrey & Kramer, 1997; Kramer, Martin-Emerson, Larish & Andersen, 1996; Plude & Doussard-Roosevelt, 1989; Scialfa, Esau & Joffe, 1998; Scialfa & Joffe, 1997; Scialfa, Thomas & Joffe, 1994). In contrast age differences can be substantial when target-distractor similarity is increased, as in difficult feature or conjunction search (Humphrey & Kramer, 1997; Plude & Doussard-Roosevelt, 1989; Scialfa et al., 1998; Scialfa & Joffe, 1997). In addition, several micro-longitudinal studies of the development of visual search skill indicate that search proficiency increases at about the same rate for older and younger adults (Anandam & Scialfa, 1997; Kramer et al., 1996; Madden & Nebes, 1980; Salthouse & Somberg, 1982; Scialfa et al., 2000; Ho & Scialfa, submitted).
306
Kramer, Scialfa, Peterson and Irwin
Efficient search is subserved by the preferential processing of targets relative to distractors, and so these findings are relevant to the topic of age differences in attentional control. However, because efficient selection in search is intentional, it may not fall under the heading of attentional capture. On the other hand, there are times when the objects that are search targets for sustained periods suddenly become distractors and vice versa. In these "reversal" conditions, there are costs associated with the allocation of attention to the old target items and yet there is often an involuntary continuation of this disruptive selection (Shiffrin & Dumais, 1991). The linkage between disruption at reversal and attentional capture is made explicit in strength-theoretic models of skill (Schneider, 1985). Under this view, disruption occurs because the attention-attraction strength of targets relative to distractors is so great that the targets draw attention involuntarily. Anandam and Scialfa (1999) examined age differences in feature search for an oriented "Y" (taken from Enns, 1989) embedded in "Y"s with a 180 degree orientation difference. After approximately 2800 consistently-mapping (CM) trials, observers underwent a full reversal and searched for the former distractor in 1, 3, or 7 former targets. Disruption was substantial, increased with display size, particularly on target-absent trials, and brought search performance back to pre-training levels. That this disruption was the result of attentional capture is supported by the observation that there was no evidence of disruption after an equivalent amount of training on a varied-mapping (VM) task where the target and distractor changed roles from training block to training block. Under these conditions, attention attraction strength would not accrue to any item and so no item would be expected to capture attention. Two age effects are worthy of note: Compared to their younger counterparts, the elderly showed less evidence of disruption at reversal. By itself, this observation would suggest that the elderly had not developed an automatic response to the target and so it did not evoke capture at reversal. This conclusion must be qualified, however, because when the analysis was restricted to targets occurring near the central regions of the display, younger and older people exhibited the same, substantial disruption. Thus, older adults demonstrated the same amount of attentional capture as the young adults, but also exhibited a reduced useful field of view (Ball, Beard, Roenker, Miller, & Griggs, 1988; Scialfa et al., 1994) that places limits on the spatial extent over which capture operates. In a more recent study, Scialfa, Jenkins, Hamaluk and Skaloud (2000) compared younger and older adults in the development of automaticity in conjunction search. Observers were trained to look for targets defined by their orientation and contrast polarity (e.g., a black, right target in white, right and black, left distractors). In Experiment 2, both RTs and eye movements were used to index the change in performance with CM practice and reversal. At reversal, the typical disruption was observed in that RTs increased, particularly for larger displays. The same effect was observed in the number of fixations prior to a correct response, indicating that capture has an oculomotor component in search (see also Irwin et al., 2000; Kramer et al., 1999, 2000; Theeuwes et al., 1998). As well, closer
Capture, Control and Aging
307
examination of the objects on which fixations landed indicated that at reversal, there was a tendency to fixate the former target. Importantly, older adults showed these effects to the same degree as the young. Again, there is no evidence for an age difference in attentional capture. Fisk, Rogers and their colleagues have examined age differences in the development of automaticity in a variety of tasks. Included in this program is work on visual search, memory search and also, semantic category search, in which observers search for exemplars of target categories that are embedded in exemplars of distractor categories. The above-mentioned work on fairly traditional visual search tasks prompts the expectation of minimal age differences in performance. In fact, quite a different picture emerges. Fisk et al. (1990) compared younger and older adults in digit-letter search under CM and VM training, which alternated across trial blocks. Memory set size varied between one and four items and display size was fixed at two items. After CM training, memory set size slopes approached zero for younger adults but remained at 16 ms/item for the older observers. At transfer, only the young group showed significant disruption, presumably because attentional capture was operating only in that group. In Rogers and Fisk (1991), older and younger people were compared in consistent-mapping, varied-mapping, and attenuated priority learning where associative learning could occur but priority learning was minimized. Transfer followed all training conditions with letter and semantic category search. Age differences in consistent-mapping training were greater than in attenuated priority training, suggesting that older adults had difficulty with the priority learning that underlies the development of an automatic attention response to the target. These results were corroborated at transfer, because the young exhibited more disruption than their older counterparts. This finding is consistent with the view that younger adults automatize responses to the target and, in consequence, show greater attentional capture. Rogers (1992) and Rogers et al. (1994) gave observers of varying ages a semantic category search task with a memory set of one category and display sizes ranging from one to four. CM and VM blocks alternated throughout the sessions. Relative to the older observers, younger adults showed greater CM improvement and greater disruption at CM reversal. They interpreted this finding to indicate that younger adults had learned to attend better to the CM target and inhibit processing of distractor items, with the result that performance was more adversely affected when this priority learning was no longer appropriate. For the most part, studies of aging and skill acquisition have focused on the costs associated with automatic responses that are inappropriate when targets and distractors are reversed. There has been only one recent study of aging and the positive transfer of automatized skill. Fisk et al. (1997) compared younger and older observers in a semantic category visual search task. Participants were given CM training followed by one reversal session consisting of three different conditions. In the trained/trained (T/T) condition, observers searched displays containing the same
308
Kramer, Scialfa, Peterson and Irwin
target words used in training. In the untrained target/trained category condition (U/T), the target semantic category remained unchanged but different exemplars were used. In the untrained target/untrained category (U/U) condition, new exemplars in new categories were used. Younger adults demonstrated positive transfer in the U/T condition, presumably because attention was captured by exemplars of the previously trained categories. Older adults did not show as much benefit (capture), consistent with the view that they did not automatize their responses to the target category. Thus, in several studies of practice-based changes in search performance, age deficits have been observed in the disruption that follows reversal of CM targets and distractors. These findings have been taken as evidence that older adults do not develop an automatic attention response to trained CM targets or do not exhibit attentional capture when these items become distractors. This conclusion is at odds with the more traditional visual search and aging literature, in that the latter indicates that older adults demonstrate as much capture as the young. A synthesis of these apparently contradictory views may be approached by understanding the cognitive differences between visual and semantic category search. The memory component of semantic category search is much larger than in visual search tasks. Because older adults are known to have difficulties in episodic encoding and retrieval (cf Kausler, 1994), they may be deficient in the development of automatic attention responses in semantic category search and, as a result, demonstrate less capture when these automatized responses are no longer useful. Second, though not an inherent property of semantic category search, the protocol employed by Fisk, Rogers and their colleagues generally requires observers to perform several cognitively heterogeneous tasks in inter-laced fashion. It is possible that older adults perform less well in these conditions because they have difficulty with task-switching (Bailey & Lauber, 1998; Kramer, Hahn, & Gopher, 1999; Kray & Lindenberger, 2000) that compromise the development of automaticity and thus the attentional capture that results. Attentional capture and focused attention
There is another general class of attentional phenomena wherein attentional control is required and attentional capture may be made manifest. These often come under the heading of focused attention tasks. They include the flanker task (Eriksen & Eriksen, 1974) and the Stroop task (MacLeod, 1991). In the flanker task, sometimes called non-search detection, observers typically decide which of several stimuli have been presented centrally while ignoring the stimuli that are presented in their immediate vicinity. Attentional capture, involuntary processing of stimuli that are spatially close to the central target, is seen as benefits when the flanking information is consistent with the target response but perhaps more clearly as costs associated with incompatible flankers. Capture may also be evidenced when, relative to a no-flanker control condition, any flanking letters produce performance decrements.
Capture, Control and Aging
309
In one of the first aging studies of the flanker effect, Wright and Elias (1979, Experiment 1) compared older and younger adult groups in identification of a central letter presented alone or flanked by neutral noise letters. Both age groups had longer RTs when the target was flanked by noise letters, suggesting some obligatory processing of them, but the older adults were not slowed disproportionately in this condition. In a second experiment, Wright and Elias (1979) compared younger and older adults in a more common variant of the flanker task, in which identification of a central letter occurs in no-noise, compatible-noise, and incompatible-noise conditions. For both groups, incompatible noise resulted in longer latencies, but the difference between the incompatible-noise and compatible-noise trials was 22.5 ms for the young and 11.6 for the elderly. Thus, the elderly showed less evidence of capture. These findings must be qualified, however, because the elderly have a reduced useful field of view (Ball et al., 1988; Scialfa et al., 1994). As such, they may show smaller costs from the more eccentric flankers. In fact, Cerella (1985) found that age deficits in flanker effects can be large when item separation is small but actually reverse when item separation is larger. Contrary to Cerella's hypothesis, Kramer et al. (1994) found age-equivalent flanker effects with closely spaced targets and distractors (see also Madden & Gottlob, 1997). However, their subjects were well practiced in the task, unlike previous flanker studies that have examined aging effects. Although not normally considered to involve the flanker task, two additional investigations bear mention here. Madden (1983) and Nissen and Corkin (1985) carried out aging studies of the benefit from advance spatial information for visual search. For example, Madden (1983) presented observers with four-item letter displays. On some trials, a two-sided arrow indicated the possible locations of the impending target letter. If the non-cued items fail to capture or vie for attention, then there should be benefits from the advance cues. In fact, older adults showed more benefit than did the young. Nissen and Corkin reported similar results (but see Plude and Hoyer, 1986). Results from a more recent study suggest that older adults might avoid distractor effects under such conditions by employing a narrower focus of attention than younger adults (Madden & Gottlob, 1997). The Stroop color-word task (cf, MacLeod, 1991) has been an archetypal paradigm for the study of focused attention. Interference produced in the condition where the word is not the same as the ink's color reflects an obligatory processing of semantic information and thus is appropriately considered under the category of attentional capture. Typically, older adults exhibit greater Stroop interference (Dulaney & Rogers, 1994; Hartley, 1993; Houx, Jolles, & Vreeling, 1993; Klein, Ponds, Houx, & Jolles, 1997). While this might be seen as evidence of greater susceptibility to capture amongst the elderly, the conclusion must be tempered by at least three considerations. First, several analyses suggest that age differences in Stroop interference are eliminated once generalized slowing is controlled (Salthouse & Meinz, 1995;
310
Kramer, Scialfa, Peterson and Irwin
Verhaeghen & De Meersman, 1998; but see Spieler et al., 1996). Second, Hartley (1993) demonstrated that when the color and the word are separated spatially, older and younger observers show equivalent Stroop effects. This might be taken to mean that older adults show greater attentional capture by semantic content but only when spatial filtering does not allow for the segregation of color and semantic information. Finally, it is surprising that there is often scant attention given by researchers to normal age-related changes in color vision. Color naming conditions will, perhaps, reveal gross deficits in color perception but, even in the absence of pathology, opacification of the lens (Weale, 1986), will render short-wavelengths less intense and can change the speed with which that color information is transduced and communicated to visual cortex and beyond. Thus, greater Stroop interference among the elderly may well be attributed to non-attentional factors.
Summary and Conclusion The literature discussed above provides a relatively broad and complex view of aging and attentional control given the potentially different forms of capture that have been studied (e.g. capture engendered by training versus capture which occurs with no training such as that observed for onset distractors), theoretical issues that have been pursued and paradigms that have been employed in the studies. It is also the case that the populations of young and old adults employed in studies of attentional control and capture differ in terms of age ranges, health, and overall intellectual functioning. Given such heterogeneity in important aspects of the studies one might wonder whether a set of coherent conclusions can be formulated on the basis of this literature. We believe that the answer is affirmative. There is increasing evidence that sudden onsets or new objects which appear in the environment have a greater impact on older than younger adults (Juola et al., 2000; Lincourt et al., 1997; Pratt & Bellomo, 1999). That is, such features of the environment appear to be more likely to redirect older adults spatial attention, even when these stimuli are clearly irrelevant and indeed harmful to the task at hand, than the spatial attention of younger adults. Whether such age-related deficiencies can be reduced with training or increased preparation time is an important question for future research. The study of overt attention or eye movements seems to provide a similar picture of age-related changes in control and capture to that of the covert cueing literature. In situations in which subjects receive little practice or perform eye movement tasks in concert with other tasks (e.g. with a target discrimination task at the intended location of the eye movement) older adults have more difficulty suppressing inappropriate eye movements than young adults (Butler et al., 1999; Nieuwenhuis et al., 2000). On the other hand, older adults eye movement behavior, even in the presence of prepotent stimuli such as onsets or new but task-irrelevant objects, is similar to that of younger adults when voluntary control is not expressed (Kramer et al., 1999, 2000). Such a pattern of results may either suggest that (a) different varieties of inhibitory processes are associated with reflexive and voluntary
Capture, Control and Aging
311
saccadic control- and that inhibitory processes associated with voluntary control are more sensitive to aging or (b) that older adults have more difficulty than younger adults in maintaining the appropriate task goals in order to overcome capture of the eyes by the onset stimulus. Indeed, it is certainly conceivable that inhibitory failure and goal neglect might both be implicated in age-related difficulties in avoiding covert and overt capture of attention. Future studies will be necessary to distinguish between the contribution of these mechanisms to age-related differences in covert and overt control and capture. At first glance the research on attentional capture within the context of visual search appears to be somewhat perplexing. Some studies of training based capture (i.e. training with specific targets and distractors for 1000's of trials before reversing the role of the targets and distractors) have found similar effects for both old and young adults (Anandam & Scialfa, 1999; Scialfa et al., 2000) while other studies have failed to find the development of automaticity, and therefore capture, for older adults (Fisk et al., 1997; Rogers & Fisk, 1991). However, a potentially important difference between these studies is the role of memory and task coordination. For example, the search studies conducted by Fisk, Rogers and colleagues have, for the most part, employed hybrid memory-visual search paradigms in which subjects are required to search for a number of different targets in a display of targets and distractors. Furthermore, different search tasks have often been intermixed in their studies (e.g. searching for one set of targets on one block of trials and then searching for another set of targets on the next block of trials). On the other hand, Scialfa and colleagues have employed more traditional visual search paradigms which entail searching for a single target in a display of a multitude of distractors. Thus, it seems reasonable to speculate that goal maintenance and updating and the efficiency of inhibition might be more of a concern in the Fisk and Rogers studies, thereby diminishing the training effects on the development of automaticity (and capture - with target/distractor reversal) for the older adults. One way to examine this hypothesis would be to systematically vary the memory and task switching demands (see Kramer et al., 1999; Meiran et al., 2001) of a search task in the study of the acquisition of automaticity or training-based capture. Our expectation is that increasing memory load and switching demands would serve to systematically diminish older adults ability to automatize target detection, thereby reducing capture effects when targets and distractors are reversed. Focused attention studies of attentional control and capture present a relatively straightforward picture of the influence of age on attentional control and capture. Research with both the flanker and Stroop paradigm suggest that older adults are capable of restricting their spatial attention to successfully ignore taskirrelevant (or harmful) distractors (Hartley, 1993; Kramer et al., 1994; Madden & Gottlob, 1997). However, there is also evidence, in the Stroop paradigm, that older adults have more difficulty ignoring task-irrelevant information when the restriction of spatial attention is not a viable strategy (i.e. when the task-relevant and taskirrelevant information is integrated in a single object; Dulaney & Rogers, 1994; Hartley, 1993; Klein et al., 1997). Thus, it would appear that capture of attention by
312
Kramer, Scialfa, Petersonand Irwin
prepotent semantic information (i.e. well known color words), like the capture of attention by onsets and new objects, increases as a function of age. Whether similar mechanisms underlie capture of attention by these different varieties of stimuli is an interesting topic for future research. Thus far, the research that we have discussed has concerned the behavioral study of age-related changes in attentional control and capture. However, the burgeoning field of neuroscience and neuroimaging provides another means to explore age-related changes in attentional control. At present there has been little research on changes in age-related differences in functional brain activity which underlie aspects of attention and memory (but see Cabeza, 2000; Grady, 2000 for recent reviews of this growing literature). However, the neuroimaging research that has been conducted, employing Positron Emission Tomography (PET) and functional Magnetic Resonance Imaging (fMRI), on age-related differences in brain activation has produced some intriguing findings that lead to a set of tentative but potentially important conclusions. First, a number of neuroimaging studies have found that older adults often show less activation than younger adults, in a variety of brain regions, across a variety of memory and attention tasks. Second, studies have reported that older adults often recruit either different regions of cortex or additional regions of cortex as compared to young adults performing the same task. For example, Madden et al. (1997) found that younger adults showed greater activation in the occipitotemporal pathway than older adults while performing a divided attention task. On the other hand older adults showed greater activation in prefrontal regions than did younger adults. Madden and colleagues interpreted these findings as evidence of age-related differences in the forms of control used to perform the tasks, with younger adults relying primarily on letter identification processes and older adults primarily relying on executive control processes supported by prefrontal regions. Other researchers have come to similar conclusions concerning age-related shifts in control strategies in attention and memory tasks on the basis of changes in brain activation patterns (Buckner & Logan, in press; Reuter-Lorenz et al., 2000). Thus, techniques such as PET, fMRI (D'Esposito et al., 1999) and optical imaging (Gratton & Fabiani, 1998), in conjunction with more traditional behavioral measures, offer the promise of enhancing our understanding of age-related changes in the processes which underlie attentional capture and control as well as providing an explication of how such processes are implemented in the brain. References
Anandam, B. T. & Scialfa, C. T. (1999). Aging and the development of automaticity in feature search. Aging, Neuropsychology, and Cognition, 6, 1 1 7 140. Ardilla, A. & Rosselli, M. (1989). Neuropsychological characteristics of normal aging. Developmental Neuropsychology, 5, 307-320.
Capture, ControlandAging
313
Atchley, P., Kramer, A.F. & Hillstrom, A. (2000). Contingent capture for onsets and offsets: Attentional set for perceptual transients. Journal of Experimental Psychology: Human Perception and Performance, 26, 595-606. Azari, N.P., Rapport, S.I., Salerno, J.A., Grady, C.L., Gonzales-Aviles, A., Schapiro, M.B. & Horwitz, B. (1992). Intergenerational correlations of resting cerebral glucose metabolism in old and young women. Brain Research, 552, 556559. Bacon, W.F. & Egeth, H.E. (1994). Overriding stimulus-driven attentional capture. Perception & Psychophysics, 55, 485-496. Bailey, A. & Lauber, E. J. (1998). Learning to task switch and aging. Paper presented at the meeting of the 1998 Cognitive Aging Conference in Atlanta, GA. Ball, K., Beard, B., Roenker, D., Miller, R., & Griggs, D. (1988). Age and visual search: Expanding the useful field of view. Journal of the Optical Society of America, 5, 2210-2219. Birren, J.E. (1965). Age changes in the speed of behavior: Its central nature and physiological correlates. In A.T. Welford & J.E. Birren (Eds.), Behavior, aging and the nervous system, (pp. 114-216). Springfield, IL: Charles C. Thomas. Birren, J.E. & Schroots, J.J. (1996). History, concepts, and theory in the psychology of aging. In J.E. Bireen & K.W. Schaie (Eds.), Handbook of the Psychology of Aging. (pp. 1-23). Sand Diego, CA: Academic Press. Buckner, R.L. & Logan, J.M. (in press). Frontal contributions to episodic memory encoding in the young and elderly. In A.E. Parker, E.L. Wilding & T. Bussey, T. (Eds.), The cognitive neuroscience of memory encoding and retrieval. Philadelphia: Psychology Press. Burke, D.M. (1997). Language, aging, and inhibitory deficits: Evaluation of a theory. Journal of Gerontology: Psychological Sciences, 6, 254-264. Butler, K.M., Zacks, R.T., & Henderson, J.M. (1999). Suppression of reflexive saccades in younger and older adults: Age comparisons on an antisaccade task. Memory and Cognition, 27, 584-591. Cabeza, R. (in press). Functional neuroimaging of cognitive aging. In R. Cabeza & A. Kingstone, (Eds), Handbook of Functional Neuroimaging of Cognition. Cambridge, MA: MIT Press. Cerella, J. (1985). Information processing rates in the elderly. Psychological Bulletin, 98, 67-83. Cheal, M.L. & Lyon, R.D. (1991). Central and peripheral precueing of forced choice discrimination. Quarterly Journal of Experimental Psychology, 43A, 859-880. Coffey, C.E., Wilkinson, W.F., Parashos, I.A., Soady, A.A.R., Sullivan, R.J., Paterson, L.J., Figiel, G.S., Webb, M.C., Spritzer, C.E., & Djang, W.T. (1992). Quantitative cerebral anatomy of the aging human brain: A cross-sectional study using magnetic resonance imaging. Neurology, 42, 527-536. Corbetta, M., Miezin, F. M., Dobmeyer, S., Shulman, G. L., & Petersen, S. E. (1991). Selective and divided attention during visual discrimination of shape,
314
Kramer, Scialfa, Peterson and Irwin
color, and speed: Functional anatomy by positron emission tomography. Journal of
Neuroscience, 11, 2383-2402. Corbetta, M. (1998). Frontoparietal cortical networks for directing attention and the eye to visual locations: Identical, independent, or overlapping neural systems. Proceedings of the National Academy of Science, 95, 831-838. Daigneault, S., Braun, C. & Whitaker, H. (1992). Early effects of normal aging on perseverative and non-perseverative prefrontal measures. Developmental Neuropsychology, 8, 99-114. D'Esposito, M., Detre, J., Alsop, D., Shin, R., Atlas, S. & Grossman, M. (1995). The neural basis of the central executive system of working memory. Nature, 378, 279-281. D'Esposito, M., Zarahn, E. & Aguirre, G.K. (1999). Event-related functional MRI: Implications for Cognitive Psychology. Psychological Bulletin, 125, 155-164. De Jong, R. (2001). Adult age differences in goal activation and goal maintenance. European Journal of Cognitive Psychology, 13, 71-89. De Jong, R., Berendsen, E. & Cools, R. (1999). Goal neglect and inhibitory limitations: dissociable causes of interference effects in conflict situations. Acta Psychologica, 101, 379-394. Dempster, F.N. (1992). The rise and fall of the inhibitory mechanism: Toward a unified theory of cognitive development and aging. Developmental Review, 12, 45-75. Deubel, H. & Schneider, W.X. (1996). Saccade target selection and object recognition: Evidence for a common attentional mechanism. Vision Research, 6, 1827-1837. Dulaney, C. & Rogers, W.A. (1994). Mechanisms underlying reduction in Stroop interference with practice for young and old adults. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 470-484. Duncan, J. (1995). Attention, intelligence, and the frontal lobes. In M.S. Gazzaniga (Ed.), The Cognitive Neurosciences. (pp. 721-733). Cambridge, MA: MIT Press. Egeth, H.E. & Yantis, S. (1997). Visual attention: Control, representation, and time course. Annual Review of Psychology, 48, 269-297. Engle, R.W., Conway, A.R., Tuholski, S.W. & Shisler, R.J. (1995). A resource account of inhibition. Psychological Science, 6, 122-125. Enns, J. T. (1989). Three-dimensional features that pop out in visual search. In D. Brogan (Ed.), Visual search (pp. 37-45) London: Taylor & Francis. Eriksen, B. A., & Eriksen, C.W. (1974). Effects of noise letters upon the identification of a target letter in a nonsearch task. Perception & Psychophysics, 16, 143-149. Eriksen, C.W. & Yeh, Y.Y. (1985). Allocation of attention in the visual field. Journal of Experimental Psychology: Human Perception and Performance, 11, 583-597.
Capture, Control and Aging
315
Fischer, B., Biscaldi, M. & Gezeck, S. (1998). On the development of voluntary and reflexive components in human saccade generation. Brain Research, 754, 285-297. Fisk, A.D., Rogers, W.A., & Giambra, L.M. (1990) Consistent and varied memory/visual search: Is there an interaction between age and response-set effects? Journal of Gerontology: Psychological Sciences, 45, P81-P87. Fisk, A. D., Hertzog, C., Lee, M. D., Rogers, W. A., & Anderson-Garlach, M. (1994). Long-term retention of skilled visual search: Do young adults retain more than old adults? Psychology and Aging, 9, 206 - 215. Fisk, A. D. & Rogers, W. A. (1991). Toward an understanding of agerelated memory and visual search effects. Journal of Experimental Psychology: General, 120, 131-149. Fisk, A. D., Rogers, W. A., Cooper, B. P. & Gilbert, D. K. (1997). Automatic category search and its transfer: Aging, type of search, and level of learning. Journals of Gerontology Series B-Psychological Sciences & Social Sciences, 52B, 91 - 102. Folk, C.L. & Hoyer, W.J. (1992). Aging and shifts of visual spatial attention. Psychology and Aging, 7, 453-465. Folk, C.L. & Remington, R.W. (1998). Selectivity in distraction by irrelevant featural singletons: Evidence for two forms of attentional capture.
Journal of Experimental Psychology: Human Perception and Performance, 24, 112. Folk, C.L., Remington, R.W. & Johnston, J.C. (1992). Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 18, 1030-1044. Folk, C.L., Remington, R.W. & Johnston, J.C. (1993). Contingent attentional capture: A reply to Yantis. Journal of Experimental Psychology: Human Perception and Performance, 19, 682-685. Foster, J.K., Behrmann, M. & Stuss, D.T. (1995). Aging and visual search: Generalized cognitive slowing or selective deficit in attention? Aging and Cognition, 2, 279-299. Gibson, B.S. & Kelsey, E.M. (1998). Stimulus driven attentional capture is contingent on attentional set for display wide visual features. Journal of Experimental Psychology: Human Perception and Performance, 24, 699-706. Grady, C.L. (2000). Functional brain imaging and age-related changes in cognition. Biological Psychology, 54, 259-281. Gratton, G. & Fabiani, M. (1998). Dynamic brain imaging: Event-related optical signal (EROS) measures of the time course and localization of cognitiverelated activity. Psychonomic Bulletin and Review, 5, 535-563. Greenwood, P.M. & Parasuraman, R. (1994). Attentional disengagement deficit in nondemented elderly over 75 years of age. Aging and Cognition, 1, 188202.
316
Kramer, Scialfa, Peterson and Irwin
Guitton, D., Buchtel, H.A., & Douglas, R.M. (1985) Frontal lobe lesions in man cause difficulties in suppressing reflexive glances and in generating goaldirected saccades. Experimental Brain Research, 58, 455-472. Hallett, P.E. (1978). Primary and secondary saccades to goals defined by instructions. Vision Research, 18, 1279-1296. Hartley, A. A. (1993). Evidence for the selective preservation of spatial selective attention in old age. Psychology and Aging, 3, 371-379. Hartley, A.A., Kieley, J.M. & Slabach, E.H. (1990). Age differences and similarities in the effects of cues and prompts. Journal of Experimental Psychology: Human Perception and Performance, 16, 523-537. Hasher, L., Stoltzfus, E.R., Zacks, R. & Rypma, B. (1991). Age and inhibition. Journal of Experimental Psychology: Learning, Memory and Cognition, 17, 163-169. Hasher, L., & Zacks, R. (1988). Working memory, comprehension, and aging: A review and a new view. In G.K. Bower (Ed.), The psychology of learning and motivation. (Vol. 22, pgs. 193-225). San Diego, CA: Academic Press. Henderson, J.M. & Hollingworth, A. (1999). The role of fixation position in detecting scene changes across saccades. Psychological Science, 1O, 438-443. Hillstrom, A.P. & Yantis, S. (1994). Visual motion and attentional capture. Perception & Psychophysics, 55, 399-411. Ho, G., & Scialfa, C.T. (submitted) Age, Skill Transfer, and Conjunction Search. Kausler, D.H. (1994). Learning and memory in normal aging. New York: Academic Press. Hoffman, J.E., & Subramaniam, B. (1995).The role of visual attention in saccadic eye movements. Perception & Psychophysics, 57, 787-795. Houx, P., Jolles, J., & Vreeling, F. (1993). Stroop interference: Aging effects assessed with Stroop color-word test. Experimental Aging Research, 19, 209-224. Humphrey, D.G. & Kramer, A.F. (1997). Age differences in visual search for feature, conjunction, and triple-conjunction targets. Psychology and Aging, 12, 704-717. Irwin, D.E., Brockmole, J. & Kramer, A.F. (submitted). Attention precedes involuntary saccades. Irwin, D.E., Colcombe, A.M., Kramer, A.F. & Hahn, S. (2000). Attentional and oculomotor capture by onset, luminance, and color singletons. Vision Research, 40, 1443-1458. James, W. (1980). The principles of psychology. NY: Henry Holt & Company. Jenkins, L., Myerson, L., Joerding, A. & Hale, S. (2000). Converging evidence that visuospatial cognition is more age-sensitive than verbal cognition. Psychology andAging, 15, 157-175.
Capture, ControlandAging
317
Jonides, J. (1981). Voluntary vs. automatic control over the mind's eye's movement. In J.B. Long and A.D. Baddeley (Eds.), Attention and Performance IX (pp. 187-203). Hillsdale, NJ: Erlbaum. Jonides, J. & Yantis, S. (1988). Uniqueness of abrupt visual onset in capturing attention. Perception & Psychophysics, 43, 346-354. Juola, J.F., Koshino, H., Wamer, C.B., McMickell, M. & Peterson, M. (2000). Automatic and voluntary control of attention in young and elderly adults. American Journal of Psychology, 113, 159-178. Kane, M.J., Hasher, L., Stoltzfus, E.R., Zacks, R.T. and Connelly, S.L. (1994). Inhibitory attentional mechanisms and aging. Psychology and Aging, 9, 103-112. Kieley, J.M. & Hartley, A.A. (1997). Age-related equivalence of identity suppression in the Stroop color-word task. Psychology and Aging, 12, 22-29. Klein, M., Ponds, R.W.H.M, Houx, P.J, & Jolles, J. (1997). Effect of test duration on age-related differences in Stroop interference. Journal of Clinical and Experimental Neuropsychology, 19, 77-82. Korteling, J. (1991). Effects of skill integration and perceptual competition on age-related differences in dual-task performance. Human Factors, 33, 35-44. Kowler, E., Anderson, E., Dosher, B. & Blaser, E. (1995). The role of attention in the programming saccades. Vision Research, 35, 1897-1916. Kramer, A.F., Hahn, S. & Gopher, D. (1999). Task coordination and aging: Explorations of executive control processes in the task switching paradigm. Acta Psychologica, 101, 339-378. Kramer, A.F., Hahn, S., Irwin, D.E. & Theeuwes, J. (1999). Attentional capture and aging: Implications for visual search performancer and oculomotor control. Psychology and Aging, 14, 135-154. Kramer, A.F., Hahn, S., Irwin, D.E. & Theeuwes, J. (2000). Age differences in the control of looking behavior: Do you know where your eyes have been? Psychological Science, 11, 210-216. Kramer, A.F., Humphrey, D.G., Larish, J.F., Logan, G.D., & Strayer, D.L. (1994). Aging and inhibition: Beyond a unitary view of inhibitory processing in attention. Psychology and Aging, 9, 491-512. Kramer, A.F., Larish, J., Weber, T., & Bardell, L. (1999 c). Training for executive control: Task coordination strategies and aging. In D. Gopher & A. Koriat (Eds.), Attention and Performance XVII. Cambridge, MA. MIT Press. Kramer, A.F., Martin-Emerson, R., Larish, J. & Andersen, G.J. (1996). Aging and filtering by movement in visual search. Journal of Gerontology: Psychological Sciences, 51, 201-216. Kray, J. & Lindenberger, U. (2000). Adult age differences in task switching. Psychology and Aging, 15, 126-147. Kwong, See, S.T. & Ryan, E.B. (1995). Cognitive mediation of adult age differences in language performance. Psychology and Aging, 1O, 458-468. LaBerge, D. (1995). Attentional processing. Cambridge, MA: Harvard University Press.
318
Kramer, Scialfa, Peterson and Irwin
Lawrence, B., Myerson, J. & Hale, S. (1998) Differential decline of verbal and visuospatial processing across the adult lifespan. Neuropsychology and Cognition, 5, 129-146. Lincourt, A.E., Folk, C.L. & Hoyer, W.J. (1997). Effects of aging on voluntary and involuntary shifts of attention. Aging, Neuropsychology and Cognition, 4, 290-303. Madden, D.J. (1983). Aging and distraction from highly familiar stimuli during visual search. Developmental Psychology, 19, 499-507. Madden, D.J. (1990). Adult age differences in the time course of visual attention. Journal of Gerontology: Psychological Sciences, 45, 9-16. Madden, D.J. & Gottlob, L.R. (1997). Adult age differences in strategic and dynamic components of focusing visual attention. Aging, Neuropsychology and Cognition, 4, 185-210. Madden, D. J. & Nebes, R. D. (1980). Aging and the development of automaticity in visual search. Developmental Psychology, 16, 377-384. Madden, D.J., Turkington, T.G., Provenzale, J.M, Hawk, T.C., Hoffman, J.M. & Coleman, R.E. (1997). Selective and divided visual attention: Age related changes in regional cerebral blood flow measured by H2 150. Human Brain Mapping, 5, 389-409. Martin-Emerson, R. and Kramer, A.F. (1997). Offset transients modulate attentional capture by sudden onsets. Perception & Psychophysics, 59, 739-751. May, C.P., Kane, M.J. & Hasher, L. (1995). Determinants of negative priming. Psychological Bulletin, 118, 35-54. Mayr, U. & Kliegl, R. (1993). Sequential and coordinative complexity: Age-based processing limitations in figural transformations. Journal of Experimental Psychology: Learning, Memory and Cognition, 19, 1297-1320. Mayr, U. & Liebscher, T. (1998). Poster presented at the 28 th Attention Performance Conference. June, Cumberland, England. McDowd, J.M. (1997). Inhibition in attention and aging. Journal of Gerontology: Psychological Sciences, 52, 265-273. McDowd, J.M., Oseas-Kreger, D.M., & Filion, D.L. (1995). Inhibitory processes in cognition and aging (pgs. 363-400). In F.N. Dempster and C.J. Brainerd (Eds.), Interference and inhibition in cognition. San Diego, CA: Academic Press. MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109, 163-203. Mead, S. & Fisk, A.D. (1998). Measuring skill acquisition and retention with an ATM simulator: The need for age-specific training. Human Factors, 40, 516-523. Meiran, N., Gotler, A. & Perlman, A. (2001). Old age is associated with a pattern of relatively intact and impaired task-set switching abilities. Journal of Gerontology Series B Psychological Sciences and Social Sciences, 56, 88-102. Moore, C. (1994). Negative priming depends on probe trial conflict: Where has all the inhibition gone? Perception & Psychophysics, 56, 133-144.
Capture, Control and Aging
319
Mfiller, H.J. & Rabbitt, P.M.A. (1989). Reflexive and voluntary orienting of visual attention: Time course of activation and resistance to interruption. Journal of Experimental Psychology: Human Perception and Performance, 15, 315-330. Munoz, D.P., Broughton, J.R., Foldring, J.E. & Armstrong, I.T. (1998). Age-related performance of human subjects in saccadic eye movement tasks. Experimental Brain Research, 121, 391-400. Neill, T. & Valdes, L. (1996). Facilitatory and inhibitory aspects of attention. In A.F. Kramer, M.G.H. Coles and G.D. Logan (Eds.), Converging operations in the study of visual selective attention. Washington, D.C.: APA Press. Nieuwenhuis, S., Ridderinkhof, K.R., de Jong, R., Kok, A. & van der Molen, M.W. (2000). Inhibitory inefficiency and failures of intention activation: Age-related decline in the control of saccadic eye movements. Psychology and Aging, 15, 635-647. Nissen, M.J., & Corkin, S. (1985). Effectiveness of attentional cueing in older and younger adults. Journal of Gerontology, 40, 185-191. Nobre, A.C., Gitelman, D.R., Dias, E.C. & Mesulam, M.M. (2000). Covert visual spatial orienting and saccades: Overlapping neural systems. Neuroimage, 11, 210-216. Olincy, A., Ross, R.G., Young, D.A., & Freedman, R. (1997). Age diminishes performance on an antisaccade eye movement task. Neurobiology of Aging, 18, 483-489. Pashler, H. (1988). Cross-dimensional interaction and texture segregation. Perception & Psychophysics, 43, 307-318. Pfefferbaum, A., Lim, K.O., Zipursky, R.B., Mathalon, D.H., Rosenbloom, M.J., Lane, B., Ha, C.N., & Sullivan, E.V. (1992). Brain gray and white matter volume loss accelerated with aging in chronic alcoholics: A quantitative MR] study. Alcoholism: Clinical and Experimental Research, 16, 1078-1089. Pierrot-Deseilligny, C., Rivaud, S., & Gaymard, B. (1991). Cortical control of reflexive visually-guided saccades. Brain, 114, 1473-1485. Pierrot-Deseilligny, C., Rivaud, S., Gaymard, B., Muff, R., & Vermersch, A.I. (1995). Cortical control of saccades. Annals of Neurology, 37, 557-567. Plude, D. & Doussard-Rossevelt, J. (1989). Aging, selective attention, and feature integration. Psychology and Aging, 4, 98-105. Plude, D.J., & Hoyer, W.J. (1986). Age and the selectivity of visual information processing. Psychology and Aging, 1, 4-10. Posner, M. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3-25. Pratt, J. & Bellomo, C.N. (1999). Attentional capture in younger and older adults. Aging, Neuropsychology and Cognition, 6, 19-31. Rabbitt, P.M.A. (1965). An age decrement in the ability to ignore irrelevant information. Journal of Gerontology, 20, 233-237. Raz, N. (2000). Aging of the brain and its impact on cognitive performance: Integration of structural and functional findings. In F. Craik & T. Salthouse (Eds.), Handbook of aging and cognition. New Jersey: Erlbaum.
320
Kramer,Scialfa,PetersonandIrwin
Remington, R.W., Johnston, J.C., & Yantis, S. (1992). Involuntary attentional capture by abrupt onsets. Perception & Psychophysics, 51,279-290. Reuter-Lorenz, P.A., Jonides, J., Smith, E.E., Hartley, A., Miller, A., Marshuetz, C., & Koeppe, R.A. (2000). Age differences in the frontal lateralization of verbal and spatial working memory as revealed by PET. Journal of Cognitive Neuroscience, 12, 174-187. Rivaud, S., Muri, R.M., Gaymard, B., Vermersch, A.I., & PierrotDeseilligny, C. (1994). Eye movement disorders after frontal eye field lesions in humans. Experimental Brain Research, 102, 110-120. Roberts, R.J., Hager, L.D. & Heron, C. (1994). Prefrontal cognitive processes: Working memory and inhibition in the antisaccade task. Journal of Experimental Psychology: General, 123, 374-393. Rogers, W.A. (1992). Age differences in visual search: Target and distractor leaming. Psychology and Aging, 7, 526-535. Rogers, W.A., & Fisk, A.D. (1991). Are age differences in consistentmapping visual search due to feature leaming or attention training? Psychology and Aging, 6, 542-550. Rogers, W.A., Fisk, A.D., & Hertzog, C. (1994). Do ability-performance relationships differentiate age and practice effects in visual search? Journal of Experimental Psychology: Learning, Memory and Cognition, 20, 710-738. Salmon, E., Marquet, P., Sandzot, B., Degueldre, C., Lemaire, C., & Franck, G. (1991). Decrease of frontal metabolism demonstrated by positron emission tomography in a population of healthy elderly volunteers. Acta Neurologica Belqique, 91, 288-295. Salthouse, T.A. (1996). General and specific speed mediation of adult age differences in memory. Journal of Gerontology: Psychological Sciences, 51, 30-42. Salthouse, T.A. (1994). The aging of working memory. Neuropsychology, 8, 535-543. Salthouse, T.A., & Meinz, E.J. (1995). Aging, inhibition, working memory, and speed. Journal of Gerontology: Psychological Sciences, 50, 297-306. Salthouse, T. A, & Somberg, B. L. (1982). Skilled performance: Effects of adult age and experience on elementary processes. Journal of Experimental Psychology: General, 111, 176-207. Schall, J.D. (1995). Neuronal basis of saccadic target selection. Review in the Neurosciences, 6, 63-85. Scialfa, C. T., Esau, S. P., & Joffe, K. M. (1998). Age, target-distractor similarity, and visual search. Experimental Aging Research, 24, 337-358. Scialfa, C. T., Jenkins, L., Hamaluk, E. & Skaloud, P. (2000). Aging and the development of automaticity in conjunction search. Journal of Gerontology: Psychological Sciences, 55B, P 27-46. Scialfa, C. T. & Joffe, K. M. (1997). Age differences in feature and conjunction search: Implications for theories of visual search and generalized slowing. Aging, Neuropsychology, and Cognition, 4, 1 - 21.
Capture,ControlandAging
3 21
Scialfa, C.T., Thomas, D.M., & Joffe, K.M. (1994). Age differences in the Useful Field of View: An eye movement analysis. Optometry and Vision Science, 71, 1-7. Shaw, T., Mortel, K., Meyer, J., Rogers, R., Hardenberg, J. & Cutaia, M. (1984). Cerebral blood flow changes in benign aging and cerebrovascular disease. Neurology, 34, 855-862. Sheliga, B.M., Craighero, L., Riggio, L., & Rizzolatti, G. (1997). Effects of spatial attention on directional manual and ocular responses. Experimental Brain Research, 114, 339-351. Shiffrin, R.M., & Dumais, S.T. (1981). The development of automatism. In J.R. Anderson (Ed.), Cognitive skills and their acquisition (pp. 111-140). Hillsdale, NJ: Erlbaum. Shimamura, A.P. & Jurica, P.J. (1994). Memory interference effect and aging: Findings from a test of frontal lobe function. Neuropsychology, 8, 408-412. Spieler, D.H., Balota, D.A., & Faust, M.E. (1996). Stroop performance in healthy younger and older adults an in individuals with dementia of the Alzheimer type. Journal of Experimental Psychology: Human Perception and Performance, 22, 461-479. Sullivan, M.P & Faust, M.E. (1993). Evidence for identity inhibition during selective attention in older adults. Psychology and Aging, 8, 589-598. Sullivan, M.P., Faust, M.E., & Balota, D.A. (1995). Identity negative priming in older adults and individuals with dementia of the Alzheimer type. Neuropsychology, 9, 537-555. Theeuwes, J. (1991). Exogenous and endogenous control of attention: The effects of visual onset and offsets. Perception & Psychophysics, 49, 83-90. Theeuwes, J. (1992). Perceptual selectivity for color and form. Perception
& Psychophysics, 51,599-606. Theeuwes, J., Kramer, A.F., Hahn, S. & Irwin, D.E. (1998). Our eyes do not always go where we want them to go: Capture of the eyes by new objects. Psychological Science, 9, 379-385. Theeuwes, J., Kramer, A.F., Hahn, S., Irwin, D.E. & Zelinsky, G.J. (1999). Influence of attentional capture on eye movement control. Journal of Experimental Psychology: Human Perception and Performance, 25, 1595-1608. Tipper, S. (1991). Less attentional selectivity as a result of declining inhibition in older adults. Bulletin of the Psychonomic Society, 29, 45-47. Todd, S. & Kramer, A.F. (1994). Attentional misguidance in visual search. Perception & Psychophysics, 56, 198-210. Tsang, P. (1996). Boundaries of cognitive performance as a function of age and flight performance. International Journal of Aviation Psychology, 6, 359-377. Verhaeghan, P., & De Meersman, L. (1998). Aging and the Stroop effect: A meta-analysis. Psychology and Aging, 13, 435-444. Verhaeghen, P., Kliegl, R. & Mayr, Y. (1997). Sequential and coordinative complexity in time-accuracy functions for mental arithmetic. Psychology and Aging, 12, 555-564.
322
Kramer, Scialfa, Peterson and Irwin
Walker, R., Husain, M., Hodgson, T., Harrison, J. & Kennard, C. (1998). Saccadic eye movement and working memory deficits following damage to the human prefrontal cortex. Neuropsychologica, 36, 1141-1159. Wamer, C.B., Juola, J.F. & Koshino, H. (1990). Voluntary allocation versus automatic capture of attention. Perception & Psychophysics, 48, 243-251. Waters, G.S. & Caplan, D. (2001). Aging, working memory, and on-line syntactic processing in sentence comprehension. Psychology and Aging, 16, 128144. Weale, R.A. (1986). Senescence and color vision. Journal of Gerontology, 41, 635-640. West, R.L. (1996). An application of prefrontal cortex function theory to cognitive aging. Psychological Bulletin, 120, 272-292. Wright, L.L., & Elias, J.W. (1979). Age differences in the effects of perceptual noise. Journal of Gerontology, 34, 704-708. Vakil, E., Manovich, R., Ramati, E., & Blachstein, H. (1996). The Stroop color-word test as a measures of selective attention: Efficiency in the elderly. Developmental Neuropsychology, 12, 313-325. Yantis, S. & Egeth, H. (1999). On the distinction between visual salience and stimulus-driven attentional capture. Journal of Experimental Psychology: Human Perception and Performance, 25, 661-676. Yantis, S. & Jonides, J. (1984). Abrupt visual onsets and selective attention: Evidence from visual search. Journal of Experimental Psychology: Human Perception and Performance, 1O, 601-621. Yantis, S. & Jonides, J. (1990). Abrupt visual onsets and selective attention: Voluntary versus automatic allocation. Journal of Experimental Psychology: Human Perception and Performance, 90, 121-134. Yantis, S. & Hillstrom, A. P. (1994). Stimulus-driven attentional capture: Evidence from equiluminant visual objects. Journal of Experimental Psychology: Human Perception and Performance, 20, 95-107. Zacks, R. & Hasher, L. (1997). Cognitive gerontology and attentional inhibition: A reply to Burke and McDowd. Journal of Gerontology: Psychological Sciences, 52, 274-283.
Acknowledgments This preparation of this chapter was supported by grants from the National Institute on Aging (RO 1 AG 14966) and the Institute for the Study of Aging. We would like to thank Jan Theeuwes and Charles Folk for their helpful comments on an earlier draft of this manuscript.
Part V
Individual Differences
This Page Intentionally Left Blank
Attraction, Distraction, and Action: MultiplePerspectiveson AttentionalCapture C. Folk and B. Gibson(Editors) 9 Elsevier Science B.V. All rights reserved.
13
325
A Multidisciplinary Perspective on Attentional Control Douglas Derryberry and Marjorie A. Reed
Recent years have seen an increasing interest in the higher level processes that influence attention. This trend is particularly evident in behavioral research on attentional capture, where capture by abrupt onsets and new objects appears to depend on top-down modulation by attentional control settings (e.g., Folk & Remington, 1999; Yantis & Egeth, 1999). It is also evident in neuropsychological investigations of the attentional problems that often arise from damage to the frontal cortex (e.g., Grafman, Holyoak & Boller, 1995). The study of such control processes is important to our understanding of attentional functioning, and moreover, to our understanding of the control of cognition in general. This chapter has three general goals. The first goal is to consider the role of motivational processes in regulating attention. Although motivation has received little emphasis in most models of cognition, its relation to attention is straightforward: Ongoing motivational states bias attention in favor of stimuli that are relevant to the current need (Derryberry & Tucker, 1993). Along these lines, we illustrate several examples of what might be referred to as "incentive capture". Attentional orienting is biased by the motivational valence (positive or negative) of potential target locations, as well as by the motivational "value" (low or high) for targets appearing at those locations. These effects may be distinct and involve different mechanisms from the usual forms of capture by abrupt onset or novel targets. The second goal is to consider potential mechanisms underlying motivational effects. Given the assumption that motivational processes exert both reactive and voluntary effects on attention, it is suggested that motivational effects may be mediated through several attentional systems. Specifically, the involuntary influences may bias posterior attentional operations involved in orienting, whereas voluntary effects are mediated via frontal executive operations (Posner & Raichle, 1994). For a person in an anxious state, for example, orienting may be reflexively biased in favor of locations carrying potential threats. Nevertheless, it is possible for the anxious person to voluntarily control this bias, allowing attention to shift to safer locations. The third goal is to consider individual differences, both at the level of motivational processes and attentional control processes. Motivational differences have long been emphasized in the field of personality, with variability related to dimensions such as Extraversion and Anxiety. In addition, recent models suggest
326
Derryberry and Reed
that attentional differences, particularly the capacity for voluntary self-control, are also of importance to personality. Thus, the person's capacity for self control depends upon the strength of reactive motivational tendencies in relation to their more voluntary attentional skills. People with good voluntary control of attention will be better able to regulate more reactive motivational tendencies, thereby enhancing their performance in many situations (Rothbart, Derryberry & Posner, 1994). The chapter begins with a brief review of the relations between motivation and attention and summarizes relevant behavioral and physiological evidence. Next, a temperament perspective is presented, emphasizing individual differences in the underlying motivational and attentional systems and their contribution to personality. Finally, several recent studies that address these processes are described. The studies suggest that attentional orienting, and therefore vulnerability to capture, depends on (1) the motivational valence or value attached to a location, (2) the individual's motivational tendencies related to trait anxiety and extraversion, and (3) the individual's capacity for voluntary attentional control. Relationships Between Motivation and Attention
Most psychological models view motivation as a set of processes that are initiated by current deficits or deviations related to the organism's appetitive and defensive needs. Traditional models assumed that the primary role of these motivational processes is to bias motor and autonomic responses. This led to models suggesting that motives functioned at a relatively high level to potentiate a set of responses (e.g., various forms of approach or avoidance) that might prove useful in satisfying the current need (Gallistel, 1980). Such hierarchical approaches were consistent with emerging knowledge of the brain, which emphasized descending control of the brainstem and spinal cord by motivational projections from the limbic systems. However, more recent models have gone beyond descending response modulation to propose that motivational processes will be more effective if they also modulate incoming sensory information. For example, defensive or security needs promote avoidance behavior, but at the same time, they facilitate the processing of information relevant to avoidance such as environmental sources of danger and safety. This view is supported by recent findings that motivational processes originating within the limbic system exert a variety of ascending influences on perceptual networks within the cortex. Thus, motivational states can be more thoroughly defined as organizing influences that arise from current deficits and function to modulate both perceptual and response pathways in order to link needed goals to adaptive behaviors (Derryberry & Tucker, 1992).
Multidisciplinary Perspectives
3 27
The modulation of perception is carried out through several mechanisms. The most general mechanism involves motivational adjustments of "arousal", thought to be carried out by projections from the limbic regions to the ascending subsystems of the reticular core. For example, Tucker and his colleagues have suggested that anxious states recruit ascending dopaminergic subsystems, which in tum promote a tonic activation of the cortex and a consequent narrowing of attention (Tucker, 1992; Tucker & Williamson, 1984). Such a general narrowing mechanism is supported by findings that anxious individuals show enhanced processing of central as compared to peripheral stimuli (e.g., Hockey, 1979) and local as opposed to global perceptual elements (Derryberry & Reed, 1998). More specific motivational influences may be carried out by relatively automatic pathway activation. Deutsch (1960) suggested that an animal's state of hunger functions by activating a memory representation of a spatial location associated with food. As activation spreads out and dissipates from that location, the animal uses the resulting "motivational gradient" as a navigational pathway for approaching and finding the food. Along similar lines, a defensive motivational state (e.g., fear) might function by directly activating locations related to danger and safety, thereby setting up an effective escape route. Such direct mechanisms are also consistent with anatomical evidence of direct connections from limbic to perceptual pathways. For example, Mesulam (1981) has emphasized interactions between a "motivational map" in the paralimbic cingulate cortex and a "sensory map" in the parietal neocortex. Still other models have suggested that a motivational regulation of perception should prove most flexible if mediated by attentional processes. In the case of hunger, for example, Wise (1987) suggested that hunger functions by increasing the attentional "holding power" of food objects. Rather than increasing the probability that an animal will make contact with a food object, hunger makes it harder for the animal to shift away. Thus, hunger prolongs eating periods without increasing their frequency. During defensive states such as anxiety, Gray proposed that attention is directed to relevant stimuli in the environment to promote an effective assessment of risk and optimal response selection. As discussed in more detail below, motivational effects on attention are supported by anatomical connections between limbic motivational circuitry to attentional networks within the posterior and anterior parts of the cortex. Also supportive are a number of human studies linking motivation and attention. Several studies using the Stroop task have found that when subjects are deprived of food, they are delayed in naming the color of food-related words, with the amount of delay correlating with their subjective report of hunger (Channon & Hayward, 1990). Similarly, studies using the dichotic listening task have found evidence for attentional biases when sexual words are presented, and enhancing the sexual motive through testosterone injections increases the bias favoring sexual
328
Derryberry and Reed
words (Alexander, Swerdloff, Wang, & Davidson, 1997). In addition, studies using the "dot-probe" task have found that when two words are simultaneously presented and followed by a detection target, trait anxious subjects are biased in favor of threatening words. This bias has been shown to increase as the anxious state increases, with students becoming more attentive to threatening words (e.g., test, flunk) immediately before an important exam (MacLeod & Mathews, 1988). While motivation models tend to view attention as a unified system, models from cognitive neuroscience indicate that attention arises from several interacting networks. This opens up the possibility that motivational effects may be exerted through different sets of attentional processes. For example, Posner and his colleagues suggest that attention involves interacting vigilance, posterior orienting, and anterior executive subsystems (Posner & DiGirolamo, 1998; Posner & Raichle, 1994; Posner & Rothbart, 1998a). The "vigilance system" is thought to involve ascending noradrenergic projections from the brainstem, and is thus similar to the relatively general arousal mechanisms described above. Single-cell studies indicate that the source cells of the noradrenergic system (the locus coeruleus) are highly responsive to the motivational significance of stimuli. In Posner's model, the vigilance mechanism functions to facilitate functioning of higher level attentional systems. The "posterior attentional system" involves a network interconnecting the parietal cortex, superior colliculus, and thalamic pulvinar nucleus. Its primary function is that of orienting attention from one location to another, for the most part in a relatively reflexive manner. Performance and neuropsychological data indicate that orienting involves three component operations: disengaging attention from the current location, moving to a new location, and engaging the new location. The engage operation facilitates the flow of information to frontal regions for further attentional and response-related processing. As noted above, motivationally significant objects can appear in different locations, and thus a mechanism for facilitating information in specific locations would seem highly adaptive. While the performance studies mentioned above suggest motivational influence on orienting (e.g., MacLeod & Mathews, 1988), the relevant anatomical connections remain obscure. Connections from limbic regions to the colliculus, pulvinar, and parietal cortex exist, but their roles have not been adequately characterized. The "anterior attentional system" involves frontal circuitry centering upon the anterior cingulate region. This system is thought to serve an "executive" function in regulating the posterior orienting system. In addition, the anterior system functions to inhibit dominant responses, to inhibit dominant conceptual associations, and to aid in the detection and correction of errors (Posner & DiGirolamo, 1998; Posner & Rothbart, 1998a). A similar "supervisory attentional system" involving related functions has been described by Shallice and his colleagues (Stuss, Shallice, Alexander & Picton, 1995). Motivational influences on the high level attentional
Multidisciplinary Perspectives
329
systems are supported by the extensive projections from limbic to frontal regions, which are particularly strong in the case of the "paralimbic" anterior cingulate region. Many anatomists have noted the close connectivity between the limbic and frontal regions, with some viewing the frontal lobe as the cortical representative of the limbic system (Nauta, 1971). Although clear experimental evidence is lacking, motivational influences at the executive level should be highly adaptive. For example, motivational states are often simultaneously active and may result in potential conflicts. Higher level mechanisms for controlling orienting and suppressing dominant responses may help one motive to suppress others and gain control of behavior. In addition, the anterior attentional system is closely interconnected with adjacent cortical regions that appear crucial to volition. Thus, this system may allow motivational systems to function in a more voluntary as opposed to reactive way. It can be seen from this brief overview that motivational effects extend beyond simple behavioral facilitation or increases in effort. Both psychological and physiological perspectives suggest that motivation may also play important roles in regulating attention. Perhaps one of the reasons that motivational processes have been neglected in much of psychology is that they show considerable variability across individuals. As will be seen in the next section, however, such individual differences provide a useful perspective for understanding the underlying mechanisms, as well as for appreciating their importance in every day life. Individual Differences in Motivation and Attention
The topic of individual differences in motivation has been addressed most directly by temperament approaches to personality. Temperament approaches attempt to relate major personality dimensions to individual differences in the reactivity of underlying neural systems. While some earlier models emphasize arousal systems (e.g., Eysenck, 1967), recent approaches have focused on systems related to motivation and emotion (e.g., Depue & Collins, 1999; Gray & McNaughton, 1996). An underlying assumption is that by playing a central guiding role in information processing, motivational processes are also fundamental to personality and its development (Derryberry & Reed, 1996). Although it may some day be possible to identify more specific motivational/personality relations, current approaches emphasize relatively general systems related to appetitive and defensive needs. Examples of appetitive systems include Gray's (1987) "behavioral activation system," Panksepp's (1998) "expectancy system," and Depue and Collins' (1999) "behavioral facilitation system." These systems are generally thought to receive information concerning positive incentives (e.g., signaling reward, non-punishment) by means of cortical projections to the amygdala and hypothalamus. Outputs from the limbic circuits
330
Derryberry and Reed
interact with ascending dopaminergic projections to facilitate the organization of approach behavior within the basal ganglia and frontal cortex. The resulting motivational state features attention to positive incentives, approach or exploratory behavior, and emotional feelings such as hope, relief, and anticipatory eagerness. Individual differences in the reactivity of the appetitive systems are most commonly related to the personality dimension of Extraversion. Appetitive motivation is viewed as increasing in strength as one moves from more introverted to extraverted individuals, though some models suggest that it will be greatest in extraverts who are also high in neuroticism. In contrast, the defensive systems respond to punishing inputs or to threatening signals that predict punishment, and promote avoidant or inhibited behaviors accompanied by fear or anxiety. Relevant information may be transmitted from the thalamus or the cortex, through which it engages limbic circuitry within the hippocampus, amygdala and hypothalamus, and also brainstem circuitry within the periaqueductal gray. Examples of systems responding to immediate threat include Panksepp's (1998) "fear" system and Gray's "fight-flight" system. Gray has also described a "behavioral inhibition system" that responds to predicted threat by inhibiting approach behavior and promoting an attentional pattern aimed at risk assessment (Gray & McNaughton, 1996). In general, the defensive systems are most often related to the general dimension of Neuroticism or more specific traits such as anxiety. Thus, the strength of defensive motivation increases as one moves from individuals low in Neuroticism to those high in Neuroticism, with anxiety perhaps being strongest in the more introverted neurotics. Temperament models emphasizing general appetitive and defensive systems provide a useful framework for understanding personality differences related to Extraversion and Neuroticism. Because the models incorporate motivational effects on behavior, attention, and emotion, predictions can be made regarding the emotional and behavioral differences across personalities. In addition, these models help in understanding clinical disorders arising from different patterns of appetitive and defensive reactivity. For example, clinical anxiety most likely involves particularly strong defensive motivation, whereas depression can be viewed as weak appetitive motivation (often accompanied by strong defensive reactivity). In contrast, impulsive disorders (e.g., anti-social behavior, psychopathy) may arise from strong appetitive motivation and/or weak defensive motivation (Fowles, 1994; Gray, 1994). Furthermore, given the assumption that the underlying motivational systems influence attention, it is possible to better understand the cognitive functioning of these various individuals. Anxious persons, for example, are particularly attentive to threatening information, which may in turn activate dangerous conceptual content and thus promote worrisome, ruminative, and even catastrophic forms of thought.
Multidisciplinary Perspectives
3 31
When such attentional processes are taken into account, however, it can be seen that differences in motivational systems provide only part of the personality picture. Because attentional systems are separable from motivational systems, individuals may also vary in the reactivity or efficiency of their attentional subsystems. If these subsystems are recruited by a motivational system, then the efficiency of the motivational function should depend on the efficiency of the recruited attentional function. In general, individuals with relatively weak voluntary attention may be prone to more inefficient or maladaptive motivation, whereas those with stronger attention should show more successful motivation. For example, successful defense requires that attention is allocated to the environmental sources of safety as well as the current threat. To the extent that an individual's attention allows them to disengage from threat and engage safety, they should be better able to remain in and cope with the stressful situation (e.g., Derryberry & Reed, 1996). Similarly, many appetitive situations require attention both to the reward and the potential obstacles that may block its pursuit. Individuals who cannot disengage from the reward may have difficulty dealing with these sources of frustration, and in the long run may fail to learn from their unsuccessful experiences (e.g., Newman, 1987). Also, in both defensive and appetitive situations, multiple threats or rewards may be present that vary in incentive value. If the person can flexibly shift attention in order to assess the relative importance of these incentives, they should be better able to select the one that is most rewarding or most dangerous. Such ideas have led theorists to propose that individual differences in attention may be pivotal to effective motivational functioning. Rothbart and her colleagues have proposed that "effortful control" serves a higher level function of regulating more reactive processes related to positive and negative motivation (Derryberry & Rothbart, 1997; Posner & Rothbart, 1998b; Rothbart et al., 1994). Variability in effortful control is thought to reflect functioning of Posner's anterior attentional system, including the voluntary control of posterior orienting and the inhibition of prepotent response tendencies. Developmental studies have shown that children with high effortful control (measured by parent report) show reduced frequencies of negative affect, a finding consistent with the idea that effective use of attention may help attenuate distress (Eisenberg, Fabes, Nyman, Bemzweig & Pinulas, 1994; Rothbart, Ziaie & O'Boyle, 1992). In addition, skillful use of attention appears important in the ability to suppress impulsive approach, as demonstrated in Mischel's studies of delay of gratification (e.g., Metcalfe & Mischel, 1999). Individual differences in effortful control are also related to socioemotional variables, correlating negatively with aggression and positively with empathy and conscience (Kochanska, Murray, Jacques, Koenig & Vandegeest, 1996; Rothbart, Ahadi & Hershey, 1994). While effortful attention may help regulate more reactive motives, it is important to avoid the notion of a single higher level executive that functions with
332
Derryberry and Reed
homuncular powers. It is thus worthwhile to reconsider the higher level processes that contribute to regulating anterior attentional functions. In terms of cognitive controls, the frontal and cingulate regions receive extensive perceptual and conceptual content arising from posterior sensory and association areas. Such afferents provide pathways through which various beliefs, expectancies, and metacognitive knowledge can influence executive attentional functions (e.g., Wells & Matthews, 1994). In terms of motivational controls, massive projections arising from limbic and paralimbic circuits (e.g., amygdala, hippocampus, hypothalamus, orbital and medial frontal ) converge on the attentional regions of the anterior cingulate cortex. These afferents may provide pathways through which motivational processes may recruit attentional functions, and thereby come to function in a more voluntary and less reactive way. This can be most easily seen in conflict situations, where several relatively reactive motives compete for the control of behavior. If one of these motives can gain access to high level control provided by the anterior attentional system, it will be at an advantage due to its enhanced capacity to suppress the orienting and response tendencies related to alternative motives. In most instances, such motives will work in conjunction with available conceptual information, such as the person's beliefs, strategies, and metacognitive knowledge. As an example, a person's motivation to suppress approach and resist temptation should be strengthened by the beliefs involving the costs of approach and the benefits of resistance.
Studies Relating Temperament, Motivation, and Attention Our research has attempted to address such processes through psychometric and performance measures. To assess individual differences in underlying motivational processes, we use standard scales thought to measure differences in appetitive and defensive motives. Measures of Extraversion and Impulsivity are taken to assess the strength of appetitive motivation, and measures of Trait Anxiety and Neuroticism to assess defensive motives. While the trait measures are assumed to reflect relatively tonic differences in motivational processes, we also manipulate more phasic motivational processes on a trial by trial or block by block basis. This is done by varying the incentive value of targets that appear in various locations. Some targets carry appetitive or positive value in the sense that fast (and correct) responses lead to an increase in points, whereas slow responses result in no loss of points. In contrast, defensive or negative targets lead to a loss of points if the response is slow, and no loss if the response is fast. Each reaction time (RT) is scored as "fast" or "slow" by comparing it to the participant's median RT on the last block of trials. Such a criterion gives rise to roughly equal numbers of fast and slow responses, and thus scores tend to stay close to zero.
Multidisciplinary Perspectives
333
Orienting and motivational valence An initial set of studies examined the effects of these trait and state variables on attentional orienting (Derryberry & Reed, 1994). The task was a modified spatial orienting task where positive or negative valences were attached to opposing locations. The basic trial display consisted of three outlined boxes, one located in the screen's center and the other two in the left and right visual fields. Incentive values were assigned to the left and right location by placing an arrow pointing up on one side (e.g., midway between the central and left box) and an arrow pointing down on the other (e.g., between the central and right box). The arrow pointing up indicated a positive value in the sense that fast responses to targets in that location would result in a gain of ten points. The arrow pointing down indicated a negative value in that slow responses to targets in that location would result in a loss of 10 points. Each trial began with a brightening of one of the three boxes. When the central box brightened, targets were equally likely to appear in either peripheral box. However, when one of the peripheral boxes brightened, 80% of the upcoming targets appeared in the cued box (i.e. valid cues) and 20% in uncued box (i.e., invalid cues). Targets appeared at SOAs of either 100 ms or 500 ms following the cue. They took the form of a small circle appearing in one of the two peripheral boxes and stayed on until the subject responded with a simple key press. Each response was immediately followed by a feedback signal indicating whether the response was fast or slow. Thus, the peripheral cues served to initiate orienting to a location carrying either a positive or negative incentive value, and the extent of such orienting could be examined by comparing RTs following valid and invalid cues across the two SOAs. Two motivational/personality effects were found in these studies. The first involved individuals scoring below the median in Extraversion and above the median in Neuroticism as measured by the Eysenck Personality Questionnaire. These neurotic introverts, who tend to be high in trait anxiety, showed an attentional bias at the short SOAs when the negative location, where points could be lost, was cued. This finding is consistent with others demonstrating enhanced attention to threat in anxious people (Wells & Mathews, 1994). The second effect involved impulsive individuals (i.e., neurotic extraverts), who showed a bias favoring positive cues, especially on trials following negative feedback. This is consistent with other evidence that extraverts enhance their approach motivation in aversive situations such as those involving punishment (Newman, 1987). These attentional biases did not arise from faster RTs to targets in cued locations, but instead involved slower RTs to targets in uncued locations; i.e., they were present on trials involving invalid but not valid cues. For example, anxious individuals were slower than low anxious persons in detecting targets in a location
334
Derryberry and Reed
opposite to a negative cue. These effects were assessed more precisely by using RTs following central cues (which initiated no pretarget orienting) to estimate the "benefits" of valid peripheral cues and the "costs" of invalid peripheral cues. Anxiety-related differences were found only in the costs data. Their absence in the benefits data suggests that anxiety does not promote a stronger automatic activation of the threatening lOcation, for such a direct activation might be expected to facilitate attentional movement toward the cued location. In addition, the lack of differences in benefits argues against a facilitation of the posterior "move" operation. Rather than suggesting enhanced orienting toward negative incentives, the increased costs found in anxious persons suggests a difficulty in disengaging attention from such stimuli. As discussed in more detail below, the delays in disengagement may arise from an incentive-related suppression of the "disengage" operation and/or an enhancement of the "engage" operation. Although this effect may not arise from the same mechanism involved in other studies of attentional capture, it illustrates one way in which attention can be "captured" or "held" by motivationally significant stimuli. In terms of higher level controls that adjust the settings, these effects are found regardless of whether the incentive cue is actually predictive of the target's location, and thus cognitive expectancies seem to play a minor role. What seems more influential are the tonic motivational processes related to Anxiety and Extraversion, interacting with phasic changes elicited by negative and positive cues. If motivational processes make it difficult to disengage from significant stimuli, one might expect tendencies to get locked into escalating emotional states. This is often the case for anxious people, who report getting stuck on a threatening stimulus (e.g., an angry look, a hazardous object) along with a consequent increase in anxiety and anxious cognition. However, effective coping with threatening situations often requires attention not only to a dangerous object, but also to the available sources of safety, escape routes, and so on. Anxious people may be at a specific disadvantage because they are unable to take advantage of such information, as a result of which they cannot prepare effective coping responses and fail to experience the relief and reassurance that such options can provide. In contrast, other individuals (and perhaps some anxious people) may be able to override this bias and disengage effectively. One mechanism that may allow such control is the anterior attentional system, especially through its capacity to regulate the posterior system's orienting. Our recent studies have therefore attempted to assess individual differences related to anterior attentional functioning. Because a variety of anterior functions have been proposed, we developed a general scale aimed at assessing overall differences in voluntary "Attentional Control." The scale consists of twenty items assessing the ability to focus attention and avoid distraction (e.g., "When I am reading or studying, I am easily distracted if there are people talking in the same
Multidisciplinary Perspectives
335
room"), to shift attention between tasks (e.g., "It is easy for me to read or write while I'm also talking on the phone"), and to flexibly control thought (e.g., "It is hard for me to break from one way of thinking about something and look at it from another point of view"). As can be seen, the items focus on attentional rather than behavioral processes, and are set within neutral contexts that avoid strong defensive or appetitive motivation. The Attentional Control scale is internally consistent (alpha = .85). Its relations to other scales are consistent with the idea that high attentional control helps to constrain Trait Anxiety (r=-.50) and to facilitate positive emotionality related to Extraversion (t=.30). In addition to the orienting effects described below, the Attentional Control scale has been found to predict performance in several studies focusing on response inhibition. Subjects high in attentional control show reduced response interference in a stimulus-response compatibility task, and fast stop times in a modified stop-signal task. Our first set of orienting studies modified the paradigm described above to examine differences between anxious people with good or poor attentional control (Derryberry & Reed, 2001). Rather than signaling the opportunity to gain or lose points, the pretarget cue signaled the probable outcome of the response. "Threat" cues (an arrow pointing down) informed subjects that targets appearing in that location would be difficult and result in a slow response 75% of the time. "Safe" cues (an arrow pointing up) indicated that targets in the cued location would be easy and result in a fast response 75% of the time. Targets appearing in the uncued side of the screen always carried a probable outcome opposite to those on the cued side. Thus, if an arrow pointing down appeared in the LVF, targets on the left would be difficult whereas those on the right would be easy, and subjects should view the left as the dangerous location and the right as the safe location. Targets were presented either 250 or 500 ms after the cue, and a central feedback signal was presented immediately after the response. Feedback following fast responses was signaled by an arrow pointing up and slow responses by an arrow pointing down, identical in form and color to the pretarget cues. Cutoffs for "fast" and "slow" responses were again based on the median RT from the previous block, but adjusted trial-by-trial in terms of the target's difficulty. Groups were based on median splits on the State Trait Anxiety Inventory and the Attentional Control Scale. When the target followed a threatening cue by 250 ms, all anxious subjects were slower than low anxious subjects in responding to targets in the uncued location. As can be seen in Figure 1, anxiety has no effect given safe cues, but given invalid threat cues (and thus targets in the uncued safe location), anxious subjects were delayed relative to low anxious subjects. This is consistent with our earlier findings, again indicating that anxious people are slow in disengaging from potentially threatening locations. At the 500 ms SOAs, however, individual
3 36
Derryberry and Reed
340
HA
HA
LA
LA
320 RT 300
280 i
V
i
I
i
V
!
I
Threat Safe Cue Valence Figure 1. Anxiety x Validity interactions at 250 ms SOAs for threatening (i.e., hard) and safe (i.e., easy) cues. V=valid cue; I-invalid cue; HA=high Trait Anxiety; LA=low Trait Anxiety (based on median split).
differences in Attentional Control became evident. Although all subjects showed a tendency to shift from the threatening to the safe location, such disengagement was least effective in anxious people with poor control. In contrast, anxious subjects with good control shifted away more effectively, equaling the performance of the low anxious groups. These interactions are illustrated in Figure 2. The anxietyrelated bias was thus limited to anxious people with poor control at long SOAs; anxious people with good control were able to shift from a cued threatening location to respond to a target at a safe location. This interaction was significant in multiple regression as well as analyses of variance, indicating that it is not a spurious effect arising from correlated personality variables. These findings are important in suggesting that our Attentional Control scale does tap individual differences related to executive attentional functions (i.e., the control of posterior orienting). Two aspects of the data are consistent with an anterior function. First, the more reactive influence of trait anxiety was evident at 250 ms., whereas the putative anterior influence became apparent at 500 ms. This is consistent with the idea that anterior intervention should take longer due to the greater time required for frontal processing. Second, the effect appears to involve a voluntary form of control. Specifically, the tendency to shift from the cued to the uncued side location was stronger when the cued location was threatening than safe,
Multidisciplinary Perspectives
3 37
340
320 HA
RT
HA
300 m.-...--.''~11 LA 280 I
L
V
1 Low
I
I
V
I High
ATTENTIONAL CONTROL Figure 2. Anxiety x Validity interactions at 500 ms SOAs for subjects low (left) and high (right) in Attentional Control (based on median split). V=valid cue; I=invalid cue; HA=high Trait Anxiety; LA=lowTrait Anxiety (based on median split).
a tendency most easily interpreted as a voluntary or strategic shift from the harder to the easier location. The underlying mechanism will be discussed in more detail later, but for now, the simplest account is that the anterior system sends a signal to the posterior system allowing attention to disengage from the cued location. This signal may be in some way stronger or faster in individuals with good attentional control. Its influence may not be apparent in low anxious subjects, who have no underlying difficulty in disengaging. But given the early impaired disengagement arising from anxiety, the more effortful influence can become manifest. Regarding the underlying motivational mechanisms, the simplest model would be one in which the functions of responding to threat and safety were carried out by a single defensive motivational system. If this were the case, then the detection of threat should occur early and result in a biasing of posterior orienting (e.g., enhancing engagement), as demonstrated at the short SOAs. The safety detection function would occur later, and lead to a recruitment of anterior functions to regulate orienting (e.g., suppressing engagement). In the present task, these defensive functions are presumably preset based on the degree of activation within the person's motivational system, though the relevant valence and location assignments must be reset on a trial by trial basis. Alternatively, it is also possible that the threat and safety functions are carried out by different motivational systems.
338
Derryberry and Reed
In a model such as Gray's (1987), for example, the "behavioral inhibition" and "behavioral activation" systems might work together in responding to threat and safety signals. The defensive system would function primarily through the posterior system and the safety system through the anterior system. Although the biasing functions could again be preset, more complex interactions between the two systems (e.g., reciprocal inhibition) would need to be considered. For now, the most important point concerns the implications for temperament and everyday performance. The ability to disengage and take advantage of other information is pivotal in coping with threat. When attention is strongly captured and held by a threatening input, coping options become limited and anxiety tends to increase. Often, the only options available are simple endurance or avoidance of the situation. Neither of these are particularly good options, for they limit the person's ability to learn from the situation. To the extent that the person can disengage, however, they can consider available sources of safety and alternative response options. If effectively carried out, these options should increase relief and decrease anxiety, thereby reducing the likelihood of maladaptive avoidance. Orienting and motivational value Another common type of conflict involves potential stimuli that vary in their value or importance rather than in their positive or negative valence. Specifically, we often face situations where we must select among stimuli with differing values, with some stimuli being more rewarding or punishing than others. Effective responding in such situations usually requires an effective distribution of attention between all of the stimuli that supports their evaluation. Otherwise, one could easily respond to a relatively trivial rather than important stimulus. This again seems like the type of situation in which more reactive motivational and voluntary attentional processes will come into play. Such processing has been examined in recent studies that held the target's valence constant across locations but varied the point value of potential targets (Derryberry & Reed, in preparation). Subjects altemated between positive and negative blocks of trials, on which they could gain points (for fast responses) and lose points (for slow responses), respectively. Within each block, each trial began with the appearance of two numbers that signaled the number of points that could be gained or lost if the target appeared in that location. One number always had higher value than the other (e.g., 8 vs 4), thereby defining its location as strategically the more important. Five hundred milliseconds after the numbers appeared, a cue was presented by tuming one number red. This cue signaled the probable location of the target, which appeared adjacent to the red number 75% of the time. Detection targets followed the location cues at SOAs of 250 or 500 ms, and each response was
Multidisciplinary Perspectives
33 9
immediately followed by a feedback signal. The cutoffs for fast or slow responses were again based on the median RT of the previous block of trials, and were equal for targets appearing in either location. This paradigm sets up a conflict between the two potential target locations, which should strategically be resolved in favor of the higher value location. However, a more interesting conflict arises between the point value of the location and the expected probability of a target. On the one hand, subjects should be motivated to attend to the location carrying the higher value, because it carries the possibility of gaining or losing a greater number of points. But the same time, they should also be motivated to attend to the cued location, because it is more likely to be targeted. One way of resolving this conflict would be to adjust the cued orienting based on the value of the cued location. In other words, a reasonable strategy would be to enhance orienting when the higher value location is cued and to attenuate orienting when the lower value location is cued. We expected that subjects with good attentional control would be better at making such trial by trial adjustments. This prediction was based on the notion that efficient anterior attentional functioning should provide greater flexibility in controlling posterior orienting. In addition, we expected that motivational biases related to traits of anxiety and extraversion would render some individuals particularly vulnerable to high value negative or positive targets. The results showed that subjects were generally capable of making strategic adjustments in orienting. Most showed a larger orienting effect (i.e., the difference between targets at cued compared to uncued locations) when the higher value location was cued. Along with its strategic nature, the fact that this effect increased from the 250 ms to 500 ms SOA is consistent with an anterior regulation of posterior orienting. Also consistent with our predictions, the strategic adjustment depended on Attentional Control. As seen in Figure 3, which graphs the data for 500 ms SOAs, poor attenders show very little if any strategic adjustment. In comparison, good attenders show stronger orienting to high value cues and weaker orienting to low value cues. To further explore these effects, we performed multiple regression analyses predicting RTs separately for trials involving valid and invalid cues. These more precise analyses indicated that the difference between good and poor attenders involved invalid trials given both low and high value cues; that is, good attenders were relatively fast to shift from a low value cue to a target in the high value location, and slow to shift from a high value cue to a low value target (see Figure 3). The involvement of uncued targets is similar to our earlier findings, and again suggests a modulation of the ease with which attention is disengaged from the cued location. In this case, good attenders are delayed in disengaging from high value locations and fast in shifting from low value locations. This is a reasonable strategy in this task, allowing good attenders to score more points than poor attenders.
340
Derryberry and Reed
Also of interest was a separate, noninteracting effect of Extraversion. Both introverts and extraverts showed stronger orienting given high value cues, but the adjustment was stronger in extraverts. The Extraversion influence appears similar to that of Attentional Control, but regression analyses suggest that a different mechanism may be involved. While the Attentional Control effects arose on invalid trials, the difference between introverts and extraverts was limited to valid trials; that is, extraverts were faster than introverts in responding to targets in cued high value locations. More research will of course by needed to differentiate the extraversion and attentional control effects. At this point, however, we suspect that the extraversion effect may involve a facilitation of response processing given high value cues. This interpretation is based on a follow-up study in which numbers were presented as targets adjacent to the cue numbers, and subjects responded only if the two numbers matched. Extraverts again showed faster responses to targets adjacent to high value cues, but they also made more errors in responding to nonmatching targets adjacent to high value cues. This speed-accuracy tradeoff suggests facilitated responding to potential targets in high value locations.
300 High
_
Low
280
Low _
RT
High
260
240 I
I
V
I
Low
i
i
V
I High
ATTENTIONAL CONTROL Figure 3. Cue Value x Validity interactions at 500 ms SOAs for subjects low (left) and high (right) in Attentional Control (based on median split). V=valid cue; I=invalid cue; Low = low value cue; High = high value cue.
The extraversion effect is also of interest in that it does not fit neatly within the types of motivational processes typically related to Extraversion. Most models suggest that extraversion involves appetitive processes in response to rewarding or relieving cues. However, Depue and Collins (1999) describe a "behavioral
Multidisciplinary Perspectives
341
facilitation system" thought to underly extraversion. A key component of this system is a "motive circuit" focused on the nucleus accumbens that computes the motivational value of incentive stimuli and adjusts the intensity of the response accordingly. It is possible that such circuitry gives extraverts an advantage in computing the relative value of contextual stimuli and/or facilitating responses.
Summary and Conclusions Our studies demonstrate several types of motivational processes that contribute to the control of spatial orienting. The studies featuring dangerous versus safe targets illustrate the role of stimulus valence (negative versus positive), while the studies with high and low value targets illustrate the role of stimulus value. The findings are consistent with earlier studies demonstrating attentional biases favoring motivationally significant stimuli, as well as physiological evidence linking motivational and attentional circuits. In general, it makes considerable sense that motivational processing has evolved to promote adaptive behavior, which should clearly benefit from the use of attention. Somewhat less intuitive is the nature of the motivational effect. Although our findings are generally consistent with the idea that attention is captured by significant stimuli, this "motivational capture" does not seem to arise from an attraction of attention to a particular location. Rather than enhancing the "attracting power" of potential locations, motivational processes appear to regulate the "holding power" of such locations once they are engaged. This may seem counterintuitive in that we often think of motivation as promoting an active search for specific goals. However, a mechanism specific to regulating holding power would be adaptive in allowing initial attentional movement to remain free of bias so that it can move effectively to all locations. Once these locations are engaged, their motivational relevance can be evaluated more thoroughly, and the holding power can be adjusted accordingly. Rather than influencing the frequency with which significant objects are attended, such a mechanism would influence the duration of attention, as first suggested by Wise (1987) in regard to hunger and food objects. More specifically, an increase in holding power could arise from a motivational enhancement of the Posner's engage operation or an attenuation of the disengage operation. We have recently completed several preliminary studies suggesting that the primary influence may involve the engage operation. Peripheral cues were presented that predicted the location of the upcoming target (on either the cued on uncued side), thereby motivating subjects to engage or disengage the location of the valenced cue. Anxious subjects showed delays in shifting from the negative cues, but only when that location was engaged. When the negative cue informed subjects to disengage and shift to the other side of the screen, anxious and low anxious subjects were equally fast to disengage. If the negative cue functioned
342
Derryberry and Reed
only to suppress the disengage operation, such voluntary disengagement should have been delayed. Clearly, more research is required to isolate the focus of the motivational effect. Nevertheless, a regulation specific to the engage operation would be adaptive in several ways. First, the facilitated processing that results from the enhanced engagement should promote more effective evaluation and response selection. Second, by making it more difficult to inadvertently disengage, the enhanced engagement may protect the individual from unintentional disengagement and distraction. When dealing with imminent danger, for example, distraction by irrelevant stimuli can interfere with effective escape or avoidance responses. Third, if the disengage operation is not directly suppressed, it can remain relatively open to more voluntary forms of control. For example, an anxious person may engage a threatening stimuli quite strongly, but may still be able to voluntarily activate the disengage operation in order to shift to a source of safety. This of course does not rule out to possibility that voluntary disengagement may also be promoted by an inhibition of the engage function. The results are also consistent with temperament models that emphasize individual differences in motivational and attentional processes (e.g., Rothbart et al., 1994). The dimension of trait anxiety appears to bias attention in favor of threatening information, as might be expected given high tonic activity within a defensive motivational system (e.g., Gray's behavioral inhibition system). Such a bias may be adaptive in the sense that it facilitates information that is clearly important. However, if it makes it difficult for the anxious person to disengage, then they will be at a disadvantage when processing other crucial information relevant to safety and coping options. Not only will this bias lead to increasing anxiety within a situation, but across time, it may lead to the development of perceptual and conceptual systems that emphasize threat at the expense of safety. Thus, anxious individuals may come to construe the world as a dangerous place and themselves as vulnerable (Derryberry & Reed, 1996). In contrast, our studies suggest that extraversion is associated with motivational circuitry that assesses the relative value of both positive and negative information (e.g., Depue and Collins' behavioral facilitation system). In several models, extraversion has been related to motivational systems that respond to positive incentives by promoting approach behavior (e.g., Gray's behavioral activation system). Our earlier studies found evidence for attentional biases favoring reward in tasks where positive and negative incentives varied randomly within a block of trials (Derryberry & Reed, 1994). However, the present studies blocked the positive and negative incentives, and extraverts showed no biases favoring positive cues. Instead, they were relatively fast given targets in high value locations, regardless of whether points could be gained or lost. In addition, we suspect that extraverts' bias may involve response-related rather than more purely
Multidisciplinary Perspectives
343
attentional processes. As mentioned earlier, this sort of value extraction processing is consistent with Depue and Collin's (1999) behavioral facilitation system, which might be viewed as functioning under both appetitive and defensive conditions to promote strong approach behavior given reward and active avoidance behavior given danger. Such rapid approach behaviors will give extraverts an advantage in situations that offer targets of varying value and requiring rapid responses. For example, extraverts may excel in social contexts because in part because they recognize opportunities and respond quickly (e.g., Matthews, 1997). But at the same time, there may be disadvantages related to approach responses that are in some way inappropriate. Perhaps of greatest interest in the present context are the individual differences in attentional control. Anxious subjects with good control were better able than those with poor control to disengage from a threatening location and respond to a target in a safe location. In addition, all subjects with good attentional control were better able to adjust orienting in favor of high value targets. Such flexible and adaptive control is consistent with one of the functions of Posner's anterior attentional system, namely the control of posterior orienting. Assuming that reactive motivational influences enhance posterior engagement, the presumably voluntary functions mediated through the anterior system may operate in several ways. The anxiety interaction in the valence studies may involve an inhibition of the engage mechanism that allows faster disengagement to safe locations. The general effect in the value study may involve an enhancement of the engage operation leading to slower disengagement to low value locations. In addition, the anterior system may modulate the disengage operation more directly, or even the perceptual input to the disengage operation. But whatever mechanisms are involved, the extent of "capture" by motivational stimuli appears to depend on individual differences in the ability to employ strategic attentional processing as well as motivation. We have also run additional studies examining other anterior functions related to response processing. In a stimulus-response compatibility task, subjects responded to arrows pointed right and left appearing in the right and left visual field. Anxious subjects with good attentional control were better able than those with poor control to suppress the dominant response tendency to respond with the hand corresponding to the target's irrelevant spatial location. In a stop-signal task, subjects high in impulsivity were slower to stop their responses, but only if they were also low in attentional control. These findings suggest that the Attentional Control scale taps individual differences in response inhibition as well as orienting functions attributed to frontal attentional systems. More research will be needed to explore the attentional differences related to other executive functions. A particularly important anterior function is that of inhibiting dominant conceptual associations. Individual differences in such a
344
Derryberry and Reed
function could be crucial to personality, which often involves relatively automatic and chronic ways of thinking. For example, trait anxiety involves dominant appraisal and attributional patterns that tend to exacerbate threat and undermine perceived control (Mineka & Zinbarg, 1996), self-evaluative processing that emphasizes punishment for shortcomings (Higgins, 1996), and general tendencies to focus negatively on the self (Matthews, 1997; Wells & Matthews,1994). Anxious people report that they often get caught up in worrisome and ruminative thought, leading to considerable interference in their daily lives. If the person can inhibit such dominant thought tendencies, they should be able to constrain the associated feelings of anxiety, and to hopefully come up with a more optimistic or controllable view of the world. It is interesting that recent therapeutic approaches have recognized the role of attentional differences, and various training techniques are being developed (e.g., Wells & Matthews, 1994). Also interesting is the fact that many pharmacological treatments, such as the monoaminergic anti-depressants, function in part by facilitating neurochemical systems involved in attention. Additional executive functions are those involved in working memory. The notion that motivation functions to enhance the "holding power" of significant information is generally consistent with working memory models that emphasize the maintenance of task goals in the face of distraction. Conway and Kane (this volume) review a series of studies indicating that individuals differences in working memory capacity are related to differences in attentional control. It makes good sense that motivational and attentional processes interact in maintaining working memory, rendering individuals more or less vulnerable to distraction and capture. A simple example would be students prone to test anxiety, who often report difficulty in staying on task due to their distraction by thoughts about failure. In conclusion, this chapter has provided a multidisciplinary perspective on attentional capture that emphasizes the influence of reactive motivational and voluntary attentional processes. The relation between these processes and others related to abrupt onsets and novelty marks an important area for future research. What all of these processes have in common, however, is a concern with the potential importance of incoming information. By investigating the various ways in which importance arises, as well as the ways it varies across individuals, we should be in a better position for understanding attention and its control. References
Alexander, G. M., Swerdloff, R.S., Wang, C.W., Davidson, T. (1997). Androgen-behavior correlations in hypogonadal men and eugonadal men: I. Mood and response to auditory sexual stimuli. Hormones & Behavior, 31, 11 O-119.
MultidisciplinaryPerspectives
345
Channon, S., & Hayward, A. (1990). The effect of short-term fasting on processing of food cues in normal subjects. International Journal of Eating Disorders, 9, 447-452. Depue, R. A., & Collins, P. F. (1999). Neurobiology of the structure of personality: Dopamine, facilitation of incentive motivation, and extraversion. Behavioral and Brain Sciences, 22, 521-555. Derryberry, D., & Reed, M. A. (1994). Temperament and attention: Orienting toward and away from positive and negative signals. Journal of Personality and Social Psychology, 66, 1128-1139. Derryberry, D., & Reed, M. A. (1996). Regulatory processes and the development of cognitive representations. Development and Psychopathology, 8, 215-234. Derryberry, D., & Reed, M. A. (1998). Anxiety and attentional focusing: Trait, state and hemispheric influences. Personality and Individual Differences, 25, 745-761. Derryberry, D., & Rothbart, M. K. (1997). Reactive and effortful processes in the organization of temperament. Development and Psychopathology, 9, 633652. Derryberry, D., & Tucker, D. M. (1992). Neural mechanisms of emotion. J. consult, clin. Psychol., 60, 329-338. Derryberry, D., & Tucker, D. M. (1993). Motivating the focus of attention. In P. Niedenthal & S. Kitayama (Eds.), The heart's eye: Emotional influences in perception and attention, (pp. 170-196). San Diego, CA: Academic Press. Deutsch, J. A. (1960). The structural basis of behavior. Chicago: University of Chicago Press. Eisenberg, N., Fabes, R. A., Nyman, M., Bemzweig, J., & Pinulas, A. (1994). The relations of emotionality and regulation to children's anger-related reactions. Child Development, 65, 109-128. Eysenck, H. J. (1967). The biological basis of personality. Springfield, Illinois: Thomas. Folk, C. L., & Remington, R. (1999). Can new objects override attentional control settings? Perception & Psychophysics, 61,727-739. Fowles, D. C. (1994). A motivational theory of psychopathology. In W. G. Spaulding (Ed.), Nebraska symposium on motivation, Vol. 41: Integrative views of motivation, cognition, and emotion, (pp. 181-238). Lincoln, Nebraska: University of Nebraska Press. Gallistel, C. R. (1980). The organization of action: A new synthesis. Hillsdale, New Jersey: Erlbaum. Grafman, J., Holyoak, K. J., & Boller, F. (Eds.). (1995). Annals of the New York Academy of Sicences, Volume 769. Structure and functions of the human prefrontal cortex. New York: New York Academy of Sciences.
346
Derryberry and Reed
Gray, J. A. (1987). Perspectives on anxiety and impulsivity: A commentary. Journal of Research in Personality, 21,493-509. Gray, J. A. (1994). Framework for a taxonomy of psychiatric disorder. In S. H. M. van Goozen, N. E. Van de Poll, & J. A. Sergeant (Eds.), Emotions: Essays on emotion theory, (pp. 29-60). Hillsdale, NJ: Erlbaum. Gray, J. A., & McNaughton, N. (1996). The neuropsychology of anxiety: Reprise. In D. A. Hope (Ed.), Nebraska Symposium on Motivation: Perspectives on anxiety, panic, and fear. Volume 43., (pp. 61-134). Lincoln, Nebraska: University of Nebraska Press. Higgins, E. T. (1996). Ideals, oughts, and regulatory focus: Affect and motivation from distinct pains and pleasures. In P. M. Gollwitzer & J. A. Bargh (Eds.), The psychology of action." Linking cognition and motivation to behavior, (pp. 91-114). New York: Guilford. Hockey, R. (1979). Stress and the cognitive components of skilled performance. In V. Hamilton & D. M. Warburton (Eds.), Human stress and cognition: An information processing approach, (pp. 141-177). New York: Wiley. Kochanska, G., Murray, K., Jacques, T. Y., Koenig, A. L., & Vandegeest, K. A. (1996). Inhibitory control in young children and its role in emerging internalization. Child Development, 67, 490-507. MacLeod, C., & Mathews, A. (1988). Anxiety and the allocation of attention to threat. Quarterly Journal of Experimental Psychology, 40, 653-670. Matthews, G. (1997). Extraversion, emotion and performance: A cognitiveadaptive model. In G. Matthews (Ed.), Cognitive science perspectives on personality and emotion, (pp. 399-442). Amsterdam: Elsevier. Mesulam, M. M. (1981). A cortical network for directed attention and unilateral neglect. Annals of Neurology, 10, 309-325. Metcalfe, J., & Mischel, W. (1999). A hot/cool-system analysis of delay of gratification: Dynamics of willpower. Psychological Review, 106, 3-19. Mineka, S., & Zinbarg, R. (1996). Conditioning and ethological models of anxiety disorders: Stress-in-dynamic context anxiety models. In D. A. Hope (Ed.), Nebraska Symposium on Motivation, Volume 43: Perspectives on anxiety, panic, and fear., (pp. 133-210). Lincoln, Nebraska: University of Nebraska Press. Nauta, W. J. H. (1971). The problem of the frontal lobe: A reinterpretation. Journal of Psychiatric Research, 8, 167-187. Newman, J. P. (1987). Reaction to punishment in extraverts and psychopaths: Implications for the impulsive behavior of disinhibited individuals. Journal of Research in Personality, 21,464-480. Panksepp, J. (1998). Affective Neuroscience. New York: Oxford. Posner, M. I., & DiGirolamo, G. J. (1998). Executive attention: conflict, target detection and cognitive control. In R. Parasuraman (Ed.), The attentive brain, (pp. 401-423). Cambridge, MA: MIT Press.
Multidisciplinary Perspectives
347
Posner, M. I., & Raichle, M. E. (1994). Images of mind. New York: Scientific American Library. Posner, M. I., & Rothbart, M. K. (1998a). Attention, self-regulation and consciousness. Philosophical Transactions of the Royal Society of London B, 353, 1915-1927. Posner, M. I., & Rothbart, M. K. (1998b). Attention, self-regulation and consciousness. Philosophical Transactions of the Royal Society of London B, 353, 1915-1927. Rothbart, M. K., Ahadi, S. A., & Hershey, K. L. (1994). Temperament and social behavior in childhood. Merrill-Palmer Quarterly, 40, 21-39. Rothbart, M. K., Derryberry, D., & Posner, M. I. (1994). A psychobiological approach to the development of temperament. In J. E. Bates & T. D. Wachs (Eds.), Temperament: Individual differences at the interface of biology and behavior, (pp. 83-116). Washington, D. C.: American Psychological Association. Rothbart, M. K., Ziaie, H., & O'Boyle, C. (1992). Self-regulation and emotion in infancy. In N. Eisenberg & R. A. Fabes (Eds.), Emotion and selfregulation in early development: New directions in child development, (pp. 7-24). San Francisco: Jossey-Bass. Stuss, D. T., Shallice, T., Alexander, M. P., & Picton, T. W. (1995). A multidisciplinary approach to anterior attentional functions. In J. Grafman, K. J. Holyoak, & F. Boller (Eds.), Annals of the New York Academy of Sciences, Volume 769. Structure and functions of the human prefrontal cortex,. New York: The New York Academy of Sciences. Tucker, D. M. (1992). Developing emotions and cortical networks. In M. Gunnar & C. A. Nelson (Eds.), Minnesota Symposium on Child Psychology. Vol. 24. Developmental behavioral neuroscience, (pp. 75-128). Hillsdale, N. J.: Erlbaum. Tucker, D. M., & Williamson, P. A. (1984). Asymmetric neural control systems in human self-regulation. Psychological Review, 91, 185-215. Wells, A., & Matthews, G. (1994). Attention and emotion: A clinical perspective. Hillsdale, NJ: Erlbaum. Wise, R. A. (1987). Sensorimotor modulation and the variable action pattern (VAP): Toward a noncircular definition of drive and motivation. Psychobiology, 15, 7-20. Yantis, S., & Egeth, H. E. (1999). On the distinction betwewen visual salience and stimulus-driven attentional capture. Journal of Experimental Psychology: Human Perception and Performance, 25, 661-676.
This Page Intentionally Left Blank
Attraction, Distraction,and Action: MultiplePerspectiveson AttentionalCapture C. Folk and B. Gibson(Editors) 02001 ElsevierScience B.V. All rights reserved.
14
349
Capacity, Control and Conflict: An Individual Differences Perspective on Attentional Capture Andrew R. A. Conway and Michael J. Kane
Webster's dictionary defines capture as "the act of catching or gaining control by force, stratagem, or guile." A clear example of capture is a military coup, such as the North Viemamese takeover of South Vietnam in 1975. But what does it mean to capture attention? The phrase "attentional capture" suggests that some "thing" is being captured and that control has been displaced. It also suggests that the "thing" is limited in supply. After all, capture by force would not be necessary if there was an unlimited supply of the "thing." Central then to the study of attentional capture are the issues of capacity, control and conflict. Capacity, control and conflict (or interference) have been fundamental issues in the study of attention and memory since the cognitive revolution of the 1950s. In that time, most information-processing theories have incorporated a limited capacity system and a mechanism or process of control, and implicit is the notion that conflict is resolved by the system. For example, Broadbent's (1958) model of selective attention posited an "early" filter in order to constrain the amount of information that passed to the limited capacity channel that sat just beyond the selective filter. Thus, there was a limited capacity channel and a filter that controlled the flow of information and in so doing resolved perceptual conflict or interference. Another example, from the memory literature, is Atkinson and Shiffrin's (1968) model of memory, in which there was a limited capacity short-term store and ill-specified control processes that managed the flow of information and resolved interference. Despite these influential cognitive models and half a century of experimental investigations of the processes and mechanisms involved in memory and attention, the precise relationship between capacity and control has yet to be understood. In this chapter we present an individual differences perspective on attentional capture. We will suggest that individuals with greater working-memory capacity (WMC) exhibit greater attentional control than do individuals with lesser WMC. In doing so, we will present a theory of memory and attention, according to which: (1) working memory is a system responsible for the maintenance of goalrelevant information in the face of concurrent processing, (2) individual differences in WMC correspond to the ability to maintain goal-relevant information, especially in contexts providing sources of competition or interference with that goal, and (3) this maintenance ability determines susceptibility to attentional capture. After a
3 50
Conway and Kane
brief review of the development of the concept of WMC, we will present evidence from a series of experiments investigating individual differences in WMC and attentional control across a variety of tasks. Our approach to exploring attentional control and capture is different from most discussed in this book, particularly insofar as our research represents a blend of experimental and differential psychology. Rigorous experimental method is necessary to explore the subtle complexities of attentional control and we therefore make use of classic experimental paradigms such as dichotic listening and the Stroop (1935) color-word task. Yet our objective is also to understand individual differences in WMC and cognitive ability more broadly, and therefore we apply ideas and statistical procedures borrowed from psychometrics. We suggest that combining these two "disciplines of scientific psychology" (Cronbach, 1957) will result in a richer understanding of capacity and control processes in attention and memory. Such a dual approach also lends itself to further exploring the increasingly evident importance of WMC and attention control to complex cognitive capacities such as reasoning and intelligence (e.g., Carpenter, Just, & Shell, 1990; Dempster, 1991; Engle, Tuholski, Laughlin & Conway, 1999; Just & Carpenter, 1992; Kyllonen & Christal, 1990; Miyake, Friedman, Emerson, Witzki, Howerter, & Wagner, 2000). Thus, while some chapters in this book present a detailed analysis of attentional capture within a single paradigm, our broad interests in attention control lead us to explore vulnerability to capture across a variety of experimental tasks. Before we discuss the empirical evidence linking WMC to attentional control, it is necessary to review the development of the concept of working memory and the development of our particular theory of individual differences in WMC. We therefore begin with a brief review of working memory research, paying close attention to notions of capacity and control.
Working Memory and Working Memory Capacity In 1974, Baddeley and Hitch began their now seminal chapter on working memory with the statement, "Despite more than a decade of intensive research on the topic of short-term memory (STM), we still know virtually nothing about its role in normal human information processing" (p. 47). One of the motivating forces behind this statement was the collection of findings suggesting that short-term memory capacity is not a good predictor of more complex cognitive behavior, such as reading comprehension or problem solving. This was inconsistent with the modal model of memory (e.g. Atkison & Shiffrin, 1968), which conceived of the shortterm store as the gateway to the information processing system. The notion that the capacity of the gateway had little impact on the general performance of the system was clearly problematic (see also Crowder, 1982). Baddeley and Hitch (1974) argued that measures of short-term memory capacity, such as the digit span task, are not predictive of more general cognitive behavior because such tasks only tap a passive storage buffer. Baddeley and Hitch
Individual Differences
3 51
argued that cognitive behavior is typically more dynamic than static, with maintenance of active memories required in the face of concurrent processing. They therefore proposed a system called "working memory," which is responsible for active maintenance of information in the service of more complex cognition. Their structural model of working memory consisted of a central executive and two storage buffers, the phonological loop for verbal information and the visuo-spatial sketchpad for spatial information. The role of the central executive was not clearly defined initially but it was supposed that the executive was responsible for coordination, integration, and control processes.
Measurement and theories of working memory capacity Baddeley and Hitch (1974) further proposed that working memory is a limited capacity system and that this capacity constrains cognitive performance. An open question following the publication of their chapter was how to assess this capacity, given that simple span tasks such as digit span tapped only the storage aspect of the working memory system. A task was needed that not only required storage but also concurrent information processing. Indeed, it was several years before Daneman and Carpenter (1980) introduced the first task designed to measure WMC. Their reading span task required subjects to read sentences and remember the last word of each sentence for later recall. The number of sentences presented before each recall cue varied, typically from 2 - 6. The largest such series for which a subject could read each sentence and recall all the sentence-final words was scored as that subject's working-memory span or WMC. Notice that this task requires not only a storage function- maintaining the sentence final w o r d s - but it also requires the simultaneous reading of each sentence. Such simultaneous processing is the hallmark of the working memory system, as defined by Baddeley and Hitch (1974), and it is now incorporated into most, if not all, measures of WMC. In contrast to prior research attempting to link measures of immediate memory to higher-order cognition, Daneman and Carpenter (1980) found that the reading span measure predicted Verbal Scholastic Aptitude Test (VSAT) scores, and it did so much better than did a simple word span task. 1 At first glance, the fact that the reading span task predicts the VSAT may not be surprising. After all, the processing component of the reading span task is reading sentences! Thus, one might argue that better readers have more time or resources to devote to the storage component of the task and therefore score higher on the span task. According to such an argument, a skill or ability that is specific to the processing component of the span task accounts for its relation to the VSAT. An alternative account, however, is that both the span task and the VSAT tap a general process or ability that is not specific to the processing component of the span task. We refer to these two alternatives as the domain-specific and the domain-general views, respectively. A number of findings support the domain-general view of the relation between span measures and measures of more complex cognition. For example, Turner and Engle (1989) developed the operation span task, which is similar to the
352
Conway and Kane
reading span task, except that instead of reading sentences, the subject is required to solve mathematical operations. Thus, a math problem and a word are presented together (e.g., IS (6+4) / 2 = 5 ? TREE) and the subject must solve the math problem and attempt to remember the word for later recall. The number of operation-word pairs per series varies and working-memory span or capacity is defined as the largest series for which the subject can correctly solve the math problems and remember all the words. Turner and Engle found that the reading-span and operation-span tasks correlate equivalently with the VSAT, and furthermore, that the two measures account for the same variance in VSAT. Such findings are clearly not consistent with a strong version of the domain-specific view outlined above, for the operation span task does not involve reading comprehension per se. Further support for the domain-general view comes from a series of experiments by Engle, Cantor, and Carullo (1992). Engle et al. had subjects perform both operation span and reading span, and they recorded a number of dependent measures from each, including the time spent viewing each portion of the processing component (i.e. time spent reading each word, time spent viewing each component of the operation). They also measured the time to read sentences and solve operations without the added requirement of recall. Partial correlations revealed that none of these task-specific (or potentially strategic) measures accounted for the correlation between span and VSAT. That is, the relation between WMC and higher-order verbal ability was not due to specific skills or strategies operating within the working-memory span tasks, themselves (see also Conway & Engle, 1996). In 1992, two influential theories of a domain-general WMC were published (Engle, Cantor & Carullo, 1992; Just & Carpenter, 1992). These general capacity theories proposed that language comprehension and other complex cognitive tasks are constrained by the amount of activation available to the cognitive system. Moreover, WMC represents that total amount of activation. Cantor and Engle (1993) provided key empirical support for these general capacity theories of working memory. Applying the logic of spreading-activation models of cognition (e.g., Anderson, 1983), Cantor and Engle reasoned that if WMC were equivalent to the total amount of activation available to the cognitive system, then individuals who differ in WMC should also differ in tasks that tap the spread of this general activation. Cantor and Engle tested this prediction by examining individual differences in the fan effect (Anderson, 1974). The fan effect is demonstrated in experiments that require subjects to memorize a large number of sentences and then verify their memory of the sentences. Each subject memorizes a number of sentences that take the form, "The person is in the place" (e.g., "The lawyer is in the park"). The number of locations associated with each person typically varies from 1 - 6, and this is referred to as the "location fan." Also, the number of people associated with each location is varied and is referred to as the "person fan." In the test phase, sentences are presented individually and the subject must verify, as quickly and accurately as possible, whether the sentence had been studied or not. The fan effect refers to the finding
Individual Differences
3 53
that verification time and accuracy are a function of both location-fan and personfan, with reaction times and error rates increasing with fan size (there are exceptions, however; for updated reviews of the fan paradigm see Anderson & Reder, 1999; Radvansky, 1999). To test whether individuals with lesser WMC would reveal a more dramatic fan effect than would individuals with greater WMC, Cantor and Engle (1993) assessed each subject's WMC with the operation span task, identifying the upper and lower quartile of the distribution as high and low WMC, respectively. They then compared high and low WMC subjects' performance on the fan task. As predicted, low-WMC subjects showed a more dramatic fan effect than did highWMC subjects. Moreover, when the slope of the fan effect was statistically partialed out of the significant correlation between operation span and VSAT scores, the correlation disappeared. These findings supported the notion that individual differences in WMC correspond to the amount of domain-general activation available to the cognitive system, as suggested by general-capacity theories of working memory.
Working memory "capacity" or working memory "control"? An alternative interpretation of Cantor and Engle's (1993) results is that WMC corresponds to the regulation or control of activation rather than the sheer amount of activation or capacity. Note that in their implementation of the fan paradigm, Cantor and Engle manipulated the person fan, with each person associated with one, three, or four places, and so the fan effect was as a function of location. That is, reaction time and error rate increased as the number of locations associated with an individual person increased. Importantly, Cantor and Engle did not manipulate location-fan. That is, every location was associated with two people. For example, if one of the sentences was, "The lawyer is in the park," another sentence might have been, "The artist is in the park." It was therefore possible that response competition between different characters in the same location influenced performance in the verification stage. In order to verify that the sentence, "The lawyer is in the park," was studied, the subject may have had to block or inhibit the sentence, "The artist in the park." Thus, individual differences in WMC might be related to the ability to resolve interference rather than the rate or amount of spreading activation. In order to examine this possibility, Conway and Engle (1994) had subjects perform a task that married the fan paradigm with a Stemberg-type memoryscanning task. For example, subjects were required to memorize four sets of letters, each set consisting of 2, 4, 6, or 8 letters. For subjects in the response-competition condition, each letter was a member of two different sets. So, if the letter X was a member of set 2 it might also be a member of set 6. For subjects in the nocompetition condition, each letter was a member of only one set. The critical question was whether individual differences in WMC would be seen in both conditions, as predicted by a capacity view, or only when competition was present,
354
Conway and Kane
as predicted by a control view. In fact, low WMC subjects revealed a more dramatic set-size effect than did high WMC subjects only in the response-competition condition (the condition resembling that used by Cantor and Engle, 1993). Although the slopes of the set-size effect in the no-competition condition were also substantial, they were equivalent for high and low WMC subjects. This pattern suggested that individual differences in WMC do not correspond to the rate of spreading activation or the sheer amount of activation. Rather, individual differences in WMC are related to the ability to control the activation of relevant information and block the activation of distracting information. The results of Conway and Engle (1994) motivate the prediction that individual differences in WMC will reveal themselves in contexts that present a significant source of interference. Subsequent work has indeed demonstrated WMC-related differences in long-term memory retrieval in the face of proactive, retroactive, and output interference (Kane & Engle, 2000; Rosen & Engle 1997, 1998). However, at the heart of Conway and Engle's interpretation of WMC is the stronger prediction that even in tasks that place minimal demands on memory, one may find WMC-related differences in any tasks that require attention control to resolve interference or competition. Thus, if WMC fundamentally reflects an attention control capability that is important in cases of competition and conflict, then one should find that high and low WMC individuals differ even in more "molecular" attention tasks tapping competition, even in those that make no explicit demands on memory retrieval. Below we present empirical evidence from four different paradigms that individual differences in WMC are, in fact, related to individual differences in performance of "attentional control" tasks. These tasks, to be discussed in turn, are dichotic listening, visual orienting (anti-saccade), Stroop, and continuous performance.
Working Memory Capacity Predicts Attentional Control and Capture: The Evidence Dichotic listening A dichotic-listening task requires the subject to shadow, or repeat aloud, a message presented to one ear while ignoring a message presented to the other ear. Early work using the dichotic listening paradigm revealed that subjects were very capable of successful shadowing and successful blocking. In fact, subjects are so successful at blocking the unattended message that little or no semantic content is ever reported from the irrelevant channel (Broadbent, 1958; Cherry, 1953). However, Moray (1959) found that when one's own name is presented on the unattended channel, 33% of subjects report hearing it, and so it appears that some semantic information is capable of capturing attention and therefore reaching awareness, at least for some individuals. Using more sophisticated sound technology, Wood and Cowan (1995) replicated Moray's (1959) study and found that 34.6% of subjects reported hearing
Individual Differences
355
their own name on the unattended channel. The question remained, why do some subjects recognize their name while other subjects do not? Note that by a capture/control view of dichotic listening performance, those who notice their names are those who are less successful in controlling attention by blocking task-irrelevant information. Thus, individuals with low WMC should be more likely to hear their name. In contrast, by a capacity view of dichotic listening, those who notice their names are those who have more attentional capacity to simultaneously devote to the task-relevant and task-irrelevant channels. By this view, individuals with high WMC should be more likely to hear their name. Conway, Cowan, and Bunting (2001) tested these possibilities by testing 20 high and 20 low WMC subjects in a version of Moray's dichotic-listening task, with high and low WMC reflecting the upper and lower quartiles of the distribution of operation span scores, respectively. The listening task required subjects to shadow 400 unrelated words presented to the right ear and ignore 350 unrelated words presented to the left ear. After 4 or 5 minutes of shadowing, the subject's own name was presented on the unattended channel. Words were presented simultaneously at a rate of one word per second. The attended channel was always a female voice and the unattended channel was always a male voice. 100 90 80 .J.,~
70 60
O
50 40
Ii
ff] Low span I High span
30 20
10 0 Figure I. Proportion of high and low span subjects who reported heating their own name in the unattended channel.
Conway, Cowan, and Bunting (2001) found very large WMC-related differences in name detection, such that low-WMC subjects were much more likely to hear their own name than were high-WMC subjects (see Figure 1). Although low-WMC subjects committed more overall shadowing errors (M = 30) than did high-WMC subjects (M = 10), the WMC groups did not differ in the number of shadowing errors committed on the two words presented before the presentation of the name. This suggests that the key finding of low spans disproportionately hearing their name was not simply due to attention wandering to the unattended channel at the opportune time. Finally, shadowing performance on the words following
356
Conway and Kane
presentation of the name was also examined. Presumably, hearing one's own name on the unattended channel would come with a cost and this was indeed the case. Regardless of WMC, subjects who reported hearing their name committed more shadowing errors on the two words following presentation of the name than subjects who did not report hearing their name. This cost only persisted for two words as there was no difference on the third or fourth word following presentation of the name. The results of Conway et al. (2001) provide strong support for the notion that WMC is related to attention control. Specifically, high and low WMC subjects differ in performance when blocking a particularly salient, and habitually attended to, stimulus. When attempting to ignore one auditory channel while shadowing another, individuals with lesser WMC are more susceptible to attentional capture by a powerful orienting cue than are those with greater WMC.
Visual orienting The results from dichotic listening suggest that low-WMC subjects encounter particular difficulty when a primed stimulus (hearing one's name) interferes with the task goal (ignoring the sound source of one's name). An experimental paradigm that analogously pits a task goal in conflict with a pre-potent visual response is the anti-saccade paradigm (Hallett, 1978; Hallett & Adams, 1980). The anti-saccade task requires the subject to detect an abrupt-onset visual cue in the environment and then use that cue to direct attention and the eyes away f r o m the cue in order to identify a target stimulus presented to the opposite spatial location (for a review see Everling & Fischer, 1998). Despite its simplicity, the anti-saccade task is much more demanding than a pro-saccade version of the task, in which the visual cue predictably appears in the same spatial location as the subsequent target. In this situation, attention and the eyes may be reflexively drawn to the cued location in order to identify targets. Thus, in only the anti-saccade task is there a conflict between the more automatic, habitual response (look toward the cue) and the goal or target response (look away from the cue). Kane, Bleckley, Conway, and Engle (2001) predicted that individual differences in WMC would not be related to performance on pro-saccade trials because orienting in these trials occurs reflexively. In contrast, given the conflict between goal and reflex presented by the anti-saccade task, Kane et al. predicted that high span subjects would be better able to resist attention capture here than would low span subjects. In order to test these predictions, Kane et al. assessed WMC using the operation span task and classified the upper and lower quartile of the distribution of span scores as high- and low-WMC, respectively. They then had 107 high- and 96 low-WMC subjects perform both pro-saccade and anti-saccade tasks. 2 In both tasks, subjects identified a pattern-masked target letter (either B, P, or R), presented 11.5 ~ to the left or right of fixation on a computer screen. In the prosaccade condition, a blinking visual cue was presented immediately before the target, one character space beneath its eventual location. Thus, the cue elicited a
Individual Differences
357
pre-potent orienting response that guided detection of the target. In the anti-saccade condition, the cue was presented immediately before the target but on the opposite side of the computer screen from the target. Successful identification of the target here required blocking the pre-potent orienting response and initiating an opposing eye movement in the opposite direction. In the first experiment, identification latency and accuracy were recorded as dependent measures. Kane et al. (2001) found that WMC predicted visual orienting performance only in the anti-saccade condition, where high span subjects identified the target nearly 200 ms faster than did low span subjects (see Figure 2). In contrast, mean identification times for the two groups in the pro-saccade condition were within 10 ms of one another. Although high span subjects appeared to be less susceptible to capture from the abrupt-onset cue than were low spans, an open question was whether high spans were able to inhibit the pre-potent orienting response altogether, or whether they simply recovered from erroneous saccades faster than did low span subjects. 1000
T
900 800
I
700 600
[:] Low span
500
1 High span
400 300 200 100 0 Pro-
AntiSaccade Task
Figure 2. Mean target-identification latencies for high and low span subjects for either prosaccade (Pro-) or antisaccade (Anti-). Errorbars depict standard errors of the means.
Kane et al. (2001) conducted a second experiment to answer this question, as well as to examine practice effects on anti-saccade performance, by monitoring subjects' eye movements during several blocks of anti-saccade trials. Twenty high span and 20 low span subjects were tested. As shown in Figure 3, the eyemovement analysis revealed that individuals with low WM spans were more likely to make erroneous reflexive saccades in the anti-saccade task than were individuals with high WM spans, and this WMC difference persisted across all 10 blocks of trials. Thus, low spans were more likely to look in the direction of the cue when they should not have, even after considerable practice. Not shown here is that after making reflexive errors, low spans also took longer to disengage their gaze from the
358
Conway and Kane
cue and move toward the target than did high spans (Ms = 674 and 512 ms, respectively). Thus, compared to individuals with greater WMC, those with lesser WMC not only made more saccade errors, but also, after committing an error they took much longer to correct it. As in dichotic listening, then, high and low WMC individuals demonstrated substantial performance differences in the anti-saccade task, where a powerful, reflexive response captured attention away from a relatively weak goal imposed by the experimental context. 0.5
0.4 " 9 Low span
0.3
9 High span
@ o~
0.2
@ @
0.1
I
i
I
I
I
I
I
I
I
A1 A2 A3 A4 A5 A6 A7 AS A9 A10 Anti-saccade Block
Figure 3. Mean proportionof reflexive eye movements,made in error, across 10 antisaccade trial blocks (A1 - A10) for high and low span subjects. Errorbars depict standard errors of the means. Stroop
Successful performance in tasks like dichotic listening and anti-saccade clearly requires resistance to interference through blocking or inhibition, and so WMC differences in such tasks may be taken to reflect a difference in inhibitory capability (see Engle, 1996; Hasher & Zacks, 1988). It has recently been suggested by several theorists, however, that successful inhibition or blocking may rely on the active maintenance of task-relevant information or the goal state (e.g., Cohen, Dunbar & McClelland, 1990; De Jong, Berendsen, & Cools, 1999; Roberts & Pennington, 1996). According to these theorists, inhibition is a by-product of successful goal-maintenance. Thus, whether WMC should be considered a cause or an effect of efficient inhibition is a current point of controversy (see Kane et al., 2001; May, Hasher & Kane, 1999). Kane and Engle (2001) attempted to tease apart the contributions of memory maintenance and inhibition to the performance of capture tasks by examining WMC-related differences in the Stroop (1935) task. The Stroop task requires the subject to name the colors in which words are presented. In the critical interference condition, the words themselves are color names, incongruent with the
359
Individual Differences
ink color (e.g., the word BLUE presented in red). Thus, the habitual process of identifying the written word comes into conflict with the task-goal of naming the ink color. Kane and Engle (2001) reasoned that actively maintaining the task goal in working memory would be more important to success when the list-wide proportion of congruent trials was high (e.g., RED presented in red). That is, if the majority of words in a Stroop task are presented in a color that matches their name, then subjects might periodically "forget" that they were supposed to be naming the ink color, not the word itself. Congruent trials provide no cost to-periodic neglect of the task goal. In contrast, when all the trials are incongruent, such that the ink color and the word are always inconsistent, the task-goal is reinforced on every single trial, making active goal maintenance less necessary. Note that accurate color naming on incongruent Vials requires blocking the habitual word-reading response no matter whether the prevailing context consists of many or no congruent trials. Thus, any differences in Stroop interference between such contexts, and span differences therein, may be better attributed to differences in goal maintenance rather than inhibition.
1,1 1,1
260 " 240 " 220 " 200 180 160 140 120
//--
100 .... 8O 60 40 20 0
mm
0%
[] L o w
span [] H i g h span
I
50%
75%
Proportion of C o n g r u e n t Trials
Figure 4. Response-timeinterference effects for high and low span subjects, by proportioncongruency condition. Interference effects were calculated by subtracting neutral-trial latencies from incongruent-trial latencies. Verticallines depict standarderrors of the means.
Kane and Engle (2001) therefore manipulated the proportion of congruent trials in the Stroop task and compared the performance of high and low span subjects. As with the dichotic listening and anti-saccade studies, WMC was assessed using the operation span task and the upper and lower quartiles of the span distribution were classified as high and low span respectively. Subjects were randomly assigned to one of three conditions representing different list-wide proportions of congruent trials (0%, 50%, or 75%) out of 288 total trials. In addition to congruent trials, which presented the words RED, BLUE, or GREEN in their matching color, all subjects saw 36 neutral vials (JKM, XTQZ, FPSTW, presented
360
Conway and Kane
in red, blue, or green), and the remaining trials were incongruent (presenting mismatched words and colors). The subject was instructed to name the ink color as quickly and accurately as possible, and that even if many trials were congruent, performance on the critical incongruent trials would be best if they always tried to ignore the word. Stroop interference was assessed by taking the difference between the incongruent trials and the neutral trials in both naming-time and error rate. As illustrated in Figure 4, there was no difference between high- and lowWMC subjects in interference in any condition, as measured by latencies. In contrast, Figure 5 shows that, in errors, high- and low- WMC subjects differed in interference in the 75% congruent condition only. Low-WMC subjects committed almost twice as many errors on incongruent trials as did high-WMC subjects. These results suggest that individual differences in WMC may not be evident when the Stroop context contains a large proportion of incongruent trials. In contrast, a large WMC-related difference was clearly seen when the experiment contained a large proportion of congruent trials, and this difference represented low WMC individuals 1098-
7-
"
654-
7] Low span ll High span
321 0
T
0%
50%
75%
Proportion of Congruent
Figure 5. Error interference effects for high and low span subjects, by proportion-congruency condition. Interferenceeffects were calculated by subtracting the number of neutral-trial errors from the number of incongruent-trial errors. Vertical lines depict standard errors of the means.
naming the word aloud instead of the color. That is, individuals with lesser WMC were especially likely to perform as if they periodically lost access to the goal of the task, to name the color and ignore the word. These findings, in conjunction with those from dichotic listening and anti-saccade tasks, provide a further demonstration that WMC may determine susceptibility to attentional capture of a habitual, dominant response when habit and goal are in conflict. Moreover, the results also indicate that inhibitory processes, as measured by tasks like the Stroop test, may vary in their successful application with the active memory maintenance of goals (for a formal implementation of this idea, see Cohen et al., 1990; Cohen & ServanSchreiber, 1992). Thus, individual differences in inhibitory ability may actually be due to individual differences in goal-maintenance (Kane et al., 2001).
Individual Differences
361
Continuous performance The experiments reviewed thus far support the notion that WMC is related to attentional control and capture. One limitation, however, at least from a psychometric perspective, is that all of the research has been conducted with samples of college students. An open question is whether the relation between WMC and attentional control will be evident in other populations. For instance, recent work from the developmental literature suggests that the processes underlying the performance of working memory tasks may be different for children than for adults (e.g., Towse, Hitch, & Hutton, 1998; but see Kail & Hall, 2001). Therefore, it is not clear if the relation between WMC and attentional control holds for schoolaged children. In an attempt to address this question, Conway, Bottoms, Nysse, Haegerich, and Davis (2001) examined the relation between WMC and distractibility in 7- to 8year-old children. Working memory capacity was measured using the counting span task, originally designed as a developmental assessment tool (Case, Kurland, & Goldberg, 1982). In counting span, the subject is presented with a series of displays, each containing a varying number of targets (e.g., blue circles) and a varying number of distracters (e.g., red circles). The subject's task is to count aloud the number of targets and remember the total for later recall. After a series of displays (typically between 2 and 6) the subject recalls all the totals from the current series. This is considered a test of WMC because it taps not only the storage component of working memory (i.e., remembering digits), but also the processing component (i.e., visual search and counting). In factor analytic studies with adult subjects, the counting span task loads on the same factor as other measures of WMC, including the reading span and the operation span task (Conway, Cowan, Bunting, Therriault, & Minkoff, in press; Engle et al., 1999). Conway et al. (2001) measured capture with a task from the Gordon Diagnostic System (Gordon, 1991), a tool used to diagnose ADHD in both children and adults. The Gordon distractibility task is a continuous performance test (CPT) with distracters. Digits are presented on a three-column display device, one every second. The subject's task is to press a button when a 1 is followed by a 9 in the middle column of the display and to ignore the digits presented in the left and right columns. Three primary measures are derived from the CPT task. First, the total number of correct responses is recorded, that is, the number of times the subject presses the button when in fact he or she should. Second, the number of commission errors is recorded, that is, the number of times the subject presses the button when in fact he or she should not have. Third, the latency of the button press is recorded. This is the time from the onset of the digit to the pressing of the button. Conway et al. (2001) had 66 children (M age approximately 8 years) perform both the counting span and CPT tasks. Correlations among the main measures are presented in Table 1. Consistent with evidence from the adult literature linking WMC to capture, there was a significant negative correlation between counting span (CSPAN) and commission errors (r=-.32), such that greater
3 62
Conway and Kane
WMC was associated with fewer commission errors. In contrast, there was not a significant correlation between CSPAN and correct responses (r=-.02) or between CSPAN and latency (r=.13). Particularly striking is the finding that WMC is correlated with commission errors but not with the total number of correct responses. Thus, children with lesser WMC did not fail to understand the task: they simply responded with more commission errors than children with greater WMC. Table 1. CorrelationsAmong Main Measures from the DistractibilityTask 1
1. AGE 2. CSPAN 3. 4. 5.
CORRECT COMMISSIONS LATENCY
---
2
3
4
5
.23 ....
.20 .02 ....
-.16 -.32* .26* ....
-.18 .13 -.44* .02 ---
* p < .05, two-tailed test Conway et al. also examined the types of commission errors associated with WMC. They used the coding scheme "XXX" to refer to the stimuli that were presented in the middle column of the display on three consecutive displays. For example, "358" refers to a situation where a 3 was presented in the middle column, followed by a 5, followed by an 8. Also, the letter "X" was used as a wildcard. That is, "19X" refers to a situation where a 1 was presented in the middle column, followed by a 9 in the middle column, followed by any number other than 1 or 9 in the middle column. A commission error in the "19X" condition represents a case where the child pressed the display too late. That is, they should have pressed the button when the 9 was presented but they pressed it just after the presentation of the 9. The different types of commission errors (with the error occurring on the third display) were XXX, 19X, XX9, XX1, X1X, and X9X. Note that X19 is not a type of commission error because it represents the target sequence. Correlations between CSPAN, latency to respond, and types of commission errors are presented in Table 2. Significant correlations were found between CSPAN and X1X errors (r=-.34) and X9X errors (r=-.32). CSPAN was not significantly correlated with other types of commission errors. Latency was also correlated with X1X errors (r=-.50) and X9X errors (r=.34). Given the correlations with latency, Conway et al. examined the correlations between CSPAN, X1X errors, and X9X errors, while controlling for latency. The correlations remained significant. Finally, there was a significant correlation between latency and 19X errors (r=.47). This is not surprising, as slower children would be more likely to commit a false alarm in this situation.
Individual Differences
3 63
Table 2. Correlations Among CSPAN, LATENCY, and Types of Commission Errors
1. CSPAN 2. 3. 4. 5. 6. 7. 8.
1
2
3
4
5
6
7
. . . .
.07 ---
.01 .36* ---
-.18 .10 .41" ---
-.14 .35* .33* .16 ---
-.34* .12 .01 .08 .23 ---
-.32* .10 .46* .58* .29* .07 ---
XXX 19X XX9 XX1 X1X X9X LATENCY
8
.13 .01 .47* .13 .03 -.50* .34* ---
* p < .05, two-tailed test
Table 3. (A) Average number of X1X commission errors as a function of latency and CSPAN (n = number of subjects)
Low capacity
LATENCY Slow Fast 1.60 (n = 15) 4.27 (n=22)
High capacity
1.59 (n=l 7)
CSPAN 2.00 (n=12)
(B) Average number of X9X commission errors as a function of latency and CSPAN LATENCY Slow Fast 1.47 (n =15) .45 (n=22) Low capacity CSPAN High capacity
.18 (n=17)
.25 (n=12)
The pattem of correlations between CSPAN, type of commission error, and latency is particularly intriguing. Specifically, the correlations between CSPAN and X1X errors and X9X errors are both negative, again, suggesting that children with greater WMC exhibit fewer commission errors. In contrast, the correlations between latency and X1X errors and X9X errors are mixed, with a negative correlation between latency and X1X errors but a positive correlation between latency and X9X errors. This suggests that some children with lesser WMC are more likely to commit "fast" commission errors (i.e., X1X) while others are more likely to commit "slow" commission errors (i.e., X9X). To further demonstrate this point, Conway et al. compared X1X errors and X9X errors for fast and slow responders as well as for high and low capacity children. Fast and slow responders were determined by a median split on latency and high and low capacity children
364
Conway and Kane
were determined by a median split on CSPAN. Average number of X1X and X9X commission errors per group are reported in Table 3. This analysis nicely demonstrates that, (1) there are slow low spans and there are fast low spans, and (2) they exhibit the same deficiency; namely an inability to stop a response when primed with part of the target sequence. The detailed analysis of the types of commission errors provides insight into the type of responding associated with low WMC. Specifically, children with lesser WMC were most likely to make a commission error in two situations: (1) when the middle column in the display prior to the target display contained a 1 (i.e., X1X), and (2) when the middle column in the display prior to the target display contained a 9 (i.e., X9X). Importantly, the relationship between WMC and the frequency of these types of errors cannot be accounted for by processing speed. That is, controlling for latency did not cause these correlations to be diminished. Instead, it appears that children with low WMC have trouble blocking, or inhibiting, a response when primed by part of the response requirements. For example, when the display prior to the target display contained either aspect of the target combination, 1 or 9, children with low WMC were more likely to commit an error. It appears as if having part of the response requirement primed a response that the child was unable to suppress. When none of the response requirements were present on the display prior to the target (i.e., XXX, XX9, XX1), the frequency of commission errors was not associated with WMC. Thus, low WMC is not associated with general impulsiveness but is associated with a tendency to be impulsive when primed for a response.
Summary of the evidence The empirical evidence suggests that WMC plays a role in a range of tasks traditionally thought to tap "low-level" attentional control and vulnerability to capture. Furthermore, WMC is not critical for all aspects of performance. Rather, WMC is critical in very specific situations, particularly when a pre-potent or habitual response conflicts with a task goal. To help illustrate the consistency of results across paradigms, the task goal, source of conflict, and pre-potent (or habitual) response for each paradigm is listed in Figure 6. Within each paradigm, individual differences in task performance were related to individual differences in WMC when the task goal came into conflict with the pre-potent response. For example, in the anti-saccade task the goal was to direct attention and orient in the opposite direction from the cue but the pre-potent response is to orient to the cue. In this situation, individual differences in WMC predicted task performance. In contrast, when the task goal and the pre-potent response were consistent, as in the pro-saccade condition, individual differences in WMC did not predict task performance.
36 5
Individual Differences
Experiment
Dichotic Listening
Anti-saccade
Task goal
Source of conflict
Shadow message presented to the right ear and ignore message presented to the left ear
One's own name presented to left ear
theShiftoppositeeyes & directionattenti~ofinan
! Abrupt-onset cue
Erroneous pre-potent or habitual response
Orient to name
i
Orient to the cue
abrupt-onset cue in order to detect a target
Stroop
Name the color of the word
The word name (especially difficult when there's a high % of congruent trials)
Read the word
Gordon Continuous Performance Test
Press a button when a 1 is followed by a 9
A I followed by some number other than 9 (or a 9 preceded by some number other than 1)
Press the button
Figure 6. An overview of the experimental situations in which working memory capacity predicts attentional control.
According to our theoretical perspective (see Conway, Cowan, Bunting et al., in press; Engle, Tuholski et al., 1999; Engle, Kane et al., 1999; Kane et al., 2001), working memory is a system responsible for the active maintenance of goalrelevant information in the face of concurrent processing and/or interference. WMC does not refer to a total amount of mental capacity or a speed of information processing per se. Rather, WMC refers to an ability to maintain a task goal in the face of salient interference, such as those situations outlined in Figure 6. In order to facilitate the comparison of our framework to other approaches discussed in this book, we embed our theory within Pashler's (1998) "controlled parallel" theory of selective attention. Pashler argued that the classic debate about an early or late filtering in selective attention has confounded two independent questions; (1) does the processing of multiple attended stimuli occur serially or in parallel? and (2) are "unattended" stimuli identified? Early selection models such as Broadbent (1958) epitomize the lower right quadrant of Figure 7. In contrast, late selection models (e.g., Deutsch & Deutsch, 1963; Norman, 1968) represent the upper left quadrant. Pashler's controlled parallel model represents a middle ground, in which multiple inputs can be processed in parallel and the extent to which unattended information is processed is under the control of the subject. Thus, if the subject's goal is to filter irrelevant information then the system will reveal characteristics of an early filter model. However, if the subject's goal is to monitor multiple inputs then the system will reveal characteristics of a late filter model.
366
Conway and Kane Are unattended stimuli identified? no
ves
Processing of multiple attended stimuli is
Controlled
Late Selection
parallel
Parallel
999 ~176176
serial
Early Selection
Figure 7. Pashler's controlled parallel theory of selective attention
We would add an individual differences perspective to this framework. That is, individual differences in WMC are related to attentional control, such that individuals with greater capacity have greater control. Therefore, if the task goal is to block irrelevant information, then individuals with greater WMC will process less unattended information than individuals with lesser WMC (see Figure 8). In short, if the task goal is to filter irrelevant information then individuals with greater WMC will exhibit evidence for early filter theory while individuals with lesser WMC will exhibit evidence for late filter theory. For example, the cocktail party study described above nicely illustrates this point. An open question at this point is whether individual differences in WMC correspond to attentional control in contexts in which the goal is to process/monitor multiple inputs. All of the research discussed here forced subjects to focus on a relevant goal and block a distracting/competing response. Future research should address whether individuals with greater WMC also have greater attentional flexibility. Are unattended stimuli identified? ves
//
Darallc', Processing of multiple attended stimuli is
no
I r-'l W o r k i n g
serial
999 ,o,
\\ . . . .
ry
capacity]
Early Selection
Figure 8. An individual differences interpretation of Pashler's controlled parallel theory of selective attention.
Individual Differences
3 67
Supporting evidence from cognitive neuroscience We conclude with a brief discussion of two recent studies in the field of cognitive neuroscience. Each of these studies nicely illustrates the relationship between WMC and attentional control in a way that complements our approach. Moreover, these studies begin to point to a neurological basis for the relationship between WMC and control. De Fockert, Rees, Frith, and Lavie (2001) used fMRI to examine the relation between WMC and selective attention. They had subjects classify famous names as either pop stars or politicians. The names were presented with faces that were either congruent or incongruent with the famous name (e.g., Bill Clinton's face presented with Bill Clinton's name or Ricky Martin's face presented with Bill Clinton's name). Distractor interference was assessed by subtracting reaction time in the congruent condition from that in the incongruent condition. Subjects performed the classification task along with a secondary memory task that imposed either a "low" or "high" load. In the low load condition subjects maintained four digits for later recall, but the digits were presented in the same order on every trial. In the high load condition subjects maintained four digits for later recall and digitorder changed every trial. De Fockert et al. found greater interference effects in the high load condition than in the low load condition. This is conceptually equivalent to our findings that low span subjects experience greater interference than high span subjects. They also found that greater memory load was associated with greater activity in the frontal cortex, which is consistent with previous findings suggesting that the active maintenance of information is particularly reliant on prefrontal cortex (for a review see Kane & Engle, 2001). Most relevant here is the finding that increased memory load (and a larger interference effect) was accompanied by greater activity in brain regions which have been shown to be critical for processing visual information (in particular faces) such as the fusiform gyrus and the extrastriate visual cortex. These findings suggest that when subjects experienced a greater memory load, frontal areas were less able to effect top-down control on posterior processing areas. Distractor processing was disinhibited by memory maintenance of irrelevant information. Similar results were reported in a neuropsychological investigation of auditory selective attention using ERP (Chao & Knight, 1998). Ten patients with unilateral lesions to the dorsolateral prefrontal cortex (dPFC) and ten healthy agematched controls were tested in an auditory delayed-matching-to-sample task with a 5000 ms delay. The sample and test stimuli consisted of real-world sounds such as coughing, dogs barking, piano notes, and dishwasher noise. Half the trials included several auditory distractor "tone pips" (4000 Hz) between the offset of the sample sound and the onset of the test sound. The patients and controls showed similar error rates on no-distractor trials, but the patients showed significantly more errors than the controls on distractor trials. Furthermore, the patients showed more responding in auditory cortex to the distractor tones than did the controls, and they
368
Conwayand Kane
showed less cortical responding to target stimuli than did the controls. These results suggest that the dPFC patients, similar to low span subjects, had difficulty maintaining the target representation during the delay, especially in the face of interference (for similar results from the primate literature, see Goldman-Rakic, 1987). Conclusion
To summarize, we conceive of working memory as a system responsible for the active maintenance of information in the face of concurrent processing and interference. The working memory system is limited in capacity and this capacity constrains cognitive performance in a general manner. Individual differences in working memory capacity are evident in samples of college students and these differences are related to performance of a wide variety of cognitive tasks. Most relevant to readers of this book, WMC is related to attentional control and resistance to attentional capture, such that individuals with greater WMC are less susceptible to capture. We submit that these people are less susceptible to capture because they are more capable of goal maintenance in the face of salient interference. Footnotes
Daneman and Carpenter (1980) found a correlation of .59 between reading span and VSAT. However, that correlation may have been inflated due to the small number of subjects tested (n=18) and the particular method used. Correlations between .35 and .49 are more typical (Daneman & Merikle, 1996; Engle, Tuholski, Laughlin, & Conway, 1999). 2 Task type (pro, anti) as well as task order (pro-first, anti-first) were both manipulated between groups. Only the results of the first task performed are reported here. There were 52 high spans and 45 low spans who performed pro-first and 55 high spans and 51 low spans who performed anti-first. Interested readers are referred to Kane et al. (2001) for a detailed discussion of order effects. References
Anderson, J. R. (1974). Retrieval of prepositional information from longterm memory. Cognitive Psychology, 6, 451-474. Anderson, J. R. (1983). A spreading activation theory of memory. Journal of Verbal Learning and Verbal Behavior, 22, 261-295. Anderson, J. R., & Reder, L. M. (1999). The fan effect: New results and new theories. Journal of Experimental Psychology: General, 128, 186-197. Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning & motivation: Advances in research & theory (Vol. 2, pp. 89-195). New York: Academic Press.
Individual Differences
369
Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. A. Bower (Ed.), The psychology of learning and motivation (vol. 8, pp. 47-89). New York: Academic Press. Broadbent, D. E. (1958). Perception and communication. New York: Pergamon Press. Cantor, J., & Engle, R. W. (1993). Working-memory capacity as long-term memory activation: An individual differences approach. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 19, 1101-1114. Carpenter, P. A., Just, M. A., & Shell, P. (1990). What one intelligence test measures: A theoretical account of the processing in the Raven Progressive Matrices test. Psychological Review, 97, 404-431. Case, R., Kurland, M. D., & Goldberg, J. (1982). Operational efficiency and the growth of short-term memory span. Journal of Experimental Child Psychology, 33, 386-404. Chao, L. L., & Knight, R. T. (1998). Contribution of human prefrontal cortex to delay performance. Journal of Cognitive Neuroscience, 10, 167-177. Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. Journal of the Acoustical Society of America, 25, 975-979. Cohen, J. D., Dunbar, K., & McClelland, J. L. (1990). On the control of automatic processes: A parallel distributed processing account of the Stroop effect. Psychological Review, 97, 332-361. Cohen, J. D., & Servan-Schreiber, D. (1992). Context, cortex, and dopamine: A connectionist approach to behavior and biology in schizophrenia. Psychological Review, 99, 45-77. Conway, A. R. A., Bottoms, B. L., Davis, S. L., Nysse, K. L., & Haegerich, T. M. (2001). Working memory capacity and distractibility in children. Manuscript submitted for publication. Conway, A. R. A., Cowan, N., & Bunting, M. F. (2001). The cocktail party phenomenon revisited: The importance of working memory capacity. Psychonomic
Bulletin & Review, 8, 331-335. Conway, A. R. A., Cowan, N., Bunting, M. F., Therriault, D., & Minkoff, S. (in press). A latent variable analysis of working memory capacity, short-term memory capacity, processing speed, and general fluid intelligence. Intelligence. Conway, A. R. A., & Engle, R. W. (1994). Working memory and retrieval: A resource-dependent inhibition model. Journal of Experimental Psychology: General, 123, 354-373. Conway, A. R. A., & Engle, R. W. (1996). Individual differences in working memory capacity: More evidence for a general capacity theory. Memory, 4, 577-590. Cronbach, L. J., (1957). The two disciplines of scientific psychology. American Psychologist, 12, 671-684. Crowder, R. G. (1983). The demise of short-term memory. Acta Psychologia, 50,291-323.
3 70
Conway and Kane
Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450466. Daneman, M., & Merikle, P. M. (1996). Working memory and language comprehension: A meta-analysis. Psychonomic Bulletin & Review, 3, 422-433. De Fockert, J. W., Rees, G., Frith, C. D., & Lavie, N. (2001). The role of working memory in visual selective attention. Science, 291, 1803-1806. De Jong, R. D., Berendsen, E., & Cools, R. (1999). Goal neglect and inhibitory limitations: Dissociable causes of interference effects in conflict situations. Acta Psychologica, 101,379-394. Dempster, F. N. (1991). Inhibitory processes: A neglected dimension in intelligence. Intelligence, 15, 157-173. Deutsch, J. A., & Deutsch, D. (1963). Attention: Some theoretical considerations. Psychological Review, 70, 80-90. Engle, R. W. (1996). Working memory and retrieval: An inhibition-resource approach. In J. T. E. Richardson, R. W. Engle, L. Hasher, R. H. Logie, E. R. Stoltzfus, & R. T. Zacks, Working memory and human cognition. New York: Oxford University Press. Engle, R. W., Cantor, J., & Carullo, J. J. (1992). Individual differences in working memory and comprehension: A test of four hypotheses. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 972-992. Engle, R. W., Kane, M. J., & Tuholski, S. W. (1999). Individual differences in working memory capacity and what they tell us about controlled attention, general fluid intelligence and functions of the prefrontal cortex. In A. Miyake & P. Shah (Eds.), Models of working memory: Mechanisms of active maintenance and executive control (pp. 102-134). New York: Cambridge University Press. Engle, R. W., Tuholski, S. W., Laughlin, J. E., & Conway, A. R. A. (1999). Working memory, short-term memory and general fluid intelligence: A latent variable approach. Journal of Experimental Psychology: General 128, 309-331. Everling, S., & Fischer, B. (1998). The antisaccade: A review of basic research and clinical findings. Neuropsychologia, 36, 885-899. Goldman-Rakic, P. S. (1987). Circuitry of primate prefrontal cortex and regulation of behavior by representational memory. In F. Plum (Ed.), Handbook of physiology- The nervous system (Vol 5, pp. 373-417). Bethesda, MD: American Physiological Society. HaileR, P. E. (1978). Primary and secondary saccades to goals defined by instructions. Vision Research, 18, 1279-1296. Hallett, P. E., & Adams, B. D. (1980). The predictability of saccadic latency in a novel voluntary oculomotor task. Vision research, 20, 329-339. Hasher, L., & Zacks, R. T. (1988). Working memory, comprehension, and aging: A review and a new view. In G. H. Bower (Ed.), The Psychology of Learning and Motivation, Vol. 22, New York: Academic Press. Just, M., & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99, 122-149.
Individual Differences
371
Kail, R., & Hall, L. (2001). Distinguishing short-term memory from working memory. Memory & Cognition, 29, 1-9. Kane, M. J., Bleckley, M. K., Conway, A. R. A., & Engle, R. W. (2001). A controlled-attention view of working memory capacity: Individual differences in memory span and the control of visual orienting. Journal of Experimental Psychology: General 130, 169-183. Kane, M. J., & Engle, R. W. (2000). Working memory capacity, proactive interference, and divided attention: Limits on long-term memory retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 336-358. Kane, M. J., & Engle, R. W. (200 l a). The contributions of working-memory capacity, goal neglect, and task set to Stroop interference. Manuscript submitted for publication. Kane, M. J., & Engle, R. W. (2001b). The role of prefrontal cortex in working memory capacity, executive attention, and general fluid intelligence. Manuscript submitted for publication. Kyllonen, P. C., & Christal, R. E. (1990). Reasoning ability is (little more than) working-memory capacity?! Intelligence, 14, 389-433. May, C.P., Hasher, L., & Kane, M.J. (1999). The role of interference in memory span. Memory and Cognition, 27, 759-767. Miyake, A., Friedman, N. P., Emerson, M. J., Witzki, A. H., Howerter, A., & Wagner, T. (2000). The unity and diversity of executive functions and their contributions to complex "frontal lobe" tasks: A latent variable analysis. Cognitive Psychology, 41, 49-100. Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. Quarterly Journal of Experimental Psychology, 11, 56-60. Norman, D. A. (1968). Toward a theory of memory and attention. Psychological Review, 75, 522-536. Pashler, H. (1998). The psychology of attention. Cambridge, MA: The MIT Press. Radvansky, G. A. (1999). The fan effect: A tale of two theories. Journal of Experimental Psychology: General 128, 198-206. Roberts, R. J., Jr., & Pennington, B. F. (1996). An interactive framework for examining prefrontal cognitive processes. Developmental Neuropsychology, 12, 105-126. Rosen, V. M., & Engle, R. W. (1997). The role of working memory capacity in retrieval. Journal of Experimental Psychology: General 126, 211-227. Rosen, V. M., & Engle, R. W. (1998). Working memory capacity and suppression. Journal of Memory and Language, 39, 418-436. Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643-662. Towse, J. N., Hitch, G. J., & Hutton, U. (1998). A reevaluation of working memory capacity in children. Journal of Memory and Language, 39, 195-217. Turner, M. L., & Engle, R. W. (1989). Is working memory capacity task dependent? Journal of Memory and Language, 28, 127-154.
37 2
Conwayand Kane
Wood, N., & Cowan, N. (1995). The cocktail party phenomenon revisited: How frequent are attention shifts to one's name in an irrelevant auditory channel? Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 255260. Authors' Notes
Andrew R. A. Conway, Department of Psychology, University of Illinois at Chicago; Michael J. Kane, Department of Psychology, University of North Carolina at Greensboro. We would like to recognize our collaborators on the experiments described here. They are Kate Bleckley, Bette Bottoms, Michael Bunting, Nelson Cowan, Suzanne Davis, Randy Engle, Tamara Haegerich, and Kari Nysse. Correspondence concerning this manuscript should be sent to Andrew Conway, University of Illinois at Chicago, Department of Psychology (M/C 285), 1007 West Harrison Street, Chicago, IL 60607-7137 (email:
[email protected]).
Dynmical SystemsEvolution
This Page Intentionally Left Blank
Attraction, Distraction, and Action: MultiplePerspectiveson Attentional Capture C. Folk and B. Gibson (Editors) 9 ElsevierScience B.V. All rights reserved.
15
375
A Dynamic, Evolutionary Perspective on Attention Capture 1 William A. Johnston and David L. Strayer
Science, like biological evolution, may progress primarily when relatively unproductive steady states are punctuated by abrupt changes, paradigm shifts, or revolutions (e.g., Kuhn, 1970). We suggest that the study of attention may be due for such a change. In what follows, we examine some possible limitations of the current approach and then outline a new, or supplementary, direction that the field might productively take, one that is based on concepts drawn from dynamical systems theory and both biological and cultural evolution. We conclude by suggesting that one potential advantage of the proposed new direction is that it has important implications for broader, social issues. Our primary goal in this chapter is no doubt somewhat unique. It is neither to summarize the literature bearing on a particular research or theoretical issue with respect to attention capture nor to present new empirical data. Rather it is to suggest a different way of thinking about attention capture, an alternative to the usual metatheoretical framework within which research and theory on attention capture are carried out. Because this alternative framework has yet to be put to significant empirical and theoretical practice in the study of attention, our treatment of it is of necessity highly speculative and relies to some extent on anecdotal evidence, z Indeed, it is not altogether clear what epistemological procedures would be most appropriate for pursuing and "testing" the ideas generated within this paradigm. For example, as we suggest below, naturalistic observation may be more fruitful than the standard laboratory experimentation that is characteristic of contemporary cognitive psychology. Although our focus is on attention capture, the basic ideas, suggestions, and speculations apply as well to contemporary approaches to most other areas of cognition
The Phenomena to be Explained and the Explanations Just what is attention capture? We consider here the natural-language and typical cognitive-psychological meanings of attention capture. Further below we consider our proposed alternative view.
376
Johnston and Strayer
Phenomenology of attention capture With respect to private, subjective experience, strong and weak forms of attention capture may be distinguished. A strong form of attention capture is experienced when an imperative stimulus or event suddenly interrupts some ongoing processing or task and "breaks into awareness." Candidates for such stimuli include sudden onsets such as an object that blinks or moves in a field of steady or stationary objects (e.g., Yantis, 1993), odd stimuli or singletons such as a red object in a field of blue objects (e.g, Folk, Remington, & J. Johnston, 1993; Triesman & Gelade, 1984), and, possibly, novel or unexpected objects in a familiar field (e.g.,W. Johnston, Hawley, Plewe, Elliott, & DeWitt, 1990; W. Johnston & Hawley, 1994). A weaker form of attention capture is experienced when one becomes aware spontaneously of one of many nonimperative stimuli in the environment. An example might be when one is idly gazing out a window drifting in reverie and suddenly becomes aware of some relatively inconspicuous and innocuous object such as one of many plants in a garden or persons in a crowd. The above description of attentional phenomenology, like the vast majority of research on attention, focuses on attention to inputs from the external environment. However, attention may at times be directed "inward" to memories, fantasies, thoughts, and feelings. This internal attention might also be said to be captured when certain of these mental events spring uninvited into awareness. Extreme forms of internal attention may characterize certain thought disorders. We shall return to a consideration of internal attention in a subsequent section. In the meantime, our discussion refers primarily to attention to external stimuli.
Conventional theoretical framework Processing of extemal stimuli is typically divided into preattentional and postattentional stages. Awareness or consciousness is associated with postattentional processing. Because postattentional processing (a.k.a. awareness) is assumed to be limited in capacity and incapable of the parallel processing of the usually massive preattentive inflow of stimulus information, selection of just a small subset of this information is often necessary. In order for behavior to be adaptive and goal directed, this selection must be systematic. Attention is the process by which this systematic selection is accomplished. Although theories differ in terms of the extent or depth of pre-attentional processing, most of them appeal to some sort of gate-keeper that is responsible for the systematic admission of preattentive data into consciousness. This gate-keeping mechanism has been variously dubbed, among other terms, attention director, control processor, executive, and search mechanism (e.g., LaBerge, 1975; Posner & Snyder, 1975; Shiffrin & Schneider, 1977; Wolfe, 1994).
Dynamical Systems and Attention Capture
377
Attention may be systematically directed to certain, task-relevant, target stimuli depending on the individual's current motivation and goals. On occasion, this systematic direction of attention may be interrupted by an attention-capturing event, in which case the capturing input bursts through the gate. Although this sketch of typical theories of attention is somewhat of a caricature and may not do justice to any given theory, it attempts to capture enough of the gist of these theories to expose some of their shortcomings.
Limitations to Contemporary Theories of Attention Capture We have discussed some of the shortcomings of contemporary theories of attention and attention capture elsewhere (e.g.,W. Johnston & Dark, 1986; W. Johnston & Hawley, 1994; W. Johnston, Strayer, & Vecera, 1998). Cognitive psychology purports to be a scientific discipline, one that endorses the received view that any concepts and explanations developed be subject to empirical test. It is against this very standard that many cognitive theories of attention may be considered to come up short. 3 Below we summarize the specific problems of appealing to intelligent homunculi and consciousness and the general problems of theoretical circularity and reductionism.
The problem of homunculi Many theories attempt to explain the intelligent behavior of an organism by appealing to an intelligent intemal entity, often assumed to be the seat of consciousness, such as an executive, search mechanism, or control agent. This is related to the internal I with which we humans identify ourselves (e.g., I think..., I believe .... , I am .... ). Although the idea of an internal homunculus may be consistent with our subjective experience and may be of some heuristic value, it is vacuous as an explanation. The appeal to an intelligent homunculus to explain the intelligent behavior of an organism begs the question of what underlies the intelligence of the homunculus and leads to an infinite regress. It leads also to the problem of consciousness.
The problem of consciousness Consciousness or awareness of an input is usually considered to be that which is captured when attention is captured by the input. We certainly do not deny the existence of consciousness. Indeed, it is arguable that in every waking moment each of us is confined within his/her own consciousness and only indirectly in touch with anything outside of it, such as a material world. Consciousness is useful as a natural-language concept and we shall make use of it below. However, the
37 8
Johnston and Strayer
mind-body problem has been considered by scholars for millennia and has never been resolved. All of the classical solutions to the problem continue to be tenable, including the idea that only mind or consciousness exists and the material world is an illusion. The problem is that consciousness is not amenable to scientific analysis; there is no consensus on what, if anything, it is composed, where, if anywhere, it resides, and how, or even if, it can be empirically assessed. Indeed, consciousness may be entirely epiphenomenal and not play a causal role in behavior at all (e.g., James, 1890/1950; W. Johnston & Dark, 1986). Therefore, the appeal to consciousness in descriptions and explanations of attention is to appeal to a concept that itself defies description and explanation within the usual parameters of Western science. Similar arguments can be directed at related concepts that are often appealed to in the literature on attention such as intention, volition, and deliberateness. 4
The problem of theoretical circularity Subjective experience and ordinary language concepts, such as the internal I and consciousness may be of some value as a starting point for the study of attention capture, but they cannot be relied upon to both define and explain attention capture. An explanation that incorporates the phenomena to be explained is a circular one that explains nothing at all. Attention and attention capture are not explained by claiming that the organism manages to systematically and adaptively attend to outside stimuli because intelligent attentional devices inside its head systematically and adaptively attend to the internal representations of these stimuli. Unfortunately, most contemporary theories of attention do little more than re-describe the phenomenology of attention using different words.
The problem of reductionism Parts vs. wholes. Following the lead of traditional, Western science, cognitive and neural sciences have attempted to understand the behavior of whole organisms by analyzing them into their parts. Just as physical sciences attempt to understand complex material entities in terms of their molecular, atomic, and sub-atomic compositions, cognitive and neural sciences attempt to understand attention by focusing on the intemal cognitive and neural mechanisms presumed to govem it. The problem with reductionism is that the phenomena of interest may be emergent properties of all the parts of the system (e.g., the body of an organism) acting in concert, potentially even in specific, contextually variable relationships to other, outside systems (e.g., other organisms or whole ecosystems). The phenomena may simply not exist in any components of the system or even in all the
Dynamical Systems and Attention Capture
379
components taken together. We suggest below that attention capture may be one such non-reducible, emergent phenomenon. It may be emergent in the relationship between the whole organism and the specific environmental context that it temporarily inhabits. Laboratory methodology. A further limitation to the reductionistic approach is that it is usually associated with laboratory methodology. Human subjects in most studies of attention are disconnected from the rich, dynamic natural habitats in which they normally reside and engage in complex behaviors (e.g., automobile driving, cooking dinner, conversing with friends) and are placed in drab cubicles facing contrived, discrete, displays of relatively simple and contrived stimuli to some of which they may make simple, discrete, arbitrary responses such as button presses. To disconnect an organism from its natural habitat runs the serious risk of altering and distorting the very processes under investigation. Attentional processes no doubt have emerged because they have had adaptive value in the sorts of complex, natural ecologies in which the species of interest has evolved. These processes may not be fully deployed and manifested in, and may even be corrupted by, contexts sufficiently different from these naturalistic contexts. 5
A Dynamical Systems Framework While scientific reductionism is not without merit, it is also not without serious limitations. As noted above, attention capture may be an emergent phenomenon that arises from the fluctuating interactions and relationships among all of the parts acting in concert within complex and dynamic, naturalistic contexts. The process of attention capture is likely to be altered, perhaps seriously so, when the organism is disconnected from its natural habitats and placed in contrived, artificial laboratory contexts that effectively disassemble the organism by engaging a small subset of systems (e.g., visual and brain systems) to a disproportionate degree. 6 We suggest that an understanding of attention capture can benefit from more holistic and naturalistic approaches. We now consider how dynamical systems theory might form the basis for one such approach.
Overview of dynamical systems theory We provide here just a broad, qualitative outline of dynamical systems theory. More thorough treatments are available elsewhere (e.g., Gleick, 1987; Kauffman, 1993; Lewin, 1992, Prigogine & Stengers, 1984). Webs of relationships. The systems approach focuses on relationships rather than material entities or things. Systems are viewed as dynamic webs or patterns of relationships. The nodes or intersections in any one web are themselves
380
Johnston and Strayer
viewed as lower-order webs of relationships rather than material things. The entire universe is viewed as a vast, complex, and dynamical web of relationships, within which are imbedded countless, interconnected lower-order systems or webs of relationships. The human body is an incredibly complex, hierarchical web of relationships composed of lower-order webs defining organs, tissues, cells, and so on all the way down to molecules, atoms, and subatomic systems. The higher-order patterns cannot be reduced to the lower-order patterns in part because of the loss of important emergent phenomena and in part because the interdependencies between the former and the latter are bidirectional (i.e., the "whole" both constrains and is constrained by the "parts"). The atoms, molecules, and cells composing the body are in constant turnover, but the pattern of relationships remains more or less the same. We still recognize people we haven't seen for years even though the matter of which they are composed has been replaced several times over. What defines the human body, then, is not a material, reducible entity but a nonreducible pattern of relationships. The human body is itself a node in many higher-order webs such as the immediate environmental context, family, profession, culture, and the planet as a whole. These webs are spun out across time as well as space. Thus, in order to understand human behavior and any phenomenon it manifests, such as attention capture, it is necessary to consider not only the human body and parts thereof but also the broader systems in which it participates, with which it has evolved, and to which it must adapt. It is necessary to consider the historical, evolutionary webs from which humans have emerged and to which they contribute. For example, human anatomy, physiology, and behavior reflect dynamical, pattern-making processes that have shaped the courses of biological and cultural evolution, not to mention geological and cosmological evolution. 7 Chaos and butterfly Effects. Activity anywhere and at any time in the universal web can ripple widely and potentially affect systems anywhere else and at any other future time in the web, leading to what have been termed butterfly effects (e.g., Gleick, 1987). Most natural systems are nonlinear, often exemplifying deterministic chaos, and, unlike additive, linear components to which Western science often attempts to reduce natural systems, tiny causes can have big effects and big causes can have tiny effects. Butterfly effects ripple across time as well as space. An example of butterfly effects of ancient origin on human bodies and behaviors is that the first hominids (e.g., Lucy) stood upright. This upright stance required a change in hip structure and a restriction of the birth canal of females which, in turn, required that babies be born "prematurely," that is, before their crania and brains were as fully developed as those of their primate cousins. This meant that brain organization became more flexible and responsive to the actual environmental contexts into
Dynamical Systems and Attention Capture
3 81
which the infants were bom. The self-organizing web of relationships emergent in the brain of an Australopithecus infant was capable of reflecting the fine-grained statistical regularities of the immediate environment in addition to the general trend, coarse-grained regularities made innately available by biological evolution. In addition, this upright stance freed up the hands for carrying and wielding objects and changed the articulatory apparatus in ways that made human speech possible. All of these changes may have contributed to the evolutionary increase in hominid brain size, the emergence of human speech and symbolic thought, and, eventually, cultural and technological evolution. We suggest that attention capture has been affected by and contributed to all of these evolutionary changes. In short, a thorough understanding of a system, such as an organism, must consider both the webs of relationships of which it is composed and the whole history or evolution of the broader webs in which it is imbedded. Self-organizing complexification. The basic idea is that systems tend to evolve away from simple states close to thermodynamic equilibrium toward higher levels of complexity. This evolution is self-organizing, rather than based on some sort of blueprint or plan, and often exemplifies deterministic chaos (e.g., Kauffman, 1993; Lewin, 1992). Along this self-organizing course, systems pass through various attractor phases, each of which is self-perpetuating and tends to maintain the system in a dynamic stasis until a sufficiently strong perturbation forces it to undergo a phase transition into another attractor. As the system passes through these attractors, it often increases in terms of complexity and dissipates more energy (e.g., Prigogine & Stengers, 1984). Examples of self-organizing complexification include cosmological evolution, biological evolution, cultural evolution, and individual development (both prenatal and postnatal). All of these are characterized by relatively simple, embryonic states and immensely complex later states, s Edge between order and chaos. Kauffman (1993) suggests that systems tend to self- organize toward and thrive near the edge between order and chaos, an area or zone of optimal system adaptability (see also Lewin, 1992). This abstract edge is reminiscent of the stability/plasticity dilemma and the costs and benefits of expertise (Grossberg, 1987; W. Johnston & Hawley, 1994; W. Johnston, Strayer, & Vecera, 1998). Systems with too little plasticity may be overly rigid and vulnerable to stagnation and decay, and those with too much plasticity may be overly fragile, sensitive to even the slightest perturbations, and vulnerable to dissolution or deterioration. Experts and specialists may be excessively stable; they tend to perform very well within the particular domains and contexts to which they have become precisely attuned and adapted, but they do so at some loss of flexibility or plasticity and may suffer costs of expertise should their environmental context change. Novices and generalists may be excessively labile; they are not precisely attuned to any particular niche, but they are sufficiently flexible that they can resonate with a changing, evolving web and not become marooned in an obsolete
382
Johnston and Strayer
attractor. One who is a "jack of all trades and master of none" is likely to have the advantage over a highly specialized trade master in times of social and environmental change.
Attentional dynamics Attentional processes lend themselves readily to a dynamical-systems perspective because they are inherently relational; they relate organisms to one another and to other systems and potentially keep organisms connected to, and sensitive to changes in, the ecological webs in which they participate. We suggest that attention capture is a non-reducible, whole body phenomenon and that it serves important functions related to the stability/plasticity dilemma. Whole body vs. brain. Contemporary approaches to the study of attention capture tend to be human-centered and brain-centered, reflecting the reductionistic assumption that attention is entirely a cerebral phenomenon. In contrast, we suggest that attention capture is an emergent phenomenon of whole bodies and that it is a fundamental characteristic of countless species of organisms. In many species, including humans, this whole-body response often entails a whole pattern of body changes, including head orientation, pupil dilation, muscle tensing, and postural changes that prepare the body to respond quickly and appropriately to the capturing event (e.g., Sokolov, 1963; N~i/it/~nen, 1992). Holistic, functional approach. Attention capture in humans, stink bugs, crawfish, and any number of other organisms is probably implemented by different internal, anatomical and physiological dynamics, but the primary function of attention capture is probably very much the same in all species. 9 We suggest that a holistic, functional approach can lead to an understanding of attention capture, one that applies to all species, in ways that a strictly reductionistic focus on internal mechanisms cannot. We suggest further that an important function of attention capture is to maintain viable relationships between whole organisms and their habitats. In particular, we suggest that attention capture helps to resolve the stability/plasticity dilemma. Attentional capture and the stability~plasticity dilemma. In order to survive and remain viable, organisms must familiarize themselves with their environments so that they can efficiently and adaptively navigate through them and otherwise relate to them. Organisms become attuned to their environments, capable of anticipating, efficiently processing, and responding to environmental regularities. They become biased toward the predictable features of their habitats and settle in adaptive behavioral routines or attractors. Examples of the bias toward familiar and expected stimuli are replete in the cognitive literature (see W. Johnston & Hawley, 1994). In short, relatively stable relationships develop between organisms and their ecosystems. Of course, an ecosystem is usually composed of a complex web of
Dynamical Systems and Attention Capture
383
diverse forms of life, each one developing relatively stable relationships with every other one, and the whole web tends to settle into a self-perpetuating attractor. Thus, individual organisms and whole ecosystems tend to move away from the edge of chaos toward increasing order and stability. Stability has immense benefits. Without it, organisms and ecosystems could not long survive. But there can be important costs to stability as well. No organism or ecosystem exists in isolation. Every system is at least indirectly connected to every other one. The broader webs are always dynamic and evolving. If a given system has settled for too long in a self-perpetuating attractor, it can become mired there, unable to respond to changes in the broader webs to which it is connected, and fail to undergo a phase transition into a new, more viable and adaptive attractor, l~ Excessive stability can lead to obsolete and isolated attractors in which systems begin to succumb more quickly to the second law of thermodynamics. Thus, in order to avoid excessive and maladaptive rigidity, systems must maintain a degree of plasticity. They must reside somewhere in the optimal zone between order and chaos. ~1 We suggest that an important function of attention capture is to help resolve the stability/plasticity dilemma for the organism and protect it against excessive rigidity. Change detection. Most supposed instances of strong attention capture in the literature involve some sort of change. A sudden onset is a change in a relatively static environment, an odd stimulus is a change in a relatively homogeneous environment, and an unexpected stimulus is a change in an otherwise familiar and predictable environment. Attention capture is a bias toward deviations from the ordinary, commonplace, and predictable, and, as such, it counteracts the strong bias noted above toward the predictable features of the environment. Because of attention capture, the bias of organisms toward the expected features of their environments is balanced to varying degrees by a bias toward unexpected stimuli. 12In addition to affording organisms a degree of vigilance toward potentially important (e.g., threatening) intrusions into their habitats, the bias toward change serves to guard against excessive stability and entrenchment in obsolete and maladaptive attractors. In general, attention capture helps to keep organisms flexible and dynamic, capable of resonating to and evolving with the dynamic webs in which they are imbedded.
Segue In the remainder of this chapter we trace the evolutionary history of attention capture, especially in humans. This evolutionary approach reveals possible shortcomings to the standard, reductionistic analysis of attention capture. In particular, this approach reveals important features of attention capture not often addressed in the contemporary literature, including mutual attraction of attention in
384
Johnston and Strayer
organisms, ecology of attention capture, contextual variability of attention capture, internal capture of attention, co-opting of attention capture in humans to serve cultural and institutional systems, and the relevance of all of this to broader, social issues.
Attention Capture and Biological Evolution Attention capture is adaptive and biologically primitive Clearly, humans are not the only organisms whose attention can be captured. One can readily wimess attention capture in other species, even so-called primitive ones with somewhat simpler brains. When one steps too close to a stink bug on a hiking trail, it immediately stops, collapses on its front end, and raises its rear end. When one comes too close to a crawfish in a stream, it too abruptly ceases whatever it is doing and raises up in a defensive posture with its claws extended upward and outward. Attention capture in stink bugs and crawfish is manifested in the whole body, often as an adaptive response to some environmental perturbation or change. Strong capture. Change detection is often a strong form of attention capture and probably emerged very early in the evolution of life on earth, even earlier than the arrival of stink bugs and crawfish, perhaps earlier even than vision, audition, and, certainly earlier than bipedalism and frontal lobes. ~3 The early onset of strong attention capture in biological evolution no doubt reflects its adaptive value. Organisms that failed to detect and respond appropriately to change in their habitats were less likely to survive. Indeed, complex organisms and ecosystems would probably not have evolved at all if attention capture had not emerged very early in the history of life. The original role of attention capture was no doubt to keep organisms sensitive to environmental events (e.g., a rustle in the bushes, a novel odor wafting in the breeze, or a shriek in the night) that signal potential biologically imperative intrusions into their habitats such as geological perturbations (e.g., storms, fires and floods) and predators, prey, and mates. The attentioncapturing power of abrupt, odd, and unexpected stimuli in our human subjects today probably had its origin at least as early as the Cambrian Explosion of multicellular life forms a half-billion years ago, if not in the microcosmic world of bacteria some three billion years earlier. Weak capture. Weak attention capture can also be adaptive and no doubt also emerged early on in the evolution of life, though perhaps somewhat later than strong capture. Objects that happen to draw attention even when they are nonimperative and do not represent a dramatic change in the environment can also affect survival. A possible example of this may be mate selection in primates and other organisms. ~4 The attention of an adult male passively monitoring his
Dynamical Systems and Attention Capture
385
environment might be prone to capture by a group of females more than by a grove of trees and especially by a female in the group whose morphology is indicative of pubescence, health, and child-bearing capability. Likewise, the attention of a female might be prone to capture by a male whose morphology and other characteristics suggest that he would be an excellent protector and provider of resources. Males and females whose attention is weakly captured by such signs of reproductive fitness are more likely to mate and pass these physical and attentional traits on to their offspring. Attention capture is an ecological process
One interesting aspect of the above examples of attention capture in stink bugs and crawfish and of mate selection in primates is that the capture can be reciprocal and contagious. The crawfish and the observer might capture each other's attention, and the attentional responses of the observer might capture the attention of a third party representing a different species and lead it to notice the crawfish, yielding a three-species ensemble of attention capture, a local and transitory ecological web of attentional relationships. Potential mates can capture each other's attention, and this mutual attraction of attention might capture the attention of a competitor. These examples illustrate another sense in which attention capture is not just a phenomenon of brain activity; it is a phenomenon of whole, dynamical ecosystems. We suggest that ecosystems evolve and flourish in part because of the mutual capturing of attention among many of their participant organisms. Attention capture may very well help to form and govern the complex webs of behavioral relationships among the participants that define particular ecosystems and render them viable. Thus, attention capture might be a vital feature of whole, thriving ecosystems, one that keeps them within an optimal zone between order and chaos. Attention capture is contextual
Attention capture appears to be contextually dependent. This is especially evident in naturalistic situations. What captures the attention of female lions depends on whether they are hungry or in heat. Even the attention-capturing power of sudden onsets varies with context (e.g., Folk, Remington, & J. Johnston, 1993). In hiking through a glen on a sunlit day, one may encounter many sudden onsets in the form of shafts of sunlight that break through small pores in the dense canopy of branches and leaves. At first these sudden onsets might capture one's attention, but, probably owing to their repetitiveness, their attention-capturing power soon diminishes. This diminution of attention capture by repeated events might itself be an adaptive feature of the attentional processes of organisms. Indeed, many
386
Johnston and Strayer
contextual variations in attention capture are likely to be adaptive and may have emerged very early in the course of biological evolution. Organisms are more likely to survive if what captures their attention is contextually appropriate. However, we discuss below in connection with cell phones and automobile driving an example of a context in which the attention-capturing power of sudden onsets is weakened, even though this weakening is probably maladaptive. Attention capture co-evolves
The ecology of attention capture may play an important role in the coevolution of species. This is especially evident in the evolutionary "arms races" between predators and prey such as cheetahs and gazelles and birds and moths (e.g., Dawkins, 1986). The predator whose attention is most readily captured by its main prey is most likely to survive and pass on this trait to its descendants. This is likewise true of the prey whose attention is most readily captured by its main predator. Predators and prey and other cohabitants of ecosystems may be locked into an evolutionary spiral of attention capture as well as a number of other traits and processes (e.g., morphology and sensory processes). In general, the shifting attentional relationships in multi-species ecosystems may serve the evolutionary vitality of these systems. Attention capture in human evolution
Weak and strong forms of attention capture no doubt played important roles in the early stages of human evolution. When our hominid ancestors first stood upright and ventured from the jungles onto the savannas, they must have confronted new biologically important inputs or, at least, new perspectives on these inputs. For example, they could perceive potential predators, prey, and mates at greater distances than they could in the dense jungles. Over the first several millennia of their dual residence in jungles and savannas, attention capture in these early hominids must have been shaped by biological evolution to appropriately sensitize them to these new inputs. Whatever species-specific forms of weak and strong attention capture might have evolved in early humans, contextually variable change detection very likely remained an adaptive algorithm that served them well even as they confronted new inputs in their broadened habitats. It is likely that attention capture continued to be fined-tuned in adaptive ways as human evolution passed through various attractor phases over the course of the last four million or so years, helping to give Homo Habilis the competitive edge over Australopithecus, Homo Erectus the competitive edge over Homo Habilis, and so on. However, we suggest that attention capture in humans underwent a much more dramatic phase transition, or
Dynamical Systems and Attention Capture
387
series of transitions, only very recently in our evolutionary history, and that this transition is attributable more to cultural evolution than biological evolution. Attention Capture and Cultural Evolution
In this section, we first outline cultural evolution and how it has transformed the human mind and human behavior in general and then examine how it may have transformed human attention in particular. Cultural evolution Upper-paleolithic and neolithic revolutions. As Diamond (1992, 1997) points out, for most of its history, human evolution with all of its phase transitions up to and including the first anatomically modem humans was unspectacular. Although the brains of our anatomically-modern ancestors 100,000 years ago were very similar if not identical to our brains today, their behavior barely distinguished them from the earliest, smaller-brained protohumans and their primate cousins. At least 99 % of human evolution passed before significant changes in human behavior occurred, changes that are attributable more to cultural evolution than biological evolution. We suggest that these profound effects of cultural evolution on human behavior have been mediated, in part, by various cognitive processes, including attentional capture. ~5 The first step in this profound change was the upper-Paleolithic revolution which began around 40,000 years ago and which was characterized by, among other things, 1) a rapid diversification of human artifacts, including a variety of specialized tools and weapons, body ornaments, and pottery, and 2) the emergence of language and self-reflective, symbolic thought, as evidenced in part by cave drawings. The next big step was the transition from a hunter-gatherer to an agrarian lifestyle that characterized the Neolithic revolution around 8,000 years ago in an area of the near-east (i.e., Mesopotamia) called the fertile crescent. Third nature and the institutional order. Once linguistic, symbolic thinking humans began to settle down into villages, a whole new web of relationships with powerful emergent properties began to self-organize and complexify, one that has had profound effects on human life and, indeed, the whole planet. This web may even define a qualitatively new form of nature, a "third nature." First nature is material; it burst into existence with the big bang some 12 billion years ago. Second nature is biological; it began, on earth at least, with the first bacteria some 4 billion years ago. Third nature is ideological, cultural and institutional; it began with the neolithic and paleolithic revolutions in human minds and life styles between 8 and 40 thousand years ago and self-organized into the vast
388
Johnston and Strayer
web of relationships defined, in part, by what has been called the institutional order (Tumer, 1997). The institutional order includes, among other dynamical systems, politics, law, religion, the press, industry, economy, technology, science, and education. Prior to the neolithic revolution, all of the functions of what was to become the institutional order were performed within kinship-based clans, but the transition to village living required a separation of these various functions out of the clans and into what was to become an immensely powerful and self-perpetuating institutional order. The history of third nature is especially evident in the history of Western civilization. The three natures are now co-evolving, and third nature is feeding back on first and second natures, often in ways that put the planet in jeopardy. To a nontrivial degree, third nature has literally re-sculpted first and second natures, extinguishing and displacing many populations and species of organisms and replacing much of the fractal geometry of first and second nature with the rectilinear structures of third nature. Humans, especially in industrialized nations, comprise the medium by which third nature lives and wields its power. We suggest that third nature has infiltrated the human mind and controls human behavior in ways that serve the institutional order. Human minds spawned third nature, continue to serve as its "survival machine," and now are affected and potentially victimized by third nature. Third nature may be considered an emergent property of human culture, a form of collective intelligence like that evidenced by ant and bee colonies (e.g., Franks, 1989). 16 Memetic evolution. The basic units of first nature might be atoms or subatomic particles, those of second nature genes, and those of third nature memes (e.g., Dawkins, 1976). Memes are concepts and belief systems and can be manifested in the various physical artifacts and technologies spawned by the institutional order (e.g., money, television, automobiles, and computers). Among the original memes that fueled the growth of the institutional order are the ideas of progress, control over and separation from the rest of nature, and human superiority to the rest of nature (e.g., Nisbet, 1994). The evolution of third nature is based on memetic evolution, and the human~ mind is the medium through which this evolution occurs. Third nature has not only restructured our environments, filling them with buildings, pavement, automobiles, and countless other artifacts, but may have altered our minds in equally profound ways. Transition from generalist to specialist. Prior to the birth of third nature, individual humans were generalists, a trait that may account for their success at survival during especially turbulent times on the earth (e.g., Potts, 1996). There was a gender-based division of labor between the hunters and gatherers, but virtually all clan members possessed all of the skills needed to survive. With the evolution of third nature, individual humans became increasingly specialized. We have been
Dynamical Systems and Attention Capture
389
recruited into particular niches, attractors, and areas of expertise within the institutional order. An adult Australopithecus Africanus would be at a loss if he or she were thrust suddenly in the modem, third-nature environments of Western civilization, ill equipped for any of its specializations, but most modem human adults would be similarly at a loss outside of third nature, ill-equipped with the general skills needed to long survive in the first- and second-nature ecosystems within which our distant ancestors thrived. 17 The transformation to specialists may have moved modem humans, at least those in industrialized nations, out of the optimal zone between order and chaos and rendered them more susceptible to the costs of expertise. Third nature itself is dynamic and plastic, always moving and adjusting to effects that ripple across the institutional web. Religions change, the U.S. constitution changes, governments change, national boundaries change, technology changes, and specializations (e.g., issues, theories, and methodologies in cognitive psychology) change. Yet there is a certain dynamic stasis in all of this flux. The institutional order as a whole is alive and well. As third nature evolves, the landscape of human skills, specializations, and belief systems changes, leaving some human experts to stagnate in obsolete attractors. Obsolete attractors and the people who fill them are replaced by new attractors filled by new, usually younger and more flexible, individuals. So third nature may remain in the adaptive zone between order and chaos even if the human minds on which it once depended are left marooned in obsolete attractors, too rigid to keep pace as the landscape of institutional attractors evolves. Just as the patterns of relationships defining our bodies remain alive and dynamic even as the cells composing them are constantly replaced, so the institutional order remains alive and dynamic even as the human minds on which it depends are constantly replaced. Attentional effects of cultural evolution
We suggest that attention capture in humans has been both exploited and transformed by the evolution of third nature. Third nature has co-opted attentional capture. Clearly our attention continues to be strongly captured by the same kinds of input that captured attention in our Paleolithic ancestors and the protohumans before them (e.g., sudden onsets and singletons of various sorts). However, both the sources and survival value of these capturing events have changed dramatically with the rise of third nature. In lieu of a rustle in the bushes or a screech in the night, what captures our attention today is more likely to be a wailing siren, rap music booming out of a passing automobile, strident admonishments from social agents such as parents and teachers, a letter from a journal editor, funding agency, or department head, or flashing, beeping, colorful advertisements or signs on television, magazine covers, or marquees on casinos, movie theaters, and store fronts. The material objects that
390
Johnston and Strayer
capture our attention are more likely to be constructions of third nature than second nature, such as computers, television, tabloids, fashion magazines, ice-cream wagons, cell phones, and alarm clocks in lieu of wild plants and animals. And what is served by attention capture in modem humans? It is less likely our survival in first and second natures that is served than it is our survival in third nature, and it may be less our survival that is served than it is the survival of the memes and institutions comprising third nature itself. Indeed, in some cases, the exploitation of our attentional processes by third nature does us a disservice, as is exemplified by the flow of carcinogens, ulcers, anorexia, and cardiovascular illnesses through our population, our fixation on physical attractiveness, wealth, and social status, and the steadily increasing proportion of our time that we spend interacting with and relating to the machines and other artifacts of third nature in lieu of other people and the dwindling ecosystems of second nature. Biologicallyevolved attention capture served our ancestors well for hundreds of thousands of years, just as it has served all other species for hundreds of millions of years. Now these same processes have been co-opted by a fledgling third nature for its own selfperpetuating purposes, just as have other biologically evolved processes such as mate selection and the fulfillment of basic needs. ~8 Third nature has transformed attentional capture. We suggest that the evolution of third nature has effected a shift in the orientation of attention away from primarily extemal stimuli to internal ones. We tend to be preoccupied with the memes with which third nature has infiltrated our minds. This internal, memetically-driven attention may be another vehicle by which the institutional order keeps us in its service; it tunes us away from the external inputs of first and second natures and into the intemalized memes of third nature. Because of this extemal to internal shift, our attention may be less likely to be captured by the same sorts of extemal input that captured the attention of our pre-linguistic huntergatherer ancestors and that continue to capture the attention of our close primate cousins. Now when we engage in our routine activities at home and at work, stroll through a park, or even hike through a forest, we are likely to spend more time absorbed in some form of memetically-controlled reverie than attentive to external stimuli. Of course, the degree of attention to intemal sources is likely to be contextually variable. For example, if our stroll through a city park happened to take us accidentally into a gang-infested neighborhood, our attention would very likely shift more toward extemal sources. ~9 Even the strong attention-capturing power of sudden onsets might be vitiated by intemally-directed attention, defining another example of contextuallyvariable attention capture. Some suggestive evidence for this has been generated by our own research on the use of cell phones while performing a simulated driving task (Strayer & W. Johnston, in press). When our subjects were deeply involved in cell phone conversations or difficult mental tasks, they not only performed the
Dynamical Systems and Attention Capture
3 91
driving task more poorly but were more than twice as likely to miss occasional sudden onsets of light at the center of fixation than when they were not on the cell phone. One might argue that these sudden onsets of lights did capture attention but that responses to them were suppressed. However, in more recent studies, we examined implicit, perceptual memory for words that were incidentally flashed at fixation and called for no responses. Implicit memory was reliably stronger when subjects were not engaged in a cell-phone conversation than when they were so engaged. This apparent reduction in the attention-capturing power of sudden onsets of words cannot be attributed to response suppression since no responses to them were required. We acknowledge that this evidence is only suggestive because attention to a cell-phone conversation entails a degree of external attention as well as internal attention. However, it bears some similarity to internal reverie in the sense that subjects are mentally engrossed in a meme-based dialog that bears no relationship to the primary visual-motor task and the proximal environment. Like much of our meme-based reveries, cell-phone conversations can pull our attention away from our current external-environmental context and into an internal, memetically-based context. 2~
Social Implications of Attention Capture In addition to suggesting issues and aspects of attention capture not often encountered in the contemporary literature, an advantage of the perspective on attention capture offered here is that it has implications for some of the large-scale "social" problems and issues facing humanity and the planet as a whole. Our thesis is that cultural evolution has produced a vast institutional order, a third nature, that controls our behavior in self-perpetuating ways that serve its own survival and growth. This control stems in part via memetically-driven attention capture. Our attention still could be, but rarely is, captured by the same first- and second-nature inputs to which millions of years of biological evolution has tuned us (e.g., a rustle in the bushes, a movement through the grass, or an animal cry in the night), in part because first and second natures have themselves been altered and shaped by third nature (e.g., they have been replaced by buildings, highways, domesticated plants and animals, and all of the technological artifacts with which we are constantly surrounded). Our biologically-evolved attentional processes have been co-opted and exploited by third nature such that our attention is directed both externally and intemally toward memes that encourage us to conform to and satisfy the needs and goals of third nature. Our attention and other cognitive processes are the "survival machines" of third nature, the means by which it self-perpetuates. Unfortunately, it has become painfully evident that third nature has altered the planet in ways that put it in peril. The rise of third nature, guided by the original memes of progress and human superiority to, conquering of, and control over
392
Johnston and Strayer
nature, has led to such serious problems as global warming, decimation of species, depletion of resources, degradation of whole ecosystems, overpopulation of the planet by humans, and, very possibly, an impoverishment of the modem human mind (e.g., Diamond, 1992, 1997; Safina, 1995; Smil, 1997). Because we are the instruments or survival machines of third nature, we are co-conspirators, witting or unwitting, in the degradation of the planet and, ultimately, of ourselves. It may not be too late for us to fight back and try to regain control over third nature, forcing it through phase transitions that place it more in harmony with first and second natures. To do this, we would ourselves need to undergo phase transitions in our minds and behaviors. One step in this direction is to cease relying so extensively on reductionistic methodological and theoretical paradigms in our scientific systems and begin putting ourselves and our three natures back together again by assessing the dynamical webs of relationships in which we are all embedded. As individuals, we might try resurrecting the Native American tradition of tuning into first and second natures, to people and natural ecosystems in lieu of third nature machines and memes, reconnecting to the planet and realizing that we are not superior to it but a vital and powerful facet of it. We certainly cannot exorcise our third nature and return to a hunter/gatherer lifestyle, but we may be able to push for a planet-friendly third nature and adopt more planet-friendly life styles. As investigators of attention capture and other cognitive and neural phenomena we might concentrate on how to conceptualize and study these phenomena within the three natural webs (i.e., the three natures) from which they evolved and to which they relate. Like most natural systems, the third nature system of psychology has self-organized and complexified for over a century. Since the "cognitive revolution" of around 1960, cognitive psychology has itself been carved up into an every increasing number of relatively isolated specializations, each with its own jargon and methodologies. The mind essentially been broken down into a multitude of parts. Our primary goal in this chapter has been to suggest how a phase transition to a holistic, dynamical-systems approach might serve to put the mind and body back together again with the natural webs with which they have coevolved. Because these areas of psychology are dynamical systems, they almost certainly will undergo significant transformations in the coming years. If we students of attention want to keep pace with these transformations and not suffer serious costs of expertise, then we may be well advised to prepare ourselves for them. This may entail digging ourselves out of our professional attractors, moving closer to the edge of chaos, and beginning to consider, participate in, and help shape the inevitable transformations in our discipline.
Dynamical Systems and Attention Capture
393
Footnotes
1 We are grateful to Chip Folk and Brad Gibson for encouraging us to submit this rather radical perspective on attention capture and to Elizabeth Cashdan and Jim Dannemiller for providing comments on an earlier version of this chapter. 2 An ancillary goal of this chapter is to promote interest in these potential lines of research and theory. 3 We do not necessarily subscribe to the usual positivistic approach to and interpretation of cognitive psychology, or any other discipline for that matter. Indeed, we suspect that much of what goes on in the name of the science of cognitive psychology is to some extent socially constructed. The received views of attention capture and the appropriate methodology for studying it may to some extent be artifacts of the cultural and technological contexts in which they have arisen and currently flourish (e.g., Shimp, 2001). 4 Nonetheless, many cognitive psychologists and other scholars continue not only to rely on the concept of consciousness but regard it as being open to scientific investigation. For example, we have just received a "Call for Papers" to an international conference entitled Consciousness and its Place in Nature: Toward a Science o f Consciousness (see also Dennett, 1991, and Jackendoff, 1987). 5 Admittedly, there may not be consensus on what distinguishes natural from unnatural contexts (e.g., environments and tasks). Indeed, any context that arises within the universe is, almost by definition, a natural one in the most general sense. However, we are using a more limited, natural-language meaning of naturalistic in this paper. To us, a naturalistic context for any species is one in which most members of that species are bom, reared, survive, and reproduce. For our hunter/gatherer ancestors, hunting and gathering would be naturalistic. For modem humans, driving automobiles, cooking dinner, watching movies, reading books, listening to music, and working at their jobs would be considered naturalistic. 6 The breaking of organisms down into their parts, especially brain parts, is often done literally in research on animal cognition. In human research it is done by contriving situations that call into play only one or a few internal systems. Most contemporary studies of human attention effectively isolate vision and brain activity from the multi-sensory, multi-system interactions and interdependencies that characterize most naturalistic contexts. Naturalistic tasks engage multiple sensory modalities simultaneously and engage the autonomic nervous system in addition to the central nervous system and endocrine, immune, and various other systems in addition to the nervous system. 7 One reviewer of this chapter expressed concern that to place attention capture in a complete evolutionary and cultural context might be desirable but is daunting and probably impractical. We agree that it would be impossible to trace all
394
Johnston and Strayer
the strands of the vast web in which individual humans are embedded. However, we suggest that it is time to at least consider the web and to begin to venture outside the organism and examine some of the more obvious strands to which various cognitive processes are related and with which they have co-evolved. 8 Of course, ultimately all systems begin to reverse their self-organizing, complexifying course, deteriorate, and succumb to the second law of thermodynamics. But, like the phoenix, new stars arise from the ashes of old stars, new empires arise after the fall of old ones, and new life arises from the remains of old life. 9 We wonder if contemporary theorists of human attention would apply the same ideas of attentional gates and intelligent, conscious homunculi to other organisms such as stink bugs and crawfish. ~~ examples of the costs of stability in the psychological literature are language (e.g., Werker, 1989), functional fixity, problem-solving set, and selffulfilling prophecy characteristics of social stereotypes. ~ Examples of the stability/plasticity dilemma and the edge between order and chaos in real organisms and ecosystems are abundant in ecological literature (e.g., Reice, 1994; Tilman, 1996; Vitousek, D'Antonio, Loope, & Westbrooks, 1996). For example, aquatic and forest systems that are too rarely perturbed by floods or fires tend to stagnate (e.g., Reice, 1994). ~2 We have related attention capture to the stability/plasticity dilemma elsewhere (e.g., W. Johnston & Hawley, 1994; W. Johnston, et. al, 1996). W. Johnston & Hawley (1994) offered mismatch theory as a possible description of some of the computational dynamics from which simultaneous biases toward both expected and unexpected inputs arise. This theory does not appeal to homunculi and consciousness and is a possible example of how attention capture can be an emergent phenomenon of normal perceptual dynamics. ~3 Whether or not some form of attention capture occurs in single-celled organisms and multicellular plants is moot and we take no position on this issue here. However, we do suggest that attention capture occurs in most multicellular species of animals. Therefore, in this paper, the term organism refers to multicellular animals. 14 In at least some instances, as when a female animal is in heat, potential mates might strongly capture attention. ~5 The fact that our brains may have remained virtually unchanged for at least 100,000 years while our behavior has changed dramatically argues against the current heavy reliance on reductionistic neural science to explain this behavior change. One must venture outside the brain and into cultural history to understand how the same brains could mediate such dramatic behavioral differences. 16 One reviewer expressed a reluctance to accept the idea of a third nature since second-nature human beings are involved in all of the institutions (e.g.,
DynamicalSystemsandAttention Capture
395
politics) and ideologies (e.g., fascism) that we offer as examples of third nature. We agree that humans are involved in third nature enterprises, but we suggest that there are emergent phenomena in these human-based collectives that are not found in their individual components. Atoms make up living cells, but there are important properties of cells that do not exist in atoms. Cells make up people, but there are emergent properties of people that are not manifested in their cells. People make up institutions, but there are important properties of institutions that do not exist in individual humans. ~7Of course, third nature of an embryonic sort no doubt existed even in our pre-linguistic hunter-gatherer ancestors. It was embedded in the clan culture, its rituals, mores, beliefs, and other collective attributes. Indeed, relatively simple forms of third nature, including culture and technology, can be found in other organisms, notably chimpanzees (e.g., Whiten, 2001). 18 Third nature constantly informs us of what we should wear, how we should look, and what we should consume. Our obedience to these instructions serves clothing designers and retailers, manufacturers of exercise equipment, pharmaceutical companies, food and beverage makers, and virtually the entire institutional order. The value of all of this to us, as biological organisms, is dubious. 19We are grateful to Elizabeth Cashdan of the Department of Anthropology at the University of Utah for suggesting this example of the contextual variability of internal attention. 2o We are currently planning a variant of our cell-phone research in which subjects are probed with sudden onsets both in time blocks when they are preparing for a ensuing debate (e.g., pro- vs. anti-abortion) and in time blocks when they reading a position on the issue. We expect to find that the attention-capturing power of the sudden onsets is less in the former condition than the latter. References
Dawkins, R. (1976). The selfish gene. Oxford: Oxford University Press. Dawkins, R. (1986). The blind watchmaker. New York: W. W. Norton. Dennett, D. C. (1991). Consciousness explained. Boston: Little Brown. Diamond, J. (1992). The third chimpanzee. New York: Harper-Collins. Diamond, J. (1997). Guns, germs, and steel. New York: W. W. Norton. Folk, C. L., Remington, R. W., & Johnston, J. C. (1993). Contingent attention capture: A reply to Yantis (1993). Journal of Experimental Psychology: Human Perception and Performance, 19, 682-685. Franks, N. R. (March-April, 1989). Army ants: A collective intelligence. American Scientist, 77, 134-145. Gleick, J. (1987). Chaos. New York: Viking Press.
396
Johnston and Strayer
Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive Science, 11, 23-63. Jackendoff, R. (1989). Consciousness and the computational mind. Cambridge, MA: MIT press. James, W. (1890/1950). The Principles of Psychology. New York: Dover. Johnston, W. A., & Dark, V. J. (1986). Selective attention. Annual Review of Psychology, 37, 43-75. Johnston, W. A., & Hawley, K. J. (1994). Perceptual inhibition: The key that opens closed minds. Psychonomic Bulletin & Review, 1, 56-72. Johnston, W. A., Hawley, K. J., Plewe, S. H., Elliott, J. M. G., & DeWitt, M. J. (1990). Attention capture by novel stimuli. Journal of Experimental Psychology: General, 119, 397-411. Johnston, W. A., Strayer, D. L., & Vecera, S. P. (1998). Broadmindedness and perceptual flexibility: Lessons from dynamic ecosystems. In J. S. Jordan (Ed.), Systems Theories and A Priori Aspects of Perception. Amsterdam: Elsevier. Kauffman, S. A. (1993). The Origins of Order. New York: Oxford University Press. Kuhn, T. S. (1970). The Structure of Scientific Revolutions. Second ed. Chicago: University of Chicago Press. LaBerge, D. (1975). Acquisition of automatic processing in perceptual and associative learning. In Rabbitt, P. M. A. & Dornic, S., (eds.) Attention and Performance, Vol. 5. New York: Academic Pres, 50-64. Lewin, R. (1992). Complexity: Life at the Edge of Chaos. New York: Macmillan. N/i~t/inen, R. (1992). Attention and Brain Function. Hillsdale, N. J.: Erlbaum. Nisbet, R. (1994). History of the Idea of Progress. New Brunswick, N. J.: Transaction. Posner, M. I., & Snyder, C. R. R. (1975). Facilitation and inhibition in the processing of signals. In Rabbitt, P. M. A. & Dornic, S., (eds.). Attention and Performance. Vol. 5, New York: Academic Press, 669-682. Potts, R. (1996). Humanity's Descent. New York: William Morrow. Prigogine, I., & Stengers, I. (1984). Order out of Chaos. New York: Bantum. Reice, S. R. (Sept.-Oct., 1994). Nonequilibrium determinants of biological community structure. American Scientist, 82, 424-435. Safina, C. (Nov., 1995). The world's imperiled fish. Scientific American. 46-53. Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending, and a general theory. Psychological Review, 84, 127-190.
Dynamical Systems and Attention Capture
397
Shimp, C. P. (2001). Behavior as a social construction. Behavioral Processes, 54, 11-33. Smil, V. (July, 1997). Global population and the nitrogen cycle. Scientific American. 76-81. Sokolov, E. N. (1963). Higher nervous functions: The orienting reflex. Annual Review of Physiology, 25, 545-580. Strayer, D. L., & Johnston, W. A. (in press). Driven to distraction: Dualtask studies of driving and conversing on a cellular phone. Psychological Science. Tilman, D. (1996). The benefits of natural disasters. Science, 273, 1518. Tumer, J. H. (1997). The Institutional Order. New york: Addison Wesley. Treisman, A. M., & Gelade, (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97-136. Vitousek, P. M., D'Antonio, C. M., Loope, L. L., & Westbrooks, R. (Sept.Oct.,1996). Biological invasions as global environmental change. American Scientist, 84, 468-478. Werker, T. F. (Jan.-Feb., 1989). Becoming a native listener. American Scientist, 77, 54-59. Whiten, A., & Boesch, C. (Jan. 2001). The culture of chimpanzees. Scientific American, 60-67. Wolfe, J. M. (1994). Guided Search 2.0: A revised model of visual search. Psychonomic Bulletin & Review, 1,202-238. Yantis, S. (1993). Stimulus-driven attentional capture and attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 19, 676-681.
This Page Intentionally Left Blank
399
Subject Index
abrupt onsets, 194, 107, 111,128, 134, 135, 136, 137, 138, 139, 154, 155, 158, 159, 191, 195, 196, 206, 208, 209, 210, 212, 213,220, 223,298, 300, 301,302, 303,304, 344 additional singleton paradigm, 134, 139, 155 ADHD, 361 adjacent response filter (Adjar), 8, 11, 18 anterior attentional system, 328, 331,332, 334, 343 anticipatory attending, 209, 210, 211, 212, 213, 216,220,222,223 anti-saccade task, 356, 357, 360, 364 attentional blink (AB), 53, 75, 93, 102, 103, 104, 106, 108, 116, 186, 206, 224, attentional control, 30, 51, 55, 57, 70, 72, 92, 94, 107, 108, 111, 112, 121,127, 293,295, 296, 298, 302, 305,306, 310, 311, 312, 325, 326, 335,337, 339, 340, 343,344, 349, 350, 354, 361,364, 365, 366, 367, 368 Attentional Control scale, 335,336, 343 attentional dwell time, 178 attentional focus, 196, 209, 210, 211, 212, 213, 215,220, 223,224 attentional pace, 194, 200, 201,206, 209, 210, 221 attentional pulse, 212, 213, 215,223 attentional set, 55, 70, 93, 111,124, 128, 135, 141,154, 155, 156, 167, 158, 163, 164, 165, 167, 168, 169, 170, 196, 206, 223,268, 296, 297,298 attractors, 381,382, 383,389, 392 auditory attention, 191,209, 233,237, 239, 241,243,245,247,251 auditory capture, 232, 237, 244, 246, 256 automaticity, 9, 15, 17, 21, 53, 78, 90, 153, 154, 179, 180, 185, 186, 187, 193, 195, 196, 202, 203,204, 205,208, 212, 214, 232, 306, 307, 308, 311,334, 344, 356 awareness, 52, 60, 61, 62, 65, 66, 68, 72, 143, 144, 151,152, 153, 159, 160, 161,162, 164, 165, 166, 167, 168, 169, 170, 354, 376, 377 biological evolution, 375,381,384, 386, 387, 391 butterfly effects, 380
central cues, 153, 158, 159, 299, 300, 334 classical conditioning, 179, 184, 185, 187 co-evolution, 386 cognitive load, 7, 153, 196 conscious detection, 53, 60, 61, 62, 66, 68, 71, 74, 154, 159, 162, 166 consciousness, 151, 152, 372, 378, 381,382, 383,384, 437, 440 contingent involuntary orienting (CIO), 59, 78, 88, 93, 97, 98, 100, 102, 107, 108, 116, 128, 129, 131,135, 142, 156, 157, 163,206, 298 continuous performance, 354, 361 covert orienting, 29, 45, 153,232,265 crossmodal attention, 231,232, 233,234, 235, 236, 237, 238, 239, 240, 241,243,244, 245, 246, 248, 251,252, 254, 255,256 cued search, 192, 195 cue-saccade task, 33, 34, 40 cue-target paradigm, 28, 30, 197, 198, 202, 203 cultural evolution, 375,380, 381,387, 389, 391 default setting, 90 deterministic chaos, 380, 381 dichotic listening, 327, 350, 354, 355,356, 358, 359, 360 difference signals, 51, 155 disengagement, 29, 101,129, 131,328, 334, 336, 337, 342, 343 divided attention, 53, 59, 72, 73, 312 dorsolateral prefrontal cortex (DLPFC), 304, 367 dual task, 52, 69, 72, 103, 104, 105, 106, 107, 303 dynamic attending, 191,208, 209, 211, 213, 214, 215,217, 218, 220, 222,223 dynamical systems, 224, 375,379, 388, 392 early selection, 6, 7, 9, 38 ecology of attention capture, 384, 386 entrainment, 210, 211, 212, 215, 216, 223 event-related potential (ERP), 3, 4, 5, 6, 9, 11, 12, 13, 15, 19, 20, 21, 27, 29, 244, 267 executive control, 293,295,296, 312
400 exogenous cue, 192, 195, 196, 199, 203,204, 208 expectancy profile, 211, 215, 216, 218, 219, 220,221,223 expectations, 55, 152, 154, 160, 162, 163, 165, 167, 169, 170, 296, 305 explicit attention capture, 151, 159, 169 extrastriate cortex, 6, 20, 21,243,367 extraversion, 325,326, 330, 332,333,334, 335,339, 340, 342 feature search mode, 111, 139, 140 filtering cost, 106, 113, 125, 127, 132, 142, first nature, 388 flanker effect, 309 focused attention, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 65, 66, 68, 69, 70, 71, 72, 73, 109, 116, 151, 169, 293,308, 309 frontal lobe hypothesis, 295 Gaussian noise, 273 goal maintenance, 311,359, 368 goal-directed selection, 57, 93, 96, 97, 157, 121,134, 137, 296, 297, 299, 302, 304 guided search, 29, 52, 53, 54, 70 habituation, 181, 184, 185 hemifield comparison model, 283,284, 286, 287 implicit attention capture, 151,152, 153, 159, 168, 169 implicit spatial discrimination, 235,243,245, 247,251, impulsivity, 343,346 inattentional blindness, 52, 53, 60, 62, 65, 143, 151,152, 161,162, 166, 170 individual differences, 156, 305,325,326, 329, 331,332, 334, 336, 342, 343,349, 350, 352, 353,354, 356, 360, 364, 366 inhibition, 11, 17, 20, 21, 27, 28, 29, 30, 37, 39, 40, 41, 42, 43, 127, 132, 133, 134, 137, 157, 177, 178, 179, 180, 184, 185, 186, 187, 265, 294, 311,330, 331,335,338, 342, 343,358 inhibition of return (IOR), 11, 17, 28, 29, 42, 43, 127, 137, 265 inhibitory surrounds, 119, 132 intregrated hazard function, 142 institutional order, 387, 388, 389, 390, 391, 395 internal attention, 376, 391,395
intramodal attention, 243,249, 255,256 involuntary orienting, 77, 78, 79, 84, 90, 113, 127, 156 late selection, 6, 7, 9, 365 maximum response model, 272, 274, 276, 277, 278, 279, 280, 281,282,283,284, 286, 287 memes, 388,390, 391,392 memetic evolution, 388 motion singletons, 56, 155 motivational valence, 325,326, 333 negative priming, 294 Neolithic revolution, 387 neuroimaging, 6, 27, 312, neuronal processing, 6 neuronal stimulation, 40, 41, 42, 121,286 neuroticism, 330, 332, 333 NP80 component, 5, 6 occipital, 6, 12, 14 oculomotor capture, 122, 134, 135, 137, 139, 141,144, 304 oculomotor programming, 28, 37 oculomotor-IOR paradigm, 32 orthogonal cuing, 242, 248, 255 oscillator, 210, 211,212, 218,223,224 overt orienting, 236, 244, 252, 267 P1 component, 5, 6, 12, 13, 14, 18, 19, 21,243 P300 component, 5, 13, 15, 16, 17, 18, 19, 21, parallel processing, 126, 127, 139, 376 pattern-directed attending, 198, 200, 201,202, 208,212 perceptual cycle, 152, 160, 161, 162, 166, 167, 168, 169 perceptual load, 7, 126 peripheral cues, 20, 28, 30, 35, 36, 159, 153, 154, 157, 158, 232, 234, 299, 300, 333,334 phase transitions, 381,383,386, 387, 392 pitch relationships, 208 posterior attentional system, 328 preattentive processes, 51, 52, 53, 54, 55, 57, 59, 124, 127, 131,139 prefrontal cortex, 37, 367 pre-pulse inhibition, 177, 178, 179, 181, 182, 183, 184, 185, 186, 187 priming, 34, 38, 113,237, 244, 294, 295 probabilistic association, 198, 200 probe detection task, 131
40 1 rapid serial visual presentation (RSVP), 29,44, 52,93, 102, 103, 104, 105, 106, 107, 108, 109, 1 1 1 , 112, 113, 116 reactive attending, 209,2 10, 2 I I , 2 12,220, 222,223 reductionism, 377,378, 379 reflexive attention, 7, 8. 9, 12, 13, 15, 17, 1 8, 19,20,21,22,4 I , 2 10 reflexive shifts, 166, 167,2 12, 24 I rhythm, 198,200,201,202, 206,207, 209, 210,211,213,214,215,216,217.21x.219.
220,22 I saccades, 2 8 , 3 4 , 3 5 , 3 8 , 4 0 , 4 1 , 4 2 , 4 3 , 9 0 , 122, 135, 136, 137, 283, 284,290, 301. 302, 304,305,357 saccadic reaction time (SKT), 33.34, 35, 36, 40,43,303 salience, 29, 51, 52, 53, 54, 55,56,57,58, 59, 60,61,62,63, 65, 66, 67,68,69, 70.7 I , 72. 93,97, 100, 102, 123, 124, 127, 131, 134, 139, 155, 159, 205,208, 267, 271, 273, 214, 275,277.282,284,286.287 salience map, 124 saturation, 270,271, 273,277 schemas, 160, 165, 169 second nature, 388, 390,391, 392 selective attention, 6, 63, 89,294, 349, 365, 366,367 selective looking, 152, 161, 162 sensory gating, 12 178 sequence monitoring, 194, 196, 198,200, 203, 205,206,207,208,210,2 13,221 serial search, 29, 138, 224 short-term memory, 350 signal analysis, 8 signal detection, 244,269, 270,271, 274, 287 singletons, 5 I , 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,62,63,64,65,66, 67, 68, 69, 71, 72, 73,96,97,98, 100, 101, 102, 103, 104, 105, 107, 111, 112, 113, 114, 115, 116,122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 139, 140, 141, 142, 143, 155, 156, 160, 194, 195,203,205,206. 208,209,223,267,269,271,273,274.277, 279,299,304 singleton detection mode, 96,97, 100, 102, I l l , 112, 116, 140 slowing, 20, 293, 30 I , 309 spatial attention, 6, 12, 107, 108, 113, 124, 125, 126, 127, 128, 132, 133, 135, 137, 138, 232,235,239,242, 250,25 I , 254,287,299, 310,31 I
spatial capture, I 13, I I5 spatial filtering, 310 spatial orienting, 29, 152, 232,250, 252, 333, 34 1 spatial relevance, 249, 256 startle, 177, 178, 179, 182, 183, 184, 185, 187 sticky fixation. 266 stimulus-driven selection, 5 5 , 56, 121, 122, 124, 131, 134, 140, 154, 157, 191, 195,202, 203,207,209, 210.21 I , 214, 222,232,296, 297,298,299,304,305 striate cortex, 6 strong capture, 384 Stroop, 105, 294, 308, 309, 31 I , 327. 350,354, 358,359,360 superior colliculus (SC), 30, 35, 37,38, 39, 39, 40,41,42, 231,249,251,252,304, 328 supramodal attention, 250,25 I , 256 sustainedattention, 151, 158, 159, 162, 166, 167, 168, 169 sustained inattention blindness, 162, 164 synchrony principle, 209 temperament, 326,329, 330,332, 338, 342 temporal capture, 208,2 12, 220, 223 third nature, 387,388, 389, 390, 391, 392, 392, 394,395 trait anxiety, 326,333, 336, 342, 344 transient attention, 158, 161, 167 uncued search, 193, 194, 195, 197,200,221 upper-Paleolithic revolution, 387 visual search, 5 I , 52, 54, 55.57, 58.59, 60, 62, 64,65,66,69, 70, 71, 72,93,96, 101, 107, 122, 129, 138, 139, 154, 193, 194, 197, 199, 2 10,265,266, 268,269,28 1,293,305,307, 308,309,311,361 visual transients, 8, 9, 11, 15, 18, 84, 85, 88, 89,90, 106, 152, 157, 158, 159, 160, 161, 166, 167, 169,285,286,287 voluntary attention, 6, 7, 8,9, 17, 18,71, 156, 158, 160, 195,200,331,338,344 voluntary shifts, 167 working memory capacity, 305, 344,35 I , 365, 368
This Page Intentionally Left Blank