EYE GUIDANCE IN READING AND SCENE PERCEPTION
This page intentionally left blank
Eye Guidance in Reading and Scene P...
46 downloads
915 Views
34MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
EYE GUIDANCE IN READING AND SCENE PERCEPTION
This page intentionally left blank
Eye Guidance in Reading and Scene Perception
Edited by
Geoffrey Underwood Department of Psychology University of Nottingham Nottingham NG7 2RD England
1998 ELSEVIER Amsterdam - Lausanne - New York - Oxford - Shannon - Singapore - Tokyo
ELSEVffiR SCIENCE Ltd The Boulevard, Langford Lane Kidlington, Oxford OX5 1GB
Library of Congress Cataloglng-ln-PublIcatlon Data Eye guidance In reading, driving and scene perception / edited by Geoffrey Underwood. — 1st ed. p. en. Includes Index. ISBN 0-08-043361-8 1. Eye—Movements. (Geoffrey D. M.) QP477.5.E916 1998 152.14—dc21
2. Visual perception.
I. Underwood. Geoffrey
98-7314 CIP
British Library of Cataloguing in Publication Data A catalogue record from the Library of Congress has been applied for. First edition 1998 ISBN: 008 043 3618
© 1998 Elsevier Science Ltd All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic tape, mechanical photocopying, recording or otherwise, without permission in writing from the publishers. © The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper). Printed in The Netherlands.
vPreface The distinguished contributors to this volume have been set the problem of describing how we know where to move our eyes next. Are they under the control of the visual information available in words and in scenes, or under the control of the meanings of those items, or is there little or no control at all? The origins of this volume are in discussions held at a meeting of the European Society for Cognitive Psychology (ESCOP) that was held in Wiirzburg in September 1996. Informally and formally, a number of us with interests in eye movements were arguing about the landing position effect in reading — an effect that, if substantiated, would provide evidence of the use of parafoveal information in eye guidance. The argument was not resolved in Wiirzburg, and so we decided to continue talking about eye guidance at a small meeting that was held in Chamonix in February 1997. Many of the contributors to this volume were present at that meeting, but the argument was not resolved in Chamonix either. We decided that the way forward was to extend the debate to encourage contributions from psychologists who had been unable to attend the meetings, but who were known to have views on the matter. And this volume is the product. The argument is still unresolved, but the problem is now clearer. ESCOP not only provided a forum for our initial arguments in Wiirzburg, but also generously supported the workshop in Chamonix. We are very grateful to the Society for giving us the opportunity to sit around a table talking about eye movements to fellow enthusiasts. Not all of the arguments in Chamonix centred upon questions of eye guidance of course, and academic rivalries saw expression in a number of competitions during our meetings. The most memorable must be the tall-story competition, which was won by an explanation of the events leading to one colleague arriving at the airport with minutes to spare before departure. This winning story involved a highly improbable overnight vigil and an illegal journey down a motorway. Chamonix also witnessed eye-movement researchers engaged in a glove-throwing competition (the best throw won by several hundred metres) and a self-mutilation-while-snow-boarding competition (the winning entry involved a party of French schoolchildren, blood spilt on the snow and, appropriately enough, an accident researcher). While our meetings have been fun, the most enjoyable aspect of the debate has been in helping this volume to completion. The contributors have responded to the original question with enthusiasm. They have developed a set of arguments, often supported with new data, that help us to understand how our eyes move in
characteristic ways. Each of the chapters has been reviewed by at least two other contributors to the volume, and authors required to respond to reviewer's comments. The authors and reviewers have kept to a strict schedule. For their readiness in making the volume a coherent series of discussions around a single question, I am pleased to acknowledge their contributions as authors and as reviewers. vi Geoffrey Underwood University of Nottingham
vii
Contents Preface Contributors
v ix
Chapter 1. Eye Guidance and Visual Information Processing: Reading, Visual Search, Picture Perception and Driving Geoffrey Underwood and Ralph Radach
1
Chapter 2. Definition and Computation of Oculomotor Measures in the Study of Cognitive Processes Albrecht Werner Inhoffand Ralph Radach Chapter 3. Eye Movements and Measures of Reading Time Simon P. Liversedge, Kevin B. Paterson and Martin J. Pickering
29
55
Chapter 4. Determinants of Fixation Positions in Words During Reading . . . 77 Ralph Radach and George W. McConkie Chapter 5. About Regressive Saccades in Reading and Their Relation to Word Identification Fran^oise Vitu, George W. McConkie and David Zola
101
Chapter 6. Word Skipping: Implications for Theories of Eye Movement Control in Reading Marc Brysbaert and Frangoise Vitu
125
Chapter 7. The Influence of Parafoveal Words on Foveal Inspection Time: Evidence for a Processing Trade-Off Alan Kennedy
149
Chapter 8. Parafoveal Pragmatics Wayne S. Murray Chapter 9. Foveal Processing Load and Landing Position Effects in Reading Simon P. Liversedge and Geoffrey Underwood
181
201
Chapter 10. Individual Differences in Reading and Eye Movement Control. . 223 viii Everatt, Mark F. Bradshaw and Paul B. Hibbard John Chapter 11. Eye Movement Control in Reading: An Overview and Model . . 243 Keith Rayner, Erik D. Reichle and Alexander Pollatsek Chapter 12. Eye Movements During Scene Viewing: An Overview John M. Henderson and Andrew Hollingworth
269
Chapter 13. Eye Guidance and Visual Search John M. Findlay and lain D. Gilchrist
295
Chapter 14. Prefixational Object Perception in Scenes: Objects Popping OutofSchemas Peter De Graef
313
Chapter 15. Functional Division of the Visual Field: Moving Masks and Moving Windows Paul M.J. van Diepen, Martien Wampers and Gery d'Ydewalle
337
Chapter 16. Film Perception: The Processing of Film Cuts Gery d'Ydewalle, Geert Desmet and Johan Van Rensbergen
357
Chapter 17. Visual Search of Dynamic Scenes: Event Types and the Role of Experience in Viewing Driving Situations Peter R. Chapman and Geoffrey Underwood
369
Chapter 18. How Much Do Novice Drivers See? The Effects of Demand on Visual Search Strategies in Novice and Experienced Drivers David E. Crundall, Geoffrey Underwood and Peter R. Chapman
395
Chapter 19. The Development of the Eye Movement Strategies of Learner Drivers Damion C. Dishart and Michael F. Land
419
Chapter 20. What the Driver's Eye Tells the Car's Brain Andrew Liu
431
Author Index Subject Index
453 461
ix
Contributors Mark F. Bradshaw Department of Psychology, University of Surrey, Guildford GU2 5XH, England Marc Brysbaert Department of Experimental Psychology, University of Ghent, Henri Dunantlaan 2, 9000 Ghent, Belgium Peter R. Chapman Department of Psychology, University of Nottingham, Nottingham NG7 2RD, England David E. Crundall Department of Psychology, University of Nottingham, Nottingham NG7 2RD, England Peter De Graef Laboratory of Experimental Psychology, University of Leuven, Tiensestraat 102, B-3000 Leuven, Belgium Geert Desmet Laboratory of Experimental Psychology, University of Leuven, Tiensestraat 102, B-3000 Leuven, Belgium Damion C. Dishart School of Biological Sciences, University of Sussex, Brighton BN1 9QG, England Gery d'Ydewalle Laboratory of Experimental Psychology, University of Leuven, Tiensestraat 102, B-3000 Leuven, Belgium John Everatt Department of Psychology, University of Surrey, Guildford GU2 5XH, England John M. Findlay Department of Psychology, University of Durham, Durham DH1 3LE, England Iain D. Gilchrist Department of Psychology, University of Durham, Durham DH1 3LE, England
xJohn M. Henderson Department of Psychology, Michigan State University, East Lansing, MI 48824-1117, US A Paul B. Hibbard Department of Psychology, University of Surrey, Guildford GU2 5XH, England Andrew Hollingworth Department of Psychology, Michigan State University, East Lansing, MI 48824-1117, USA Albrecht W. Inhoff Department of Psychology, State University of New York, Binghamton, NY 13902-6000, US A Alan Kennedy Department of Psychology, University of Dundee, Dundee DD1 4HN, Scotland Michael F. Land School of Biological Sciences, University of Sussex, Brighton BN1 9QG, England Andrew Liu Nissan Cambridge Basic Research, 4 Cambridge Center, Cambridge MA 02142, USA Simon P. Liversedge Department of Psychology, University of Durham, Durham DH1 3LE, England George W. McConkie Beckman Institute, 405 North Mathews Avenue, University of Illinois at Urbana-Champaign, IL 61801, USA Wayne S. Murray Department of Psychology, University of Dundee, Dundee DD1 4HN, Scotland Kevin B. Paterson Department of Psychology, University of Nottingham, Nottingham NG7 2RD, England Martin J. Pickering Department of Psychology, University of Glasgow, Glasgow G12 8QQ, Scotland Alexander Pollatsek Department of Psychology, University of Masachusetts, Amherst, MA 01003, USA
xi
Ralph Radach Institute of Psychology, Technical University of Aachan, Jaegerstrasse 17, 52064 Aachen, Germany Keith Rayner Department of Psychology, University of Masachusetts, Amherst, MA 01003, USA Erik D. Reichle Department of Psychology, University of Masachusetts, Amherst, MA 01003, USA Geoffrey Underwood Department of Psychology, University of Nottingham, Nottingham NG7 2RD, England Paul M. J. van Diepen Laboratory of Experimental Psychology, University of Leuven, Tiensestraat 102, B-3000 Leuven, Belgium Johan Van Rensbergen Laboratory of Experimental Psychology, University of Leuven, Tiensestraat 102, B-3000 Leuven, Belgium Frangoise Vitu Laboratoire de Psychologic Experimentale, Universite Rene Descartes, 28 rue Serpente, 75006 Paris, France Martien Wampers Laboratory of Experimental Psychology, University of Leuven, Tiensestraat 102, B-3000 Leuven, Belgium David Zola Department of Educational Psychology, University of Illinois at Urbana-Champaign, IL 61801, USA
This page intentionally left blank
CHAPTER 1 1
Eye Guidance and Visual Information Processing: Reading, Visual Search, Picture Perception and Driving Geoffrey Underwood University of Nottingham
and Ralph Radach Technical University of Aachen
Abstract What determines the location of the next fixation as we move our eyes around the display in front of us? This question is addressed with reference to the specific activities necessary for reading, for the inspection of two-dimensional pictures and line drawings, and when seeing movement in a three-dimensional display such as a video or in moving around a real world scene in a vehicle. An underlying question here is whether the non-fixated parts of the text or scene can be processed to the extent that the information that is extracted can be used to guide future fixations to useful parts of the display. When we make decisions to skip over part of the display, or when we selectively fixate informative parts or regions, the evidence may be interpreted as suggesting that the movements of our eyes are under cognitive control to an extent that is determined by constraints of task and stimulus configuration. Eye guidance appears as low-level as needed and as cognitive as possible for a given set of circumstances. The evidence and the interpretations of the evidence are the subject of debate, however, and this chapter provides an introduction to this argument.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
2
G. Underwood & R. Radach
Introduction Our eyes do not move randomly over a page of text, but move in synchrony with the course of the ideas represented in the sentences that we read. The progressive left-to-right movements are demanded by the way that our writing system has developed, with other orthographies requiring, and receiving, patterns of top-tobottom or right-to-left fixations. But other than moving our eyes in the same direction of the words as they might be presented to us in spoken language, what else determines where we place our eyes? We do not move simply from word to word, we do not move a fixed distance with every saccadic movement, and we do not move invariably from left to right, so what is constraining selection of the next fixation? The difficulty of processing will influence the frequency with which right-to-left regressive saccadic movements are made, and we tend not to fixate the spaces between words or the very ends of words. These are examples of high-level cognitive processes and low-level visual influences, and many of the discussions in this volume debate the conditions under which one or other influence emerges. What determines whether we make progressive vs. regressive fixations, and if we do make a progressive fixation, then which of the words to the right should we fixate? If we skip the next word, is the decision made on the basis of purely visual information — the word is very short, perhaps — or on the basis of early word identification? If skipping is guided rather than random, then here is a use for parafoveal processing. Only if we can identify some of the features of an unfixated item can the guidance mechanism move our eyes towards more informative words or important objects in a scene, and so part of our task is to establish the extent to which parafoveal vision can deliver useful information. The contrast between cognitive and visual influences is also apparent in the discussions of scene perception — do we look at meaningful parts of the scene or at areas of high visual conspicuity ? And if we look at the semantically important parts of a scene early in the sequence of fixations, the question arises of how we knew that the object was important before we pointed our eyes at it. In both scene perception and in reading the contentious candidate that may be informing us where to move our eyes next is the processing of information from parts of the display that are not being fixated. Parafoveal vision may be able to deliver sufficient information to the mechanism that decides about the location of the next fixation, and many of our discussions are concerned with the question of when we can make use of this information, and when we must rely upon low level processes and pre-programmed sequences of movements. Demonstrations of parafoveal and peripheral processing are a requirement of any account of selective fixation, and occupy a number of discussions here, but before we can come to this point, in answering our question of what controls selection of the next fixation location, we must first consider a number of choices that must be made in describing eye movements themselves.
Eye guidance
3
Measurement-related and methodological issues The question of how we decide where to look next is likely to be answered with finely controlled displays and finely executed analyses of spatial and temporal data streams. We can describe the visual processing given to a visual display with any number of measures taken from our eye movement recorders, and so the question requires that we consider with care the measures that are available. This is the starting point in our collection of discussions. There are several methodological decisions to be made in the definition and computation of basic oculomotor events: the selection of an algorithm to detect saccades, the setting of cut-off values for saccade size and fixation duration, the treatment of saccadic overshoot, blink time and saccade duration. Inhoff and Radach (Chapter 2) discuss a number of measurement-related and methodological issues that so far have received little attention in the literature, although they play a crucial role whenever oculomotor data are collected and analysed. It is evident, that these choices can alter measured oculomotor effects. For example, when a higher value for minimal fixation duration is chosen, linguistic processing is given a greater chance to influence the first fixation duration in a word. At present, no optimal solutions for these problems can be offered, but increasing awareness of their presence among researchers in the field may be a first step toward more useful discussion on the topic. Inhoff and Radach further discuss the merits and potential problems of oculomotor measures commonly used to index cognitive processes in reading, including fixation probability, single, first and second fixation duration, gaze duration, total viewing duration and "repair time". Special emphasis is given to the consideration of deviations from the assumption of a direct correspondence between current fixation position and unit of actual cognitive processing. For example, local fixation patterns indicate that word processing is often distributed over several fixations and spill-over effects suggest that there is a variable time-span between eye and mind. Such phenomena, together with accumulating evidence on inter-relations between "foveal" and "parafoveal" information are likely to play a dominant role not only in theoretical discussions but also in the emerging general debate on methodological problems of eye movement research. The computation of complex reading time measures is one of these important methodological issues, and is addressed in detail in the discussions by Liversedge, Paterson and Pickering (Chapter 3). They start by noting that the usual technique in psycholinguistic eye movement research is to design sentences that cause specific processing difficulties and to compare the induced eye movement patterns with those recorded for control sentences. There are three possible scenarios when readers encounter processing difficulties in these experiments: the eyes can stay within the difficult region until the problem is resolved, they can proceed with a progressive saccade (perhaps with some reading time increase in the next region),
4
G. Underwood & R. Radach
or, alternatively, a regressive saccade could be initiated to allow re-reading of earlier segments of text. The measures usually reported in psycholinguistic studies are first fixation duration, first pass reading time and total reading time for a given critical region. These spatially contiguous measures are likely to handle the first two of the scenarios outlined above. But they may fail to give an adequate account of the spatiotemporal dynamics when one or several regressions occur and a reanalysis is initiated. Therefore, Liversedge et al. propose to combine them with two temporally contiguous measures. Regression path reading time is defined as the sum of all fixations from the first fixation in a region up to but excluding the first fixation to the right of this region. It provides an index of the time spent detecting a problem and re-reading the text prior to fixating novel linguistic material. Subtracting from this the first pass reading time for the critical region yields the second measure, re-reading time. Liversedge et al. apply these measures to a specific experiment and show that they can indeed capture effects that may be overlooked when using only the established reading time measures. There have already been some studies discussing the merits of complex reading time measures, but this chapter is perhaps the first that does this, not in order not to advocate a certain theoretical position, but to make a general methodological point. We hope that in the near future there will be more discussion of this kind leading to the general acceptance of new useful reading time measures but preventing an inflation of measures. It is important to decide upon the most appropriate measures, but it is just as important that we come to a consensus, otherwise each study may use a different set of measures to the previous study and comparability would then be weakened.
Visual processing and eye guidance in reading The question of what determines the location of the next fixation is addressed in a number of discussions, and some of the most powerful determinants are discussed by Radach and McConkie in their report of analyses of saccade landing positions based on a corpus of German reading data (Chapter 4). They emphasise that eye movement control in reading is strictly word-based, that is, at any given point in time the eyes try to make a saccade to a specific word-target. Once a target has been selected, the determination of fixation positions can be explained by a set of simple low-level oculomotor principles. The most fundamental of these is what they call the landing position function, a linear relation between launch distance and landing position of an initial saccade into a word. They present evidence showing that this function extends to launch distances of up to -21 characters, thus virtually including all possible visually guided progressive saccades. Radach and McConkie further show that the landing position function continues smoothly when launch sites from within the same word (refixations) are added. Together with the fact that the
Eye guidance
5
"preferred viewing position" of refixations is just right of the word centre (rather than close to the word boundaries) this argues against any special status for refixation landing positions. This does not necessarily mean that a distinction between "inter-word strategies" and "within-word tactics" (O'Regan, 1990) is unjustified, but it does mean that exactly the same eye guidance principles apply in both cases. Another interesting point concerns the landing positions of inter-word regressions. When the landing position function for regressions is plotted, the part for regressive refixations behaves very much like progressive refixations. For every increment in launch distance there is a certain shift in landing position. But surprisingly, for regressive saccades coming back from the next or second next word, landing positions are always clustered at the word centre, with almost no variation due to launch distance or word length. This result implies that regressions are determined by a qualitatively different mode of control, perhaps with more cognitive mediation and certainly with more precision as compared to the control processes for inter- and intra-word progressions and refixations. Whether to move their eyes to the left or to the right is an important decision for readers. There are costs and benefits associated with each, and Vitu, McConkie and Zola investigate the conditions under which regressions occur in reading by analysing a corpus of reading data for fifth-grade children (Chapter 5). Regressions constitute an interruption in the default left-to-right scanning direction that can be caused by problems on several levels: processing difficulties on the semantic or syntactic level, difficulties in word recognition or problems on the level of low-level perceptual or oculomotor mechanisms. An example for the latter are inaccuracies in eye positioning, for example, when the eyes have "accidentally" skipped a word. Within-word regressive saccades may often result from a tendency to make an additional fixation when the initial fixation on a word is mislocated. Vitu, McConkie and Zola's analyses start with replicating earlier findings that regressive saccades are more likely to occur following longer forward saccades. They then consider the influence of characteristics related to the word passed before the regression, including word length, distance between word and current fixation position, fixation location in the word, word frequency and whether or not the word had been refixated. Most interesting are results relating to regressions to previously skipped words. When saccade length is controlled, skipped words are more often regressed to and regression frequency increases with skipped word length. Instead of being simply a function of the preceding progression amplitude, the likelihood of regressing to a skipped word is systematically related to fixation positions preceding and following skipping. The greater the chance to identify the word before or after skipping, the less likely is a regression. In addition, there is a tendency to regress to less frequent skipped words. The results are less clear for cases where no skipping occurs, which, as Vitu et al. point out, may be due to combined analyses of intra- and inter-word regressions. In summary, there is substantial support for a significant
6
G. Underwood & R. Radach
role of word processing factors in the determination of regressions and a good starting point for more specific investigations of why and how they occur. In the default case of moving left to right with a progressive movement the reader may choose to fixate the next word to the right, but on a notable number of occasions the decision will be made to not fixate the next word at all. Brysbaert and Vitu discuss the characteristics of saccade target selection and the resulting frequency of fixating vs. skipping words (Chapter 6). For a number of years, word skipping has been one of the controversial topics, because it is one of the focus points where "low level" and "processing" theories of eye guidance differ greatly in their predictions. From the viewpoint of a strong oculomotor model, there should be only marginal, if any, influences of cognitive factors on word skipping. On the contrary, a strong processing (e.g., attention-based) model would predict that word skipping is related to parafoveal word recognition and a subsequent decision to avoid fixating a word that is already identified. The most important determinants of word skipping are saccade launch distance and target word length. This is not surprising, since word skipping frequencies represent the portion of a saccade amplitude (or landing site) distribution that falls on the letters of the target word. However, these effects can not be taken to indicate that word skipping is determined entirely by low-level processes, as larger and more distant words are also more difficult to process and therefore less likely to be parafoveally identified. In order to disentangle the contributions of low-level and word processing factors on word skipping, a number of experiments with carefully controlled stimulus sentences have been carried out. Brysbaert and Vitu present an elegant meta-analysis of these experiments, including results from seven studies manipulating the easiness of parafoveal words and eight studies manipulating contextual constraints. They establish that there is a significant cognitive influence on word skipping. When assessing the relative contribution of processing load against word length, however, it turns out that only a modest portion of the variance (4% for easiness and 11% for context) can be explained by cognitive factors. If this is the situation for specifically constructed sentence material, one is tempted to speculate how often equally high variation of contextual constraint and equally strong variation of word frequency is encountered when reading 'normal' text. Brysbaert and Vitu explain word skipping by referring to an "educated guessing" process based on the perceptibility of a parafoveal target word as a function of distance. This process provides an estimation of the chance to recognise the word from the current parafoveal position. The decision to skip (or to stay: fixating word-by-word must not be the default) is made on the basis of this quick and crude estimation. In this view word skipping is executed by a low-level process but based on cognitive (i.e., high-level) knowledge. If word skipping is a product of a decision based upon parafoveal processing, then we should be able to demonstrate that words a short distance from fixation can
Eye guidance
7
be identified, at least implicitly. A number of ingenious tasks have been created in the past as attempts to demonstrate effects of parafoveal processing, and in the latest experiment in this tradition Kennedy reports a task that used specific word comparisons (Chapter 7). Two words were presented to the right of a fixated prompt word, and the experiment asked whether the two parafoveal words could mutually facilitate recognition with fixation. The reader first looked at either the word "looks" or the word "means" and was asked to decide either about the visual identity or about the synonymy of the two eccentric target words. The critical experimental manipulation involved properties of the first parafoveal target word, which was shown to influence viewing duration measures while the eyes were still on the prompt word. Interestingly, gaze duration on the foveal word was shorter when the parafoveal target word was long (nine vs. five letters) and of low word frequency. For low word frequency targets, gaze duration was shorter when type trigram frequency is high, that is, when the word shared the same initial letters with many other words. This pattern of effects was largely due to a variation in refixations on the prompt word. However, an analysis restricted to cases where the prompt received only one fixation showed similar effects on fixation duration of parafoveal word length and initial trigram informativeness. Kennedy interprets these results by suggesting local process monitoring over a region larger than the current word. Saccade triggering is assumed to be sensitive to the extraction rate for parafoveal sublexical information. In the case of "difficult" parafoveal items an early saccade can be initiated when information acquisition is insufficiently rapid. The information acquired parafoveally is subsequently traded off when the word is directly fixated. Apart from its theoretical importance, this study provokes an interesting methodological question: to what extent can the results of simple word comparison tasks be generalised to normal reading? The rationale for choosing the task was to limit oculomotor complexities while still requiring stimulus words to be lexically processed. It allowed for an orthogonal variation of word frequency, initial trigram token and type familiarity that would have been difficult to achieve in a sentence reading task. This task may indeed provide a "microscope" to better look at subtle parafoveal effects that might otherwise be obscured by other, more potent, sources of variance. Once established as a working hypothesis, results should be re-investigated in a more ecologically valid reading situation. Kennedy made an attempt in this direction by presenting results of a corpus analysis that indeed provides some support for the generality of the reported effects. Another, more reading-like comparison task was used by Murray (Chapter 8). He presented two short sentences, one below the other, that could either be identical or differ by just one word. Subjects were asked to decide whether the second sentence was identical to the first, or not. Interestingly, in this task subjects do not use a simple word by word comparison strategy, but apparently build up a representation
8
G. Underwood & R. Radach
of the first sentence which they then compare to the second. There are two very important points about Murray's study. First, he extends the range of factors that can be shown to influence first fixation durations to the level of semantic plausibility. When a noun phrase is followed by a verb, the duration of the initial fixation on the verb is modulated by the plausibility of their semantic relation. This basically means that even the highest levels of processing in text reading can have an immediate effect on the duration of the ongoing fixation. Secondly, he shows that such effects can even operate when the critical information is available only at distances usually referred to as "parafoveal". When looking at the last fixation within the noun phrase, there is a clear effect of semantic plausibility: when the following verb is less plausible, fixation duration increases substantially. This effect is related to fixation position distance: it levels off when the preceding fixation was located more then four letters left of the critical word. On the basis of his data, Murray argues that there is no base for a functional distinction between information acquired from foveal vs. parafoveal words. Taken together, the studies by Kennedy and by Murray provide intriguing insights into the mechanisms of parafoveal-to-foveal crosstalk in eye movement control and are likely to stimulate more research on this interesting subject. If parafoveal words can be identified during reading, then the question arises of whether this processing is restricted to aiding recognition processes, or whether identification, or part identification, can be used by the processes that determine the landing position upon the previously parafoveal word. The discussions by Liversedge and Underwood (Chapter 9) and by Everatt, Bradshaw and Hibbard (Chapter 10) address the controversial issue of the landing position effect. They investigate the possibility that landing positions of initial saccades into words may be codetermined by cognitive properties of these words such as lexical informativeness, morphological composition or orthographic saliency. This hypotheses was investigated in a series of experiments carried out initially by Hyona, Niemi and Underwood (1989), by Underwood, Clews and Everatt (1990), and by Everatt and Underwood (1992). Stimuli for these experiments were obtained in pilot studies where subjects were asked to guess words on the basis of their first five or last five letters. The criterion value for words with informative beginnings vs. endings was that at least 89% of the respective guesses were correct. Stimulus words were presented within short passages of text which subjects were required to read. Of particular interest are two experiments, reported by Everatt et al., in which the sentence context prior to the critical word was manipulated. In one study, the general context either primed the informative beginning/ending words or was neutral. In another study, the critical words were preceded by a word with which they were either semantically related or unrelated. In both experiments there was a reliable effect of informativeness on landing position, but semantic manipulations did not mediate the effect. The authors see this as evidence against a role of parafoveal semantic pre-processing in eye
Eye guidance
9
guidance. Since results of other studies also make it unlikely that the landing position effect can be attributed to the morphemic composition of target words, it appears that orthographic properties of the letter sequences themselves give rise to the phenomenon. When comparing the landing position effect for adult dyslexic and non-dyslexic readers, Everatt et al. found that in the dyslexic group the average effect is less pronounced and there is more variability. Explanations for these results are discussed with respect to deficits in parafoveal visual processing or in eye movement control. The determination of the landing position is certainly not under strong control from parafoveal cognitive processing. There are stronger determinants such as launch distance and word length, and the landing position effect is not robust (see, for example, Rayner and Morris, 1992). The effect is seen in some experiments but not in others, and it is important for us to discover the conditions under which it does appear. One possibility is that it appears only when the foveal load is light, and when attentional resources may be allocated to parafoveal processing. To examine this possibility Liversedge and Underwood report two experiments that were intended to test whether a manipulation of foveal processing load would interact with the landing position effect. The rationale was that if foveal processing is more difficult, there should be less opportunity for processing of letter strings in the parafovea, leading to a weakening of the landing position effect. In the first experiment, processing load was manipulated by inserting a category word before the target word that referred to an antecedent noun phrase which was either a typical or atypical instance of the category. In the second experiment, the target word was preceded by a possessive pronoun referring to an antecedent noun phrase with a stereotypical gender which was either congruous or incongruous with the gender of the pronoun. Unexpectedly, in both experiments there was no effect of initial trigram on the landing position of the first saccade into the target word. However, when in a subsequent analysis of the second experiment only items with the shortest vs. longest gaze durations in the region before the target word were considered, an interesting pattern of results emerged. When foveal load was low, saccades landed further into words containing infrequent trigrams. The saccade into a word with a distinctive beginning was more extensive if the previous region of the sentence had received a greater amount of processing. A possible explanation for the weak evidence in these studies as compared to the earlier experiments on the landing position effect is the selection of the stimulus material. In most experiments where a cognitive landing position effect occurred, the target words had either a very high guessing probability or very salient beginning letter clusters (but note that in some of these studies the effect was also not replicated). A good example is the study by Hyona (1995), who used loan words whose initial trigrams almost only occur in these words, and are certainly the most extreme variation of orthographic saliency possible in a specific language. On the
10
G. Underwood & R. Radach
contrary, in the two experiments by Liversedge and Underwood, the differences in type initial trigram frequency were rather moderate. This, in turn, appears to indicate that to obtain cognitive landing position effects requires quite severe manipulation of experimental materials and, hence, these effects may reflect an exception rather than the rule in normal reading. On the other hand, the fact that the number of studies reporting cognitive landing position effects is increasing (e.g., Beauvillain, Dore and Baudouin, 1996; Inhoff, Briihl and Schwartz, 1996; Vonk, Radach and van Rijn, 1997) suggests that the effect is real, and that it can be observed only in specific conditions. But this does not necessarily mean that existing models of eye guidance need to be fundamentally changed. For example, when the eyes are thought to be driven by attention shifts, why should it not be possible that attention is occasionally captured by unusual letter clusters, leading to an extra fixation close to a word beginning? More generally, it may be that the cognitive landing position effect operates on the selection component rather than the amplitude computation component of the control system. Looking at the experiments on this topic presented in the current volume, there is one very interesting detail: when describing the results of earlier studies, Everatt et al. note that there was an effect of word beginning informativeness on the initial landing position, but that this was not caused by a difference in saccade length. If landing positions are different by more then one character and saccade length are identical the question arises: why were launch positions of the initial saccades different by more then one character? Since the configuration of preceding words in the experimental sentences was identical, something must have happened while the eyes were still on the prior word. The obvious candidate for this is that in some cases the salient vs. less salient parafoveal word beginning caused an additional fixation (refixation) in word n-1. This effect may be quite similar to what is called "refixations on the prompt" in Kennedy's chapter, providing a further example of parafoveal-to foveal crosstalk. Evidence for such an interpretation can also be derived from Hyona (1995) who reported that unusual letter clusters "attracted" fixations particularly to the space prior to the word. An alternative account for cognitive landing position effects would be to claim that they operate in a more graded fashion on the computation of saccade amplitude. (See Radach and McConkie's discussion in Chapter 4 for a more detailed description of discrete vs. graded effects). This is not in harmony with most current eye guidance models, but it appears to be not completely implausible, given the fact that some amplitude modulation is possible during late stages of saccade preparation, even when the decision to execute a saccade has already been taken (Becker, 1989). The overview of eye guidance during reading by Rayner, Reichle and Pollatsek (Chapter 11) starts with a discussion of evidence on the spatial and temporal components of eye guidance in reading. One interesting point they discuss is the controversy around the notion of an "optimal viewing position". They agree with
Eye guidance
11
the idea that saccades are in most cases intended to go to the centre of words, but raise doubts about whether the actual landing position has serious consequences for word recognition and eye movement control. In general, Rayner et al. see "where decisions" as co-determined by low level and cognitive mechanisms and "when decisions" as primarily determined by linguistic processing, though with some low-level modulation. After briefly reviewing recent modelling attempts by other authors, Rayner et al. present their new E-Z Reader model. What makes this model distinct from its predecessors is that it has been mathematically implemented and can be used as a simulation environment. From a theoretical perspective, the most important new feature is a decoupling of attention and saccade programming. There are now two distinct stages in the process of word recognition: a familiarity check, initiating the programming of saccades and the completion of lexical access, causing attention to shift to the next word. The speed of these processes is thought to be a function of word frequency, with modulation from both retinal eccentricity and contextual predictability of the target word. Saccade programming in the new model is divided into an early, labile stage, where saccade cancellation is possible (leading to word skipping) and a later, nonlabile stage that does not allow for cancellation (accounting for very short fixation durations between two successive saccades). This construction enables E-Z Reader to overcome one major limitation of the "classic" Morrison Model (Morrison, 1984; Rayner and Pollatsek, 1989) the handling of refixations. In the earlier model, there was no specific mechanism for dealing with multiple fixations on a word. The notion of a saccade programming deadline as proposed by Henderson and Ferreira (1990; see also the discussion by Henderson and Hollingworth in Chapter 12) turned out to be not in harmony with some aspects of the available data including fixation duration patterns (first vs. second of two) and fixation duration distribution data (e.g. Kennison and Clifton, 1995). In the new model, the labile saccade program will initiate refixations on the word as long as the lexical access stage in the word recognition module is not completed, and then provide a signal to go to the next word. As Rayner et al. point out, this mechanism is somewhat similar to what O'Regan (1990, 1992) has suggested to be the base for refixations. The chapter ends with a number of simulations run on a corpus of reading data which show that E-Z Reader is quite successful in simulating fixation duration, fixation probability and frequency of refixations while capturing specific effects related to word frequency, processing spillover and parafoveal preview benefit. The E-Z Reader model as described by Rayner et al. sets a new standard against which alternative ideas will have to compete. Its appeal is not only related to the fact that it can handle a large range of oculomotor phenomena with a limited set of provisions (although in the recent version the number of provisions has increased). On a more general level the popularity of attention-based eye guidance models may be related to the fact that they are situated in the "middle ground" between more extreme looking proposals at the oculomotor or cognitive end of the theoretical
12
G. Underwood & R. Radach
spectrum. In the following section we will briefly look at some theoretical issues that have received prominent attention in the reading-related chapters of this volume. Since virtually all authors present their own ideas in relation to the "standard model", this discussion can be seen not only as a summary of more or less common lines of reasoning, but to a certain extent also as a collection of alternative views and potential challenges to the Rayner et al. model. Is word recognition the engine that drives every saccade in reading? The findings by Radach and McConkie and by Vitu, McConkie and Zola that inter-word regression landing sites cluster at the word centre indicate that regressive saccades can be much more accurate than they usually are during progressive, left-to-right reading. What makes regressions distinct from other types of reading saccades is that in most cases the saccade is initiated several hundred milliseconds after the target word first appeared in parafoveal vision. At this point in time there is much more information available about the word (i.e., with respect to its spatial co-ordinates within the line of text) as compared to normal progressions, and, hence, the usual saccadic range error can be avoided. When a normal progressive saccade is prepared, the necessary parafoveal low-level (low spatial frequency) information can be obtained already during the first 50 ms of the current fixation (Pollatsek and Rayner, 1982). At the other end of the fixation, minimal saccade reprogramming time can be estimated to be in the order of 80 ms (Becker, 1989). Within these constraints, the time and resources effectively available during a normal fixation to determine the next saccade amplitude seem to be insufficient to achieve optimal accuracy within the target word. It is an open question whether similar considerations can also be applied to saccade target selection. Brysbaert and Vitu consider the timing of events during a fixation for the case of word skipping. In a simple additive model, they add 60 ms transfer time from the eyes to the brain, 100 ms saccade programming time and 90 ms extra processing time increase due to a (minimal) eccentricity of three letters, arriving at a sum of 250 ms. This estimate leaves virtually no time for operations like the completion of lexical access on the current word, the execution of a familiarity check on the next word, and, on the basis of the result, the cancellation of the labile saccade program to this word, which are all necessary to skip a word in the Rayner et al. model. This conclusion would change if some of the operations were executed in parallel, of course. On the other hand, as Brysbaert and Vitu's meta-analysis shows, lexical and contextual information does mediate at least some saccade target selection decisions. Also, in comparison to the classic attention-based model (e.g., Rayner and Pollatsek, 1989), the timing concerns raised above apply to a somewhat lesser extent to the E-Z reader model, because in the new familiarity check stage less cognitive processing is demanded to trigger saccade programming.
Eye guidance
\3
It is interesting to observe that while Rayner et al. take a more moderate position on the role of cognition in determining saccades, researchers who emphasise low-level factors try to make their scanning routines smarter. For example, the alternative mechanism for word skipping proposed by Brysbaert and Vitu involves an estimation of the chances to recognise the next word within the current fixation. The decision is not based on immediate lexical processing, but guided by long-term knowledge about parafoveal word recognition that has been acquired individually by a certain reader. Another use of cognition to aid a low-level mechanism has been proposed by Radach (1996), who suggested that readers may obtain a limited repertoire of parafoveal word length configurations that statistically correspond to psycholinguistic units and may be used to mediate eye placement decisions. Does attention move in discrete, sequential steps from word to word? One of the cornerstones of current thinking about eye movement control in reading is the notion of the perceptual span, denoting the region from which useful information is acquired during a certain fixation. More specifically, Rayner and Pollatsek (1987), define a letter identification span encompassing a region 3-4 letters to the left to 6-10 letters to the right of the current fixation position (sometimes more) and a word identification span including the current word, and possibly the next or the next two words. The left boundary of the perceptual span is thought to be tied to the left boundary of the current word. All variants of attention-based eye movement control models claim that within the perceptual span attention moves in discrete steps from word to word in a left-to-right direction. This has been called the sequential attention hypothesis (e.g., Henderson and Ferreira, 1990). Although in the new E-Z reader model attention and saccade programming are decoupled, the serial attention shift claim is maintained, as the familiarity check on the next word is only initiated after lexical access on the current word has been completed. Although Rayner et al. provide convincing evidence to support this claim, there is also evidence accumulating that is not in harmony with their model. In the following we will discuss some problems that have been raised in the reading-related chapters of the this volume. The first problem is related to the possibility that the region from which letterlevel information is extracted can be extended over the left word boundary. Vitu, McConkie and Zola report that after a saccade, when the eyes land at positions just following a skipped word, the probability of making a regression back to this word is very low. Obviously, when input from the skipped word is needed and this word is not far away from the current fixation position, it is often preferred to get this input via a parafoveal "back-view" rather than by programming and executing a regression. Converging evidence comes from Inhoff and Radach's discussion which also suggests that readers may sometimes obtain information (including lexical inform-
14
G. Underwood & R. Radach
ation) from locations left of the current word beginning. In principle, the current attention-based models could accommodate these findings by postulating something like an "attentional regression" (see Duffy, 1992, for a discussion of "mental regressions") but this would lead to complications. For example, shifting attention back to a previous word would mean interrupting or delaying processing of the currently fixated word and should lead to substantial costs in processing time. However, increased fixation durations after word skipping have so far never been reported. This leads to a more general problem, which concerns the possibility that linguistic information from neighbouring words is processed in parallel with information from the currently fixated word. As Kennedy points out in Chapter 7, if attention is allocated sequentially, properties of a parafoveal word will not influence foveal processing time, because an attention shift only occurs when foveal processing is complete. Contrary to this view, he demonstrates that sublexical properties of a parafoveal word can influence fixation durations and refixation rates on the foveal word. Similar effects are shown by Murray on the semantic level. At present there is only a small number of studies that directly contradict the sequential attention hypothesis, but the number of studies directly supporting the hypothesis is also limited. If more evidence on parafovea-to-fovea crosstalk accumulates in future research this would seriously challenge the sequential attention hypothesis, and, more generally, any qualitative difference in information extraction from within and beyond a word boundary. An alternative model that could account for parafoveato-fovea crosstalk could include an attention gradient, where letter information is acquired in parallel from all letters within the letter identification span with the rate of extraction being a function of letter eccentricity. This possibility has been discussed, among others, by Inhoff, Pollatsek, Posner and Rayner (1989) and by Kennison and Clifton (1995). Within such an alternative framework, saccade processing and the extraction of text information would be less tightly coupled as in the current attention-based processing models. Eye guidance in picture viewing and visual search The development of attention-based processing models has been a major step forward towards an understanding of eye movement control in reading and has also influenced theoretical discussions in neighbouring domains such as scene perception. We are not yet in a position to describe such a complete model of fixation patterns on scenes, partly because we have not yet found an adequate way of describing the pictorial information, but the remaining discussions in this volume go some way to describing what it is that such a model will have to explain. What controls the locations of our fixations when viewing scenes? Henderson and Hollingworth ask whether we select components of a scene on the basis of
Eye guidance
15
purely visual characteristics, or whether eye guidance can be influenced by the semantic characteristics of an object (Chapter 12). This is one of the recurring questions asked in the discussions of reading elsewhere in this volume, of course. As Henderson and Hollingworth point out, most of our knowledge of eye-movement control has been derived from studies of reading lines of text, but a complete theory must take into account the processes that are observed in scene perception. We would go further than this, and argue that the complete theory will also describe the placement of fixations when the observer is watching dynamic scenes such as in cinema or video film, and when the observer is moving within the environment that is being inspected. Only then will the theory be ecologically valid, and there are discussions of these extensions to the scope of our question in Chapters 16 by d'Ydewalle, Desmet and Van Rensbergen, Chapter 17 by Chapman and Underwood, Chapter 18 by Crundall, Underwood and Chapman, Chapter 19 by Dishart and Land, and Chapter 20 by Liu. The relatively sparse literature suggests that informative areas receive more fixations than others. This result is consistent with the idea that we allocate visual attention, through a greater number of fixations and through longer fixations, when we encounter material that requires cognitive processing. The principle holds for reading and also for scene inspection. But the interesting question is how we know that some areas of a scene are informative and that others are not worth a fixation? The early literature suggests that uninformative areas are often not fixated at all, suggesting that the guidance decision is influenced by the processing of extrafoveal information1. The debate about the significance of word skipping that is addressed specifically in the discussions by Brysbaert and Vitu and by Vitu, McConkie and Zola, re-emerges here as a question about object-skipping. A secondary similarity appears in Henderson and Hollingworth's distinction between visual and semantic informativeness. Do observers avoid fixating certain areas of a picture because they have detected few discontinuities of texture, for example, or because the object in an area is uninformative? If we are guided by object informativeness then we can conclude that fixation patterns are determined by cognitive operations as well as more restricted perceptual processes. This question has been approached by using scenes in which objects are congruous, and therefore have lower semantic informativeness, or incongruous. For example, a microscope on the counter in a hotel bar would be more informative than a tumbler with a straw in the same location. When first shown a picture, do observers fixate an informative object earlier than they do a more predictable object? There are inconsistencies in the literature 1 The distinction between parafoveal and extrafoveal is generally made in the discussions of scene perception and text inspection here because the non-fixated objects in scene perception studies are usually located beyond the parafovea. The "extrafovea" does, of course, include the parafovea.
16
G. Underwood & R. Radach
reviewed by Henderson and Hollingworth, but the balance of replies to the question is against the idea that the semantic properties of objects are able to attract early fixations. What then does influence fixation locations during scene inspection? The factors that are considered to be potent here are image size, viewing task, display duration, and image content and type. Henderson and Hollingworth incorporate these influences in their view of eye-movement control during scene inspection. The "saliency map framework" suggests that attention is always allocated to the region with the greatest saliency value and these regions are in turn fixated. The framework predicts which factors will be influential during the early stages of inspection, and describes how saliency will change as information about the scene is collected, such that the early stages of inspection will be driven by predominantly visual features, and later stages by predominantly semantic features. We are not only making progress in the investigation of scene inspection, but we are also seeing the development of theoretical frameworks that will help focus future investigations. Findlay and Gilchrist also address the question of the relationship between the direction of our eyes and the direction of attention, asking whether we can redirect attention without moving our eyes (Chapter 13). There is well established evidence that this dissociation is possible, but these demonstrations may be possible only in highly constrained and highly artificial laboratory situations. Distinctions between serial and parallel processes in visual search have relied upon the assumption that observers can search an array of items with rapid attentional shifts that are covert in that they are made in the absence of eye movements. Findlay and Gilchrist challenge this assumption, and explore the characteristics of the overt and covert attentional processes. There is also a methodological consideration here, of course, because if attention can disconnect from the point of regard from which we inspect scenes, then an analysis of fixations will give us a misleading account of search patterns. By inspecting the first saccade in a visual search task, Findlay and Gilchrist focused on the processes that result in selection of the location of the first fixation in the display. Is there evidence of a covert movement of attention to the target prior to the eye movement? There was no difference in the saccadic latency whether the target stimulus was defined on the basis of a simple feature (object shape) or on the basis of a conjunction of features (shape and colour together). Attention is thought to be necessary for the identification of targets composed of conjunctions of features, but not for simple targets, and so the absence of difference in saccadic latency suggests that the attentional processes necessary for the conjunction search are not operating prior to the first saccadic movement. However, with one of their tasks Findlay and Gilchrist were able to show a use for covert attentional shifts followed by delayed saccades, but this was with training over the course of several hours. In general, they conclude that we do not move attention and eyes at different times, in contrast with the assumptions of the Henderson and Hollingworth account of scene perception in which attention is released once the object being fixated is processed. At this point
Eye guidance
17
attention is attracted by the area with the greatest saliency, and the eyes programmed to move to that point in space. The relationship between attention and the point of fixation remains a matter of some debate. Part of the argument used by Findlay and Gilchrist to distinguish between processes that require attention and those that do not is based upon the so-called "pop-out" of objects in displays when processing does not require attention. If the target in a visual search task is a solitary circle in a display otherwise consisting of squares and triangles, then the search time will be low and the target will stand out from its background. Also, the number of irrelevant items does not influence the time taken to find the target. It appears that this kind of target is detected without the need of attentional resources. In contrast, a target defined by a conjunction of features does not pop-out, and has a search time that is a function of the number of irrelevant items, and it is concluded that identification requires attention. Henderson and Hollingworth examine some of the evidence concerning the pop-out of objects in line drawings and other scenes, and De Graef's discussion focuses on this phenomenon in an examination of the capture of attention by objects that are schematically inconsistent with their contextual scene. This takes up Henderson and Hollingworth's discussion of the use of extrafoveal information in guiding attention and the observer's eyes to objects that form part of the scene. At what point in the processing of a scene can semantic processes start to have an influence on the eye guidance mechanism? Does an object need to be fixated before its meaning is appreciated, or is there a pop-out effect for objects that are incongruous with their surroundings? De Graef points out that incongruous objects variously present processing enhancement and processing difficulty. In one task these objects illustrate a pop-out, and in another they are the source of processing delays. In reading tasks, for example, a semantically anomalous word will cause a break in the otherwise smooth flow of ideas through the reader's mind. When should we expect facilitation from perceptual pop-out and when should we expect inhibition from schema-driven perception? De Graef accepts the view that the two effects are not logically inconsistent, and uses a mismatch principle (Johnston and Hawley, 1994) by which items that occur together repeatedly become unitized. Recognition of any component item that matches expectations will then be facilitated, and processing of any item that does not match expectations will also be facilitated and will result in rejection, or pop-out, from the unitized representation. To answer the question of whether implausible objects gain an attentional advantage that may result in early fixation, De Graef uses an ingenious display in which target objects wiggle (Chapter 14). An object that wiggles (moves up and down rapidly) may elicit attentional capture, and when the wiggling object is a target in a search task it is then possible to set up questions about the relationships between attention and the processing of implausible objects. Implausible objects were fixated earlier and more frequently than their plausible counterparts, and this
18
G. Underwood & R. Radach
evidence must be added to the review by Henderson and Hollingworth in that it goes against the current flow which suggests that there are no fixation advantages to be gained by semantics. This argument about the potency of extrafoveal items to attract fixations has many of the characteristics of the debate elsewhere in this volume, over the matter of whether properties of a parafoveal (target) word can have a significant influence on viewing time and fixation position measures on a foveated word. Low target plausibility reduced fixations on the priming stimulus in De Graef s experiments, consistent with the idea that implausible targets present a processing difficulty while they remain in the extrafovea, and it is this difficulty that attracts attention and fixation. An interesting conclusion from the wiggle experiments comes from a manipulation of the distance between the priming object (location of the first fixation on the scene) and the target (wiggled) object. In this study implausible targets were fixated more than plausible targets, as before, and near targets (3° distance) were fixated more than far targets (8°). It is the need for foveal analysis that determines whether an object will be fixated, and this need is greater for implausible objects and for objects that are too far into the periphery for analysis. De Graef concludes that we can consider these effects as demonstrating that plausible objects, which are by definition most of the objects that we encounter, enjoy a larger useful field of view. It may be that the context of the scene reduces the possible candidates and thereby increases the importance of features that can be detected extrafoveally, an account that can be adopted quite readily as an explanation for context effects in word skipping during reading. The longer our eyes dwell upon one object or word, not only the greater is the information extracted from the point of regard, but also the greater is the amount of information that can be collected about extrafoveal objects. The discussion by van Diepen, Wampers and d'Ydewalle also focuses on the necessity of foveal inspection (Chapter 15). The technique used in the work reviewed in their discussion uses a display that changes according to the current location of the observer's eyes — a contingent display. The parts of the display around the point of fixation are degraded in some of the studies, to simulate the effect of a scotoma, and the area of degradation can be varied to create a blind-spot that can vary in size. The "mirror image" of this procedure gives the observer a moving window with extrafoveal information degraded. A technical appendix to their discussion explains how these displays are engineered. Contingent degradation at the point of fixation allows a direct approach to the question of how much information can be extracted from extrafoveal vision, for the simple reason that the participant observer in these experiments, in effect, has no foveal information available. However, it should be kept in mind that this technique provides an estimate of maximal rather then normal extrafoveal information acquisition as under normal conditions foveal and extrafoveal visual processing will always have to compete for shared resources.
Eye guidance
19
Studies of the recognition of simple line drawings in these artificial scotoma experiments suggest that foveation is not a requirement for identification, although gaze durations did increase. This increase suggests that we can use extrafoveal vision to determine the identity of objects, but that this takes extra time. Interestingly, individual fixation durations do not vary, suggesting that it is difficult to fixate in one (degraded) location while attending to an object in another (clear) location. By varying the onset of the degradation relative to onset of the fixation, such that the observer receives a clear image of fixated material for a brief interval before it is obscured, it is possible to establish the time-course of extraction of information from foveated images. This procedure establishes that observers can extract the information they need within the first few centiseconds of fixation. The remaining fixation time may be used for task processing, saccadic programming, or perhaps extrafoveal processing. Evidence of the early identification of foveated objects during long fixations, combined with evidence that suggests that extrafoveal information can serve object identification if the inspection is sufficiently prolonged, would result in the prediction that extrafoveal identification will be most readily observed in cases of prolonged fixation. This is the pattern of results reported in the discussion by Liversedge and Underwood, for the case of parafoveal effects during reading. Consistent with the conclusions of Findlay and Gilchrist, the studies reviewed by van Diepen, Wampers and d'Ydewalle suggest that observers can divide fixation and attention when the necessity arises, but that this does not come easily. Presumably if we made this dissociation regularly when reading and when viewing scenes it would be well-practised and thereby readily observable as a skill in laboratory experiments. Saccades to the attended object are unavoidable, in support of the framework for fixation guidance outlined by Henderson and Hollingworth, in which our eyes follow shifts in the location of attention. Eye guidance while watching dynamic scenes Our working assumption is that recognition of part of a scene benefits from foveation but is not impossible without it. Extrafoveal information can be used, if foveal information is unavailable, and we have demonstrations of object recognition without fixation. The extent of extrafoveal processing becomes increasingly significant, of course, when inspecting scenes containing moving objects. The efficient detection of movement in the extrafovea was a property of the human visual system exploited by De Graef by having objects of interest move up and down while other parts of the scene were being inspected. These wiggling objects were very successful in attracting attention. When whole objects are moving within a scene together, and when the scene itself is apparently moving, by virtue of the
20
G. Underwood & R. Radach
relative movement of the observer, then we can expect extrafoveal processing to become more important. Different types of apparent movement result in different perceptions, and d'Ydewalle, Desmet and Van Rensbergen demonstrate the importance of taking account of the type of motion for those of us who are concerned with the recognition of moving objects in dynamic scenes. The approach taken involves the presentation of moving scenes in cinematic ("movie") format, and in which different types of editing cuts are made. What do the observer's eye movements tell us about the perception of change in these dynamic real-world scenes? Three types of editing cut were investigated. At the first level there were small changes in the image portrayed, implemented by changing the camera position for example, and these changes have little effect upon fixation patterns. In contrast, at the second level the position of the camera is changed to give the view from a different direction plus a changed background, and these changes prompt eye movements. The measure of movement here is an increase in the variance in horizontal fixation locations, peaking a few hundred milliseconds after the change. This is an unconventional measure of fixation behaviour that Chapman, Crundall and Underwood also found useful in their attempts to understand the perception of dynamic scenes in their discussions. A number of the discussions in this collection illustrate not only the huge range of measures that are available once we start collecting an observer's eye movements, but also the different implications associated with changes in these measures. D'Ydewalle et al. concluded that this influence on the variance of fixation locations was an indication of post-perceptual, cognitive effects, and the third type of editing cut, involving a change to action sequence of the narrative, also produced increased variance. The observers in the study reported by Chapman and Underwood also watched moving images (digitised video clips, in this case) of real-world scenes, but in this study there were no artificial changes to the camera position to disturb the understanding of the narrative. The changes of interest involved potentially dangerous events. The video clips were recorded from a moving vehicle as it moved along various types of roads, and the events of interest involved the actions of other road users, such as a cyclist who rides out in front of the vehicle from a side road, and a parked vehicle ahead that is obscuring a pedestrian who is about to cross the road. For the actual driver of the vehicle, these are hazardous events that would require an evasive act. When drivers watch these video clips in the laboratory, their fixation behaviour indicates the extent to which the event captures their attention, and thereby indicates differences between drivers of varying ability and between events of varying saliency. When a dangerous event unfolds, the object that is the source of the danger collects extended fixations suggesting that an object to be monitored requires greater attention — the observers are monitoring these objects to determine whether the potential for danger inherent in a pedestrian moving from behind a
Eye guidance
21
parked car does in fact become an actual hazard that would require an action. There were also differences due to driving experience consistent with this interpretation, with novice drivers having longer fixation durations than experienced drivers. For the experienced driver, as for the experienced reader encountering a story for the first time, understanding of the unfolding events is facilitated through having previously encountered other configurations of the components of the scene or the sentence. There may be a further relationship between reading and driving here, in that when we move through a sentence there are limited options for the next word. Syntax and semantics restrict the range of words that can occur at any point, and these constraints are acquired by readers and used to predict what can happen next. The same may be said of driving, and if we stop the video at any point, experienced drivers will be able to give the range of events that could happen. Perhaps the reduced fixation durations of both experienced drivers and experienced readers reflects the ease of recognition of the objects that are presented in the sequence of road or sentence events, and perhaps in both cases contextual facilitation can influence recognition. A difficulty with the simple account of Chapman and Underwood's results is that fixation behaviour varied according to the type of road being presented on the video clips. Urban roads collected shorter fixations than rural roads, and perhaps this is a reflection of the overall difficulty of processing, but a more difficult sentence attracts longer gazes, not shorter ones. This is where the analogy between watching driving videos and reading collapses, of course, because reading is usually a self-paced task whereas the video is always shown at the same pace. When a fixed-speed video contains many potentially dangerous events (e.g., a crowded street scene) rather than very few objects (e.g., a rural lane) then the observer must scan from object to object searching in case one of them becomes the hazard to be avoided. When the observers are not watching driving videos, but driving themselves, we return them to a self-paced task in which an increase in perceived difficulty can be accommodated by a reduction in speed. The final three discussions present studies of fixation behaviour in drivers of varying experience and in varying environmental situations. Chapman and Underwood found that, at least for certain types of events with an abrupt onset (e.g., a pedestrian steps out from behind a parked car), there is evidence of attentional capture. Measures of fixation location and of the variance of fixation location indicate that at these moments most observers were fixating the same point on the screen. This is the equivalent of one of De Graef's line-drawings wiggling and then popping-out of the surrounding display — it cannot be avoided, and it is difficult to look anywhere else on the screen when one of these dangerous events is unfolding. Crundall, Underwood and Chapman also consider variations in the attentional field induced by roadway events and by prior experience, in studies in which drivers
22
G. Underwood & R. Radach
control a vehicle through a set route involving a range of roads, and in which their ability to detect extrafoveal targets is assessed in a laboratory task. Different roadways present different difficulties, and novice drivers respond to them in different ways. The variance of horizontal search proved to be a successful indicator here, as it did in the study reported by d'Ydewalle, Desmet and Van Rensbergen. In the driving study, novices showed little variation in their fixation behaviour regardless of the difficulty of the road being driven, whereas experienced drivers were sensitive to task demand. Specifically, experienced drivers had a huge increase in the variance of their horizontal fixation locations on a multi-lane road on which traffic merging was frequent. These drivers were looking around them to check on the locations of adjacent vehicles. The novices tended to look straight ahead, with longer fixation durations, possibly in response to the increased perceptual load of a crowded urban motorway on which many vehicles were changing their traffic lanes. Drivers were later presented with a laboratory task to investigate their responses to varying perceptual loads, using video clips similar to those described in Chapman and Underwood's discussion in combination with a peripheral target detection task. Low perceptual demand was associated with saccades of shorter latency in the direction of targets, and reduced fixation durations on the targets, but target detection was uniformly influenced by perceptual load for near targets (<6°) and for far targets (>6°), suggesting that perceptual load has a general effect of interference rather than any attentional narrowing that we could describe as tunnel vision. Comparisons between the fixation behaviour of novice and experienced drivers are also made in the discussion by Dishart and Land, but the novices here are totally inexperienced in that they are in the early stages of receiving tuition prior to attempting their driving tests. It is therefore difficult to make direct comparisons between this discussion and the previous studies, and the novices here must be regarded at a very different stage in the acquisition of skill relative to the recently qualified drivers described in the previous discussions. The question asked by Dishart and Land is how fixation patterns reflect the earliest stages of learning to drive, when controlling the vehicle is still a problem. On the basis of a study reported by Land and Lee (1994) we have the suggestion that fixation of the tangent point of a curve is important for steering control. The tangent point is the area of a curve where the road appears to change its direction, and constant fixation will provide the driver with information about the extent to which the steering wheel should be turned. There are other features of the roadway to inspect however, as in Chapman and Underwood's video clips portraying dangerous events, and this is why the Land and Lee suggestion deserves further consideration. Their study, involving three experienced drivers, used a road with no on-coming traffic and very few side roads. There were few obstacles and not only was there little need to look ahead to check that the road was clear, but the road was embanked and therefore provided little opportunity for the driver to look ahead. Regular
Eye guidance
23
fixation of the tangent point in the Land and Lee study may have been a consequence of the absence of alternatives. Dishart and Land extend the investigation of tangent points in their discussion to look for changes in the early stages of driving, with evidence, at least from one driver, of increased use of the tangent point after the first four hours of driving tuition. This suggests a change from using the immediately adjacent road kerbside for lane maintenance, to use of the more distant tangent point for anticipatory steering control. This may be stage-specific change in behaviour, however, and restricted to the case of driver's in the initial stages of learning to control the vehicle in order to keep it on the road. When ten experienced drivers are compared with ten novices who have recently passed a driving test, it emerges that with some curves it is the experienced drivers who tend to not look at the tangent point (Underwood et al., 1997). Indeed, many of the drivers in this study failed to fixate near to the tangent point during at least one of the curves while they were driving, leading to the question that if tangent point fixation is essential for steering, then how were they able to keep the vehicle in lane when they were looking elsewhere? Presumably extrafoveal vision is used extensively by more experienced drivers. The driver in the Dishart and Land study also showed an interesting increase in fixation durations over the first few hours of driving experience, and they interpret this as an indication that the driver has started to process more useful information. When first introduced to the vehicle's controls this driver may have been searching the view ahead in an attempt to learn what it is that will help with steering control, but after a few hours the relationship between road curvature and required steerage is in the process of being acquired. When drivers are placed in a steering simulator, however, any form of active road-related experience (such as cycling and computer driving games) predicts the ability to perform in the task. Furthermore, longer fixations are usually associated with increased perceptual difficulty, however, and if this increase proves to be a reliable result it may be that this driver has started to realise some of the difficulties of driving safely. A methodological decision to be made when studying driving behaviour is whether to use a simulator or an on-road vehicle. Dishart and Land used both, Crundall et al. used an instrumented vehicle with an on-board PC and eye-tracker, and Liu gave his experienced drivers a journey in a laboratory simulator. The advantage of the simulator over the natural roadway, of course, is that all participants can be presented with the same driving conditions and with the same traffic problems. The advantage of the roadway is that it is authentic — the quality of the display is usually better than that in a simulator in all possible respects including size, colour, texture and resolution. In all cases the drivers are in control of their own motion, real or apparent, and receive complex visual displays that vary according to their own position in the scene. Liu applies an analysis to his fixation data that lends itself very readily to a modelling process. Liu's Markovian analysis asks whether
24
G. Underwood & R. Radach
each fixation location is dependent upon the location of the previous fixation. Given that the driver has just looked at the tangent point of the approaching curve, what part of display is inspected next? The leading car, the nearest visible part of the road ahead, or the right-hand side of the road. And having then looked at the nearest part of road, on the second fixation in this sequence, does the driver look at the leading car, the right-hand side of the road, or the left-hand side of the road. Each fixation made during the simulated driving task can be entered into this sequence analysis to find a fixation transition matrix for different driving tasks. Liu instructed his drivers either to maintain the vehicle in the centre of the road, or to maintain a selfdetermined distance from a lead car. Well formed fixation transition matrices were discovered for each of these driving tasks, which themselves generated different matrices. Liu proceeds to use his analysis of fixation sequences to develop a model of the driver's cognitive processes. Given that a specific sequence of eye fixations has been identified, is it possible to determine the cognitive representations that produced that sequence? The question asks whether we can determine a driver's intentions by first recognising a sequence of fixations, for if we can then the builders of the intelligent vehicles of the future will be able to modify the information provided to the driver to help achieve these goals. For example, we know that glances into the side mirrors precede overtaking and lane-changing manoeuvres, and if the intelligent vehicle's perimeter field detectors have identified an obstructing vehicle that would render the manoeuvre unsafe, then a warning could be sounded, or the action could even be prohibited. While this may sound like science fiction it should be borne in mind that Liu's investigations are made on behalf of an international car manufacturer, and further discussions of intelligent vehicle control can be found in Parkes and Franzen (1993) and Underwood et al. (1994). Concluding remarks The question that we started with was "what determines where we look next?" and most of the discussions have focused on the reduced form of "what determines what we look at?" For the case of driving Liu has provided a very direct answer: what we look at depends upon what we looked at last. The story is complicated by the use of displays that contain moving objects of course, with De Graef s wiggling drawings and Chapman and Underwood's moving hazards having a pronounced saliency in attracting attention. If we performed Liu's sequence analyses on the fixation patterns from these experiments, we would find that whatever was being looked at now, a moving object would be fixated next. The transition matrices found in his driving simulator suggest that there are fixed patterns of fixations, or scanpaths, that predict where a driver will look given a previous fixation in any other part of the visual field. The equivalent conclusion from studies of reading would be that having
Eye guidance
25
looked at one word in a sentence the next fixation will be upon the next word. A transition matrix for reading would have the word immediately to the right of the current word as the most likely object of the next fixation (for a transition matrix analysis of reading data, see Stark, 1994). Just as with scene inspection, there are other sources of variation to be taken into account. Discussions in this volume have focused upon just three of these sources — the conditions under which a regressive fixation will be made, the conditions under which word skipping will occur, and the variation of the landing position within the next word. Some similarities are reported across the domains discussed here. For example, the debate over whether semantic information can be collected with extrafoveal vision is represented both in the discussions of reading (e.g., by Murray, by Kennedy, by Liversedge and Underwood, and by Everatt, Bradshaw and Hibbard), and in discussions of scene perception (e.g., by Henderson and Hollingworth, by Findlay and Gilchrist, by De Graef, and by van Diepen, Wampers and d'Ydewalle). In both cases the data are equivocal. The answer is probably that semantic information is potentially available, but that it is only collected under very specific circumstances. If these conditions are not met, then the experiment will conclude that no influence is present. A similarly unresolved argument concerns variations in landing positions that can be attributed to anything more than the gross visual features of the word or object that is about to be fixated. At one end of the spectrum there is Radach and McConkie's emphasis on low level determinants of fixation positions leaving little room for cognitive influences. At the other end there is Liversedge and Underwood's claim for cognitive landing position effects, at least when the prior fixation is long and there is a greater opportunity for an extrafoveal word or object to be part-recognised and for the saccadic programming that will lead to fixation of a selected part of that word or object. More evidence will be needed to resolve this specific controversy. But on a more general level the appropriate answer may be that eye guidance is as low-level as needed and as cognitive as possible for a certain task and stimulus configuration. To determine the details of what kind of information is acquired and processed when and how during the time course of visual and cognitive analysis is a very complex research enterprise. The interpretation of the evidence, including the new data presented here, illustrate the complexity of the enterprise, but they also demonstrate the progress that is being made. References Beauvillain, C., Dore, K. and Baudouin, V. (1996). The 'center of gravity' of words: Evidence for an effect of the word-initial letters. Vision Research, 36, 589-604. Becker, W. (1989). Metrics. In: R.H. Wurtz and M.E. Goldberg (Eds.), The Neurobiology of Saccadic Eye Movements (pp. 13-61). Amsterdam: Elsevier.
26
G. Underwood & R. Radach
Duffy, S.A. (1992). Eye movements and complex comprehension processes. In: K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading (pp. 333-354). New York: Springer. Everatt, J. and Underwood, G. (1992). Parafoveal guidance and priming effects during reading: A special case of the mind being ahead of the eyes. Consciousness and Cognition, 1, 186-197. Henderson, J.M. and Ferreira, F. (1990). Effects of foveal processing difficulty on the perceptual span in reading: Implications for attention and eye movement control. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 417-429. Hyona, J. (1995). Do irregular letter combinations attract readers' attention? Evidence from fixation locations in words. Journal of Experimental Psychology: Human Perception and Performance, 21, 68-81. Hyona, J., Niemi, P. and Underwood, G. (1989). Reading long words embedded in sentences: Informativeness of word parts affects eye movements. Journal of Experimental Psychology: Human Perception and Performance, 15, 142-152. Inhoff, A.W., Pollatsek, A., Posner, M.I. and Rayner, K. (1989). Covert attention and eye movements in reading. Quarterly Journal of Experimental Psychology, 41, 63-89. Inhoff, A.W., Briihl, D. and Schwartz, J. (1996). Compound word naming in reading, online naming and delayed naming tasks. Memory and Cognition, 24,466-476. Johnston, W.A. and Hawley, K.J. (1994). Perceptual inhibition of expected inputs: The key that opens closed minds. Psychonomic Bulletin and Review, 1, 56-72. Kennison, S.M. and Clifton, C. (1995). Determinants of parafoveal preview benefit in high and low working memory capacity readers: Implications for eye movement control. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 68-81. Land, M.F. and Lee, D.N. (1994). Where we look when we steer. Nature, 369, 742-744. Morrison, R.E. (1984). Manipulation of stimulus onset delay in reading: Evidence for parallel programming of saccades. Journal of Experimental Psychology: Human Perception and Performance, 10, 395-396. O'Regan, J.K. (1990). Eye movements and reading. In: E. Kowler (Ed.), Eye Movements and Their Role in Visual and Cognitive Processes (pp. 395-453). Amsterdam: Elsevier. O'Regan, J.K. (1992). Optimal viewing position in words and the strategy-tactics theory of eye-movements in reading. In: K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading (pp. 333-354). New York: Springer. Parkes, A.M. and Franzen, S. (1993). Driving Future Vehicles. London: Taylor & Francis. Pollatsek, A. and Rayner, K. (1982). Eye movement control in reading: The role of word boundaries. Journal of Experimental Psychology: Human Perception and Performance, 8,817-833. Radach, R. (1996). Blickbewegungen beim Lesen: Psychologische Aspekte der Determination von Fixationspositionen (Eye Movements in Reading: Psychological Aspects of the Determination of Fixation Positions). Miinster/New York: Waxmann. Rayner, K. and Morris, R.K. (1992). Eye movement control in reading: Evidence against semantic preprocessing. Journal of Experimental Psychology: Human Perception and Performance, 18, 163-172. Rayner, K. and Pollatsek, A. (1987). Eye movements in reading: A tutorial review. In: M. Coltheart (Ed.), Attention and Performance XII: The Psychology of Reading (pp.
Eye guidance
27
327-362). London: Erlbaum. Rayner, K. and Pollatsek, A. (1989). The Psychology of Reading. Englewood Cliffs, NJ: Prentice Hall. Stark, L.W. (1994). Sequences of fixations and saccades in reading. In: J. Ygge and G. Lennerstrand (Ed.), Eye Movements in Reading. Oxford: Elsevier. Underwood, G., Chapman, P., Crundall, D., Cooper, S. and Wallen, R. (1997). The visual control of steering and driving: Where do we look when negotiating curves? Paper presented at the Vision in Vehicles VII Conference, Marseilles. Underwood, G., Clews, S. and Everatt, J. (1990). How do readers know where to look next? Local information distributions influence eye fixations. Quarterly Journal of Experimental Psychology, 42, 39-65. Underwood, G., Sommerville, F., Underwood, J.D.M. and Hengeveld, W. (1994). Information Technology on the Move: Technical and Behavioural Evaluations of Mobile Telecommunications. Chichester: John Wiley. Vonk, W., Radach, R. and van Rijn, H. (1997). Eye guidance and the saliency of word beginnings. Paper presented at the Ninth European Conference on Eye Movements, Ulm.
This page intentionally left blank
29
CHAPTER 2
Definition and Computation of Oculomotor Measures in the Study of Cognitive Processes Albrecht Werner Inhoff State University of New York at Binghamton and Ralph Radach Technical University of Aachen
Abstract Oculomotor measures provide distinct methodological advantages in the study of cognitive and perceptual processes. Derivation and use of these measures is not, however, straightforward. This chapter reviews methodological choice points that need to be considered when raw data are used to define basic oculomotor events, such as fixations and saccades. The chapter also considers the usage of these oculomotor events for the indexing of perceptual and cognitive processes. Particular attention is given to viewing duration measures as these are used as the primary indicator of cognitive processes in current eye movement research. The discussion is limited in that the raising of measurement-related and methodological issues is not followed by the presentation of specific solutions. Nevertheless, we consider the discussion of these issues a crucial step in the eventual development of oculomotor measurement standards.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
30
A. W. Inhoff & R. Radach
Introduction Oculomotor measures provide distinct methodological advantages in the study of cognitive and perceptual processes. They appear sensitive to a wide range of 'cognitive processes' and they can be obtained under relatively natural task conditions; hence, the growing acceptance and popularity of these measures. However, measurement of oculomotor activity in the cognitive and perceptual sciences is far from being straightforward. There has been some discussion pertaining to the measurement, evaluation, and reporting of oculomotor data (e.g., McConkie, 1981; Heller, 1983; McConkie, Wolverton and Zola, 1984; Mueller et al., 1993; Harris, Abramov and Hainline, 1984; Nodine et al., 1992) but no measurement standards have evolved and many methodological issues have remained either unaddressed or unresolved. As a result, there is still considerable ambiguity in the definition of basic oculomotor events and the use of different types of oculomotor measures in the study of conceptually similar issues. An informal survey conducted by us (Inhoff, Radach, and Heller), to which thirty-two researchers using oculomotor measures responded, indicated that all of them believe that there is a need for increased discussion of measurement-related and methodological issues. More specifically, two thirds of them consider the definition of functional oculomotor events controversial, and more than one third wish to appraise the specification and interpretation of extant measures. This uncertainty has led to considerable differences in the specification and interpretation of oculomotor measures, hampering the appraisal, comparison, and exchange of scholarly findings. The main goal of this chapter is to provide a review of likely methodological choice points that need to be considered when oculomotor data are collected. The chapter is not concerned with questions related to the hard- and software setup, nor with the assessing of the accuracy and reliability of the eye tracker output (see McConkie, 1981, for an excellent review). Instead, our considerations focus on issues that are encountered 'down-stream' when raw data are used to define basic oculomotor events, such as fixations and saccades, and when oculomotor data are used to index perceptual and cognitive processes. We will show that different procedures can be (and have been) used to specify saccades and fixations and that the researcher using oculomotor measures has a relatively large degree of freedom in choosing — or defining — oculomotor measures of cognitive processes (see also Chapter 3 and Konieczny, 1996). Our discussion is limited in that it focuses on the use of oculomotor measures in the study of perceptual and cognitive processes during reading. We presume, however, that similar methodological issues are encountered in other domains of study. It is further limited in that discussion of measurement dilemmas is not followed by the specification of measurement solutions. Nevertheless, we consider the discussion a crucial step in the eventual development of oculomotor measurement standards.
Oculomotor measures
31
Defining fixations and saccades Measurement of basic oculomotor events, comprising fixations and saccades during text reading, is often considered straightforward, as there appears to be little ambiguity as to whether the eyes are fixated or in motion. However, a close look at the movement-time profile shown in Fig. 1 (recorded in Binghamton using a dual-Purkinje eye tracker) indicates that the saccade-fixation transition is not necessarily clear-cut. The figure shows the ending 20 ms of a fixation near character location 9 (on a line of print containing 80 characters of text), followed by a right-directed saccade of about seven character spaces. The high-velocity right directed movement is followed by a shifting to the left, with high-and low-velocity movement components extending across 8.4 and 1.2 character spaces and lasting 16 ms and 14 ms, respectively. As can be seen, the measured (right) eye appears to move too far, then shifts (or converges) to its final position. The low-velocity shift is followed by a 60 ms period during which the eye is relatively stable. After this, the trajectory shows another high- and low-velocity movement sequence, with movement sizes of 1 and 0.2 character spaces and
Fig. 1. Space-time diagram of eye movements recorded over a period of 200 ms with a sampling rate of 1000 Hz using a dual-Purkinje eye tracking system. Left-to-right eye movements correspond to a bottom-to-top change on the ordinate (see text for details).
32
A. W. Inhoff & R. Radach
movement durations of 8 ms and 4 ms, respectively. This is followed by another period during which fixation location remained 'almost' unchanged. Brief fixations which are followed by short saccades of approximately one character space are relatively common in some readers. Records from the Binghamton laboratory indicate that approximately 5-15% of their fixations follow this pattern. Figure 1 indicates that the onset of a stable fixation is preceded by two different types of oculomotor events: a high-velocity change of fixation location and a reversed-direction, lower-velocity shifting toward a resting location. Is the reversedirection movement part of the saccade? According to Deubel and Bridgeman (1995a,b), it reflects a corresponding lens movement in the dual-Purkinje's eye movement recording. During saccade offset, deceleration of the lens lags deceleration of the eyeball so that the lens movement overshoots the eyeball movement until it is 'pulled' back by its attachments to the eyeball tissue. Smaller reverse-direction movements can also be seen, however, when a pupil reflection method is used (see Fig. 3). Presumably they reflect the onset of vergence movements and, perhaps, small movements of the pupil relative to the eyeball. Deceleration of the eyeball fluids and the pupil may lag behind the deceleration of oculomotor muscle forces, possibly moving the pupil briefly in the direction of the saccade after muscle movements were terminated. Deubel and Bridgeman (1995a, p.532) suggest that the "maximum of the Purkinje tracker overshoot represents a practicable estimator for the veridical end of the saccade". More generally, it may be assumed that the time at which the eye movement reaches is largest extent coincides with the veridical end of the saccade. If useful information was obtained immediately after the veridical saccade was completed, then fixation onset should be defined as the point in time at which the recorded saccade reached its largest value. According to this view, the 14 ms and 4 ms reverse-direction shifts should be considered part of the fixation. If, however, effective perception of text was suppressed during postsaccadic movements, then fixation onset should be defined as the point in time at which the complete eyeball system was relatively stationary. According to this view, the 14 ms and 4 ms shifts should be excluded from the fixation duration. Deubel and Bridgeman (1995b) showed that brief flashes can be perceived during the lens movement, but it remains unclear whether useful information can also be gleaned from a steady image that is visible during the postsaccadic movement period. On the one hand, postsaccadic movement will smear the retinal image. On the other hand, postsaccadic movements are relatively small and principled, and the reader may be able to extract useful information during that movement period. Looking at Fig. 1, other methodological issues arise. The relatively small change in fixation location, extending across one character space, could indicate that the reader terminated the brief 74/60 ms fixation duration to seek new information. Alternatively, the small change in location may not have any functional significance
Oculomotor measures
33
Fig. 2. Space-time diagram of eye movements recorded over a period of 160 ms using an AmTech ET4 pupil reflection eye tracking system with a sampling rate of 500 Hz. The recording shows a null-saccade, consisting of a rapid lateral jerk of both eyes that extend across approximate 2 character spaces, followed by an immediate return to the previously occupied fixation location.
for the linguistic analysis of fixated text, and the short fixation period should be considered part of a single fixation duration. The movement-time profile shown in Fig. 1 thus can be used to arrive at radically different descriptions of oculomotor activity: it may depict one functional saccade of average length which was followed by a relatively long fixation duration. Alternatively, it may depict a sequence of two saccades, their mean extent being considerably shorter than the size of an average saccade, a mean fixation location that is located at a point where the eyes never went, and two fixations, their mean duration also being shorter than the duration of a typical fixation. Figure 2 shows yet another movement-time profile (also obtained in the Binghamton laboratory), according to which the eyes begin to move one to two character spaces away from a fixation but then return to the original fixation location. These displacements need not be lateral but can occur in any direction. They may constitute a functional oculomotor event; hence fixation durations prior to the 'jerk'
34
A. W. Inhoff & R. Radach
may need to be discriminated to from the fixation duration following the jerk. Alternatively, these jerks may be low level oculomotor 'irregularities' that are completely unrelated to the perceptual and linguistic analysis of fixated text. The fixation period of a target area can also be punctured by one or several blinks. Blink time can be excluded from the computation of fixation duration, as text is not visible during this period. However, it appears likely that cognitive analyses of text continue during the blink period, unless the blink occurs at the very beginning of a fixation; hence, blink time may be added to the fixation duration. Another methodological option is to exclude fixations with blinks from analysis. The methodological treatment of saccade durations adds yet a further option to the computation of fixation durations. Most researchers do not include saccadic movement in their computation of fixation durations. Yet it is plausible to assume that the cognitive analysis of a fixated segment of text continues during the saccade that follows the fixation. Hence postfixation movement time may need to be included in the fixation duration measure when it is used to measure cognitive processes. Selecting cutoff values Small microsaccades of the eyes, extending across a fraction of a character space, occur frequently when the eyes are fixated. They are, however, not necessary to maintain stable fixation (Kowler and Steinman, 1980). Interestingly, during prolonged fixation, the intersaccade interval for microsaccades is in the order of normal saccade latencies (Cunitz and Steinman, 1969). McConkie (1983) considers two possible functions of these very small movements: they could indicate fine attentional adjustments or reflect an intrinsic tendency to move the eyes after some delay has passed. So far, no consistent functional explanation has emerged and there is no empirical evidence for relations to perceptual or cognitive processes (Kowler and Steinman, 1980). When detected, small saccades may be excluded from the set of data by selecting a minimum saccade size value. Saccades smaller than the criterion value can either be considered part of a larger saccade or they can be excluded from analysis. Similarly, the functional role of brief fixations is also poorly understood. In recent studies, conducted in the Binghamton laboratory, we have excluded fixation durations of less than 50 ms from the data set, assuming that shorter duration fixations are not determined by on-line cognitive processes (e.g. Inhoff, Briihl and Schwarz, 1996). Other researchers (e.g., Vitu, O'Regan and Mittau, 1990; Hyona, Niemi and Underwood, 1989) have used cutoff values in the order of 70-100 ms. Again, fixation durations that are shorter than the criterion value can either be considered part of a preceding or following fixation or they can be excluded from analysis. Possibly, the optimal cutoff value is determined by the procedure used to
Oculomotor measures
35
compute fixation durations and by effect sizes. Cutoff values may be higher when movement time is included in the computation of fixation duration. Furthermore, relatively high cutoff values may be advantageous when linguistic processes are studied, as McConkie, Reddix and Zola (1992) showed that fixation durations of 140 ms or less are unaffected by the lexical properties of fixated text. In addition, some — but not all — researchers exclude fixation durations above and below two or three standard deviations from the mean. Use of different saccade size and fixation duration cutoff criteria may have a profound effect on the description of readers' eye movements during reading. For instance, setting the minimum fixation duration to either 50 ms or 100 ms will yield radically different descriptions of the saccade-fixation sequence depicted in Fig. 1. Cutoffs may also determine effect sizes and their reliability. In psycholinguistic studies, effect sizes that are not reliable when a 50 ms cutoff is used may be reliable when a 140 ms cutoff is used. Currently, there are no comparisons of the effectiveness of different cutoff values. Binocular measurement The prevailing view, according to which "both eyes pretty much move in synchrony with each other across the page" (Rayner and Pollatsek, 1989, p. 113), implies that binocular measurement should not create any additional measurement dilemmas. However, recent findings indicate that there can be marked saccadic and postsaccadic asymmetries in the movements of the eyes. A typical binocular asymmetry is shown in Fig. 3. Colleweijn, Erkelens and Steinman (1988) provided a detailed description of the temporal and spatial dynamics of binocular coordination of horizontal saccades in a simple scanning task and found movements of the abducting (temporally moving) eye to be somewhat larger than corresponding movements of the adducing (nasally moving) eye. Similar observations were reported by Heller and Radach (1995) for reading and other complex visual tasks. The magnitude of absolute amplitude differences between the eyes can be quite remarkable, with means in the order of 5% of total movement amplitude when large (10-12 letter) saccades are considered and of approximately 15% when small (2-3 letter) saccades are considered. During fixation, the initial disparity is reduced — but generally not offset — by a slow, relatively uniform convergence shift. When reading a line of text, the residual disparity at the end of many fixations can lead to an accumulated binocular difference in fixation position of up to several characters. The relative contribution of each eye to postsaccadic convergence is modulated by its landing position. In a simple scanning experiment both eyes converged symmetrically when they landed on the target (consisting of a parafoveally presented 0.4 degree wide rectangle). In case of under- or overshoot, however, it was
A.W.lnhoff&R.Radach
36
0
50
100
150
200
250
300
350
400
Time (in ms) Fig. 3. Space-time diagram of eye movements recorded using an AmTech ET4 pupil reflection eye tracking system running at 500 Hz. Reading direction is from bottom to top on the ordinate. The recording shows a disparity in the landing position of the regressive saccade made by the two eyes and an asymmetry in the subsequent slow vergence movement.
the eye that landed farther away from the target centre that performed the larger vergence movement, indicating that vergence is not reflexive, as commonly thought, but to a certain degree 'goal directed' (Radach, Heller and Jaschinski, 1996). Asymmetry in postsaccadic vergence indicates that fixation durations, when defined as the period of time during which the eyes are relatively stable, may differ for the two eyes. When fixations are short, instances can occur in which the fixation duration of one eye is slightly above the criterion value whereas the fixation duration of the other eye is slightly below. Other measurement dilemmas may occur. The two eyes may be fixated at different characters, requiring a choice as to which one of the two fixation locations defines the initial fixation position within a word, a measure that has been used in several studies of reading (see below). Instances may even occur in which the two eyes fixate different linguistic units, e.g., the line of regard of one eye may fall on the ending character of one unit (word, morpheme, syllable, or other) and the line of regard of the other eye may fall on the first character of the next unit.
Oculomotor measures
37
Using oculomotor viewing duration measures After fixations and saccades have been identified they can be used to index cognitive processes. For the last fifteen years, this indexing has been guided by two processing assumptions, often referred to as eye-mind and immediacy assumption (Just and Carpenter, 1980). According to these assumptions, the location of a fixation coincides with the cognitive processing of text at that location, and an ensuing fixation duration is determined by the area's (e.g., word's) perceptual and cognitive analysis. According to the two assumptions, the viewing duration of an area of text, or some derivative of it, can be used to index its perceptual and linguistic analysis. Methodological issues arise, however, when an effective viewing duration measure is to be defined. Readers may not move the eyes from word to word in the left-to-right direction (in left-to-right ordered text). Some words, notably short words, are skipped (i.e., not fixated) during sentence reading (e.g., Vitu et al., 1995). Conversely, some words, notably long words, receive more than one fixation. Sentence and passage reading often involves regressions (eye movements to previously read text) and multiple word readings. Furthermore, there is now accumulating evidence indicating that the linguistic processing of a word can be initiated before it is fixated (see Rayner and Pollatsek, 1989, for a review) and that the processing of a word can continue after it has been fixated (see Chapter 11). In the following section, we will describe several oculomotor measures that have been reported in the literature. One or several of these measures are generally used as dependent variables in experimental studies and as criterion variables in nonexperimental regression studies (e.g., Just and Carpenter, 1980). Particular attention will be given to the computation of (word) viewing durations, as processing time has been used as the primary indicator of linguistic processes in the cognitive sciences (the listing of measures is not exhaustive). To simplify the discussion, we will not consider the question of whether mean effects of experimental manipulations on viewing duration measures are produced by similar effects in all relevant cases or by combinations of unequal effects in subsets of the data (see McConkie, Zola and Wolverton, 1985 for a discussion of theoretical and methodological implications of this "frequency of effects problem"). We will further assume that the spatial properties of to-be-studied text is matched across conditions. Various types of scaling have been used when this matching is not possible, e.g., the viewing duration of a target area may be divided by the number of characters in the target area. At the outset of our discussion will be a seeming 'non-event': instances in which the word of interest (also referred to as target word) is skipped.
38
A. W. Inhoff & R. Radach
(1) Skipping rate (fixation probability being the inverse, but less common, measure) A target is considered skipped when neither the space preceding it nor any of its constituent letters receives a fixation during the initial right-directed reading of text. When computed, skipping rates generally constitute a supplementary measure that, when analysed as a function of the skipped target's visuo-spatial and linguistic properties, may provide a converging source of evidence for a particular hypothesis. Virtually all studies have shown that target length and the launching distance of a saccade from the target are the main determinants of target skipping rate, with skipping being particularly common when the target is short (Vitu et al., 1995) and when the launch site is near the target's beginning letter (see, e.g., McConkie, Kerr and Dyre, 1994, who propose a 'skipping formula' to describe skipping rates mathematically). There is also considerable evidence showing that linguistic variables modulate target skipping. When matched for length, high-frequency words are more likely to be skipped than low-frequency words (Inhoff and Topolski, 1994), especially when words contain seven or fewer characters (Rayner, Sereno and Raney, 1996), and contextually constrained words are more often skipped than unconstrained words (Rayner and Well, 1996). The fact that lexical effects on word skipping are sometimes not found in analyses of large data corpuses (e.g., Radach in O'Regan et al., 1994) may be due to the less extreme variation of word frequency and contextual constraint in natural text material as compared with experimental sentences. In an elegant meta-analysis on a large number of published studies, Brysbaert and Vitu (Chapter 6) show that although linguistic variables influence skipping, they account only for a relatively modest portion of the total variance, hence, our suggestion that they constitute a supplementary source of evidence in the examination of cognitive processes. Skipped words are generally excluded from viewing duration analyses. However, attempts have been made to estimate their processing time. This was accomplished either by setting the fixation duration of skipped words to zero (Just and Carpenter, 1978; 1980), or by basing means only on subjects with viewing durations above a certain minimum (Carpenter and Just, 1983). Another technique is to substitute pretarget viewing durations for target viewing durations when the pretarget fixation is reasonably close (e.g., within three characters of the target, Folk and Morris, 1995), or by using the fixation nearest to the target as target viewing duration, irrespective of whether the fixation is to the right or left of the target (e.g., Garrod et al., 1990, who used a pre- or post- target window of four and one character spaces, respectively). On the one hand, the assignment of processing time to skipped words will increase the number of observations that can be used for the computation of target viewing durations. The advantage may be offset, however, by methodological shortcomings. Assignment of a zero processing duration to a target word (target
Oculomotor measures
39
region) is likely to underestimate the demands of the target's linguistic analyses. Using the duration of a pre- or post-target fixation is also problematic, as it is likely to confound the target's linguistic analyses with the linguistic analysis of either preor post-target words. Currently, no data are available to determine the advantages and disadvantages of viewing duration substitution measures when target skipping occurred. (2) Single fixation durations Computation of target viewing durations is uncontroversial when the target is the recipient of a single fixation during the initial left-to-right sentence reading. Instances in which the target receives more than one fixation or instances in which the word is fixated after text to the right of it was viewed (following a regressive saccade) are excluded. The exclusion of instances in which the word is a recipient of more than one fixation from the single fixation duration measure rests on the assumption that qualitatively different processes control eye movements when a target receives a single fixation and when it receives multiple fixations (e.g., O'Regan, 1992). Single fixations that follow a regression are excluded because it remains unclear what type of information had been obtained prior to the initial skipping of the word. Computation of single fixations may not yield a meaningful measure when target refixations are common. The exclusion of a large number of target vie wings is likely to increase the variability and decrease the sensitivity of the measure. Furthermore, assumptions underlying the computation of single fixation durations remain controversial (see below). (3) The durations of the first and second of two target fixations Instances in which the target receives more than one fixation, which is common when long words are encountered, can be used to determine whether initial fixations on a target are functionally distinct from subsequent fixations. Differential effects of linguistic target characteristics on the first fixation duration and on refixation durations would recommend the separation of instances in which the target receives a single fixation and of instances in which the target receives more than one fixation. Comparison of first and second fixation durations, as a function of the location of these fixations on a target word reveals a distinct tradeoff. Initial fixations that occupy beginning or ending word positions tend to be relatively short and fixations near the word centre are relatively long. Conversely, the duration of the second fixation is relatively short when the first had landed near the target centre and increases as first fixation position is shifted toward the target's beginning or ending letters (O'Regan, Pynte and Coeffe, 1986; O'Regan et al., 1994; Pynte, Kennedy and Murray, 1991). However, it remains unclear whether this tradeoff is due to the computation of functionally distinct linguistic representations during a target's initial fixation and its refixation.
40
A. W. Inhoff & R. Radach
Hyqna, Niemi and Underwood (1989) examined this issue by manipulating the degree to which a long word's beginning and ending letter sequences constrained the set of compatible candidate words (this was referred to as informativeness). Their results showed an effect of informativeness on the duration of the second but not of the first fixation during sentence reading. In contrast to this, informativeness determined the duration of initial fixations, especially when the eyes landed near the word's beginning, in Pynte, Kennedy, and Murray's (1991) study. Taken together the two studies suggest that neither first nor second fixations are 'code specific'. This conclusion is consistent with Rayner et al.'s recent (1996) findings. In their analyses effects of word frequency were determined separately for the first and second of two target fixations (when the target received two fixations), and for single fixation durations (when the target received a single fixation). The comparison revealed virtually identical effect sizes, amounting to 29, 31, and 25 ms, in the first and second of two intratarget fixation and in the single fixation durations, respectively. These results indicate that the separation of single fixation from multiple target fixations may not be empirically justified. Still, this conclusion must be considered tentative, as word properties other than word frequency and 'informativeness' may have differential effects on the first and second of two target fixations. (4) First fixation durations First fixations comprise the duration of the first fixation on a target word during first pass sentence reading, irrespective of whether the target word receives one or several fixations. It thus provides a measurement of word viewing durations whenever the target is a recipient of a fixation. The measure was first used by Inhoff (1984) and is now one of the most commonly reported linguistic measures. It is sensitive to a range of linguistic computations including orthographic properties (Lima and Inhoff, 1985), phonological properties (Inhoff and Topolski, 1994; Pollatsek et al., 1992), lexical properties (Inhoff and Rayner, 1986; Rayner and Duffey, 1986), metaphorical status (Inhoff, Lima and Carrol, 1984) and contextual constraints (Inhoff, 1984). The measure has been criticised, however, for confounding instance in which the target receives a single fixation and instances in which it receives multiple fixations (O'Regan, 1990; 1992). For the criticism to be meritorious, it has also to be shown that single fixation and the first of several target fixations are sensitive to different types of linguistic computations during sentence reading, which does not appear to be the case (see above). (5) Gaze durations This measure was first used by Just and Carpenter (1978; 1980) and is perhaps still the most commonly reported index of cognitive processes. Computationally, gaze
Oculomotor measures
41
durations are defined as the time spent viewing a target word until another word is fixated. Similar to first fixation durations, gaze durations include all instances in which the target was fixated. Similar to first fixation durations, gaze durations have been shown to be sensitive to a wide range of pre-lexical, lexical, and post-lexical word properties. One typical computation of gaze durations is to cumulate all the viewing durations on a word during its initial reading. Since the measure is to index cognitive processes, it could be argued that the cumulation should also include the time spent moving the eyes within a word, and perhaps even the time spent moving the eyes from the target word to the next word in the text. Currently, there are no data available that compares the diagnostic value of different types of gaze computations. It appears unlikely that the inclusion of movement time in the gaze measure will lead to notable changes in effect sizes as movement time appears to constitute a small fraction of the time spent on a target word (Blanchard, 1985). (6) Mean fixation durations A mean fixation duration is defined as the mean duration of all fixations that occupy a target during its first pass reading. Mean fixation durations were computed in some of the earlier studies of reading where they were found to be sensitive to linguistic processes, yielding, for instance, shorter durations when a contextually constrained target was viewed than when an unconstrained target was viewed (e.g., Ehrlich and Rayner 1981). Like other oculomotor measures, mean fixation durations confound instances of single and multiple target fixations. However, in contrast to first fixation durations and gaze durations, this confounding can lead to a serious distortion of experimental effects. For instance, words that are difficult to recognise are likely to receive more than one fixation (e.g., O'Regan et al., 1994; Rayner, Sereno and Raney, 1996; Radach, Heller and Inhoff, 1997; Vitu et al., 1995); hence, large effects of difficulty on gaze durations. However, since initial fixation durations and refixation durations also tend to be shorter than single fixation durations, the mean of multiple fixations of difficult targets can be shorter than the mean of single fixations of easy targets. (7) Total viewing durations Total viewing durations are computed by cumulating all fixation durations on a target, irrespective of whether it is fixated during first pass sentence reading or not. Total target viewing durations were first used by Inhoff (1983) to determine effects of text structure. Recently, they have also been used to determine the linguistic processing of individual words (e.g., Daneman, Reingold and Davidson, 1995; Perea and Pollatsek, 1998). Interpretation of the measure appears to be difficult. Perea and Pollatsek (1998) suggest that it may be sensitive to lexical analyses that may occur relatively late in
42
A. W. Inhoff & R. Radach
the word recognition process. According to Daneman et al. (1995), the measure is particularly sensitive to processes that occur after a word has been identified. Difficulties in the interpretation of the measure may be avoided by distinguishing between 'forward gaze durations1, i.e., gazes that accrue during first pass reading, and 'regressive gaze durations', i.e., gazes that accrue during second and subsequent word readings (see also Carpenter and Daneman, 1981). First pass reading can then be related to the ease of the word's linguistic analysis, and regressive gazes can be related to another linguistic process, e.g., the context-dependent adjustment of word meaning (Carpenter and Daneman, 1981) or the re-assignment of syntactic structure (Rayner and Frazier, 1987; 1989). (8) Repair time (or total repair time) Repair time is another example of an oculomotor measure that has been used to study 'complex' linguistic processes that follow the accessing of word meanings. It is computed by cumulating all consecutive fixations in between the first and last fixation on a word, irrespective of whether they are first or second pass readings and irrespective of whether these fixations are on the target or not (Daneman et al., 1995; Daneman and Reingold, 1993; Daneman and Stainton, 1991). For instance, the total repair time on the false homophone 'hare' in the phrase 'she had her long hare permed and ....' would comprise fixations 1 to 7, given the following fixation sequence: the initial fixation of 'hare' (1) followed by 'long' (2), 'had' (3), 'hare' (4), 'long' (5), 'permed' (6), 'hare' (7), 'blowing' (8), and no further fixations on 'hare'. According to Daneman et al. (1995) repair time is particularly sensitive to post-recognition processes that are invoked to discern sentence meaning when a context-inconsistent word is encountered ('hare' in the example). Consistency of experimental effects across measures Effect sizes may also differ across viewing duration measures. For instance, effects of a word's linguistic properties on total repair time may not be evident in first fixation or gaze durations. Since repair time includes the time spent re-reading the target and surrounding text, this discrepancy of effects may indicate that a particular linguistic property becomes relevant after the target's lexical representation had been activated (see also Chapter 3 for a discussion of discrepancies between viewing duration measures). In other instances, discrepancies between measures are more difficult to reconcile. Inhoff (1984) obtained different effects of contextual constraints on first fixation durations and gaze durations. Other studies reported effects of linguistic manipulations on first fixation durations but not on gaze durations (Inhoff and Rayner, 1986; Lima and Inhoff, 1985; Pollatsek et al., 1992) or the reversed pattern of effects with reliable effects in the gaze durations but not first fixation durations
Oculomotor measures
43
(Balota, Pollatsek and Rayner, 1985; Kennison and Clifton, 1995; Sereno and Rayner, 1992). In view of Rayner et al.'s (1996) findings, and other recent results, effect sizes should generally be larger when gazes are measured than when first fixations are measured, as effect size should increase with the number of fixations on a fixated target. From this perspective, effects of first fixation durations in the absence of robust effects on gaze durations appear particularly problematic. However, the inclusion of intra-word refixations in the gaze measure is likely to increase the variability of this measure. Consequently, effects that are robust in the first fixation durations may not be robust in the gaze durations.
Movement-related measures The reporting of viewing durations is often accompanied by the reporting of one or several of the following movement measures: the size of saccades that lead to a target fixation, the landing position(s) on the target, refixation rates, the size of the saccade that led to a target's refixation, the position of the refixation, and the size of the saccade leaving the target. A saccade to a target can be followed by a single fixation or by the first of several target fixations. Hence, similar to the viewing duration measures, different types of saccade size measure can be computed. Eye movements that lead to the re-reading of text can be measured and reported as regression rate, regression size, and regressive fixation location. Progressive and regressive saccades are usually reported in character units and landing position in terms of the fixated character position (for a specified word length). In spite of this plethora of potential measures, we confine our discussion to two measures: fixation locations and refixation rates (see Chapters 4, 6, 9 and 11 for a more detailed discussion of movement related processes and measures). Fixation position measures may be of diagnostic value for the study of cognitive processes. As noted before, there is now some consensus that eye movements resulting in target skipping are co-determined by the linguistic analysis of the skipped word, but the more general claim, that the planning of saccades to a subsequently fixated word is co-determined by linguistic analyses has remained controversial. The original claim made by Underwood and co-workers (Underwood, Bloomfield and Clews 1988; Underwood, Clews and Everatt, 1990) of an effect of parafoveally available semantic or lexical information on initial landing positions was contradicted failures to replicate (e.g., Rayner and Morris 1992, Radach et al., 1995). On the other hand, there is now evidence for small effects on the orthographic level (Inhoff, 1989), indicating that very unusual letter clusters may lead to modulations of saccade amplitudes (Hyona, 1995, for details, see Underwood, this volume). Inhoff, Briihl and Schwartz (1996; see also Hyona et al., 1989) also obtained larger saccades and a right-shifted fixation location when
44
A.W. Inhoff& R. Radach
morphologically complex targets were fixated than when length-matched monomorphemic control words were fixated. Some researchers have also recommended measurement of fixation locations because the linguistic analysis of a word may be a function of its initial fixation location (O'Regan and Jacobs, 1992). According to this view, any study of linguistic processes in which target viewing durations are not analysed as a function of target fixation location is likely to confound linguistic effects with fixation location effects. Even though the location of a fixation affects the duration of the ensuing fixation, there is currently no support for the view that this confounds linguistic processes. On the contrary, O'Regan et al. (1994) and Rayner, Sereno and Raney (1996), who determined the size of the word frequency effect as a function of fixation location, found that the two effects were completely additive; similarly, effects of morphological complexity on first fixation durations and gaze durations were completely independent of fixation location in our recent work (Inhoff, Briihl and Schwartz, 1996). Refixation rates constitute a movement related measure that has proven to be sensitive to cognitive variables. They are generally defined as the ratio of the frequency of multiple target fixations over the frequency of multiple and single target fixations during first pass reading. Refixations are of profound methodological importance, since the main constituent of linguistic gaze duration effects is the increase in refixation rate rather than a prolongation of individual fixation durations (Blanchard, 1989; O'Regan, 1990). Examination of refixation rates has yielded a number of important insights into oculomotor activity during reading. Specifically, refixation rates are relatively low when the initial fixation is near the target word centre (e.g., McConkie et al., 1989; O'Regan et al., 1994) and increases as the eyes move off the centre. Refixation rates are also sensitive to the linguistic properties of fixated words, occurring more often when a low-frequency word is fixated than when a high-frequency word is fixated. Refixations may be related to first fixation durations. First fixation durations on a difficult item may, for instance, be shorter than first fixations on a easy item but when this occurs, the difficult item is more likely to be refixated (DeGraef, personal communication). McConkie et al. (1989) have first described the relation between initial fixation position and refixation rate in terms of a U-shaped quadratic function, where word frequency influences the height of the curve but not its steepness. This dissociation of visuomotor and lexical influences on refixations has subsequently been confirmed by others (Vitu, O'Regan and Mittau, 1990; O'Regan et al., 1994; Rayner, Sereno and Raney, 1996). Radach and McConkie (Chapter 3) show that the positioning of refixations within words is guided by the same mechanisms as for initial fixations. Cognitive variables like word frequency have a substantial effect on the proportion of refixations within words but very little influence on refixation positions
Oculomotor measures
45
(Hyona, Niemi and Underwood, 1989; Radach, Inhoff and Heller 1997). Differing results in Pynte, Kennedy and Murray (1991) and Pynte (1996) may be due to the fact that in these studies initial fixation position was imposed experimentally. In view of our earlier discussion of binocular activity, it should be noted that vergence velocity may be co-determined by cognitive processes. Hendriks' (1996) recent results point to the possibility that vergence velocity during reading increases as the difficulty of a cognitive task increases.
The eye-mind and immediacy assumptions reconsidered Following McConkie and Rayner (1975) and Rayner (1975), a large number of studies have shown that useful linguistic information is obtained from the fixated word and from the next (parafoveal) word in the text (see Rayner and Pollatsek, 1989, for a review). Hence, the linguistic analysis of a target does not commence with its fixation but before that. Processing of a target is thus spatially distributed, making it often difficult to ascertain the extent to which a word is analysed while it is fixated and before it was fixated. Though there have been some attempts to provide measurement solutions to this problem (McConkie et al., 1979), effects of prefixation analyses on target viewing durations are generally neglected, the justification being that the linguistic analysis of a fixated word is independent of the linguistic analysis of the next word in the text and that the duration of a target word's (parafoveal) analyses, i.e., its analysis prior to fixation, constitutes a constant (Morrison, 1984; Inhoff et al., 1989). These assumptions may not be warranted, however. Recent results from the eye-movement laboratory at Binghamton (Inhoff, Briihl, and Starr, in preparation) indicate that viewing durations of a fixated target word can be affected by its semantic relationship to the next word in the text. In the critical experiment, readers fixated a word that was followed either by an unassociated word (e.g., fixation of 'mother's' followed by 'garden') or by an associated word ('mother's' followed by the fixation of 'father'). Under these conditions, fixation durations on the identical target words (mother's) were shorter in the associated condition, presumably because of parafovea-to-fovea cross-talk. Kennedy (Chapter 7) also obtained parafoveal-tofoveal cross-talk in a reading-like word recognition paradigm when fixated and parafoveal words shared word-initial orthographic letter sequences. Other results indicate that the duration of a target word's parafoveal analyses is variable, and that the extent to which a parafoveally visible target is analysed is determined by the ease of linguistic analysis during the pretarget fixations. For instance, readers obtained less useful linguistic information from a parafoveally visible target when a lowfrequency word was fixated than when a high-frequency word was fixated (Henderson and Ferreira, 1990; Kennison and Clifton, 1995). Eye movements
46
A. W. Inhoff & R. Radach
during target viewing can thus be affected by the target's prior parafoveal analysis and by parafovea-to-fovea crosstalk when the target is subsequently fixated. Conversely, readers may seek linguistic information from a previously fixated word. In one recent study (Briihl and Inhoff, 1995), eye-movement-contingent display changes were used to control the parafoveal visibility of a seven-character target word so that it was either fully visible, partially visible, or completely masked. In all instances, the target was completely visible when it was either directly fixated or when the eyes moved to the right of it. Under these conditions, readers skipped approximately 10% of the targets, irrespective of whether a target had been fully visible, partially visible, or fully masked prior to its skipping. Furthermore, inspection of oculomotor activity during the reading of the posttarget sentence fragment indicated that readers rarely — if ever — moved the eyes back to target words, even when they had been partially or fully masked prior to their skipping. Nevertheless, they were able to report sentence content when asked to do so. These observations suggested to us that readers obtained visual information from the word to the left of fixation and that word skipping was not conditional on the prior identification of the next word in the text. This view was corroborated by results of a recently completed experiment (Inhoff, Radach and Starr, in preparation; see also Binder, Pollatsek and Rayner, in preparation). In our experiment, target sentences were used that contained a critical 3-, 4-, or 5-character word pair. The two words of a pair were identical except for their first letter, e.g., 'hat' and 'mat'. During reading, one member of the pair, e.g., 'hat', was visible to the right of fixation until it was either fixated or skipped. As soon as the eyes moved to the right of the critical word, an eye-movement-contingent display change was implemented that replaced the previously visible member of the pair, 'hat', with its alternate, 'mat'; i.e., the identity of the critical word was changed after it had been fixated. After sentence reading was completed, readers were asked to identify the critical word(s) by choosing among three visually similar candidates, e.g., 'hat', 'mat', 'cat'. If readers obtained useful information solely to the right of fixation, then they should always select the word that had been visible to the right of fixation, hat in the example. However, the corresponding decision was made on only 60% of the trials. Furthermore, if readers obtained useful information only when the target was either fixated or to the right of fixation, then selection of the two remaining alternatives, mat or cat, in the example, should be equally likely. However, this was not the case. They selected the word that was visible after the critical word had been fixated — or skipped — on approximately 30% of the trials; the word that had never been shown during sentence reading was selected on approximately 10% of the trials. Converging evidence comes from an extensive corpus analysis of saccade landing positions. Radach and Kempe (1993) have shown that the initial saccade landing position on a word is substantially shifted to the left if a preceding short word was
Oculomotor measures
47
skipped. To account for this effect, Radach (1996) plotted distributions of fixation positions on the last three and the first three letters of two consecutive words. In the case of two medium-sized (5-7 letter) words the distribution was U-shaped with a minimum fixation frequency near the empty space between the two words. When the transition from a 3-letter word to a 5-7 letter word was plotted, the distribution was inverted U-shaped and did not show a local minimum. A possible explanation for this pattern of data is that specific parafoveal word combinations (e.g. short and long word, a parafoveal pattern likely to represent a noun phrase) can form a unified saccade target. Given a distribution of saccade landing positions over such a two-word target, readers may regularly obtain useful information to the left of the 'fixated' word within the target region. Other oculomotor effects have been reported that complicate the computation of target-specific oculomotor measures, be they first fixation durations, gaze durations, single fixation durations, regressive gaze durations, or some other measure. Fixation durations that precede the skipping of a word have been claimed to be longer than fixation durations that occur when the next word in the text is fixated (Pollatsek, Rayner and Balota, 1986; Hogaboam, 1983; see, however, McConkie, Kerr and Dyre, 1994, who did not obtain such an effect). If fixation durations that precede skipping were increased, then gaze durations and fixation durations for words that are followed by word skippings would overestimate processing demands. Conversely, spillover effects have been noted in the literature, where effects of the linguistic analysis of one word are found in the fixation duration of the following word (Balota et al., 1985). When this occurs, gaze durations and fixation durations will underestimate the demands of cognitive/linguistic analyses. A recent study (Perea and Pollatsek, 1998) reported a particularly striking example of spillover effects. Specifically, post-target fixation durations — but not target fixations — were longer when the previously fixated target word had an orthographic neighbour with a higher frequency of occurrence than when all of its orthographic neighbours had lower word frequency counts. Use of eye-movement-contingent display changes While eye movement measures provide distinct advantages over more traditional measures they are also subject to one particular methodological disadvantage: the reader rather than the experimenter has control over the viewing of text. This disadvantage can be offset, however, by the use of eye-movement-contingent display changes, pioneered by McConkie and Rayner (1975) and Rayner (1975). In this procedure, the location of a fixation is determined and used to show an experimenter defined segment of text. For instance, the method has been used to control the visibility of parafoveally available text by showing the reader the
48
A. W. Inhoff & R. Radach
currently fixated word and by masking all other words in the text. During reading, the masking of each word is removed as soon as it is fixated. These eye-movementcontingent display change techniques can also be used to control the visibility of fixated text (Rayner et al., 1981), the temporal availability of parafoveally visible or fixated text (Rayner et al., 1981; Rayner and Sereno, 1992), or fixation locations (for instance, by centering a fixated word around the current fixated location). In spite of its methodological promise, the technique has been used in relatively few laboratories. In a well-known article, O'Regan (1990) argued that the change of text displays during fixations will disrupt oculomotor activity because eyemovement-contingent display changes generate noticeable flicker and because phosphoric persistence of replaced characters decreases the contrast of subsequently shown characters. In response to these concerns, Inhoff et al. (1998) compared effects of eyemovement-contingent display changes as a function of display-change flicker frequencies. Display changes were implemented on a phosphor-free electroluminescent panel and on a conventional (phosphor-based) CRT. The results indicated that gaze durations, first fixation durations, and saccade size remained unaffected by flicker and phosphor persistence, affirming the methodological promise of the technique. Conclusion In spite of the increasing popularity and methodological promise of eye-movement measures, there are currently no measurement standards that define basic oculomotor events (fixations and saccades). Furthermore, processing assumptions that underlie the translation of oculomotor events into measures of cognitive processes are increasingly challenged. Nevertheless, there has been a distinct paucity of methodologically oriented studies, and it remains unclear whether some definitions of oculomotor events and some oculomotor measures are more effective than others. The lack of research on methodological issues is of particular concern in view of the marketing of ready-to-use oculomotor data-processing programs. The danger exists that the 'measurement solution' offered by one of these programs becomes a de facto standard merely by virtue of its ease of use, its affordability, its marketing, or its use by established laboratories. Acknowledgements Preparation of this chapter was supported by grant No 503870 from the NIMH and Grand No BMH1-CT94-1441 from the European Union under the BIOMED Programme. The authors are indebted to Dieter Heller for many helpful discussions and
Oculomotor measures
49
to Alan Kennedy, Peter De Graef and Simon Liversedge for valuable comments on earlier drafts of the manuscript.
References Balota, D.A., Pollatsek, A. and Rayner, K. (1985). The interaction of contextual constraints and parafoveal visual information in reading. Cognitive Psychology, 17, 364-390. Binder, K.F., Pollatsek, A. and Rayner, K. (in preparation). Extraction of information to the left of the fixated word in reading. Blanchard, H.E. (1985). A comparison of some processing measures based on eye movements. Acta Psychologica, 58, 1-15. Briihl, D. and Inhoff, A.W. (1995). Integrating information across fixations during reading: The use of orthographic bodies and of exterior letters. Journal of Experimental Psychology: Learning, Cognition and Memory, 21, 55-67. Carpenter, P.A. and Daneman, M. (1981). Lexical retrieval and error recovery in reading: A model based on eye fixations. Journal of Verbal Learning and Verbal Behavior, 20, 137-160. Carpenter, P.A. and Just, M.A. (1983). What your eyes do while your mind is reading. In: K. Rayner (Ed.), Eye Movements in Reading. Perceptual and Language Processes. New York: Academic Press, pp. 275-307. Collewijn, H., Erkelens, C.J. and Steinman, R.M. (1988). Binocular co-ordination of human horizontal saccadic eye movements. Journal of Physiology, 404, 157-182 Cunitz, R.J. and Steinman, R.M. (1969). Comparison of saccadic eye movements during fixation and reading. Vision Research, 9, 683-693. Daneman, M. and Reingold, E. (1993). What eye fixations tell us about phonological recording during reading. Canadian Journal of Experimental Psychology, 47, 153-178. Daneman, M., Reingold, E.M., and Davidson, M. (1995). Time course of phonological activation during reading: Evidence from eye fixations. Journal of Experimental Psychology: Learning, Memory and Cognition, 21, 884-898. Daneman, M. and Stainton, M. (1991). Phonological receding in silent reading. Journal of Experimental Psychology: Learning, Memory and Cognition, 17, 618-632. Deubel, H. and Bridgeman, B. (1995a). Fourth Purkinje image signals reveal eye-lens deviations and retinal image distortions during saccades. Vision Research, 35, 529-538. Deubel, H. and Bridgeman, B. (1995b). Perceptual consequences of ocular lens overshoot during saccadic eye movements. Vision Research, 35, 2897-2902. Ehrlich, K. and Rayner, K. (1981). Contextual effects on word perception and eye movements during reading. Journal of Verbal Learning and Verbal Behavior, 20, 641-655. Folk, J.R. and Morris, R. (1995). Multiple lexical codes in reading: Evidence from eye movements, naming time, and oral reading. Journal of Experimental Psychology: Learning, Memory and Cognition, 21, 1412-1429. Garrod, S.O., O'Brien, E.J., Morris, R. and Rayner, K. (1990). Elaborative inferencing as an active or passive process. Journal of Experimental Psychology: Learning, Memory and Cognition, 16, 250-257.
50
A. W. Inhoff & R. Radach
Harris, C.M., Abramov, I. and Hainline, L. (1984). Instrument considerations in measuring fast eye movements. Behavior Research Methods, Instruments and Computers, 16, 341-350. Heller, D. (1983). Problems of on-line processing of EOG-data in reading. In: R. Groner, C. Menz, D.F. Fisher and R.A. Monty, (Eds.), Eye Movements and Psychological Functions: International Views. Hillsdale, NJ: Erlbaum, pp. 43-52. Heller, D. and R. Radach (1995). Binocular coordination in complex visual tasks. Perception 24, Suppl., 72. Henderson, J.M. and Ferreira, F. (1990). Effects of foveal processing difficulty on the perceptual span in reading: Implications for attention and eye movement control. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17,417-429. Hendricks, A. (1996). Vergence eye movements during fixations in reading. Acta Psychological, 131-151. Hogaboam, T.W. (1983). Reading patterns in eye movement data. In: K. Rayner (Ed.), Eye Movements in Reading: Perceptual and Language Processes. New York: Academic Press. Hyona, J., Niemi, P. and Underwood, G. (1989). Reading long words embedded in sentences: Informativeness of word halves affects eye movements. Journal of Experimental Psychology: Human Perception and Performance, 15, 142-152. Hyona, J. (1995). Do irregular letter combinations attract reader's attention? Evidence from fixation locations in words. Journal of Experimental Psychology: Human Perception and Performance, 21, 68-81. Inhoff, A.W. (1983). Attentional strategies during the reading of short passages. In: K. Rayner (Ed.), Eye Movements in Reading: Perceptual and Language Processes. New York: Academic Press, pp. 181-192. Inhoff, A.W. (1984). Two stages of word processing during eye fixations in the reading of prose. Journal of Verbal Learning and Verbal Behavior, 23, 612-624. Inhoff, A.W., Lima, S.D. and Carroll, P.J. (1984). Effects of context on metaphor comprehension. Memory and Cognition, 12, 558-567. Inhoff, A.W., Briihl, D. and Schwarz, J. (1996). Compound words effects differ in reading, on-line naming, and delayed naming tasks. Memory and Cognition, 24, 466—476. Inhoff, A.W., Briihl, D., and Starr, M. (in preparation). Parafovea-to-fovea priming during eye fixations in reading. Inhoff, A.W., Starr, M., Liu, W., and Wang, J. (in preparation). Eye-movement-contingent display changes are not compromised by flicker and phosphor persistence. Psychonomic Bulletin and Review. Inhoff, A.W., Pollatsek, A., Posner, M.I. and Rayner, K. (1989). Covert attention and eye movements during reading. Quarterly Journal of Experimental Psychology, 4 la, 63-89. Inhoff, A.W. and Rayner, K. (1986). Parafoveal word processing during eye fixations in reading: Effects of word frequency. Perception and Psychophysics, 40, 431—439. Inhoff, A.W. and Topolski, R. (1994). Use of phonological codes during eye fixations in reading and in on-line and delayed naming tasks. Journal of Memory and Language, 33, 689-713. Inhoff, A.W., Radach, R., and Starr, M. (in preparation). Use of information to the left of fixation during reading.
Oculomotor measures
51
Just, M.A. and Carpenter, P.A. (1978). Inference processing during reading: Reflections from eye fixations. In: J.W. Senders, D.F. Fisher, R.A. Monty (Eds.), Eye Movements and the Higher Psychological Functions. Hillsdale, NJ: Erlbaum. Just, M.A. and Carpenter, P.A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87, 329-354. Kennison, S.M. and Clifton, C. (1995). Determinants of parafoveal preview benefit in high and low working memory capacity readers: Implications for eye movement control. Journal of Experimental Psychology: Learning, Memory and Cognition, 21(1), 68-81. Konieczny, L. (1996). Human sentence processing: a semantics-oriented parsing approach. IIG-Reports 3/96, Universitat Freiburg, Institut fur Informatik und Gesellschaft. Kowler, E. and Steinman, R.M. (1980). Small saccades serve no useful purpose. Vision Research, 20, 273-276. Lima, S.D. and Inhoff, A.W. (1985). Lexical access during eye fixations in reading: Effects of word initial letter sequence. Journal of Experimental Psychology: Human Perception and Performance, 13, 272-285. McConkie, G.W. (1981). Evaluating and reporting data quality in eye movement research. Behavior Research Methods and Instrumentation, 13, 97-106. McConkie, G.W. (1983). Eye Movements and Perception during Reading. In: K. Rayner (Ed.), Eye Movements in Reading: Perceptual and Language Processes. New York: Academic Press. McConkie, G.W., Hogaboam, T.W., Wolverton, G.S., Zola, D. and Lucas, P.A. (1979). Toward the use of eye movements in the study of language processing. Discourse Processes, 2, 157-177. McConkie, G.W., Kerr, P.W. and Dyre, B.P. (1994). What are 'normal' eye movements during reading: Toward a mathematical description. In: J. Ygge and G. Lennerstrand (Eds.), Eye Movements in Reading. Oxford: Elsevier, pp. 315-327. McConkie, G.W. and Rayner, K. (1975). The span of the effective stimulus during reading. Perception and Psychophysics, 17, 578-586. McConkie, G.W., Reddix, M.R., and Zola, D. (1992) Perception and cognition in reading: Where is the meeting point. In: K. Rayner (Ed.), Eye Movements and Vision Cognition: Scene Perception and Reading New York: Springer, pp. 293-303. McConkie, G.W., Wolverton, G.S. and Zola, D. (1984). Instrumentation considerations in research involving eye-movement contingent stimulus control. In: A.G. Gale and F. Johnson (Eds.), Theoretical and Applied Aspects of Eye Movement Research. Amsterdam: Elsevier/North Holland. McConkie, G.W., McConkie, G.W., Zola, D. and Wolverton, G.S. (1985). Estimating frequency and size of effects due to experimental maniplations in eye movement research. In: R. Groner, G.W. McConkie and C. Menz (Eds.), Eye Movements and Human Information Processing. Amsterdam: Elsevier, pp. 137-147. Morrison, R.E. (1984). Manipulation of stimulus onset delay in reading: Evidence for parallel programming of saccades. Journal of Experimental Psychology: Human Perception and Performance, 10, 667-682. Mueller, P.U., Cavegn, D., d'Ydewalle and Groner, R. (1993). A comparison of a new limbus tracker, corneal reflection technique, Purkinje eye tracking and electro-oculography. In: G. d'Ydewalle and Van Rensbergen, J. (Eds.), Perception and Cognition.
52
A. W. Inhoff & R. Radach
Advances in Eye Movement Research. Amsterdam: Elsevier/North Holland. Nodine, C.F., Kundel, H.L., Toto, L.C. and Krupinski, E.A. (1992). Recording and analyzing eye-position data using a microcomputer workstation. Behavior Research Methods, Instruments and Computers, 24, 475-485. O'Regan, J.K. (1990). Eye movements and reading. In: E. Kowler (Ed.), Eye Movements and Their Role in Visual and Cognitive Processes: Reviews of Oculomotor Research. Amsterdam: Elsevier. O'Regan, J.K. (1992). Optimal viewing position in words and the strategy-tactics theory of eye movements in reading. In: K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading. New York: Springer. O'Regan, K., Pynte, J. and Coeffe, C. (1986). Comment le regard explore un mot isole (How does the eye inspect isolated words?). Bulletin de Psychologic, 39, 7-10. O'Regan, J.K. and Jacobs, A.M. (1992). Optimal viewing position effects in word recognition: A challenge to current theory. Journal of Experimental Psychology: Human Perception and Performance, 18, 185-197. O'Regan, J.K., Vitu, R, Radach,R., and Kerr, P.W. (1994). Effects of local processing and oculomotor factors in eye movement guidance in reading. In: J. Ygge and G. Lennerstand (Eds.), Eye Movements and Reading. Oxford: Pergamon, pp. 329-348. Perea, M. and Pollatsek, A. (in preparation). The effects of neighborhood frequency in reading and lexical decision. Journal of Experimental Psychology: Human Perception and Performance. Pollatsek, A., Rayner, K. and Balota, D.A. (1986). Inferences about eye movement control from the perceptual span in reading. Perception and Psychophysics 40, 123-130. Pollatsek, A., Lesch, M., Morris, R. and Rayner, K. (1992). Phonological codes are used in integrating information across saccades in word identification and reading. Journal of Experimental Psychology: Human Perception and Performance, 18, 148-162. Pynte, J. (1996). Lexical control of within-word eye movements. Journal of Experimental Psychology: Human Perception and Performance, 22, 958-969. Pynte, J., Kennedy, A. and Murray, W.S. (1991). Within-word inspection strategies in continuous reading: Time course of perceptual, lexical and contextual processes. Journal of Experimental Psychology: Human Perception and Performance, 17, 458-470. Radach, R. (1996). Blickbewegungen beim Lesen. Psychologische Aspekte der Determination von Fixationspositionen. (Eye movements in reading. Psychological aspects of fixation position control). Miinster/New York: Waxmann. Radach, R., Heller, D. and W. Jaschinkski (1996). Binocular coordination, fixation disparity and ocular dominance. Perception, 25, Suppl., 67. Radach, R., Heller, D. and Inhoff, A. (1997). Blickbewegungen and kognitive Prozesse: Stand und Perspektiven (eye movements and cognitive processes: current issues and developments). In: H. Mandl (Ed.), Bericht ueber den 40. Kongress der Deutschen Gesellschaft fuer Psychology (Proceedings of the 40th Congress of the German Psychological Society). Goettingen: Hogrefe. Radach, R. and Kempe, B. (1993). An individual analysis of initial fixation positions in reading. In: G. d'Ydewalle and J. van Rensbergen (Eds.), Perception and Cognition. Advances in Eye Movement Research. Amsterdam: Elsevier/North-Holland, pp. 213225.
Oculomotor measures
53
Radach, R., Krummenacher, J. Heller, D. and J. Hofmeister (1995). Individual eye movement patterns in word recognition: perceptual and linguistic factors. In: J. Findlay et al. (Eds.), Eye Movement Research. Amsterdam: Elsevier/North-Holland. Rayner, K. (1975). The perceptual span and peripheral cues in reading. Cognitive Psychology, 7, 65-81. Rayner, K. and Duffey, S.D. (1986). Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity and lexical ambiguity. Memory and Cognition, 14,191-201 Rayner, K. and Frazier, L. (1987). Parsing temporarily ambiguous complements. Quarterly Journal of Experimental Psychology, 39a, 657-673. Rayner, K., Inhoff, A.W., Morrison, R., Slowiaczek, M.L. and Bertera, J.B. (1981). Masking of foveal and parafoveal vision during eye movements in reading. Journal of Experimental Psychology: Human Perception and Performance, 7, 167-179. Rayner, K. and Morris, R. (1992). Eye movement control in reading: Evidence against semantic preprocessing. Journal of Experimental Psychology: Human Perception and Performance, 18, 163-172. Rayner, K. and Pollatsek, A. (1989). The Psychology of Reading. Englewood Cliffs, NJ: Prentice Hall. Rayner, K., Sereno, S.C. and Raney, G.E. (1996). Eye movement control in reading: A comparison of two types of models. Journal of Experimental Psychology: Human Perception and Performance, 22 (5), 1188-1200. Rayner, K. and Well, A. (1996). Effects of contextual constraints on eye movements in reading. Psychonomic Bulletin and Review, 3, 504-509. Sereno, S.C., and Rayner, K. (1992). Fast priming during eye fixations in reading. Journal of Experimental Psychology: Human Perception and Performance, 18, 173-184. Underwood, G., Bloomfield, R. and Clews, S. (1988). Information influences the pattern of eye fixations during sentence comprehension. Perception, 17, 267-278. Underwood, G., Clews, S. and Everatt, J. (1990). How do readers know where to look next? Local information distributions influence eye fixations. Quarterly Journal of Experimental Psychology, 42a, 39-65. Vitu, F., O'Regan, J.K. and Mittau, M. (1990). Optimal landing position in reading isolated words and continuous text. Perception and Psychophysics, 47, 583-600. Vitu, F., O'Regan, J.K., Inhoff, A.W. and Topolski, R. (1995). Mindless reading: Eye movement characteristics are similar in scanning letter strings and reading text. Perception and Psychophysics, 57, 352-365.
This page intentionally left blank
55
CHAPTER 3
Eye Movements and Measures of Reading Time Simon P. Liversedge University of Durham Kevin B. Paterson University of Nottingham and Martin J. Pickering University of Glasgow
Abstract In this chapter, we consider the use of reading time measures that sum fixation durations in order to gain an understanding of the difficulty experienced when reading texts. We draw a distinction between two approaches to summing the duration of fixations. One approach is to sum the duration of fixations that are spatially contiguous in the text, meaning that the fixations neighbour each other in a specified region of space. The other approach is to sum fixations that are temporally contiguous, meaning that they occur in a sequence over a specified period of time. It is argued that both types of reading time measure are needed if the experimenter is to understand the time course of the influence of a linguistic variable on readers' processing of text. We first discuss a number of hypothetical eye movement records in order to illustrate the differential sensitivities of qualitatively different eye movement measures. We then report an eye movement experiment investigating how people process reduced relative clause sentences with and without the focus operator only in order to examine the utility of different reading time measures. The results showed that measures summing temporally contiguous fixations can make an important contribution to the experimenters' understanding of the precise pattern of eye movements which occur when a problem is encountered in the text.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
56
S.P. Liversedge, K.B. Paterson & M.J. Pickering
Introduction Over the last century researchers have monitored people's eye movements in order to investigate various aspects of vision. One area in which eye movement methodology has proved of particular benefit is that of reading research. Psychologists working in this area often conduct experiments in which they record the eye movements subjects make as they read text (usually from a computer screen). They then compute from the eye movement record how long people took to read different portions of the text and use this information to draw inferences about the underlying psychological processes. Importantly, there is an underlying assumption that, on the whole, there is a close correspondence between the pattern of eye movements made by a reader and the mental processes needed to understand the text that they are currently inspecting. Thus, the direction of gaze indicates (to a large extent at least) what part of the text is being currently processed, and the time taken to process the text is indicative of the ease with which processing occurred. These assumptions are warranted by the considerable evidence that the linguistic properties of text have a direct influence on the time it takes to read that text (see Rayner and Pollatsek, 1989, for a review). However, there is not a perfect yoking of eye and mind. Research suggests that readers are able to at least partially process text that they have not yet encountered (see Balota and Rayner, 1991, for a review; also, other chapters in this volume). Furthermore, if a portion of text causes the reader difficulty, then this difficulty can "spill over" and affect processing of subsequent text (e.g. Ehrlich and Rayner, 1983). One of the major advantages of eye movement methodology over alternative reading time measures (e.g. self-paced reading) is that the experimenter can separate those fixations that were made when a region was first read from fixations that were made later in the eye movement record. This is important as it allows experimenters to determine when a characteristic of the text first influenced processing and therefore permits them to make inferences about the time course of processing during text comprehension. When text is read, the eye makes a sequence of fixations that are separated by saccadic movements. Studies have shown that information is extracted from the text during the fixation, but not during the saccades (again see Rayner and Pollatsek, 1989, for a review). Consequently, most (though not all) experimenters analysing eye movements treat the time spent fixating the text, and not the time spent making saccades, as a measure of reading time. Very often researchers sum fixations that readers make on portions of text according to several different algorithms. Thus, it is quite usual for researchers to consider a number of different reading time measures rather than considering only individual fixations. In this chapter we will focus on the different types of measures that have been widely used in psycholinguistic experiments. Our principle claim will be that reading time measures may retain the property of either spatial or temporal contiguity (or both),
Reading time measures
57
Fig. 1. Hypothetical eye movement record one.
and when analysing a reader's eye movement record it is important to consider measures which sum temporally contiguous fixations and also measures which sum spatially contiguous fixations in order to avoid failing to detect effects (Liversedge and Pickering, 1995; Liversedge, Pickering and Traxler, 1996). We will explain what we mean by spatial and temporal contiguity below and use hypothetical data to illustrate our point. We will then present data from an experiment conducted by Paterson, Liversedge and Underwood (1998) to demonstrate that simply using measures which retain the property of spatial contiguity results in a failure to detect certain effects. Figure 1 shows a hypothetical eye movement record containing a sequence of nine fixations. Each of these fixations had a duration of between 196 and 320 milliseconds, and the sum of the duration of these fixations represents the overall amount of time during spent reading the sentence. The location of each fixation is represented by a small circle and the lines between circles indicate the trajectories of saccades made between each successive fixation. The sentence is the same as one used by Paterson et al., and has been divided into six regions, each spanning one or two words of the sentence. As would be expected, the reader starts by fixating the first noun phrase of the sentence. Paterson et al. were principally interested in the time subjects spent reading the verb invited in Region Four as this is the verb that syntactically disambiguates the sentence and is likely to be the first point at which disruption might be detected. Therefore we will often refer to Region Four as the critical region. In most eye movement experiments investigating reading, researchers construct experimental sentences which are designed to cause a reader difficulty as they are read, and counterpart control sentences which are usually very similar but are constructed such that they do not cause the reader problems. The experimenter then compares reading times for regions of the experimental sentences with reading times for equivalent regions of control sentences. In this way, the experimenter is able to observe the degree of disruption to normal reading the experimental manipu-
58
S.P. Uversedge, K.B. Paterson & M.J. Pickering
lation induced. When measuring processing difficulty, researchers generally report three reading time measures: first fixation duration in a critical region along with first pass reading time and total reading time for the critical region, and other regions of interest (see Rayner et al., 1989, for a general discussion). The first fixation duration in a particular region of text, as we might expect, is simply the time the reader spent initially fixating the region. This measure is generally taken to be the very earliest point at which we might expect to see an effect due to the experimental manipulation, as this is the first time the reader has directly fixated the region in which disruption to processing is anticipated. In Fig. 1, the first fixation duration in Region Four, the critical region, is 270 ms. It is a common finding that the duration of the first fixation is sensitive to processing difficulty experienced immediately on reading that word. For example, studies have shown that subjects have a longer first fixation for words that have a low frequency of occurrence in the language than words with a high frequency of occurrence, when word length is controlled (e.g. Inhoff and Rayner, 1986; Rayner and Duffy, 1986; Raney and Rayner, 1995). Similarly, readers have a longer first fixation for a word that disambiguates a sentence in favour of an initially dispreferred syntactic analysis, as compared to the same word in an unambiguous version of the sentence (e.g. Rayner, Carlson and Frazier, 1983; Murray and Liversedge, 1994). However, the first fixation duration measure does not always provide an indication of initial processing difficulty. Quite often the region of text that is predicted to cause difficulty contains either a long word, or several words, and is likely to be fixated more than once during the first sweep of the eyes through the sentence. In such a situation all of these fixations can contribute to initial processing, in which case a more sensitive measure may be first pass reading time. First pass reading time (or gaze duration, if the region for which the measure is computed contains a single word) is defined as the sum of all the fixations made in a region until the point of fixation leaves the region either to the left or to the right. From Fig. 1 it can be seen that the gaze duration for Region Four is 513 ms (obtained by summing 270 ms and 243 ms). If effects are found on measures of either first fixation duration or gaze duration in one condition relative to another, the experimenter will usually conclude that difficulty was experienced immediately on processing that region of text. Total reading time sums all fixations made within a region of text, including those fixations made when re-reading the region. The total reading time for Region Four in Fig. 1 is 748 ms, obtained by summing the first pass fixations in the region (i.e. 270 + 243 ms), and the duration of the fixation made on the region following a regressive saccade from Region Five (i.e. 235 ms). If an effect is observed for total reading time on a region, but not for earlier measures such as first fixation duration or first pass reading time, then this is generally taken as an indication of the manipulation having a relatively late effect on processing.
Reading time measures
59
In addition to these reading time measures, researchers often provide a measure of the number of regressive saccades a reader makes from the critical region. This is usually done by computing the probability of a reader making a first pass regression (i.e. a regressive saccade that terminates the summation of fixations for first pass or gaze duration reading time). Such a measure is also usually taken to indicate that the reader is experiencing difficulty when processing the critical word. However, the probability of making a first pass regression does not provide an index of the time a reader spends on earlier portions of a sentence after making a regression. For instance, it is possible that a reader may regress to an earlier region of text an equal number of times under different conditions of an experiment but spend more time re-reading text under one condition than another. To a large extent, the reading time measures (as defined above) have been widely adopted as "industry standards" within the psycholinguistic community. However, twenty years ago there was far less standardisation of measures than there exists today. In fact, early studies frequently failed to provide unambiguous definitions of the measures which were used. In recent years, there has also been some debate concerning the relative merits of different measures of disruption to processing during reading (e.g. Kennedy et al., 1989; see also Altmann, Garnham and Dennis, 1994; Altmann, 1994; but see Rayner and Sereno, 1994a,b). However, it is important to note that all these studies conducted comparative analyses of different measures in an attempt to resolve questions regarding the interpretation of data with respect to an underlying psycholinguistic theory. By contrast, in this article we make a comparison between reading time measures in order to address a theoretical question concerning the properties of the measures themselves. To our knowledge, there are few, if any, published articles to date that explicitly set out to do this. Reporting a combination of measures, such as first fixation duration, first pass and total reading times minimises the possibility that an experimenter may fail to detect an effect. However, it does not rule out this possibility (see Konieczny, Hernforth and Scheepers, 1997). The effectiveness of this approach critically depends upon what eye movement behaviour occurs when a reader detects a problem. Consider the options that are available to a reader (in terms of eye movements) when they encounter a word in a sentence which causes them difficulty. First, the reader could remain fixating the problematic word until their difficulty is resolved. Secondly, the reader could make a regressive saccade to permit them to re-read earlier parts of the text in an attempt to work out where they went wrong. Finally, the reader might make a rightwards saccade in order to read the next region of the sentence in the hope that this may help them resolve their difficulty. Most discussion about the pattern of eye movements made on encountering processing difficulty during reading has been concerned with what happens when reading sentences that are difficult to parse. However, many linguistic phenomena other than garden path sentences induce processing difficulty, and since we wish to
60
S.P. Liversedge, K.B. Paterson & M.J. Pickering
Fig. 2. Hypothetical eye movement record two.
keep the discussion of reading time measures in this chapter at a more general level we will refrain from associating particular mental processes with specific patterns of eye movement behaviour. Instead, we will consider the differential sensitivities of each of the measures in relation to different patterns of eye movement behaviour that may occur as a result of experiencing processing difficulty. The simplest situation is where the reader makes a saccade to the right after detection of disruption. In such a situation, assuming that fixations in the critical region are not inflated due to detection of disruption, no measure of initial processing will detect an effect for that region as in such a situation there is no observable cost to processing. In such a situation any effects would presumably be detected on the subsequent region. Figure 2 shows a situation in which the reader has remained fixating the region that caused them difficulty. In this case the reader makes three successive fixations on the critical region. In such a situation it seems likely that all three measures described above would detect the disruption the reader experienced. Summing the fixations on the region would inflate the reading times for that region relative to reading times for the same region of the control sentences where presumably the reader would spend less time. It is important to note that under these circumstances the increased reading time may be due to both detection of the problem and reanalysis processes that permit recovery from the processing difficulty. It is the third scenario, in which a reader makes a leftward saccade upon encountering the problematic word, that we find most interesting. In such a situation, it is possible that upon reading the problematic word, the reader may make a long fixation, or even a series of fixations prior to making a regression. If this occurs, then as with the situation depicted in Fig. 2, the reading time measures described above will detect the effect. However, in the situation where the reader detects a problem and makes an immediate regression, such measures will fail to detect disruption because there may be no difference in the first pass reading time for the critical region of experimental and control sentences. Indeed, there may even
Reading time measures
61
Fig. 3. Hypothetical eye movement record three.
be a shorter first pass reading time for the critical region in the condition that introduced processing difficulty than in the control condition. Importantly, in such a situation, the reading time measures described above would not detect disruption to processing and researchers may fail to detect an effect. After a reader has made a regression, there are a large number of possible patterns of eye movements they may make. Below we will consider several of these possibilities and point out important differences between them. This discussion should illustrate that if a researcher considers only first fixation, first pass and total reading time measures, then they may be unable to make claims about when an experimental manipulation influenced processing after detection of a misanalysis. That is to say, these measures alone provide only limited information regarding the time course of processing. Consider Fig. 3 which depicts one of the most simple patterns of eye movements a subject could make after a regression from a problematic region. Having made a leftwards saccade from the critical region, the reader makes a single fixation on text in Region Two, prior to making a rightwards saccade in order to fixate as yet uninspected text. In such a situation first fixation duration, gaze duration and total reading time for the verb will be no different to the control sentences. The only way in which an experimenter may detect disruption to processing is by computing the total time for Region Two which will be slightly inflated due to the fixation of 243 ms. At this stage, two important points should be made. First, computing total reading times for each of the regions of a sentence should ensure that effects such as these are detected. They would only be missed if gaze duration and total reading times were computed for Region Four alone. Secondly, it might be argued that if a researcher computed the number of regressions a subject made from Region Four during the first pass, then they would also detect such an effect. However, our point is slightly more subtle than this. We do not dispute that the effects depicted in Fig. 3 could be detected without recourse to measures of reading time defined in novel ways. What we do claim, however, is that spatially
62
S.P. Uversedge, K.B. Paterson & M.J. Pickering
Fig. 4. Hypothetical eye movement record four.
contiguous measures of reading time and regressions do not necessarily provide an indication of the first point at which a variable has an influence on reading time after a subject has made a regression. To expand upon this point, consider Fig. 4. The eye movement record in Fig. 4 shows that upon encountering the disambiguating word, the reader makes a long regressive saccade to the first noun phrase of the sentence and made a series of fixations to re-read this portion of the sentence. After the fourth fixation during re-inspection (214 ms), the reader then makes a long saccade to Region Five and continues to read the sentence normally. This situation is interesting as, presumably, all the four fixations in Region One reflect the processes involved in recovery after detection of misprocessing. As with the situation shown in Fig. 3, disruption of this type would be detected if the total reading times for each of the regions was computed. However, note that the total time measure would include fixations made during the first pass of the eyes through the sentence along with the fixations made after the regressive saccade from Region Four. That is to say, the total reading time measure is not a measure of recovery time alone, but instead is a mixture of initial processing time and recovery time. This is an important point as will become clear if we consider the hypothetical data shown in Fig. 5. Figure 5 shows a pattern of data where the reader makes a series of fixations, each in a different region after reading the problematic word. In this situation, the problem of failing to detect an effect is exacerbated. The problem here is twofold in that not only are the fixations made during reanalysis grouped with fixations made during the initial analysis of the sentence, but also, because fixations made during reanalysis were not made within the same region, they will never be grouped together to provide a measure of recovery from disruption. Thus, effects which might have occurred during reanalysis could be substantially weakened due to noise from first pass fixations included in the total time measure for a region and also due
Reading time measures
63
Fig 5. Hypothetical eye movement record five.
to reinspection fixations being considered in isolation rather than being grouped together. An important characteristic of the reading time measures discussed so far is that they sum spatially contiguous fixations. That is, the fixations which are summed for first pass reading time and total reading time all occur within the same region of a sentence. It is the property of spatial contiguity which can result in measures of this kind failing to provide an indication of the time course of a variable having an effect on processing during recovery, and possibly even failing to detect an effect, because in such a situation each subsequent fixation is not spatially contiguous with its predecessor. In order to circumvent this possibility, we require a measure which sums not spatially contiguous fixations, but temporally contiguous fixations. That is to say, we need a measure that groups fixations in terms of when they occur in relation to each other in time rather than in spatial location. Two measures that retain the property of temporal rather than spatial contiguity are regression path reading time (Konieczny, 1996) and re-reading time1. We define regression path reading time as the sum of all the fixations from the first fixation in a region up to but excluding the first fixation to the right of this region. This measure provides an index of the time a subject spent detecting a problem and then re-reading the text prior to fixating novel linguistic material. The regression path reading time for Region Four in Fig. 5 is 1337 ms (comprising the fixations of 270,243,214,310 and 300 ms). Re-reading time is defined as the regression path reading time for a region less the first pass reading time for a region. This measure simply provides an index of the time a subject spent re-reading the text after encountering a problem, but before they make an eye movement to fixate words to the right of the problematic 1 A number of measures exist which are the same or similar to regression path reading time. See for example, Brysbaert and Mitchell, 1996; Clifton, Kennison and Albrecht, 1997; Liversedge, 1994; Konieczny et al., 1997.
64
S.P. Liversedge, K.B. Paterson & M.J. Pickering
region. Note that these measures will usually group together the fixations a reader makes after they have encountered difficulty. Additionally, re-reading time does not include fixations made as the sentence is initially processed. By using measures which sum temporally contiguous fixations in parallel with measures such as first pass and total reading time which sum spatially contiguous fixations, we should ensure that we do not fail to detect subtle effects during recovery. Also, if recovery from disruption is an ongoing process, then by summing temporally contiguous fixations the experimenter might obtain a clearer picture of the nature of this process over time. We stress that we are not advocating that measures summing temporally contiguous fixations will always be better at detecting effects which occur during recovery than measures which sum spatially contiguous fixations. Rather, we are suggesting that in some circumstances measures summing temporally contiguous fixations may be more sensitive to the time course of effects than measures summing spatially contiguous fixations. We therefore suggest that the two types of measure should be used together. This will maximise the possibility that an experimenter will detect any effects occurring as the reader reinspects text. It will also permit experimenters to examine the nature of eye movement behaviour that occurs after a reader experiences difficulty during sentence comprehension. Consider Fig. 6: in this scenario, upon encountering the disambiguating region, the reader makes a regressive saccade and a brief fixation on the preceding word before making a rightwards saccade to continue reading the sentence its entirety. Only when they have inspected the whole of the sentence does the reader spend a substantial period of time re-reading portions of the sentence that preceded the disambiguating verb. This may be contrasted with the scenario depicted in Fig. 5, in which the reader spends a substantial amount of time re-reading portions of text preceding the disambiguating verb before inspecting the remainder of the sentence. The point is that the total reading times for Regions Two, Three and Four for the hypothetical eye movement records depicted in Figs. 5 and 6 will be the same, yet the patterns of eye movements made in the two scenarios are qualitatively different. It is our contention that such differences are likely reflect differences in the processes occurring during recovery. Such differences would not be detected if the experimenter relied solely upon reading time measures which sum spatially contiguous fixations. In advocating the use of additional measures, we are aware that a preponderance of statistically dependent measures can result in inflated Type I error rates, and the possibility that different measures could produce conflicting results. We thank Alan Kennedy and Wayne Murray for bringing these points to our attention during the Chamonix workshop. This is certainly true of the existing measures. First fixation, first pass reading time and total reading time are all statistically dependent measures, as a summation of the same fixations contribute to all three measures of reading time. Also, Altmann, Garnham and Dennis (1992) report two studies in which
Reading time measures
65
Fig. 6. Hypothetical eye movement record six.
measures of first pass reading time and the probability of a reader making a first pass regression favoured competing accounts of initial sentence processing. Of the two new measures discussed in this paper, only the regression path reading time includes a sum of fixations that also contribute to other measures of reading time for a region. The re-reading time measure is statistically independent of first fixation, first pass and total reading times for the region of interest. Furthermore, because regression path and re-reading times sum temporally contiguous aspects of the eye movement record, any discrepancy between these measures and measures of spatially contiguous fixations will be informative about the pattern of eye movements that occurred. Our central claim is that reading time measures which sum spatially contiguous fixations (first fixations, first pass reading times and total reading times), do not necessarily provide data to permit the experimenter to make strong claims about the time course of disruption to processing and possible recovery. We therefore advocate the use of two further reading time measures in psycholinguistic experiments: regression path reading time and re-reading time. We now demonstrate the utility of this approach by considering a recent experiment by Paterson, Liversedge and Underwood (1998). Experiment In this experiment, Paterson et al. monitored eye movements as subjects read a series of reduced and unreduced relative clause sentences. Reduced relative clause sentences like (1) are temporarily ambiguous between two syntactic analyses: a reduced relative clause reading in which the phrase allowed a party modifies the subject noun-phrase (i.e. the teenagers), and a simple active reading in which the
66
S.P. Liver sedge, K.B. Pater son & M.J. Pickering
noun-phrase a party is a direct object argument of the verb allowed. Such sentences are disambiguated in favour of the reduced relative clause reading on encountering the verb invited. Unreduced relative clause sentences (e.g. 2) are unambiguous due to the inclusion of a relative pronoun and auxiliary verb (i.e. who were). (1) The teenagers allowed a party invited a juggler straightaway. (2) The teenagers who were allowed a party invited a juggler straightaway. It is a well established finding that readers often experience processing difficulty when reading reduced relative clause sentences in isolation (e.g. Bever, 1970; Frazier and Rayner, 1982). Most accounts assume that readers initially adopt a simple active reading of the reduced relative clause sentence, and on encountering the disambiguating verb must reanalyse the sentence in terms of the dispreferred relative clause reading. Readers are said to have been 'garden-pathed' when they are forced to reanalyse the sentence in favour of an initially dispreferred syntactic analysis. Our experiment tested a recent claim by Ni, Grain and Shankweiler (1996) that the inclusion of the focus operator only will guide the initial processing of reduced relative clause sentences such as (3), and so enable the reader to avoid the garden path that is normally experienced when reading such sentences. (3) Only teenagers allowed a party invited a juggler straightaway. Ni et al.'s claims are derived from the Referential theory of sentence processing (Grain and Steedman, 1985; Altmann and Steedman, 1988), according to which the nature of the referential context can guide initial processing decisions regarding structural ambiguities. Ni et al propose that when readers process the subject noun-phrase only teenagers in sentences like (3), they construct a mental representation in which a focused set of teenagers is contrasted with some other set. However, as no contrast set is explicitly mentioned in the preceding discourse (because the sentence is presented in isolation) the reader must infer one. Ni et al. argue that for reasons of parsimony, the reader will infer that there are two sets of teenagers, and must anticipate further modifying information, such as a relative clause, in order to specify the nature of the difference between these two sets. Furthermore, they claim that anticipating modifying information will influence the initial processing of sentences like (3), such that readers will preferentially adopt the reduced relative clause reading of the ambiguous phrase allowed a party, and will not be garden pathed when encountering the disambiguating verb. While not disputing that the referential properties of only will influence the processing of the sentence, in contrast with Ni et al., we predicted that only would not guide initial processing decisions (see Paterson, Liversedge and Underwood, 1998, for a full discussion). We anticipated that the reader would be garden-pathed immediately on encountering the disambiguating verb of a sentence like (3).
Reading time measures
67
However, we hypothesised that the referential properties of only would facilitate reanalysis, so that readers would find it easier to recover from the garden path in (3) than in (1). We will report the duration of the first fixation and gaze duration as measures of initial processing that is spatially localised at the disambiguating verb. Method Subjects Thirty-two students from the University of Nottingham participated in this experiment. All subjects had normal and uncorrected vision and were paid £4. Materials and design Subjects read a series of thirty-six sentences that were either temporarily ambiguous reduced relative clause sentences or unreduced relative clause sentences which were disambiguated by the inclusion of a relative pronoun and auxiliary verb (i.e. who were). The sentences began with one of two determiner types: either the definite article (the) or a focus operator (only). Examples of these sentences are given in sentences (4) to (7) below. (4) The teenagers/ allowed a/ party/ invited/ a juggler/ straightaway. (5) Only teenagers/ allowed a/ party/ invited/ a juggler/ straightaway. (6) The teenagers who were/ allowed a/ party/ invited/ a juggler/ straightaway. (7) Only teenagers who were/ allowed a/ party/ invited/ a juggler/ straightaway. Prior to analysis the sentences were divided into six regions that spanned one or two words in the sentence. Region divisions are indicated by a slash in sentences (4) to (7). The critical region of interest was Region Four, which contained the disambiguating verb. Apparatus and procedure Eye movements were monitored using a SRI Dual-Purkinje Generation 5.5 eyetracker produced by Fourward Technologies. The tracker monitored subjects' gaze location every millisecond and the software sampled the tracker's output to establish the sequence of eye fixations and their start and finish times. Before the start of the experiment, subjects read an explanation of the eye-tracking procedure and a set of instructions. They were instructed to read at their normal rate and to read to comprehend the sentences as well as they could. Subjects were then seated at the eye-tracker and placed on a bite-bar and under forehead restraint to minimise head movements. Subjects then completed a calibration procedure.
68
S.P. Liversedge, K.B. Paterson & M.J. Pickering
Before each trial, a fixation cross appeared near the upper-left-corner of the screen. Immediately subjects fixated this cross, the computer displayed a target sentence, with the first character of this sentence replacing the fixation cross. This also served as an automatic calibration check, as the computer did not display the text until it detected a stable fixation on the cross. If subjects did not rapidly fixate the cross, the experimenter re-calibrated the eye-tracker. The experiment was conducted in two blocks, with a short intervening break while the experimenter set-up the equipment for the second block, and subjects were calibrated at the beginning of both blocks, with other re-calibrations performed every eight materials to maintain a high level of accuracy. This meant that the eyetracker was calibrated a minimum of 10 times during the experiment. Once subjects had finished reading each sentence, they pressed a key, and the computer displayed a comprehension question. Comprehension questions followed all of the experimental and filler trials. Half of these questions had "yes" answers, and half had "no" answers. Subjects responded to the comprehension questions by fixating either the word "yes" or "no". These words were presented on the left and right hand sides of the screen below the comprehension question. Subjects' responses were recorded by the experimenter without feedback. The computer displayed each experimental list in a fixed Latin Square order, together with 32 fillers that were materials for an unrelated experiment, and an additional 11 items that appeared at the beginning of the two halves of the experiment, and following each of the breaks for re-calibration. Results and discussion Prior to analysis, an automatic procedure pooled short contiguous fixations. Fixations of less than 80 ms were incorporated into larger fixations found within one character, and fixations of less than 40 ms that were not within three characters of another fixation were deleted. We removed those trials where either subjects failed to read the passage properly, or where there had been track loss. More specifically, those trials were removed in which a zero first pass reading time was recorded for two consecutive regions of text. This accounted for 3.0% of the data. The data were analysed for all six regions of text. As the disambiguating verb provided the point at which we expected to detect garden path effects, the reading time results for Region Four are reported below. We also report the total reading times for Regions Two and Three. The data were analysed using 2 (Determiner) x 2 (Sentence Structure) ANOVAs, treating subjects and items as random variables. The mean first fixation duration, gaze duration, total reading times, regression path reading times and re-reading times for the disambiguating verb, along with the total reading times for Region Two and Three are given in Table 1.
Reading time measures
69
Table 1 First fixation, gaze duration, total reading times, regression path reading times, and re-reading times for regions of reduced and unreduced relative clause sentences beginning with either the or X1W/A)
The
Only
Measure
Reduced
Unreduced
Reduced
Unreduced
First fixation duration (ms) for disambiguating verb
219
192
191
173
Gaze duration (ms) for disambiguating 272 verb
238
250
213
Total reading time (ms) for Region 2
549
337
444
328
Total reading time (ms) for Region 3
386
296
321
307
Total reading time (ms) for disambiguating verb
446
310
355
261
Regression path reading time (ms) for 426 disambiguating verb
270
318
278
Re-reading time (ms) for disambiguating verb
31
68
65
155
First fixation duration There was a significant main effect of Sentence Structure (Fj(l,31) = 6.5, p < 0.05, MSe = 2626; F2(l,35) = 4.4, p < 0.05, MSt = 3868), with a longer first fixation on the disambiguating verb when it was part of a reduced as compared to an unreduced relative clause sentence. There was also a significant main effect of Determiner (F,(l,31) = 14.0, /?< 0.001, MSe = 1292; F2(l,35) = l 1.6, p<0.01,MS e = 1550), with a longer first fixation at this region when sentences beginning with the as compared to only. Yet, crucially, there was no interaction of Sentences Structure and Determiner (F < 1). The results demonstrated that subjects encountered a garden path on the first fixation on the disambiguating verb of reduced relative clause sentences, and that this effect occurred regardless of whether the sentence began with the or only. The garden path effect on the duration of the first fixation on the critical verb is in line with previous findings (Rayner, Carlson and Frazier, 1983; Murray and Liversedge, 1994). For present purposes, the main effect of Determiner is not important. However, the lack of an interaction demonstrates that the inclusion of only did not guide initial parsing of reduced relative clause sentences.
70
S.P. Liversedge, K.B. Paterson & M.J. Pickering
Gaze duration There was a significant main effect of Sentence Structure (F,( 1,31) = 11.0, p < 0.01, MSC = 3605; F2( 1,35) = 6.3, p < 0.05, MSe - 6217), with a longer gaze duration on the disambiguating verb of reduced than unreduced sentences. There was also a significant main effect of Determiner (F,(l,31) = 5.6, p < 0.01, MSe = 3216; F2(l,35) = 7.6, p< 0.01, M5e = 2776), such that there was a longer gaze duration for sentences beginning with the compared with only. There was no interaction of Sentence Structure and Determiner (F< 1). The results were very similar to those of the first fixation duration, and showed that subjects were garden-pathed when reading reduced relative clause sentences despite the inclusion of the focus operator only. The first fixation and gaze duration results are clear. Subjects experienced processing difficulty on reading the disambiguating verb of reduced relative clause sentences, regardless of the inclusion of the focus operator. However, it was possible that the inclusion of only may have facilitated reanalysis for the reduced relative clause sentences at a later point in the reading process. If this was the case then we should observe an interaction between Sentence Structure and Determiner on the total time measure. We examined the total readings times for Regions 2 and 3 along with the disambiguating region. Total reading time At the disambiguating verb, there was a significant main effect of Sentence Structure (F,(l,31) = 38.2, p < 0.001, M5e = 11022; F2(l,35) = 31.8, p < 0.001, M5e = 14083), with a longer total reading time for the disambiguating verb of reduced as compared to unreduced sentences. There was also a significant main effect of Determiner (F,(l,31)= 13.2,p<0.01,MSe= 11828; F2( 1,35) = 27.8, p< 0.00 l,AfSe = 6471), such that there was a longer total reading time for sentences beginning with the as compared to only. However, there was no reliable interaction of Sentence Structure and Determiner (F,(l,31) = 1.8, p > 0.05, MSe = 7824; F2(l,35) = 1.8, p > 0.05, MSe = 8203). The total reading times for the disambiguating verb have same pattern as the first fixation duration and gaze duration results. There was a longer total reading time for reduced than unreduced sentences, and a longer reading time for sentences beginning with the as compared to only. Yet, importantly, there was no interaction of Determiner and Sentence Structure that might be expected if only had facilitated recovery from the garden path effect experienced when reading reduced relative clause sentences. At Region Two, there was a significant main effect of Sentence Structure (F,(l,31) = 69.4, p< 0.001, M5e= 12477; F2(l,35) = 81.7, p< 0.001, MSC= 12075), with a longer total reading time for reduced than unreduced sentences. There was also a significant main effect of Determiner (F,( 1,31) = 20.6, p < 0.001, MSe = 4993; F2(l,35) = 12.5, p < 0.001, MSC = 8541), with a longer total reading time for
Reading time measures
71
sentences beginning with the than only. There was a significant interaction of Sentence Structure and Determiner (F^l,31) = 9.0,p < 0.01, M5e = 8175; F2(l ,35) = 11.3, p < 0.01, M5e = 7771). An analysis of simple effects showed that there was a longer total reading time for reduced as compared to unreduced sentences beginning with the (F,(l,31) = 88.3, p < 0.001, M5e = 8175; F2(l,35) = 106.9, p < 0.001, MSe =7771), and for reduced as compared to unreduced sentences beginning with only (F,(l,31) = 26.6, p < 0.001, M5e = 8175; F2(l,35) = 31.3, p < 0.001, MSe = 7771). A similar pattern of effects occurred for total time on Region Three. There was a significant main effect of Sentence Structure (F,(l,31) = 13.4, p < 0.001, MSe = 6492; F2(l,35) = 12.8, p < 0.01, MSe = 7531), with a longer total reading time for reduced than unreduced sentences. There was an effect of Determiner that was significant by subjects and marginal by items (Fj(l,31) = 5.0, p < 0.05, MSe = 4607; F2(l ,35) = 3.8, p < 0.06, MSe = 6164), such that there was a longer total reading time for sentences beginning with the than only. Finally, there was a significant interaction of Sentence Structure and Determiner (F,(l,31) = 4.3, p< 0.05, MSe = 10462; F2( 1,35) = 8.1, p < 0.01, MSe = 7007). Simple effects showed that there was a longer total reading time for reduced as compared to unreduced sentences beginning with *MF,(1,31) = 12.2, /?<0.01,M5e= 10462; F2(l,35) = 21.5,/?<0.001,MSe = 7007), but no difference between reduced and unreduced sentences beginning with only (F < 1). The results for Region Two and Three produced the predicted interaction of Sentence Structure and Determiner. There was a longer total reading time for Regions Two and Three of reduced as compared to unreduced relative clause sentences beginning with the. There was also a longer total reading time for Region Three of reduced as compared to unreduced sentences beginning with only. At this point we have evidence to suggest that while initial processing was not guided, reanalysis was facilitated by the inclusion of only in reduced relative clause sentences. More time was spent re-reading text contained in Regions Two and Three of reduced relative clause sentences beginning with the, and text contained in Region Three of the reduced relative clause sentences beginning with only, after encountering a problem at the disambiguating verb at Region Four. It seems likely that readers re-read these earlier portions of text in order to recover from the misanalysis experienced at the disambiguating verb, but these total reading times alone are difficult to interpret in two respects. It is not possible to determine if, upon encountering the disambiguating verb, subjects immediately made a regressive saccade and spent considerable time re-fixating earlier portions of the text prior to reading as yet uninspected text. Or alternatively, if subjects read beyond the disambiguating region before making a regressive saccade to re-read earlier portions of the text. This is exactly the point we highlighted in our discussion of the hypothetical eye movement records in Figs. 5 and 6. To remind the reader, regression path reading times indicate the time subjects spend reading the disambiguating region, and re-reading text prior to the dis-
72
S.P. Liver sedge, K.B. Pater son & M.J. Pickering
ambiguating region, before inspecting text to the right. Also, re-reading times indicate the time subjects spent inspecting text immediately after making a regression from the disambiguating region, but before fixating text to the right of the disambiguating region. Using these measures we should be able to establish whether readers spent time re-reading the beginning of the sentence immediately upon encountering the disambiguating verb, or whether they returned to re-read earlier portions of the sentence after they had read the sentence in its entirety. If subjects did re-read immediately upon encountering the verb, then we should obtain a significant interaction between Sentence Structure and Determiner for both regression path and re-reading time measures. Alternatively, if subjects read the sentence in its entirety before re-reading the text, then there should be no interaction. Regression path and re-reading time The regression path measure showed a significant main effect of Sentence Structure (F,(l,31) = 27.1, p< 0.001, M5 e =l 1418; F2( 1,35) = 22.2, p< 0.00 l,MSe = 14262), with a longer regression path reading time for reduced as compared to unreduced sentences. There was also significant main effect of Determiner (F,(l,31) = 19.6, p < 0.001, MSe = 4077; F2( 1,35) = 11.9, p < 0.01, MSe = 7363), such that there was a longer regression path reading time for sentences beginning with the than only. Finally, there was a significant interaction of Sentence Structure and Determiner (F,(l,31) = 10.8, p < 0.01, M5e = 10098; F2(l,35) = 12.3, p < 0.01, MSe = 9781). There was a significantly longer regression path reading time for reduced as compared to unreduced sentences beginning with the (F,(l,31) = 38.9, p < 0.001, MSe = 10098; F2(l,35) = 42.3, p < 0.001, MSt = 9781), but no difference in reading time for reduced and unreduced sentences beginning with only (F,(l,31) = 2.5, p > 0.05, MSe = 10098; F2(l,35) = 2.4, p > 0.05, MSe = 9781). The re-reading time showed a significant main effect of Sentence Structure (F,(l,31) = 20.4, p < 0.001, MSe = 6252; F2(l,35) = 13.8, p < 0.001, MSe = 9661), such that more time was spent re-reading earlier portions of reduced relative clause sentences. There was a main effect of Determiner (F,(l,31) = 9.7, p < 0.01, MSe = 2241 \ F2(l,35) = 5.8, p < 0.05, MSe = 3945), with more time spent re-reading to earlier portions of sentences beginning with the than only. There was also a significant interaction of Sentence Structure and Determiner (F,(l,31) = 8.3, p < 0.01, MSe = 13965; F2(l,35) = 13.9, p < 0.001, MSe = 9415). Simple effects showed there was a longer re-reading time for reduced as compared to unreduced sentences beginning with the (F,(l,31) = 17.4, p < 0.001, MSC = 13965; F2(l,35) = 28.0, p < 0.001, MSC = 9415). However, there was no difference in the re-reading time for reduced and unreduced sentences beginning with only (F < 1). The interaction between Sentence Structure and Determiner was significant for both these reading time measures, and we obtained the same pattern of results as for the total reading time for Region Three. This pattern of reading times suggests that
Reading time measures
73
upon encountering the disambiguating region of reduced relative clause sentences containing the, subjects made a regressive saccade in order to spend more time re-reading preceding portions of the sentence than they did when they read reduced relative clause sentences containing only. The results show that recovery was initiated immediately upon detecting a problem at the disambiguating region, and furthermore, that during the recovery process readers spent time re-inspecting text they had already read prior to fixating as yet unread text. We suggest that for these garden path sentences subjects attempted structural reanalysis immediately upon encountering the syntactically disambiguating verb. The analysis of the temporally contiguous fixations made an important contribution to our understanding of the pattern of eye movements which occurred immediately after subjects read the disambiguating verb. Only by examining reading times which summed temporally contiguous fixations were we able to determine when during the course of processing subjects spent time re-reading the sentence. In this case re-reading occurred immediately upon a reader encountering the problem, rather than occurring after the sentence had been read in full. Conclusion The results of our experiment confirmed that the focus operator only did not guide initial parsing decisions when reduced relative clause sentences were read. The inclusion of only did, however, facilitate reanalysis after detection of the initial misanalysis. Our conclusion is that the inclusion of a focus operator does not cause a reader to anticipate modifying information and therefore the focus operator does not guide parsing decisions for reduced relative clause sentences. Turning to the reading times measures, we have provided an argument in favour of the use of reading time measures which sum temporally contiguous fixations. We advocate the use of these measures in conjunction with existing reading time measures which sum spatially contiguous fixations. By using measures which sum spatially contiguous fixations and measures which sum temporally contiguous fixations, the experimenter is better able to determine the time course of the influence of linguistic variables on processing, and distinguish between qualitatively different types of eye movement behaviour which may occur when processing difficulty is experienced. This approach enabled a fuller understanding of the data obtained in the present experiment. We suggest that the use of these measures will prove valuable in the interpretation of results obtained in other eye movement experiments.
74
S.P. Liversedge, K.B. Paterson & M.J. Pickering
Acknowledgements Special thanks are due to Keith Edwards of the Psychology Department at the University of Glasgow who developed software to support the reading time measures described in this paper. We also thank Wayne Murray, Franchise Vitu and an anonymous reviewer for providing comments on an earlier version of this paper. This research was completed when the first author was a Research Fellow at the University of Nottingham.
References Altmann, G.T.M. and Steedman, M. (1988). Interaction with context during human sentence processing. Cognition, 30, 191-238. Altmann, G.T.M. (1994). Regression-contingent analyses of eye during sentence processing: Reply to Rayner and Sereno. Memory and Cognition, 22, 286-290. Altmann, G.T.M., Garnham, A. and Dennis, Y. (1992). Avoiding the garden path: Eye movements in contexts. Journal of Memory and Language, 31, 685-712. Balota, D.A. and Rayner, K. (1991). Word recognition processes in foveal and parafoveal vision: The range of influence of lexical variables. In: D. Besner and G.W. Humphreys (Eds.), Basic Processes in Reading, Hillsdale, NJ: LEA. Bever, T.G. (1970). The cognitive basis for linguistic structures. In: J.R. Hayes (Ed.), Cognition and the Development of Language. New York: Wiley. Brysbaert, M. and Mitchell, D.C. (1996). Modifier attachment in sentence parsing — Evidence from Dutch. Quarterly Journal of Experimental Psychology, 49, 664-695. Clifton, C., Kennison, S.M. and Albrecht, J.E. (1997). Reading the words her, his, him: Implications for parsing principles based on frequency and on structure. Journal of Memory and Language, 36, 276-292. Grain, S. and Steedman, M. (1985). On not being led up the garden path: The use of context by the psychological syntax processor. In: D.R. Dowty, L. Kartunnen and A.M. Zwicky (Eds.), Natural Language Parsing: Psychological, Computational, and Theoretical Perspectives. Cambridge: Cambridge University Press. Ehrlich, K. and Rayner, K. (1983). Pronoun assignment and semantic integration during reading: Eye-movements and immediacy of processing. Journal of Verbal Learning and Verbal Behavior, 22, 75-87. Ferreira, F. and Clifton, C. (1986). The independence of syntactic processing. Journal of Memory and Language, 25, 75-87. Frazier, L. and Rayner, K. (1982). Making and correcting errors during sentence comprehension: Eye movements in the study of structurally ambiguous sentences. Cognitive Psychology, 14, 178-210. Inhoff, A.W. and Rayner, K. (1986). Parafoveal word processing during eye fixations in reading: Effects of word frequency. Perception and Psychophysics, 40, 431-439. Kennedy, A., Murray, W.S., Jennings, F. and Reid, C. (1989). Parsing complements: Comments on the generality of the principle of minimal attachment. Language and Cognitive Processes, 4, S151-S176.
Reading time measures
75
Konieczny, L. (1996). Human sentence processing: a semantic oriented parsing approach. Freiburg: IIG Berichte 3/96. Konieczny, L., Hemforth, B. and Scheepers C. (1997). Head position and clause boundary effects in reanalysis. Paper presented at the 10th Annual CUNY Sentence Processing Conference, March, 1997, Santa Monica, CA, USA. Liversedge, S.P. (1994). Referential context, relative clauses and syntactic parsing. Unpublished PhD thesis. University of Dundee. Liversedge, S.P. and Pickering, M. (1995) A comparison of eye movement measures in language processing. Paper presented to the Eighth European Conference on Eye Movements, Derby, September 1995. Liversedge, S.P., Pickering, M.J. and Traxler, M.J. (1996). A comparative analysis of qualitatively different eye movement measures. Poster presented at the Ninth Annual CUNY Conference, New York, March, 1996. Murray, W.S. and Liversedge, S.P. (1994). Referential context effects on syntactic processing. In: C. Clifton Jnr., L. Frazier and K. Rayner (Eds.), Perspectives on syntactic sentence processing. Hillsdale, NJ: Erlbaum. Ni, W., Grain, S. and Shankweiler, D. (1996). Sidestepping garden paths: The contribution of syntax, semantics and plausibility in resolving ambiguities. Language and Cognitive Processes, 11,283-334. Paterson, K.B., Liversedge, S.P. and Underwood, G. (1998). The influence of focus operators on syntactic processing of short relative clause sentences. Quarterly Journal of Experimental Psychology, in preparation. Raney, G.E. and Rayner, K. (1995). Word frequency effects and eye movements during two readings of a text. Canadian Journal of Experimental Psychology, 49, 151-172. Rayner, K. and Duffy, S.A. (1986). Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory and Cognition, 14, 191-201. Rayner, K. and Pollatsek, A. (1989). The Psychology of Reading. Englewood Cliffs, NJ: Prentice Hall. Rayner, K. and Sereno, S.C. (1994a). Regressive eye movements and sentence parsing: On the use of regression-contingent analyses. Memory and Cognition, 22, 281-285. Rayner, K. and Sereno, S.C. (1994b). Regression-contingent analyses: A reply to Altmann. Memory and Cognition, 22, 291-292. Rayner, K., Carlson, M. and Frazier, L. (1983). The interaction of syntax and semantics during sentence processing: Eye movements in the analysis of semantically biased sentences. Journal of Verbal Learning and Verbal Behavior 22, 358-374. Rayner, K., Sereno, S.C., Morris, R.K., Schmauder, A.R. and Clifton, C. (1989). Eye movements and on-line language comprehension processes. Language and Cognitive Processes, 4, 21-50.
This page intentionally left blank
77
CHAPTER 4
Determinants of Fixation Positions in Words During Reading Ralph Radach Technical University of Aachen George W. McConkie University of Illinois at Urbana-Champaign
Abstract This chapter begins by reviewing previous findings concerning where the eyes land (i.e., landing positions) in words following progressive saccades during reading. It then reports the results from an examination of landing positions of German readers in two previously unexamined situations: intraword saccades (refixations) and interword regressive saccades (regressions). No evidence was found to support the common distinction between intra-and inter-word progressive saccades; landing positions in refixations are continuous with those in interword progressive saccades. In contrast, interword regressive saccades do not show the normal linear relation between launch site and mean landing position in the word that is observed in other conditions. In all cases, eye movement control during reading appears to be word-based.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
78
R. Radach & G. W. McConkie
Introduction Although the eye-movement pattern made while reading a passage is infinitely complex, at one level it can be represented simply as a series of individual decisions about when and where to move the eyes. The where decision, at least in reading languages having spaces between words, appears to be word-based: that is, each saccadic eye movement is intended to take the eyes to a specific word location (McConkie and Zola, 1984).1 If this is so, then understanding where the eyes actually land in the text requires two theories: first, a selection theory which indicates how one word rather than another is selected as the target of a saccade, and, second, a performance theory which indicates where the eyes land given the selection of a target word for a particular saccade. In this chapter we will deal primarily with the latter of these two issues: the landing positions of the eyes within words. McConkie, Kerr, Reddix and Zola, (1988) conducted a quantitative analysis of the landing positions of the first saccades into a word (initial fixation positions), using a corpus of 40,000 fixations based on a sample of 66 readers. Included in the analysis were three factors of potential influence, word length, saccade launch distance and prior fixation duration. These authors concluded that early visual processes must parse the text into word-units, probably on the level of low spatial frequency objects that are separated by empty spaces. This provides candidates that could serve as the target of the next saccade. On some basis, which they did not investigate, a word-unit is selected and a saccade is launched to take the eyes to it. Where the eyes actually land (i.e., the distribution of saccade landing positions, or initial fixation positions) can be accounted for by a small set of basic visuomotor principles, which they attempted to explicate. McConkie et al. (1988), like many others (see reviews in O'Regan 1990, 1992), assume that there is a functional target location near the word center to which the eyes are being sent. O'Regan (1990) suggests that this is a result of learning that the center of the word is an "optimal viewing position" for word identification. He suggests that the position in each word from which it can be most readily identified varies according to the information structure within the word, but that, on average, the word center is optimal. However, the actual distribution of landing positions in words of different lengths, often referred to as the "preferred viewing position" (Rayner, 1979), shows an inverted U shape, with a maximum somewhat to the left of
1 There are exceptions to this generalization. For example, Hofmeister (1997) has data indicating that landing positions following return sweeps (saccades to the next line of text) are distributed relative to the beginning of that line, without influence of word locations. Also, it has been proposed (Radach, 1996) that some saccades may be sent to word groups, particularly when a relatively long word is preceded by a short function word. Further research is necessary to explore this possibility.
Fixation positions in reading
79
the word center. Thus, on most fixations the eyes are not at the optimal location. McConkie et al. proposed that this is the result of two sources of error in the visuomotor system. A "saccadic range error" leads to a linear relation between saccade launch distance and the mean landing position, which we will refer to as the "landing position function". When saccades are launched from a distance of about seven letters relative to the word center (a figure close to the average saccade length in reading), the resulting normal distribution of landing positions has its maximum close to the center of the word. When saccades are launched from locations closer to the target word, landing sites are shifted to the right (overshoot); when they are launched from more distant locations, landing positions are shifted to the left (undershoot). The second source of error is a "random placement error" that causes the spread in landing positions and that leads to an increase in this spread or variance with distance of the center of the target word from the launch site.2 In summary, the "preferred viewing position" in a word is seen, not as a basic oculomotor phenomenon, but as the result of several combined factors of influence (McConkie et al., 1988; see also McConkie, Kerr, Grimes and Zola, 1990; McConkie, Kerr and Dyre, 1994). This chapter begins with a summary of factors that determine the locations of initial fixation positions on words following forward intra-word saccades during reading, as obtained from an extensive analysis of a corpus of German reading data (Radach, 1996). It then presents analyses of the same corpus, examining initial fixations on words following saccades that lead to refixations of words, and regressive intra-word saccades, to see if they show the same properties. It ends with a discussion of theoretical issues regarding eye movement control during reading, arguing that the eyes are sent to a selected word (discrete decision) rather than a certain distance (continuous decision), and that where the eyes land in the word is determined by oculomotor error factors rather than by certain alternatives that have been recently proposed. Methodology Eye movement data were collected from four participants (German-speaking graduate students of physics) as they read a German translation of the first two parts of the book Gulliver's Travels (about 160 book pages).3 The text was presented in
2 Saccade target undershoot, saccadic range effect and increase in variation for more distant targets are common phenomena in the literature on basic oculomotor and other motor control processes (Becker, 1989; Kapoula and Robinson, 1986; Poulton, 1981). 3 The data used in this study were collected by the first author while on a Fulbright scholarship at the University of Illinois in Champaign/Urbana. The authors gratefully acknowledge the support of Gary Wolverton, Paul Kerr and John Grimes.
80
R. Radach & G. W. McConkie
screen pages of five to seven double-spaced lines of up to 72 ASCII characters each on a 15-inch VGA monitor in negative polarity. At a viewing distance of 80 cm each letter corresponded to approximately 0.25° of visual angle. Participants were instructed to read the text at their normal pace in order to comprehend the main ideas and to be able to answer questions at the end of each of 32 text segments. Eye movements were recorded using a Generation 5 Dual-Purkinje eye tracker with a sampling rate of 1000 Hz. The algorithm used for saccade identification is described in McConkie, Wolverton and Zola (1984), and details on the calibration routine are reported in McConkie (1981). After excluding blinks, cases of track loss and fixations outside the page, matrices of 47989, 47826, 59857 and 64226 valid saccade-fixations pairs were available for the four participants, respectively. Orthographic and lexical variables (e.g., letter frequency and word frequency measures) were either generated on the basis of the text itself (48334 words) or imported from the German CELEX corpus (Celex, 1995). The method employed in this study is quasi-experimental, involving the analysis of a complete set of normal reading data. The key technique is the method of orthogonal sampling (Kliegl, Olson and Davidson, 1983) where a number of variables is controlled (held constant) within a subset of data while a target variable is systematically investigated by comparing cases with different values of that variable. This method has certain weaknesses. For example, effects may sometimes be influenced by variables not considered in the sampling scheme or mediated by hidden interactions. The method also has a number of strengths. In cases where aspects of behavior being studied cannot be experimentally manipulated, such as where the eyes go during free viewing, this method makes it possible to separate out the effects of different variables on behavior. Also, when the corpus includes very large data sets from different individuals, similarities and differences among participants can be investigated and problems commonly arising in group statistics can be avoided. This is true in particular for the present corpus, because it includes enough data to carry out a complete replication of the original study by McConkie et al. (n = 66) for each individual reader. Where the eyes go when initially fixating a word On the basis of the data corpus described above, we have replicated McConkie et al.'s analysis and extended it to include a broader range of variables (Radach, 1996; see also Radach and Kempe, 1993). In the following section we will concentrate on those aspects of our studies on initial fixation positions in words that are most relevant for ongoing discussions of theoretical issues and which also lay the ground for the analyses of refixations and regressions presented in the later part of the chapter.
Fixation positions in reading
81
As indicated above, we are convinced that initial saccades are being sent to specific words during reading. Furthermore, the control system seems to act as if a particular location in the word, near its center, were acting as the target for the saccade. At the same time, there is an "optimal viewing position" within a word from which the word is most easily identified (O'Regan, 1990). One way to identify this is to compute the probability of refixating the same word as a function of the initial landing position. The basic result is that the likelihood of making a second fixation on a word is minimal when the eyes first land close to the word center (O'Regan et al., 1984). The effect was first observed in normal reading by McConkie et al. (1989), who found that the U-shaped refixation curve can be well specified by a quadratic equation of the form: Y = A+B(X-Q2 where X is a fixation location within a word, Y is the proportion of fixations at that location that are followed by a refixation of the word, A is the vertical offset of the curve, B indicates the slope and C is the minimum point. This minimum point is the horizontal offset of the curve, the location where refixation frequency is minimal. The analysis of the German corpus data found considerable interindividual variability with respect to the actual distributions of landing positions (the "preferred viewing position"). For three participants, the landing positions vary around locations about halfway between word beginning and word center, whereas for one participant they are much closer to the word center. However, there is no comparable variation in the refixation curves. For all four individual participants and at all word lengths studied (5 to 9), the minimum of each refixation curve is within one letter position of the word center. Therefore, we consider the word center to be an estimate of an interindividually stable optimal viewing position. The actual frequency distribution of saccade landing positions for a given word length (the "preferred viewing position") is Gaussian in shape and can be fit well by a normal curve. The data in this global distribution can be partitioned into launch site contingent landing position distributions. As an example, one such distribution would indicate landing positions within a seven-letter word following saccades that come from a distance of ten letters left of the word center. These distributions are also normally distributed and, most importantly, there is a linear relation between launch distance and mean landing position, the "landing position function". In the 206 particular combinations of word length and launch site considered in our analysis, the average shift of mean landing position with each increment in launch distance (the slope of the landing position function) is 0.35, 0.33, 0.55 and 0.38 letters for our four participants. In regression analyses, performed on the individual subject's mean landing positions as a function of launch distance for word length 4-15, more then 90% of the variance is accounted for by a simple linear model.
82
R. Radach & G.W. McConkie Center based landing position
-21 -20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 - 9 4 - 7 - 6 - 5 - 4 -3 -2 Launch site relative to word beginning Fig. 1. Estimated mean landing positions of initial saccades as a function of launch site relative to the word center for 9-11 letter words. Each data point represents between 1054 and 133 observations. At about launch site -15 the peaks of the underlying landing site distributions start to move off the target word and the estimated means are more variable. For this data set the slope of the linear regression function is 0.53 (r2 = 0.96). Launch distances of 5 or less include cases where the saccade was launched from within the same word.
This key finding, which has been observed in several analyses of different sets of reading data in both English and German (see also Kerr, 1992; McConkie, Kerr and Dyre, 1994), is illustrated in Fig. 1. This figure was prepared in order to explore how far from the word center the linear landing position function is found. We have plotted the landing position function for words of length 9-11, based on pooled data from our four subjects. Data pooling was required in order to have a large enough sample size for the analysis. This word length range was selected for two reasons. First, words had to be relatively long to counter the tendency of landing site distributions to move off the left word boundary for very distant launch sites. Second, words had to be short enough to provide a sufficient number of observations. German text is well-suited to meet these constraints. As is apparent from Fig. 1, the mean landing positions become less stable at launch distance of-15 due to reduced sample sizes but even at -21 (n = 133) the linear trend is still present. We conclude from this analysis that for inter-word progressive saccades the landing position function is linear over the entire range of launch distances that is found in normal reading. Unlike McConkie et al. (1988), but in agreement with McConkie, Kerr and Dyre (1994), we also found a small effect of word length on mean landing positions for a given launch distance. The individual average increase (relative to the word beginning) in mean landing position was 0.16,0.18,0.14 and 0.11 letter position for each one-letter increase in word length. This is a much smaller effect than the launch
Fixation positions in reading
83
position effect reported above. In linear regression analyses, performed on the mean word length contingent landing positions for each launch site and participant, only a modest portion of the variance can be accounted for (r2 > 0.34, see Radach, 1996 for further details). The effect can be interpreted as a "center of gravity" phenomenon (Findlay, 1982), modulating the launch position effect. Such an interpretation would follow O'Regan's (1990) suggestion that the landing position saccades can be influenced by the presence of further elements in the critical visual configuration, in the present case additional letters belonging to the same target word (see also below our discussion of the relative importance of range-effect vs. "center of gravity" effect). A curious observation made during the analysis of the word length effect was that not only the length of the target word but also the length of the preceding word had an influence on initial landing positions. When further investigating this effect, we found that this was not caused by prior word length per se, but largely due to the fixation pattern on the preceding word (Radach and Kempe, 1993). With saccade launch position held constant, there is a substantial rightward shift in mean landing position in a word when the prior word received more than one fixation (effect size in the order of 0.5 letters) and a large leftward shift when the preceding word was skipped (effect size up to 1.5 letters). Radach (1996) suggests that the first effect may indicate that in the case of refixations it is more likely that a parafoveal word or letter cluster is being recognized and that the subsequent saccade is lengthened in a "skipping-like" fashion. The second effect, a leftward shift of initial fixation position after skipping a short word is more puzzling and has led Radach (1996) to consider the possibility that saccades may sometimes be aimed at units of two words in which case a small function word is not "skipped" but remains unfixated because it is part of the larger two-word target unit (see Chapter 2 for a discussion of the related issue of information acquisition from locations left of the actual fixation position). A further variable of potential influence on saccade landing positions is the duration of the previous fixation, the 'latency' of the initial saccade. This variable is of particular interest, because it provides a link to temporal aspects of eye movement control that may well significantly modulate spatial saccade parameters. Several hypotheses on the issue are feasible. McConkie et al. (1988) and O'Regan (1990) have proposed that when a preceding fixation is long, the following initial fixation locations converge towards the optimal viewing position. An alternative hypothesis could be that, due to increased parafoveal preprocessing, there should generally be a rightward shift of fixation positions after longer fixations (Pollatsek, Rayner and Balota, 1986).4 Although Pollatsek et al. do not explicitly state this hypothesis, it might be derived from their more general proposal that"... there are places in text where more complete processing of the material fixated (and that just to the right of fixation) takes more time but then leads to longer saccades".
84
R. Radach & G.W. McConkie
We studied the role of the preceding fixation duration for a total of 40 conditions obtained by orthogonal sampling within the German data corpus: five launch sites from -10 to -1 (number of letter positions to the left of the center of the target word, in two-letter increments), two word length ranges (5-6 vs. 7-9 letter words) and four participants. For each of these conditions, mean initial landing positions were compared for the condition-specific quartiles of prior fixation duration. These comparisons used non-parametric Kruskal-Wallis tests (equivalent to ANOVA's) and r-tests of the extreme quartiles in each condition. Interestingly, our analysis supported neither of these two hypotheses. Of the 12 conditions with significant effects of prior fixation duration on saccade landing positions, 11 showed an unexpected pattern. In these conditions landing positions were shifted to the right following shorter preceding fixations. This result is likely to be related to the fact that refixations tend to be of shorter duration as compared to single fixations (e.g. Kliegl, Olson and Davidson, 1983; Underwood, Clews and Everatt, 1990; O'Regan et al., 1994; Rayner, Sereno and Raney, 1996), and that saccades following refixations tend to be longer (see the fixation pattern effect described above). With these considerations in mind, one could suspect that the observed rightward shift after shorter fixation durations is equivalent to a rightward shift of landing position following longer gaze durations which typically result from cases that include more refixations. Indeed this hypothesis was confirmed as part of recent explorations into interrelations between spatial and temporal aspects of eye movement control (Radach and Heller, in preparation). An additional source of variation in initial fixation positions within words is the position of the target word within the current line of text. Mean landing positions in words at the beginning of the line are substantially shifted to the right and those in words at the end of the line are shifted to the left. These are relatively large effects, producing up to one letter position difference in mean landing position, depending on word length and launch distance. This underscores the importance of line-level information (line length, line distance, layout) for spatial navigation through a page of text (Heller, 1982). A comprehensive model of eye guidance in reading will need to consider these aspects, especially the functional significance of fixations preceding and following return sweep saccades (Hofmeister, 1997). The variables discussed so far (with the exception of prior viewing duration) were all "low-level" variables operating on the oculomotor and/or perceptual level. It is now generally agreed that these explain most of the variance in initial saccade landing position distributions (see, e.g., Chapter 11 for a converging perspective). However, in recent years there has been considerable controversy around the question of whether parafoveally available cognitive (semantic, lexical or sublexical) information can also influence initial landing positions. One popular variation of this idea is the "parafoveal guidance hypothesis" stating that saccades go
Fixation positions in reading
85
further into words that have a less informative (more redundant) word beginning (Underwood, Bloomfield and Clews, 1988; Underwood, Clews and Everatt, 1990; but see Chapter 9 for a more cautious view). We have tested the original parafoveal guidance hypothesis by looking at the same 40 conditions (5 launch distances x 2 word length ranges x 4 participants) as described above. For each condition we compared mean landing position as a function the quartiles of initial trigram 'informativeness' (i.e., number of word forms that share the same initial trigram). Among the 40 conditions there were only four significant differences and only two of these were in the direction predicted by the hypothesis. Two similar analyses was carried out with token trigram frequency and word frequency as the dependent variables, again with negative results (for a similar corpus analysis, see Rayner, Sereno and Raney, 1996). It could be argued that in our corpus analyses we may not have given cognitive variables a fair chance to show their influence on saccade landing positions. However, when we recomputed the range of trigram frequencies used by Radach (1996) and Radach and Kempe (1993), and compared it to values given by other researches (e.g. Liversegde and Underwood, Chapter 9) we found that the differences between our high and low trigram items are in the same range. Our results are in line with other failures to replicate word beginning informativeness and trigram frequency effects (Rayner and Morris, 1992; Radach et al., 1995). However, other studies exist that have found evidence for a small effect of orthographic manipulations on the mean landing position (e.g., Hyona, 1995). In summary, we have found that in the eye-movement data of four German readers the landing positions on words following progressive inter-word saccades is primarily determined by the locations from which the saccades originated, or launch sites, with smaller effects due to word length and position of the word on the line of text. In these data, neither the duration of the prior eye fixation, nor the informativeness of the initial trigram of the word affected these landing positions. The purpose of the study described below was to determine whether launch-sitecontingent intraword and regressive saccades show landing position distributions having the same properties as those obtained from interword progressive saccades. This is of particular interest given the strong distinction often made between interword and intraword (refixation) saccades, and between progressive and regressive saccades. O'Regan's (1990, 1992) Strategy-Tactics theory assumes a general scanning routine containing two components: (1) the eyes are sent to successive words in the forward (rightward in English and German) direction, and (2) there is a within word tactic of repositioning the eyes (refixating) when their initial fixation position is at a non-optimal location. Averaged over words, the word center is considered to be the optimal location for their viewing. Given that interword and intraword saccades are assumed to be controlled on quite different bases, we might expect their landing position distributions to show different characteristics.
86
R. Radach & G. W. McConkie
Similarly, Morrison's (1984) model of eye movement control during reading, as modified e.g. by Henderson and Ferreira (1990) and Pollatsek and Rayner (1990), makes a clear distinction between refixations and progressive interword saccades. The original version of this model included no mechanism for accounting for refixations; it has been necessary to suggest other mechanisms to account for these. Again, if refixations and interword progressive saccades are being generated by different mechanisms, it would be expected that they would show different characteristics. Finally, neither O'Regan's model nor Morrison's model include a clearly defined mechanism for generating interword regressive saccades; typically, they are explained as resulting from processing difficulties at higher cognitive levels that require a reconsideration of earlier text, thus resulting from quite a different set of circumstances than the 'normal' progressive interword saccades that are the basis for existing theories. If differences were found in the properties of the landing position distributions of refixations or regressions, as compared to progressive interword saccades, this would provide evidence for the psychological reality of the processing distinctions described above. On the other hand, if no differences are found, this would not eliminate the possibility of these distinctions, since it may be that the distinctions lie at some higher level, where selection of a saccade target is made, with the mechanism that produces the landing position distributions (the control system that gets the eyes to a selected target) being common to all of these processes. The present study examined our corpus of German reading eye movement data to determine whether the landing position distributions following interword progressive saccades, refixations, and interword regressive saccades do or do not show similar properties. It also examines the issue of whether eye movement control is discrete or graded, and considers two alternative hypotheses of how landing positions in selected words are determined. Refixations vs. progressive interword saccades In the literature on eye movement control in reading two alternative views on refixations have been proposed. According to the first view, refixations are initiated when cognitive processing is difficult in one or another respect. Such a processing difficulty can take several forms: it may be that lexical access is hampered or that due to cognitive overload at modules of syntactic or semantic analysis the oculomotor system is ordered to "slow down" and park the eyes on the current word for some additional time (for a review, see Rayner and Pollatsek 1989). To account for
Fixation positions in reading
87
reflations, Henderson and Ferreira (1990) supplemented Morrison's attentionbased eye guidance theory with the notion of an oculomotor deadline, in which a refixation is initiated if attention has not shifted to a new word. Contrary to this prediction, the first of two fixations on words has been shown to be shorter than single fixations (Kiegl, Olson and Davidson, 1983; Underwood, Clews and Everatt, 1990; O'Regan et al., 1994; Rayner, Sereno and Raney, 1996; Radach, Heller and Inhoff, 1997). Pollatsek and Rayner (1990) discuss a another possibility in relation to parallel interactive models of word recognition (e.g. Paap et al., 1982). They propose that the total "level of excitation" in the lexicon (including excitation from parafoveal preprocessing during prior fixations) may provide a cognitive base for refixation decisions (see also Chapter 11 for a new processing mechanism to account for refixations). According to the second view, refixations are based on visuomotor factors: deviations of the initial fixation position from the generally optimal viewing position result in an increase of refixation frequency (see above). In his "strategy and tactics theory" O'Regan (1990) proposes that the eyes, driven by a global, preprogrammed scanning routine, attempt to go to the "generally optimal" viewing position located close to the center of the word. If the eyes do not land at the optimal position, "lexical processing will begin, but may not be able to terminate, since information about certain letters in the word is lacking" (O'Regan 1990, p. 427). Therefore, a refixation will be initiated that will bring the eyes to the opposite end of the current word. "When the eye is on one side of the word, it goes to the other. When it is near the middle, it goes to either one or other end. In other words, the eye is not attempting to get to the optimal viewing position. Rather, it is attempting to spread its fixations evenly over the word." (O'Regan 1990, p. 427). Although O'Regan (1990, 1992) does not make exact quantitative predictions with respect to landing positions of within-word saccades, it is evident that he proposes a qualitative difference between saccades across word boundaries (based on the "strategy") and within-word saccades (based on "within-word rescue tactics"). Such a difference should be evident when saccade landing positions are plotted as a function of launch site relative to the word (the landing position function), allowing a comparison between cases with launch sites lying to the left of the target word (initial fixations) and cases in which the eyes are launched from within the word (refixations). Specifically, there should be some type of discontinuity in mean landing position at the transition point between inter-word to intra-word launch sites. The left panel in Fig. 2 shows the frequency with which progressive intraword saccades (refixations) are launched from different letter positions of 9- and 11-letter
88
R. Radach & G.W. McConkie
Fig. 2. Left panel: Frequency distributions of progressive refixation launch sites and refixation landing positions for 9-11 letter words. Please note that a refixation launch site is equivalent to the landing position of the prior initial fixation. The scale on the abscissa indicates letter positions relative to the center of the word. Right panel: Mean landing positions of saccades as a function of launch site. The abscissa indicates letter positions relative to the space immediately to the left of the word. Negative numbers indicate launch sites of initial progressive saccades going into the target word, whereas positive numbers indicate launch sites from within the target word (for these refixations observations are identical to those in the left panel). Pooled data for four participants. Notice that the curves appear continuous from progressive interword saccades to intraword (refixation) saccades.
words combined, relative to the center of the same word, and the frequency with which the eyes land at these locations following the refixation saccades.5 The great majority of progressive refixations are initiated from positions at or to the left of the center of the word. The right panel shows the landing position function on 8- to 5 There are two features of the right panel of Fig. 2 that deserve comment. First, the vertical displacement of the curves from one another is primarily due to the fact that they are plotted relative to the space before the word (letter position 0 here), which was done to preserve continuity with Fig. 3. If plotted relative to the centers of the words, the curves are largely superimposed on each other. Second, the curves appear to diverge somewhat in the refixation range. This is primarily due to the fact that in this figure means of the observed landing positions are given as opposed to fitting a normal curve to the data as in Fig. 1. This has the effect of underestimating the central tendency of the distribution when it migrates toward the far end of the word, because saccades landing off the word are not included in the calculation. At any given letter position this error in estimation is greater for shorter words, hence causing a separation between curves for different word lengths, as observed here.
Fixation positions in reading
89
12-letter target words following saccades launched from different locations (launch sites) relative to the space before the target words. This figure illustrates the linear landing position function that has been observed in previous studies. However, it also shows that this relation continues smoothly from interword to intraword progressive saccades. There is no discontinuity at the transition point between the two types of saccades, as one might anticipate from current theories distinguishing these categories. Thus, this analysis provides no evidence for a difference in the basis on which landing positions are determined for initial and refixation progressive saccades.
Regressive inter-word saccades The frequencies of saccades made in a right to left direction (regressions) among our participants ranged from 16.7 to 36.1% (including only saccades that start and land on the same line of text), which is within the range typically observed among German readers. About 2/3 of these are inter-word saccades ranging from 10.6 to 29.6 percent of all saccades made by the four participants studied. Our goal in the following analysis was to determine whether the linear mean landing position function observed for data following progressive saccades is also observed in data for interword regressive saccades. The frequency of regressions varies considerably as a function of factors like text difficulty and reading instruction (Heller, 1982). Regressive saccades have recently received much attention in the psycholinguistic literature, having been shown to be related to language processing difficulties. Several regression-related eye movement measures can be considered: the frequency with which the fixation of a target word results in the initiation of a regression, the frequency with which a word is the recipient of a regression and the duration of fixations after regressive saccades. Regression-contingent analyses of syntactic parsing mechanisms have been discussed (Altman, Garnham and Dennis, 1992; Rayner and Sereno, 1994) and complex eye movement measures involving regressions have been developed (e.g., Daneman, Reingold and Davidson, 1995; see also Chapter 3). However, there is currently only limited understanding of whether and how spatial information about word locations within a sentence is being acquired and used for the subsequent planning and execution of regressions (Kennedy, 1992). The question addressed here concerns the landing position function following these regressions and, particularly, whether it shows the same linear functions of launch site as to progressive saccades. An analysis of launch sites of regressive saccades in our corpus of reading data shows that most regressions originate from positions relatively close to the target word. Of all regressive saccades made within one line of text to words of length 5 to 10, 26.0% come from within the same word
90
R. Radach & G.W. McConkie
(regressive refixations), 49.4% come from the immediately following word and only 24.6% from more distant locations. Figure 3 shows the landing position function for regressive saccades. The abscissa on this figure, which shows the launch site, indicates letter position relative to the space following the fixated word. Thus, negative numbers indicate refixation cases, where -1 means that the eyes were launched from the final letter in the word (-1 with respect to the space following the word), while positive numbers indicate launch sites to the right of the word, or interword refixation cases. As can be seen in the figure, for every word length the refixation regressions (negative numbers on the abscissa) show the same linear relationship between launch site and the mean of the landing position distribution that progressive saccades show (see Figs. 1 and 2). The mean of the slopes for the different word lengths, for this part of the data, is 0.39. However, for interword regressions (positive numbers on the abscissa), the result is quite different. Here, for every word length, the curves go flat (mean slope of 0.06). For these interword regressions there is essentially no relationship between launch site and the mean of the landing position distribution; no matter from where the saccades are launched, the mean landing position in the word on which they land is the same. Thus, the mean landing position function shows a sharp bend in the region of the space before the word, which separates intraword from interword regressions.
Fig. 3. Mean landing positions of regressive saccades as a function of launch site. The abscissa is numbered relative to the space following the target word, with negative numbers indicating launch sites from within the word, and positive numbers indicating launch positions to the right of the word boundary. The ordinate, indicating mean landing position, is numbered with respect to the center of the word. Each graph represents between 3819 and 744 observations. Pooled data from four participants.
Fixation positions in reading
91
These results indicate that the control of the eyes in making interword regressions is functionally different in some way than it is in the other cases studied. McConkie et al. (1988), following Kapoula and Robinson (1986), attributed the linear relation between launch site and mean landing position found with progressive saccades to a range effect that is frequently observed in other muscular systems (Poulton, 1981): a tendency to overshoot near targets and undershoot far targets. Thus, it is assumed that a word string within a line of text, perceptually delimited by preceding and following spaces, functions as a stimulus unit or 'blob', with the eyes being drawn to the center of that unit when it is selected as the target for a saccade. The center serves as a functional target location for directing the saccade. From this perspective, it appears that in making interword saccades, there is no range effect. Rather, the eyes go consistently, with some random error that produces the Gaussian distribution observed, to their target. A collateral observation from examining Fig. 3 is that the mean landing position following interword regressions is very near the center of the word, regardless of the word length or the launch site. Thus, the two variables that appear to have the greatest effect on mean landing position in other cases, have little or no effect following these particular saccades. This has two implications. First, it provides further evidence that the center of the word is the functional target for saccades going to the word. Second, it indicates that the mechanism producing the range effect is not always operating, even in the reading task itself. It is curious that the range effect should occur in intraword regressions but not in interword regressions. It is not just the direction of the saccade that is critical, since intraword regressive saccades show its pattern. Neither is it the distance that the saccade target lies into the periphery; the range effect is observed in progressive saccades of similar lengths to the interword regressions that do not show its pattern. There is clearly something different about saccade control with interword regressive saccades. Research is now needed to better understand the mechanism that gives rise to the range effect, and the conditions under which it operates. Finally, there is one more observation that indicates the distinctiveness of interword regressions. Radach, Heller and Hofmeister (1998) have data indicating that certain readers show a distinct peak of very short fixations (80-120 ms) in fixation duration frequency distributions that these investigators explain in terms of minimal latencies of saccades that have become unnecessary but could not be canceled soon enough (Becker, 1989; Morrison 1984). Interestingly, these very short fixations were never found before inter-word regressions, suggesting that progressive and refixation saccades are part of a mode of eye control that may automatically generate, but sometimes cancel, saccades. Apparently the decision to make a regression to a previous word is the result of a more deliberate mode of eye control.
92
R. Radach & G. W. McConkie
Continuous vs. discrete control of saccades In the existing research literature on reading, a number of variables have been found to have a local effect on the lengths of saccades. As one example, there is a marked effect of word frequency on the likelihood of refixating the same word (e.g., McConkie et al., 1989; O'Regan et al., 1994, Rayner, Sereno and Raney, 1996). The effect is basically a vertical displacement of the refixation function, as depicted in Fig. 4 (left panel), which results from an increase in the amplitudes of saccades launched from the initial fixation within higher-frequency words as compared to saccades launched from lower-frequency words. In most cases the sizes of these saccade length effects are less than one letter position (e.g., 0.82 letters for the observations in Fig. 4, right panel). This can give the appearance of an eye movement control system that adjusts the lengths of saccades in a graded fashion, slightly increasing or decreasing the lengths of saccades according to the local processing requirements during reading. The alternative to this position, and the one we espouse, is that the lengths of saccades are determined in a more discrete fashion, resulting primarily from which words are selected as targets of those saccades. In harmony with Morrison (1984) and Rayner, Sereno and Raney (1996) and others, we propose that eye movement control is primarily a discrete process, with most variables on the perceptual and cognitive level influencing which word is selected as the target of the next saccade.6 Thus, the observed effect of a local variable on saccade length is interpreted as being probabilistic in nature, with that variable having an influence on some underlying attractiveness level of the words that serve as potential candidates as target for the next saccade. Since the eyes can be sent to only one of these candidates on a given saccade, a selection is made through some winner-take-all competition. For many saccades, one candidate is so strong that small changes in candidate attractiveness have no effect on the outcome; for other saccades where alternative words have attractiveness values that are sufficiently similar, such an influence can cause a different word to be selected than would otherwise occur. In such a case, the saccade length will change by several letter positions; in extreme cases, the direction may even change. Within the framework just described, finding that a variable decreases the lengths of saccades by an average of 0.5 letter positions essentially indicates that the lengths of the great majority of saccades were not affected at all by the variable, even though an influence of the variable may have been present. For only a small proportion of the cases was the influence of the variable being studied great enough, and the As noted earlier, some variables, such as those that affect the center of gravity in a word, may influence the landing position within the word, thus being a graded influence. We assume that variables of this type account for only a small part of the variance in saccade lengths in reading.
Fixation positions in reading
93
Fig. 4. Left panel: Frequency of refixating the same word as a function of the initial landing position within the word. Data for word length 8-12 from four participants were pooled. Negative numbers on the abscissa indicate initial landing positions left of the word center. The selection of 'infrequent' and 'frequent' words is based on a median-split of word form frequency (Celex, 1995). Right panel: Landing positions of progressive saccades following an initial fixation position of -3 relative to the word center (with observations identical to those for -3 in the left panel). Negative numbers on the abscissa indicate landing positions within the same word (refixations); positive numbers indicate landing positions of regressive inter-word saccades.
existing selection likelihood structure of alternative words similar enough, for the effect of the variable to be realized in behavior. In these cases, a large change in saccade length occurred, sending the eyes to one word rather than another. Theoretical and methodological problems related to this "frequency of effects" problem in reading have been outlined by McConkie, Zola and Wolverton, (1985). An example of this type of influence can be seen in the case of the effect of the frequency of a fixated word on the length of the following saccade. The two local variables that appear to have the largest effect on the likelihood that a given word will be selected as the target of the next saccade are its location with respect to the currently fixated letter and its length (Kerr, 1992; McConkie, Kerr and Dyre, 1994; see also Chapter 6). To see other influences it is necessary to control for these two variables, which can be done by selecting cases from a large corpus of reading data as we have described above. Figure 4 shows a frequency distribution of landing positions in all cases where the saccade's launch site was three letter positions left of the center of the currently-fixated word (word length 8-12). As can be seen, the frequency distribution is markedly bi-modal, suggesting two populations of saccades, one keeping the eyes on the currently-fixated word (refixations) and the other taking the
94
R. Radach & G. W. McConkie
eyes to a following word. The two curves are formed by taking a median split of the data based on the cultural frequency of the currently fixated word, including all words of our corpus that are listed in the German CELEX Corpus. Figure 4 shows that the frequency of the word does not simply shift the distribution left or right in a simple manner, as would occur if the effect occurred in a graded fashion. Rather, word frequency influences how many of the saccades remain on the current word vs. go to the next, suggesting a discrete choice between alternative possible saccade targets. The curves for high- and low-frequency words both appear to be the sum of the same two underlying distributions, having similar modes; word frequency affects the number of cases in the two component distributions, resulting from the frequency with which the two alternative candidate words are selected as the target of the saccade. When launch site and the lengths of immediately surrounding words are controlled in the manner illustrated above, the landing position distributions actually change shape as word length configurations change. Bimodal distributions of the type shown in Fig. 4 are frequently observed, suggesting that choices are being made among alternative potential saccade targets. This is consistent with an eye movement control mechanism that is discrete and probabilistic in operation, rather than one that adjusts saccade lengths in a graded fashion on the basis of local variables. Some theoretical issues We have argued that eye movement control during reading is a discrete process, involving the selection of a target word, and the somewhat error-prone process of moving the eyes to that word. In this section we discuss several issues related to the landing position distributions on words. Launch site contingent landing position distributions are very well behaved, being Gaussian in shape, with a mean that is a linear function of the launch site (with certain exceptions), and a standard deviation that increases with launch distance (McConkie et al., 1988; see also McConkie et al., 1990; McConkie, Kerr and Dyre 1994). We have attributed the linear landing position function to a saccadic range error. Vitu (199la, 1991b) has argued for an alternative possibility. As we do, she assumes that a word is selected to be the target of the following saccade. However, she then proposes that the eyes go to the center of gravity of an attended region extending about seven letters from the beginning of the selected word. The blank space at the end of the attended word is thought to have sufficient influence on this center of gravity to account for changes in mean landing position; the farther the space lies to the right, the further to the right lies the center of gravity within this attended region. This theory can successfully predict that landing positions within longer words that begin at the same distance to the right of the launch site tend to lie further from the words' beginnings (Vitu, 199la). However, it does provide a basis
Fixation positions in reading
95
for predicting the landing position function: changes in mean landing position within words of the same length lying at different distances from the launch site. In contrast, in our analyses if landing position is measured relative to the centers of words, then word length has very little effect on the landing position distribution, whereas distance of the launch site from the target word has a large effect. Thus, we do not see center of gravity within an attended area as an alternative to the concept of the range effect for accounting for the strong, linear landing position function that exists in reading data. Rayner and Morris (1992) as well as Rayner, Sereno and Raney (1996) have suggested that the landing position within words is determined by which letters in the word have been previously identified, with the eyes tending to go beyond the identified letters, similar to the proposal made by McConkie (1979). This raises the question of how landing position distributions are affected by cognitive factors. In discussing this issue it is critical to distinguish between factors that influence the selection of the saccade target, and those that influence where the eyes land with respect to that word. Clearly, cognition does affect where the eyes go during reading; the question here is whether it influences where the eyes land with respect to a selected word, or only affects the selection process itself. The preprocessing hypothesis could be the basis for an alternative explanation of the range effect: namely, the farther the launch site is from a target word, the fewer the letters that are likely to have been identified from it, thus resulting in the eyes landing closer to the front of the word. However, two aspects of our data are not in harmony with this explanation. First, the slope of the landing position function tends to be in the range of 0.35. Thus, for every letter position farther from a word a saccade to it is launched, the landing position in the word moves leftward by 0.35 letter positions. However, as distance of the eye from the target word is increased, the visibility of the word should fall off faster than this. Second, the landing position distribution continues to be linear for all launch site distances examined so far, whereas we would expect that a visibility-based effect would be negatively accelerated, especially at more distant launch sites where a word's visibility would be minimal (Rayner and Morris, 1981; Nazir, Heller and Sussmann, 1992). There is another possible interpretation of the preprocessing hypothesis in the current context; namely, that number of letters identified from a target word before sending the eyes to it might affect the variability that we have called the random placement error. This is the source of error that produces the Gaussian shape of the landing position distribution and is indexed by the standard deviation of that distribution. We assume that this error arises from a combination of sources: natural lack of precision in saccadic programming, especially with limited response times, error in the accuracy of the eyetracking device, and variation in the center of gravity within the word which results from interword variation in the distribution of letters of different levels in density, reflected here as the number of pixels intensified in
96
R. Radach & G.W. McConkie
creating each letter. However, it is also possible that there may be systematic variability in landing positions related to number of letters peripherally identified from the target word, a factor that itself may vary from saccade to saccade, thus contributing to the spread of the distribution. This raises the question of whether any of the variability in these distributions is due to momentary variation in aspects of the reader's cognitive state. This must be a matter for further investigation, but we doubt that such influence will be found. Conclusions In accounting for eye behavior during reading, McConkie et al. (1988) argued for a word-based control system, with a sharp distinction between selecting a word to serve as the saccade target, and the process of getting the eyes to the target word. They developed the beginnings of a quantitative model to account for data concerning where the eyes land in words, or landing position distributions. This model has been further confirmed and extended by Radach and his co-workers (Radach and Kempe, 1993; Radach 1996). The primary goal of the current investigation was to determine whether this model, which was originally developed to account for progressive inter-word saccades, can be extended to two other conditions: refixations and interword regressive saccades. The original model appears to account for refixations, which raises questions about the sharp distinction that is often made in theories of reading between progressive and refixation saccades. However, interword regressive saccades clearly show different landing position characteristics than do the other cases. Neither word length nor distance from launch site had much effect on the landing positions of these saccades, and the saccadic range effect, typically found with progressive saccades, was absent. Interword regressions appear to be sent to the centers of words, with landing position distributions in which "optimal" and "preferred" viewing positions are very similar. Thus, the original model must be modified to account for regressive inter-word saccades, and further research is needed to understand the basis for the difference. Interestingly, intraword regressive saccades (refixations) did not show these unusual characteristics. Several issues were explored regarding the basis for eye movement control during reading. We have argued that eye movement control is discrete, based on the selection of word targets, rather than being a graded type of control, and have explored some of the implications of this position. In particular, we pointed out that even when a variable produces a significant effect on saccade length, it is likely that the lengths of relatively few of the saccades may have actually been changed. We also examined and argued against alternative positions to our model: specifically, that the landing position function may be due to center of gravity effects, and that the landing positions in selected words may be due to parafoveal letter processing rather than visuo-oculomotor factors.
Fixation positions in reading
97
In sum, we have argued that this one aspect of eye behavior during reading, namely, where the eyes go with respect to selected saccade target words, is the result of low-level visuo-oculomotor control factors, almost completely unaffected by higher cognitive processes. However, this should not be taken to indicate that we think there is no role for cognition in eye guidance. After all, the purpose of reading is to understand text and eye movements are made to serve this purpose. Elsewhere in this book (Chapters 6 and 11) it is argued that cognition plays a significant role in selecting words as saccade targets. We agree with much of what is said on the issue of "word skipping" in these chapters (see also McConkie, Kerr and Dyre, 1994). In this chapter we have presented evidence for a cognitive component, in the form of a word frequency effect, in the decision of whether to refixate or make an inter-word saccade. Finally, in the case of regressive inter-word saccades, the saccade parameters we have looked at suggest a control mode different from the low-level default routines. Although we have not investigated this in the present chapter, it is very likely that not only the large 're-inspection' regressions (Kennedy and Murray, 1987) but also many regressions coming from an adjacent word are based on cognitive grounds. The determination of landing positions within selected saccade targets, is highly regular and can be quantitatively described using models that account for most of the variance in the data. We suggest that investigators who obtain effects of cognitive factors on saccade landing positions consider the possibility that these factors are operating on the target selection stage rather than directly modulating saccade amplitude computation. Separating out these two sources of variance can be a difficult task. The predominant low-level determination of saccade landing positions within selected target words can be good news for researchers studying psycholinguistics processes: while there must be controls for this aspect of eye behavior, it does not appear to be a source of useful dependent variables, thus reducing somewhat the complexity of data analysis. Acknowledgements Preparation of this chapter was supported in part by grant No BMH1-CT94-1441 from the European Union under the BIOMED Programme. The authors are indebted to Dieter Heller, Albrecht Inhoff, Wayne Murray and two anonymous reviewers for helpful discussions and comments on earlier drafts of the manuscript. References Altman, G., Garnham, A. and Dennis, Y. (1992). Avoiding the garden path: eye movements in context. Journal of Memory and Language, 31, 685-712.
98
R. Radach & G. W. McConkie
Becker, W. (1989). Metrics. In: R.H. Wurtz and M.E. Goldberg (Eds.), The Neurobiology of Saccadic Eye Movements. Amsterdam: Elsevier, pp. 13-61. CELEX German database. Release D25. Computer software. Nijmegen: Centre for Lexical Information, 1995. Daneman, M, Reingold, E.M. and Davidson, M. (1995). Time course of phologogical activation during reading: Evidence from eye fixations. Journal of Experimental Psychology: Learning, Memory and Cognition, 21, 884-898. Findlay, J.M. (1982). Global processing for saccadic eye movements. Vision Research, 22, 1033-1045. Heller, D. (1982). Eye movements in reading. In: R. Groner and P. Fraisse (Eds.), Cognition and Eye Movements. Berlin: Deutscher Verlag der Wissenschaften, pp. 139-154. Henderson, J.M. and Ferreira, F. (1990). Effects of foveal processing difficulty on the perceptual span in reading: Implications for attention and eye movement control. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16 (3), 417-429. Hofmeister, J. (1997). Uber Korrektursakkaden beim Lesen und bei leseaehnlichen Aufgaben. (On corrective saccades in reading and reading-like tasks). Unpublished doctoral dissertation. Technical University of Aachen. Hyona, J., Niemi, P. and Underwood, G. (1989). Reading long words embedded in sentences: Informativeness of word halves affects eye movements. Journal of Experimental Psychology: Human Perception and Performances, 15, 142-152. Hyona, J. (1995). Do irregular letter combinations attract reader's attention. Evidence from fixation locations in words. Journal of Experimental Psychology: Human Perception and Performance, 21, 68-81. Inhoff, A.W. (1989). Parafoveal processing of words and saccade computation during eye fixations in reading. Journal of Experimental Psychology: Human Perception and Performance, 15, 544-555. Kennedy, A. (1992). The spatial coding hypothesis. In: Rayner, K. (Ed.), Eye Movements and Visual Cognition. Scene Perception and Reading. New York: Springer. Kerr, Paul. W. (1992). Eye movement control during reading: the selection of where to send the eyes. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign. Kliegl, R., Olson, R.K. and Davidson, B.J. (1983). On problems of unconfounding perceptual and language processes. In: K. Rayner (Ed.), Eye Movements in Reading and Perceptual and Language Processes. New York: Academic Press, pp. 333-343. Kapoula, Z. and Robinson, D.A. (1986). Saccadic undershoot is not inevitable: Saccades can be accurate. Vision Research, 26 (5), 735-743. McConkie, G.W. (1979). On the role and control of eye movements in reading. In: P. A. Kolers, M.E. Wrolstad and H. Bouma (Eds.), Processing of Visible Language, Vol. I. New York: Plenum Press, pp. 37-48. McConkie, G.W. (1981). Evaluating and reporting data quality in eye movement research. Behavior Research Methods and Instrumentation, 13,97-106. McConkie, G.W. and Zola, D. (1984). Eye movement control during reading: The effects of word units. In: W. Prinz and A.T. Sanders (Eds.), Cognition and Motor Processes. Berlin: Springer, 63-74. McConkie, G.W., Zola, D. and Wolverton, G.S. (1985). Estimating frequency and size of effects due to experimental manipulations in eye movement research. In: R. Groner,
Fixation positions in reading
99
G.W. McConkie and C. Menz (Eds.), Eye Movements and Human Information Processing. Amsterdam: Elsevier, pp. 137-147. McConkie, G.W., Grimes, J.M., Kerr, P.W. and Zola, D. (1990). Children's eye movements during reading. In: J.F. Stein (Ed.), Vision and Visual Dyslexia. McConkie, G.W., Kerr, P.W. and Dyre, B.P. (1994). What are 'normal' eye movements during reading: Toward a mathematical description. In: J. Ygge and G. Lennerstrand (Eds.), Eye Movements in Reading. Oxford: Elsevier, pp 315-327. McConkie, G.W., Kerr, P.W., Reddix, M.D. and Zola, D. (1988). Eye movement control during reading: I. The location of initial eye fixation on words. Vision Research, 28, 1107-1118. McConkie, G.W., Kerr, P.W., Reddix, M.D., Zola, D. and Jacobs, A.M. (1989). Eye movement control during reading: II. Frequency of refixating a word. Perception and Psychophysics, 46, 245-253. McConkie, G.W., Wolverton, G.S. and Zola, D. (1984). Instrumentation considerations in research involving eye-movement contingent stimulus control. In: A.G. Gale and F. Johnson (Eds.), Theoretical and Applied Aspects of Eye Movement Research. Amsterdam: North-Holland. Morrison, R.E. (1984). Manipulation of stimulus onset delay in reading: Evidence for parallel programming of saccades. Journal of Experimental Psychology: Human Perception and Performance, 10, 667-682. Nazir, T.A., Heller, D. and Sussmann, C. (1992). Letter visibility and word recognition: The optimal viewing position in printed words. Perception and Psychophysics, 52 (3), 315-328. O'Regan, J.K. (1990). Eye movements and reading. In: E. Kowler (Ed.), Reviews of Oculomotor Research, Vol. 4. Eye Movements and Their Role in Visual and Cognitive Processes. Amsterdam: Elsevier. O'Regan, J.K. (1992). Optimal viewing position in words and the strategy-tactics theory of eye movements in reading. In: K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading. New York: Springer Verlag. O'Regan, J.K., Levy-Schoen, A., Pynte, J. and Brugaillere, B. (1984). Convenient fixation location within isolated words of different length and structure. Journal of Experimental Psychology: Human Perception and Performance, 10, 250-257. O'Regan, J.K., Vitu, F., Radach, R., Kerr, P. (1994): Effects of local processing and oculomotor factors in eye movement guidance in reading. In: J. Ygge and G. Lennerstrand (Eds.), Eye Movements in Reading. New York: Pergamon Press. Paap, K.R., Newsome, S.L., McDonald, J.E. and Schvaneveldt, R.W. (1982). An activationverification model for letter and word recognition: The word-superiority-effect. Psychological Review, 89, 573-594. Pollatsek, A., Rayner, K. and Balota, D.A. (1986). Inferences about eye movement control from the perceptual span in reading. Perception and Psychophysics, 40, 123-130. Pollatsek, A. and Rayner, K. (1990). Eye movements and lexical access in reading. In: D.A. Balota, G.B. Flores d'Arcais and K. Rayner, (Eds), Comprehension Processes in Reading. Hillsdale, NJ: Erlbaum, pp. 143-164. Poulton, E.G. (1981). Human manual control. In: V.B. Brooks (Ed.), Handbook of Physiology, Sect. 1, Vol. II, Part 2, pp. 1337-1389, Bethesda: American Physiology Society.
100
R. Radach & G. W. McConkie
Pynte, J., Kennedy, A. and Murray, W.S. (1991). Within-word inspection strategies in continuous reading: time course of perceptual, lexical and contextual processes. Journal of Experimental Psychology: Human Perception and Performance, 17 (2), 458-470. Radach, R. (1996). Blickbewegungen beim Lesen: Psychologische Aspekte der Determination von Fixationspositionen (Eye movements in reading: psychological aspects of the determination of fixation positions). Miinster/New York: Waxmann. Radach, R. and Kempe, V. (1993). An individual analysis of fixation positions in reading. In: G. d'Ydevalle and J. van Rensbergen (Eds.), Perception and Cognition. Advances in Eye Movement Research. Amsterdam: Elsevier/North-Holland. Radach, R., Heller, H., Krummenacher, J. and Hofmeister, J. (1995). Individual eye movement patterns in word recognition: perceptual and linguistic factors. In: J.M. Findlay, R. Walker and R.W. Kentridge (Eds.), Eye Movement Research: Processes, Mechanisms and Applications. Amsterdam: Elsevier/North-Holland. Radach, R., Heller, D. and Inhoff, A. (1997). Blickbewegungen and kognitive Prozesse: Stand und Perspektiven (eye movements and cognitive processes: current issues and developments). In: H. Mandl (Ed.), Bericht iiber den 40. Kongress der Deutschen Gesellschaft fur Psychologic (Proceedings of the 40th Congress of the German Psychological Society). Hogrefe, Goettingen. Radach, R. and Heller, D. Interrelations between spatial and temporal aspects of eye movement control in reading. Manuscript in preparation. Rayner, K. (1979). Eye guidance in reading: Fixation locations within words. Perception, 8, 21-30. Rayner, K. and Morris, R.E. (1981). Eye movements and identifying words in parafoveal vision. Bulletin of the Psychonomic Society, 17, 135-138. Rayner, K. and Morris, R. (1992). Eye movement control in reading: Evidence against semantic preprocessing. Journal of Experimental Psychology: Human Perception and Performance, 18, 163-172. Rayner, K. and Pollatsek, A. (1989). The Psychology of Reading. Boston: Prentice-Hall. Rayner, K. and Sereno, S. (1994). Regressive eye movements and sentence parsing: on the use of regression-contingent analysis. Memory and Cognition, 22, 281-285. Rayner, K., Sereno, S.C. and Raney, G.E. (1996). Eye movement control in reading: a comparison of two types of models. Journal of Experimental Psychology: Human Perception and Performance, 22, No.5, 1188-1200 Underwood, G., Bloomfield, R. and Clews, S. (1988). Information influences the pattern of fixation during sentence comprehension. Perception, 17, 267-278. Underwood, G., Clews, S. and Everatt, J. (1990). How do readers know where to look next? Local information distributions influence eye fixations. Quarterly Journal of Experimental Psychology, 42, 39-65. Vitu, F. (1991 a). The existence of a center of gravity effect during reading. Vision Research, 31, 1289-1313. Vitu, F. (1991b). Against the existence of a range effect during reading. Vision Research, 31, 2009-2015.
101
CHAPTER 5
About Regressive Saccades in Reading and Their Relation to Word Identification Fran9oise Vitu Universite Rene Descartes George W. McConkie University of Illinois at Urbana-Champaign and David Zola University of Illinois at Urbana-Champaign
Abstract The purpose of the present study is to describe some of the conditions under which regressive saccades occur in reading for fifth-grade children. The results obtained replicate the prior finding that regressions are more likely to occur following larger progressive saccades (Andriessen and deVoogd, 1973; Lesevre, 1964), and additionally show that regressions are more frequent following a progressive saccade that skips a word. Further analyses indicate that when a word is skipped, the regression likelihood increases with the skipped word length and the distance from the word of the fixation that precedes or follows skipping. The regression probability also tends to be higher for low- than high-frequency skipped words. These results suggest that regressive saccades in reading are not just the result of a pre-determined oculomotor scanning strategy, but are related to the perceptibility of the words. However, the question of the extent to which regressions in reading reflect word processing problems remains open. Indeed, in cases where no word was skipped, several factors that supposedly affect the ease of processing words did not affect the likelihood of regressing, or had an effect opposite to what would be predicted.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
102
F. Vitu, G. W. McConkie & D. Zola
Introduction When reading a text, people's eyes move with saccades of variable sizes which are alternated with fixations of variable durations. The most common pattern is the eyes moving forward from one word to the next, with intermediate fixations lasting for about 250 ms. On some occasions, variations in this general pattern might be observed, such as the eyes skipping the next word, or making an additional fixation on the word before moving on to the next word(s). Sometimes, the eyes might also return to previously read portions of the text, which correspond to what is commonly called 'regressive' saccades in opposition to the more frequent progressive saccades (see Lesevre, 1964; Buswell, 1920). For more than a century, and particularly during the last thirty years, research has been conducted to determine the extent to which this variability in ocular behavior, and particularly that associated with progressive movements, relates to on-going processing of the encountered words and sentences. This research suggests that a single mechanism cannot be responsible for the variability of the ocular behavior in reading. Both higher level lexical and linguistic characteristics of the text and lower level visuomotor constraints and predetermined oculomotor scanning strategies contribute to this variability (for reviews, see Chapters 6 and 11; O'Regan, 1990; Vitu and O'Regan, 1995). At the same time, previous research has put much less effort into the study of regressive saccades. Most theories of eye movements in reading, including recent ones, make only vague assumptions concerning the determinants of regressive saccades. According to several authors, regressive saccades are associated with high processing levels related to sentence and text comprehension (Bouma and deVoogd, 1974; Buswell, 1920; Hochberg, 1976; Just and Carpenter, 1980; Shebilske, 1975; Shebilske and Fisher, 1983). If, in the process of reading, a given comprehension stage requires 'a review of previously read text to reencode it or process it to deeper levels', a regressive saccade is initiated (Just and Carpenter, 1980 (p. 337); see also, Carpenter and Just, 1977). This view is supported by several studies which indicate that the likelihood of making a regressive saccade is a function of several different factors related to the difficulty of the text. First, it depends on the global text difficulty, the order of the words, the presence or not of clarifying punctuation, the presence of a change in word meaning from sentence to sentence, the position of key words for text comprehension, and the tense in which the sentence is written (Bayle, 1942; Klein and Kurlowski, 1974; Shebilske and Fisher, 1983; Wanat, 1976). Second, regressive saccades often occur in semantically or syntactically ambiguous sentences where a given word may have different meanings or syntactic functions. In ambiguous sentences, the eyes often regress from the disambiguating region back to the ambiguous region or to the beginning of the sentence (see Rayner and Pollatsek, 1989, for a review). Thus, regressive saccades are related to some extent
Regressive saccades in reading
103
to semantic and syntactic word integration processes. According to Frazier and Rayner's (1982) garden path model of sentence processing, such regressive saccades occur when the ambiguous word has been incorrectly interpreted during the eyes' first encounter with it. All regressive saccades that occur in reading might not be a result of high processing levels related to sentence and text comprehension, but some might result from lower-level processes related to word identification and/or saccadic programming. First, there is not a perfect temporal matching between the on-going processing of the encountered words and the ocular activity; processing of a word sometimes lags behind the ocular activity (Schroyens et al., 1998; see also Chapters 2 and 11). The fixation and gaze durations on a word are affected not only by the word's lexical/linguistic properties, but also by those of the previously fixated word. This finding, referred to as 'spillover effect', suggests that the processing induced by a word might not be complete when the eyes move on to another word. Increasing the time the eyes spend on a subsequently fixated word might not be the only example of delayed processing effects. Another way to complete the processing of a previously fixated word might be to bring the eyes back to it. Thus, at least some regressive saccades in reading might result from processing difficulties at the lexical level, such as word recognition failures or failures to complete the processing induced by a word before the eyes moved on to another word (Bouma, 1978; Bouma and deVoogd, 1974; Pollatsek and Rayner, 1990; Shebilske, 1975). Finally, progressive saccades in reading are subject to inaccuracies in eye positioning, which result in unintended skipping of a word, or landing at variable locations in the words (see O'Regan, 1990; and Chapter 6). Regressive saccades might occur in response to such oculomotor aiming errors, bringing the eyes back at the intended location (Taylor, 1971). As one example of lower-level influences, within-word regressive saccades are more likely to occur when the eyes initially land near the end of a word than when they land in the middle of the word (O'Regan and Levy-Schoen, 1987; McConkie et al., 1989; Rayner, Sereno, and Raney, 1996; Vitu, O'Regan and Mittau, 1990; see also Pynte, 1996). This result might support the notion that regressive saccades are initiated in response to word identification processes. Indeed, because of visual acuity limitations, word processing is more efficient when the eyes are near the center of the word, and an additional fixation is often required when the eyes do not initially land at that optimal location (Nazir, O'Regan and Jacobs, 1991; O'Regan, 1990). However, in subjects scanning meaningless letter strings where word and letter identification are not involved, the proportion of regressive saccades is very similar to that observed in normal reading (Vitu et al., 1995; see also Rayner and Fischer, 1996 for slightly different results). Furthermore, as in reading, there is a relationship between the eyes' initial location in a letter string and the occurrence of a regressive saccade within the string. This suggests that some within-word regressive
104
F. Vita, G. W. McConkie & D. Zola
saccades in reading result from an oculomotor strategy (O' Regan, 1990; Vitu and O'Regan, 1995) rather than from the necessities of on-going processing associated with the fixated words. The strategy would consist of making an additional fixation on a word when the first fixation is mislocated. It has also been noted that regressive saccades are more likely to occur following longer forward saccades (Andriessen and deVoogd, 1973; Lesevre, 1964). The basis for this relationship is not clear. A longer saccade is more likely to skip over a word, which may necessitate returning the eyes to that word later. Longer saccades are also less accurate, thus favoring the likelihood of ill-placed fixation locations. In addition, during the fixation prior to a long saccade, less visual information is available from the saccade target word, since it lies further into the visual periphery. Thus, regressive saccades after long forward saccades might result from oculomotor aiming errors, or word recognition failures. Finally, it is possible that these regressive saccades simply result from a strategy of moving backwards after making a large progressive saccade. The above review yields four possible bases for making regressive saccades in reading: (1) high-level comprehension failure, (2) failure to fully identify a fixated vs. skipped word prior to moving the eyes, (3) mislocation of the eyes, resulting in a corrective saccade, and (4) a pre-determined oculomotor strategy that produces regressive saccades under certain conditions, such as when the prior saccade was large, or the eyes land near the end of a word. It should be noted that these are not exclusive of one another; more than one of these could be operating at the same or different times during reading. The present study is an attempt to describe some of the conditions under which regressive saccades occur in normal reading in order to begin to determine to what extent each of the above explanations underlie the making of regressions. This was done through the analysis of a large corpus of eye movement data from fifth-grade children reading half a novel appropriate to their age. This corpus was selected because fifth-grade children make more regressions than do adults, while still showing patterns similar to those of adults (Vitu, McConkie, Kerr, and O'Regan, in preparation). For example, both fifth-grade children and adults are less likely to refixate a word following an initial fixation near its center than at its ends (the Refixation Optimal Viewing Position effect), and are less likely to refixate a highthan low-frequency word. Fixation durations also present similar patterns for both fifth-grade children and adults as a function of both the initial fixation location and the word frequency. Since we did not have access to language variables or independent behavioral data that could be used to indicate comprehension failure, we were unable to test the high-level comprehension failure hypothesis. Instead, the analyses were directed at testing between explanations of regressions that are based on word processing factors and those that assume strictly oculomotor strategies. To accomplish this,
Regressive saccades in reading
105
several factors were identified that would be expected, on the basis of past research, to affect word processing. This included visual factors which affect the visibility of the word's constituent letters (word length, distance of the word from the current fixation location, fixation location within the word), word lexical characteristics (word frequency), as well as eye movement related factors (whether or not a word had been refixated). If these word-processing-related variables are found to affect the likelihood of making regressive saccades, this stands as evidence against a strictly oculomotor strategy as a basis for producing regressions. If word lexical characteristics, such as the word frequency, affect the likelihood of regressing in appropriate ways, this clearly suggests that word processing difficulties may underlie regressions. Method Thirty fifth-grade children (approximately 12 years old) participated in the experiment which was run at the Center for the Study of Reading (University of Illinois at Urbana-Champaign). Subjects' eye movements were recorded as they read the first six chapters of 'Old Yeller', a children's novel, written by Frederick B. Gipson. Their eyes were monitored using an SRI Dual-Purkinje Image Eyetracker (Cornsweet and Crane, 1973), sampling eye position every millisecond. The text was presented on a computer screen, seven lines at a time in triple-spaced format. The reader advanced the page by pressing a button. Subjects were asked to read the story for meaning and were tested for comprehension after each chapter. A set of 395,415 eye fixations was obtained. For the following analyses, only the cases in which there was no blink or other signal irregularity present were considered. Furthermore, since within- and betweenline saccades might result from different mechanisms, we restricted the present analysis to within-line regressive saccades. All between-line saccades as well as the short regressive saccades that often occur after a return sweep were eliminated. Analyses that consisted of measuring the regression likelihood considered only cases where the prior saccade was progressive. In addition, both the origin word from which the progressive saccade was initiated and the destination word where the progressive saccade terminated had never previously been fixated. Finally, for the particular cases where the progressive saccade led the eyes to skip one or several words, only cases where the skipped word had never been fixated before being skipped were considered. After selection, the total number of saccades available for analysis was 76,561, with a total of 31,390 cases in which a word was skipped (3843, 8107, 11632, 5519, 1383, respectively for 1- to 5-letter skipped words), and 45,171 cases in which the saccade took the eyes to the immediately following word (1271, 4047, 11493, 11291, 6664, 5233, respectively for 1- to 5-letter origin words in non-skip cases).
106
F. Vitu, G.W. McConkie & D. Zola
For each dependent variable, means or proportions were calculated for each subject, and these were then averaged across subjects. In this way, the weight of individual subjects' contributions to the final values were not influenced by the number of fixations that qualified for a particular condition. Analyses of variance were run on means or proportions obtained from the different individual subjects. Results General characteristics For the population being tested, 28% of all within-line saccades were regressive. The average size of within-line regressive and progressive saccades was -4.6 and 6.8 letters and the average fixation duration was 252 ms. Forty percent of the progressive saccades skipped a word, and among the fixated words, 21 % received at least two consecutive fixations during their first encounter (first pass). Figure la presents the frequency distribution of the lengths of within-line regressive saccades. Saccades of 1 to 3 characters in length were most frequent, accounting for 54% of all regressions, but some are much longer. The proportion of
Fig. 1. Distributions of the size of within-line regressive saccades when expressed in letters (1 a) or words (1b).
Regressive saccades in reading
107
very long regressive saccades (more than 14 characters) is small, being less than 3%. Figure Ib shows the lengths of these saccades in terms of words, with 0 being refixations of the same word. Refixations and regressions to the immediately preceding word account for 85% of all regressions. Further analyses show that 80% of regressive saccades are preceded by a progressive eye movement, and in these cases, regressions are more likely to occur after the previous word was skipped (49%) than after the previous word was fixated (39%) or after a refixation of the same word (12%). Effects of prior saccade length and word skipping Investigators (Andriessen and deVoogd, 1973; Lesevre, 1964) have reported that regressive saccades are more likely to occur following longer progressive saccades. Figure 2a presents the likelihood of regressing following saccades of different lengths in the current data set, and indicates that this same relationship holds in the present data, F(l5,435) = 49.06, p < 0.0005. However, the rise in regression likelihood only begins with saccades of 6 letter positions; shorter saccades show very little relationship, though there is a small local maximum at 2 letter positions and minimum at 5. Since longer saccades are more likely to skip words, it is possible that it is actually the skipping of words that leads to regressions. Figure 2b shows the likelihood of regressing following saccades of different lengths in cases where a word is skipped or is not skipped. Ws for long saccades that do not skip words become quite low. The figure shows that, for most saccade lengths, the likelihood of regressing is indeed increased when a word is skipped, F(l,29) = 15.80, p < 0.0005. However, for both word skip and non-word-skip cases, there is still a strong relationship between saccade length and regression likelihood, F( 12,348) = 45.54, p < 0.0005 and F(ll,319) = 15.92, p < 0.0005, respectively, with a significant interaction, F(9,261) = 2.09, p < 0.05. Thus, while skipping words does increase the likelihood of regressing, this is not the sole explanation for the prior saccade length effect. Effects of skipped word length It is possible that the perceptibility of a word skipped by a progressive saccade might affect the likelihood of regressing after skipping it, either because this influences the likelihood of being able to identify it on the fixation prior to, or the fixation following, the saccade. A first way to test this hypothesis is to look at the effects of the skipped word length on the regression likelihood. Indeed, as a result of visual acuity limitations, the perceptibility of a word depends on its length: the longer a word is, the fewer letters from the word are visible when it is located in
108
F. Vitu, G.W. McConkie & D. Zola
Fig. 2. Probability of making a regressive saccade after a progressive saccade of variable length (2a), and separately for cases where the progressive saccade skips a word or not (2b).
parafoveal vision, and the lower the identification probability for that word (Brysbaert, Vitu and Schroyens, 1996). Since longer words are less frequently skipped by children, the current analysis was restricted to words of length 1 to 5. In addition, to increase the ATs underlying the proportions, prior saccade lengths were grouped into bins of two letter positions. Figure 3 shows that skipped word length does indeed affect the likelihood of a regression occurring, when saccade length is controlled (between 6 and 14 letters), F(4,116) = 12.42, p < 0.0005. The regression likelihood is higher following skipping of 2- and 3-letter words (0.45 and 0.47, respectively) than following skipping of 1-letter words (0.35), and it is higher following skipping of 4- and 5-letter words (0.53 and 0.54, respectively) than following skipping of shorter words. The effect of skipped word length remains significant when only tested on 2- to 5-letter skipped words, F(3,87) = 3.12, p < 0.05. Finally, progressive saccades that skip words of all lengths studied still show a significant effect of saccade length (at the 0.0005 level), longer saccades being more frequently followed by regressions, F(4,l 16) = 14.33, F(5,145) = 14.34, F(6,174) = 30.51, F(6,174) = 33.70, F(6,174) = 8.61, for words of length 1-5. There was no interaction between saccade length and skipped word length, F( 12,348) = 0.78, for prior saccade lengths between 6 and 14 letters.
Regressive saccades in reading
10,2]
]4.6]
]8,10]
]I2,14]
109
]16,20]
PRIOR SACCADE LENGTH (LETTERS)
Fig. 3. Probability of making a regressive saccade after a word is skipped with a progressive saccade, as a function of both the progressive saccade length and the skipped word length.
Effect affixation location with respect to a skipped word Another factor that affects the perceptibility of a skipped word is the distance of the eyes from that word: the further the eyes from the word, the lower the probability of correctly identifying that word (Brysbaert et al., 1996). We therefore investigated whether the location of a skipped word, with respect to the position of the eyes prior to the progressive saccade that skipped it, affects the likelihood of regressing following that saccade. Figures 4a-d show the relationship between prior saccade size and regression likelihood following progressive saccades initiated from different distances (here called 'launch sites') to the left of a skipped word (of two to six letters, the data for 5- and 6-letter words being grouped together). In order to increase the APs underlying the proportions, launch sites were grouped into bins of two letter positions. As Figs. 4a-d clearly indicate, regression likelihood increases with launch site when saccade length is controlled, but this applies only to words longer than 2 letters; data for 2-letter words show no difference between adjacent launch sites, or an effect opposite to what was predicted. Furthermore, there is a progression such that the longer the word is, the larger the effect of launch site is. While for 3-letter
110
F. Vitu, G. W. McConkie & D. Zola
words, the difference mainly occurs between cases where the eyes were launched from 0 to 2 letters from the space before the word, and cases where the eyes were launched from further away, for longer words, differences between the different adjacent launch sites start to emerge. To test whether the differences being observed were significant, pair comparisons were made for each word length between each two adjacent launch site intervals (the prior saccade length in a given comparison pair was held constant). The results obtained show that none of the comparisons made for two-letter skipped words are significant, Fs(l,29) < 3.45. In contrast, for all three other tested word lengths, there is a significant difference between launch site intervals [0,2] and [2,4], F(l,29) = 5.89, p < 0.05, F(l,29) = 5.56, p < 0.05, F(l,29) = 9.11, p < 0.005, respectively for 3-, 4-, and 5/6-letter skipped words, but no significant difference between other launch site intervals, Fs < 2.77. The interaction between launch site and prior saccade length was never significant, Fs < 2.24. In addition, for all three tested word lengths, saccade length does continue to produce a significant effect when launch site is controlled, except in a few cases where the analysis relies on a smaller N, F(2,58) = 3.08, p < 0.05, F(2,58) = 4.73, p < 0.05, F(2,58) = 8.52, p < 0.0005, F(2,58) = 1.88 (2-letter skipped words), F(2,58) = 5.59, p < 0.01, F(2,58) = 13.72, p < 0.0005, F(2,58) = 13.22, p < 0.0005, F(l,29) = 5.63, p < 0.05 (3-letter skipped words), and F(2,58) = 9.40, p < 0.0005, F(2,58) = 2.90, p < 0.05, F(2,58) =1.15, F(l,29) = 0.32 (4-letter skipped words), F(l,29) = 1.75, F(l,29) = 0.003, F(l,29) = 0.008 (5- and 6-letter skipped words). An interpretation of these data can be made by considering where the progressive saccade lands with respect to the skipped word. For instance, in Fig. 4b which presents data for 3-letter skipped words, the first data point of each curve represents data for cases where the eyes land on the space following the skipped word or the first letter of the immediately following word; the second data point represents data where the eyes land on the first, second or third letter of the immediately following word; the third data point represents data where the eyes land on the third, fourth or fifth letter of the immediately following word. Thus, it appears that the likelihood of regressing is depressed if the eyes land immediately following a skipped word, with this likelihood rising as this distance increases. These observations suggest that when a word is skipped, the likelihood of regressing is not just a simple function of the length of the progressive saccade that skipped the word. Rather, this likelihood is a function of the eyes' location preceding
Opposite page: Fig. 4. Probability of making a regressive saccade after a word is skipped with a progressive saccade, as a function of both the progressive saccade length and the launch site from which the progressive saccade is initiated, separately for 2- (4a), 3- (5a), and 4-(4c) letter skipped words. Launch sites are measured relative to the space before the prior skipped word and are grouped in four intervals: [0,2], [2,4], [4,6], [6,8].
Regressive saccades in reading
111
112
F. Vitu, G. W. McConkie & D. Zola
and following skipping. This pattern is compatible with a word perceptibility (or lexical) explanation of these regressions: regressions to a skipped word are needed if the launch site is further from it and the word is at least 3 letters long (resulting from less information being acquired from it), but not if the eyes land so close to it that it can then be identified on the fixation following the saccade. Thus, it appears that the saccade length effect on regression likelihood at least in skip cases, may actually be the result of word-based processes taking place: the longer the saccade, the less likely a skipped word might be identified in parafoveal vision (before and/or after skipping), and the more likely a regressive saccade is initiated. Effects of skipped word frequency As shown in several studies, the perceptibility of a skipped word is not only a function of factors that affect the visibility of its constituent letters, but it is also a function of its frequency of occurrence in the language. Indeed, the less frequent a word is, the smaller the benefit that results from the possibility of seeing the word in parafoveal vision (Inhoff and Rayner, 1986; Vitu, 1991). To further test the lexical hypothesis, we therefore tested the likelihood of regressing following skipping of a word with a progressive saccade, as a function of the skipped word frequency of occurrence in the language. For this analysis, two classes of word frequency were defined using the American Heritage corpus (Carroll, Davies, and Richman, 1971), such that approximately the same number of cases were available in each class, and with words of moderate frequency excluded. High frequency words were defined as having a frequency greater than 196 occurrences per million; lower frequency as having a frequency less than 118 per million. This analysis was done across all preceding saccade size intervals but only for 3- and 4-letter skipped words. The obtained median word frequencies were 8046 and 2087 for 3- and 4-letter words of high frequency, and 49 and 71 for 3- and 4-letter words of low-frequency. The number of data available for low-frequency words was much lower than that for high-frequency words, which resulted in a smaller range of prior saccade lengths for low-frequency words. Results show for 3-, and to a certain extent 4-letter words, that there is a tendency for regressive saccades to be more likely to occur after a low-frequency skipped word (see Figs. 5a-b). The effect of word frequency (tested only for prior saccade lengths between 6 and 10 letters) was globally significant, F(l,29) = 10.96, p < 0.005, as well as the interaction between word frequency and word length, F(\ ,29) = 11.40, p < 0.005. Further analyses reveal that the effect of word frequency was significant only for 3-letter words, F(l,29) = 20.65, p < 0.0005, but not for 4-letter words, F(l,29) = 1.47. No other interaction was significant. Since in the present analyses, the launch site prior to skipping was not controlled, the observed effects of word frequency on regression likelihood might have resulted
Regressive saccades in reading
113
Fig. 5. Probability of making a regressive saccade after a 3- (a) or 4-letter word (b) is skipped with a progressive saccade, as a function of both the progressive saccade length and the skipped word frequency.
from the eyes being launched from different locations for high- and low-frequency words. To ensure that this was not the case, the distributions of launch sites prior to skipping were plotted for low- against high-frequency skipped words, separately for each saccade length interval and word length (3 and 4 letters). Results show that while the distributions for both word frequencies overlap in the cases of 4-letter skipped words, there is a systematic tendency for closer launch sites in the lowfrequency condition for 3-letter words. Since closer launch sites result in less regressions (see Fig. 4b), the launch site might not be responsible for the fact that more regressions occur after low-frequency skipped words. Thus, the likelihood of making a regression following skipping of a word is not only a function of visual factors that affect the visibility of the word's constituent letters, but it depends also on the skipped word frequency (at least for 3-letter words). This result suggests that at least some regressive saccades are initiated when processing of a word is not complete before skipping of the word occurs. Since low-frequency words are less easily identified in parafovea than high-frequency words (Inhoff and Rayner, 1986; Vitu, 1991), they are therefore more likely to produce regressions in word skip cases.
114
F. Vitu, G.W. McConkie & D. Zola
Effect of origin word length when no word is skipped Most progressive saccades do not skip a word. We now ask whether the factors that affect the perceptibility of the word from which these saccades originate, which we will call the 'origin word,' influence the likelihood of regressing following the saccade. A first step consists of testing whether the likelihood of regressing varies as a function of the origin word length. Figure 6 shows the likelihood of making a regression following a saccade that originated from words of different lengths from 1 to 6 letters. It indicates a general tendency for more regressions to occur following saccades originating from shorter words, F(5,145) = 5.11, p < 0.0005, when prior saccade length is controlled (between 2 and 8 letters). Data for words of each length show significant effects of prior saccade length except 1 - and 2-letter words, F(2,58) = 2.98, p < 0.10, F(3,87) = 0.41, F(4,116) = 3.09, p < 0.05, F(4,116) = 3.84, p < 0.01, F(3,87) = 19.89, p < 0.0005, F(4,l 16) = 5.56, p < 0.0005, respectively for 1- to 6-letter origin words.
]0.2]
J4.6]
18.10]
)12,I4)
] 16.20)
PRIOR SACCADE LENGTH (LETTERS)
Fig. 6. Probability of making a regressive saccade after a progressive saccade that does not skip a word, as a function of both the progressive saccade length and the length of the origin word from which the progressive saccade is initiated.
Regressive saccades in reading
115
Effect of fixation location on the origin word Previous research suggests that a word is more likely to be identified with a single fixation when the eyes are located near the center of the word (a position from which most letters of the word can be seen) than when they are located at its beginning or end (Brysbaert et al., 1996; Nazir, et al., 1991; O'Regan, 1990). It might therefore be that regressive saccades are more likely to occur when the origin word was fixated only once with the eyes at a non-optimal location for word processing. To test this hypothesis, an analysis was conducted which consisted of measuring whether the eyes' position on the origin word, when no word was skipped, influenced the likelihood of regressing following a forward saccade, with saccade length held constant. This analysis was performed only on cases where the origin word was fixated with a single fixation. To ensure that the number of data available for each condition was large enough, fixation positions in the origin word were grouped into three classes: beginning (Letters 0 and 1 for 4-letter words, and Letters 0,1, and 2 for 5- and 6-letter words), middle (Letters 2 and 3, for 4-letter words, Letter 3 for 5-letter words, and Letters 3 and 4 for 6-letter words), and end (Letter 4 for 4-letter words, Letters 4 and 5 for 5-letter words, and Letters 5 and 6 for 6-letter words). Results indicate a tendency for regressions to be more likely following fixations located near the end of the origin word than after fixations located near the beginning or middle of the word, but this applies mostly for 4-letter origin words; the effect of fixation position for longer words is not very consistent. When prior saccade length was controlled (between 4 and 8 letters), the effect of eye position in the origin word was globally significant, F(2,58) = 4.56, p < 0.05, but it was significant only for 4-letter words, F(2,58) = 7.90, p < 0.0005, F(2,58) = 2.59, p < 0.10, F(2,58) = 1.97. Despite the present control of the eyes' location within the origin word, the origin word length presented again a pattern opposite to that predicted by the lexical hypothesis, the effect of word length being still significant, F(2,58) = 4.29, p < 0.05. None of the interactions were significant. Effect of eye behavior on the origin word An additional analysis examined whether making more than one fixation on the origin word when no word was skipped affected the likelihood of regressing following the progressive saccade originating from that word. Several studies indicate that a word is less likely to be identified if presented only during the initial fixation, this being the result of less letter information being extracted from the word (O'Regan, 1990). Thus, a regressive saccade might be more likely to occur following a word fixated with a single fixation than following a refixated word. Forward saccades originating from 4- to 6-letter words were divided into those following a single fixation on the origin word, and those following refixations of the word. With saccade length controlled, no effect of the number of origin word
116
F. Vitu, G. W. McConkie & D. Zola
fixations on the regression likelihood was found, F(l,29) = 2.38. The interaction between number of origin word fixations and prior saccade length was also never significant, F(2,58) = 1.85. These tests were made for prior saccade lengths between 4 and 10 letters. Thus, these analyses fail to support the prediction made by the lexical hypothesis that the likelihood of identifying the origin word has an effect on regressing following a progressive saccade from that word. Effect of origin word frequency Since the ease of processing associated with a word is a function of its frequency, an analysis of the effect of origin word frequency was carried out. This analysis which was done separately for different prior saccade length intervals, used the same frequency categories as described earlier, 4- to 6-letter words, and only cases where the origin word received a single fixation. The median of word frequencies were 1504,781, and 457 for 4-, 5-, and 6-letter words of high frequency, and 70,32, and 14 for 4-, 5-, and 6-letter words of low frequency. The results that are presented in Figs. 7a-c show, at least for 5- and 6-letter origin words, a tendency for regressions to be most likely following low-frequency origin words. Analyses of variance (made for prior saccade lengths between 4 and 8 letters) reveal a significant effect of word frequency, F(l,29) = 9.15, p< 0.01, with a marginally significant interaction with word length, F(l,29) = 2.39,p < 0.10. The word frequency effect was actually significant only for 6-letter origin words, F(l,29) = 0.47, F(l,29) = 2.29, F(l,29) = 5.75, p < 0.05. The three-way interaction between word frequency, word length, and prior saccade length was also significant, F(2,58) = 3.20, p < 0.05. To ensure that word frequency was not confounded with the eyes' position in the origin word, we compared the distributions of fixation positions in high- and low-frequency origin words, separately for the different prior saccade length intervals. The results obtained show that for most of the conditions, the distribution of fixation locations do not differ between high- and low-frequency words. The present results therefore provide only weak evidence in favor of the hypothesis that initiating a progressive saccade from less perceptible words results in more regressions following that saccade.
Opposite page: Fig. 7. Probability of making a regressive saccade after a progressive saccade that does not skip a word, as a function of both the progressive saccade length and the origin word frequency, for 4-(a), 5- (b), and 6-letter origin words (c).
Regressive saccades in reading
117
118
F. Vitu, G. W. McConkie & D. Zola
Discussion The goal of the above analyses has been to determine whether factors that are known to affect the processing of words (particularly the origin word and the skipped word) influence the likelihood of making a regression during reading. The factors investigated included visual factors that affect the visibility of the word's constituent letters such as, the word length and the location of the fixation on it (in the case of origin words) or to the left and right of it (in the case of skipped words), lexical characteristics of the word (frequency), and eye behavior on the origin word when there was no word skipped. The analyses started from the observation that longer progressive saccades are more frequently followed by a regression, a fact that was replicated in the current study. Further analyses showed that this effect is related to some extent on whether the progressive saccade skips a word. Indeed, regressions are more likely to occur in cases where a word is skipped than in cases where no word is skipped, and the effect of the progressive saccade length is accentuated for skipped cases. Given the fact that the presence of a skipped word affects the likelihood of regressing, it is reasonable to expect that the characteristics of the skipped word might also have an effect, with less perceptible words inducing more regressions. The results obtained are compatible within certain limits, with this hypothesis. First, the likelihood of regressing is higher for longer words, and when word length is controlled, it tends to be higher for low- compared to high-frequency words. Second, the regression likelihood is a function of the eyes' location preceding and following word skipping. For words longer than 2 letters, the further the skipped word lies from the saccade's launch site, the greater the likelihood of a regression occurring after the saccade. However, if the eyes land very near the end of the word, this reduces regressions, as if the skipped word could be identified in these cases without the need to return the eyes to it. These observations suggests that the easiness of processing associated with a word located in parafoveal vision and that is not fixated during the first eye pass may be controlling regression frequency. Indeed, as shown in prior studies, the visibility of a word decreases with distance into the periphery (Brysbaert et al., 1996), and the efficiency of processing a parafoveal word depends on its length and frequency in the language (Inhoff and Rayner, 1986; Vitu, 1991). However, most forward saccades do not skip words. In these cases, one might also expect that characteristics of the origin word, from which the forward saccade was launched, would affect the likelihood of regressing. The present results do not support this hypothesis. First, although the frequency of the origin word had an effect for at least one of the three tested word lengths (6-letter words), longer origin words led to fewer regressions rather than more, and there was no effect of whether the word received a refixation (a factor which is known to affect the number of
Regressive saccades in reading
119
letters that can be extracted on the word). Second, it is only for one of the three tested word lengths that the location of the fixation on single-fixation origin words tends to affect the regression likelihood, with fixations located near the end of the origin word producing slightly more regressions. With respect to the different explanations for regressive saccades that were stated initially, it appears therefore that they are not just the result of an oculomotor strategy that elicits regressions after long saccades. The regressive saccades that occur after a word was skipped would depend for a large part on the on going processing associated with those words. At the same time, several variables that would be expected to affect the ease of processing origin words in non-word-skip cases did not affect regression likelihood or had an effect opposite to that predicted. This might argue against a pure lexical hypothesis for the occurrence of regressions in reading. The current study is obviously only a beginning toward investigating this issue. However, it does support the need for further studies, indicating that certain word-related variables are indeed influencing the frequency of regressing. An important limitation of the analyses reported above is that no distinction was made regarding the word to which regressions were sent: no distinction was made between a regression that refixates a word, goes to the previous word, or goes beyond the previous word. We accept the position that the control of saccades is word-based; that is, that each saccade is intended for a particular word (see Chapter 4). Given this, the current analyses are only a crude beginning at investigating the hypothesis that regressions are made in response to specific word processing difficulties. This problem particularly applies to the analysis of non-skip cases which might have been biased by a very large proportion of within-word regressions. In non-word-skip cases, the eyes are launched from closer to the destination word (on average 3 letters away from the beginning of the destination word) than in skip cases (7 letters away), which results in the eyes landing further away in the destination word for non-skip (on average between the 2nd and 3rd letter) compared to skip cases (on the first letter) (see McConkie et al., 1988; Vitu et al., 1995 for similar results). At the same time, within-word regressive refixations are more likely to occur when the eyes land near the end of the word and in cases where the word is difficult to process (McConkie et al., 1989; O'Regan and Levy-Schoen, 1987; Rayner et al., 1996, Vitu, 1991; Vitu et al., 1990). It is therefore very likely that the regression likelihood measure was contaminated by a large number of regressions within the destination word, regressions which were produced in response to the difficulty of processing associated with only the destination word, and not the origin word. This would account for our failure to report consistent effects of word-related factors in non-skip cases. The need to distinguish between intraword and interword regressions is emphasized by Radach and McConkie (Chapter 4), who observe that the mean of the landing position distributions of intraword regressions tends to vary linearly with
120
F. Vitu, G.W. McConkie & D. Zola
Fig. 8. Distribution of landing sites in 4-letter words, following a interword regressive saccade initiated from different launch sites.
distance of the launch site with respect to the center of the word, whereas interword regressions do not. Figure 8 shows this same phenomenon in the current data set: the mean landing position for interword saccades going to the immediately preceding word is quite constant following saccades of very different lengths. Thus, intraword and interword regressions appear to be controlled on different bases. Despite the limitations that apply to the analyses reported in this chapter, it remains unequivocal that regressions are more likely to occur following word skipping, and that in those instances, regressions reflect to some extent word-related processes. As shown above, regressions are more likely to occur for long and low-frequency skipped words; they are also more likely to occur when the critical word was further from the center of vision preceding and following skipping. These results suggest that a word might be skipped while it has not been yet identified in parafovea. Thus, the size of progressive saccades would not always be appropriately defined to respond to the necessities of on going processing associated with the parafoveal words, a view which is compatible with several accounts for word skipping in reading (see Chapter 6). The present finding that a word skipping is frequently followed by a regression also raises important questions for the use of eye movements as a tool to investigate
Regressive saccades in reading
121
the cognitive processes involved in reading. In the study of sentence processing, several measures have been defined to track the effects of syntactical and/or semantic ambiguities (see Chapters 2 and 3). Among those are the regression likelihood and the first pass reading time. These measures might be subject to biases if no caution is taken as to whether a word skipping occurred in some sentences, or in some conditions. First, the regression likelihood might be enhanced in one condition as a result of word skipping, a fact which might not be attributable to high processing level influences, but rather to lower-level influences related to word identification or saccadic programming. This possibility is very likely at least in studies which compare the ocular behavior on sentences that differ by the presence or absence of one or two very short words before the critical word (see Chapter 3 for an example). Indeed, word skipping is much larger for shorter words (see Chapter 6). The first pass reading time measure might also be influenced by whether or not a word is skipped. Indeed, the time spent in a given region of text strongly depends on how many fixations occur in that region, a fact again which might have nothing to do with higher level influences (see Trueswell, Tanenhaus, and Kello, 1993, for a similar conclusion). In addition, as shown in Fig. 9, the duration of the fixation that precedes a regression is shorter in word-skip cases (x - 235 ms) than in nonword-skip cases (x = 245) (see Pynte, 1996, for similar results). Fixation durations are also shorter before a regression than before a progressive saccade (and this particularly for word skip cases). Thus, if a regressive saccade is initiated from the critical word, the first pass reading time for the region that contains the critical word will be shortened, and this particularly if the word preceding the critical word was
Fig. 9. Distribution of the duration of fixations that precede either a regressive or progressive saccade in word-skip and non-word skip cases.
122
F. Vitu, G. W. McConkie & D. Zola
skipped. This suggests the necessity, when using measures that sum the time spent on several different words, to ensure that the eye movement pattern which precedes or follows the critical word is similar in the different conditions (see Chapter 3 for a similar conclusion). Finally, there remains the question of what is being extracted from the destination word during the fixation that follows word skipping and precedes a regression toward the skipped word. Is processing of the destination word initiated during that fixation, or is it delayed until processing of the skipped word has been resolved? Acknowledgements
We would like to thank John Grimes who participated in the data collection for the present experiment, and Mark Brysbaert for his helpful comments on an earlier version of the manuscript. We would also like to thank R. Radach, K. Rayner and P. van Diepen for their helpful comments on the submitted version of the chapter. This work was supported by a NATO grant which was given to Fran9oise Vitu while she was a visiting researcher at the Beckman Institute, University of Illinois at Urbana-Champaign (1996-1997). References Andriessen, J.J. and DeVoogd, A.H. (1973). Analysis of eye movement patterns in silent reading. IPO Annual Progress Report, 8, 29-34. Bayle, E. (1942). The nature and causes of regressive movements in reading. Journal of Experimental Education, 11, 16-36. Bouma, H. (1978). Visual search and reading: Eye movements and functional visual field: A tutorial review. In: J. Requin (Ed.), Attention and Performance VII. Hillsdale, NJ: Erlbaum, pp. 115-145. Bouma, H. and De Voogd, A.H. (1974) On the control of eye saccades in reading. Vision Research, 14, 273-284. Brysbaert, M., Vitu, F. and Schroyens, W. (1996). The right field visual advantage and the optimal viewing position effect: On the relation between foveal and parafoveal word recognition. Neuropsychologia, 10 (3), 385-395. Buswell, G.T. (1920). An experimental study of the eye-voice span in reading. Supplementary Educational Monographs, 17, Chicago, University of Chicago. Carpenter, P.A. and Just, M.A. (1977). Integrative processes in comprehension. In: D. Laberge and S.J. Samuels (Eds.), Basic Processes in Reading: Perception and Comprehension. Hillsdale, NJ: Erlbaum. Carroll, J.B., Davies, P. and Richman, B. (1971). The American Heritage Word Frequency Book. New York: Houghton Mifflin. Cornsweet, T.N. and Crane, H.D. (1973). Accurate two-dimensional eye tracker using first and fourth Purkinje images. Journal of the Optical Society of America, 63, 6-13.
Regressive saccades in reading
123
Frazier, L. and Rayner, K. (1982). Making and correcting errors during sentence comprehension: eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14, 178-210. Hochberg, J. (1976). Toward a speech-plan eye-movement model of reading. In: R.A. Monty and J.W. Senders (Eds.), Eye Movements and Psychological Processes. Hillsdale, New Jersey: Erlbaum. Inhoff, A.W. and Rayner, K. (1986). Parafoveal word processing during eye fixations in reading: Effects of word frequency. Perception and Psychophysics, 40, 431-439. Just, M.A. and Carpenter, P.A. (1980). A theory of reading: from eye fixations to comprehension. Psychological Review, 87, 329-354. Klein, G.A. and Kurlowski, F. (1974). Effect of task demands on relationship between eye movements and sentence complexity. Perceptual and Motor Skills, 39, 463^466. Lesevre, N. (1964). Les mouvements oculaires d'exploration. Etude electro-oculographique comparee d'enfants normaux et d'enfants dyslexiques. These, Paris. McConkie, G.W., Kerr, P.W., Reddix, M.D., Zola, D. and Jacobs, A.M. (1989). Eye movement control during reading: II. Frequency of refixating a word. Perception and Psychophysics, 46, 245-253. Nazir, T., O'Regan, J.K. and Jacobs, A.M. (1991). On words and their letters. Bulletin of the Psychonomic Society, 29, 171-174. O'Regan, J.K. (1990). Eye movements and reading. In: E. Kowler (Ed.), Eye Movements and Their Role in Visual and Cognitive Processes. Amsterdam/Oxford: Elsevier, pp. 395^53. O'Regan, J.K. and Levy-Schoen, A. (1987). Eye movement strategy and tactics in word recognition and reading. In: M. Coltheart (Ed.), Attention and Performance XII: The Psychology of Reading. Hillsdale, NJ: Erlbaum, pp. 363-383. Pollatsek, A. and Rayner, K. (1990). Eye movements and lexical access in reading. In: D.A. Balota, G.B. Flores d'Arcais and K. Rayner (Eds.), Comprehension Processes in Reading. Hillsdale, NJ: Erlbaum, pp. 143-163. Pynte, J. (1996). Lexical control of within-word eye movements. Journal of Experimental Psychology: Human Perception and Performance, 22 (4), 958-969. Rayner, K and Fischer, M.H. (1996). Mindless reading revisited: Eye movements during reading and scanning are different. Perception and Psychophysics, 58 (5), 734-747. Rayner, K. and Pollatsek, S. (1989). The Psychology of Reading. London: Prentice-Hall. Rayner, K., Sereno, S.C. and Raney, G.E. (1996). Eye movement control in reading: A comparison of two types of models. Journal of Experimental Psychology: Human Perception and Performance, 22 (5), 1188-1200. Schroyens, W., Vitu, F., Brysbaert, M. and d'Ydewalle, G. (in press). Visual attention and eye-movement control during reading: The case of parafoveal processing. Submitted to Quarterly Journal of Experimental Psychology. Shebilske, W. (1975) Reading eye movements from an information-processing point of view. In: D. Massaro (Ed.), Understanding Language. New York: Academic Press. Shebilske, W.L. and Fisher, D.F. (1983). Eye movements and context effects during reading of extended discourse. In: K. Rayner (Ed.) Eye Movements in Reading: Perceptual and Language Processes. Taylor, E. (1971). The dynamic activity of reading: A model of the process. Research Infor-
124
F. Vitu, G. W. McConkie & D. Zola
mation Bulletin No 9. New York: Educational Developmental Laboratories. Trueswell, J.C., Tanenhaus, M.K. and Kello, C. (1993). Verb-specific constraints in sentence processing: separating effects of lexical preference from garden-paths. Journal of Experimental Psychology: Learning, Memory and Cognition, 19(3), 528-553. Vitu, F. (1991). The influence of parafoveal preprocessing and linguistic context on the optimal landing position effect. Perception and Psychophysics, 50, 58-75. Vitu, F. and O' Regan, J.K. (1995). A challenge to current theories of eye movements in reading. In: J. Findlay, R.W. Kentridge and R. Walker (Eds.), Eye Movement Research: Mechanisms, Processes and Applications. Amsterdam, Lausanne, NY, Oxford, Shannon, Tokyo: Elsevier, pp. 381-392. Vitu, F., O'Regan J.K. and Mittau, M. (1990). Optimal landing position in reading isolated words and continuous text. Perception and Psychophysics, 47 (6), 583-600. Vitu, F., McConkie, G.W., Kerr, P.W. and O'Regan, J.K. Within-word eye behavior during reading: Effects of initial fixation location on the gaze duration and its components. In preparation. Vitu, F., O'Regan, J.K., Inhoff, A.W. and Topolski, R. (1995). Mindless reading: Eye movement characteristics are similar in scanning strings and reading texts. Perception and Psychophysics, 57, 352-364. Wanat, S.F. (1976). Language behind the eye: Some findings, speculations, and research strategies. In: R.A. Monty and J.W. Senders (Eds.), Eye Movements and Psychological Processes. Hillsdale, NJ: Erlbaum.
125
CHAPTER 6
Word Skipping: Implications for Theories of Eye Movement Control in Reading Marc Brysbaert University of Ghent
and Fran9oise Vitu Universite Rene Descartes
Abstract Eye movements in reading are characterized by short periods of steadiness (fixations) followed by fast movements (saccades). Saccades are needed to bring new information into the centre of the visual field where acuity is best; fixations are required to recognize words. Assuming that the central (foveal) word is identified during a fixation, it is tempting to forward the hypothesis that eye movements in reading essentially consist of word-to-word movements. Unfortunately, such a simple sequence of motion is rarely observed in empirical data. Some words are fixated more than once, some are initially not fixated but immediately afterwards regressed to, and some are not fixated at all. Ever since the first measurements of eye movements in reading, researchers have been puzzled by this complicated pattern of activity and have suggested various explanations for it. In this chapter, we will focus on one aspect, namely the fact that more than one third of the words are initially skipped during reading. First, we will discuss the explanations offered by different authors, then we will examine the empirical evidence more closely, and, finally, we will present an alternative account of word skipping.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
126
M. Brysbaert & F. Vitu
Word skipping in different theories The loose relation between eye movements and text layout made many of the first researchers believe that eye movements were controlled by an autonomous oculomotor control centre (e.g., Buswell, 1920; Erdman and Dodge, 1898; Huey, 1908). According to this view, saccade sizes were more or less constant and only changed as a function of the global difficulty of the materials being read. Variations in saccade size resulted from noise in the oculomotor system and adjustments to the difficulty of the text. As for word skipping, this implied that the probability of skipping a word depended on the overall easiness of the text but not on the easiness of the word itself. The autonomous oculomotor control model remained the dominant model until the middle of the 1970s, although some refinements were added. For instance, Bouma and de Voogd (1974; see also Shebilske, 1975) attributed changes in saccade size to the limited capacities of an input buffer which could contain but a few activated word units. According to this view, it is only when the buffer gets full or empty that saccade sizes vary around their means. Other variants of the model speculated that variations in saccade size were determined by visual characteristics of the words, in particular the length of the upcoming, parafoveal, word. For example, having found that the length of inter-word saccades depended on the length of the parafoveal word, O'Regan (1980) hypothesized that the eyes were programmed to jump to the centre of the next word. Again, word skipping was not assumed to depend on the easiness of the word (but see O'Regan, 1979). Gradually, the possibility of linguistic influences on the probability of word skipping began to be taken into account. In his cognitive and peripheral search guidance theory, Hochberg (1975,1976) still maintained that eye movements were primarily determined by pre-established scanning routines independent of the linguistic information extracted at each fixation, but these routines depended on both the readers' knowledge of the language constraints and on the task they set to themselves when reading. If the task required subjects to pay attention to the letters and the spelling of the words without paying attention to the meaning (proofreading), readers were assumed to adopt a letter-by-letter or word-by-word scanning routine. If, however, the reading purpose was to extract meaning from text, then larger saccades were made in order to reduce the number of samples per line. In that case, the size of the saccades was controlled by peripheral search guidance mechanisms: At each fixation, the reader anticipated what he would find next on the basis of the meaning and the grammar status of the words previously read and on the basis of global visual information extracted in parafoveal vision (such as the next word's length). Readers directed their eyes to the portions of text which appeared as the most informative to test the predictions that had been made. So, according to Hochberg, the size of a saccade depended on (i) text difficulty and redundancy
Word skipping
127
(which affect a word's predictability and the size of the perceptual span), and (ii) the reader's ability to extrapolate upcoming information and to process parafoveal stimuli. Similar reasoning can be found in Shebilske (1975). Saccade sizes were mainly determined by the capacities of the input buffer (see above), but on some instances the on-going linguistic processing could intervene. If both the overall meaning of the words read on previous fixations and the visual information extracted from the parafovea (such as the next word's length) made the next word highly predictable, then the automatic oculomotor program could be interrupted and the word skipped. This could occur independently of the state of the internal buffer. So, linguistically induced word skipping was possible when the word was highly predictable from prior context and the parafoveal visual information compatible with this prediction. In the late 1970s, the idea that word skipping did not happen at random but was determined by the probability of the parafoveal word being identified during the previous fixation, was strongly promoted in a series of highly influential papers. McConkie (1979) hypothesized that saccade length was determined by the size of the perceptual span (see also Rayner, 1978). Saccades were tuned at each fixation in order to bring the eyes to the region of text which was not clearly visible from the current fixation location. Therefore, saccade sizes were determined by visual acuity limitations, but also by the ease of processing associated with the words to the right of the fixation location (i.e., their predictability and their frequency). Further support for the processing account of word skipping was provided by the different variants of the attentional theory of eye guidance in reading (Morrison, 1984; Pollatsek and Rayner, 1990; Henderson and Ferreira, 1990; see also Chapter 11 for the most recent developments). According to this theory, during reading there is an interplay between attentional processes and eye movement control. As originally proposed by Morrison (1984; see also McConkie, 1979), the sequence of events within a fixation is as follows. Initially, attention is focused on the word in foveal vision. As soon as this word is identified, the attentional beam shifts toward the next word and this word starts to be processed. The shift of attention triggers the programming of a saccade, which is executed after a more or less fixed delay. If processing of the parafoveal word is completed some time before the end of the delay, word skipping is possible. Because the parafoveal word has been identified, the attentional beam again shifts to the next word, and the saccade to the first parafoveal word will be cancelled and replaced by a saccade to the second word. As the probability of parafoveal word identification before saccade execution depends on the easiness of the parafoveal word, the attentional theory predicts that all word skipping should be linguistically based. Although the attentional theory has dominated eye movement research during the last decade, the idea that word skipping may be controlled by non-linguistic
128
M. Brysbaert & F. Vitu
strategies has not completely vanished. For instance, Just and Carpenter (1980, 1987) argued that saccade sizes were independent of on-going processing requirements, and were preprogrammed so that the eyes fixated every word and landed between the beginning and the middle of a word. The hypothesis was that the information extracted from the parafovea was not very detailed, and served mainly to locate the target word for the next saccade. This information was assumed not to be clear enough for recognition of the parafoveal word to take place and to influence the length of the next saccade. Only in exceptional cases when the eyes were located at the very end of a word and when the next word was very short and frequent was it possible that parafoveal word recognition occurred and led to word skipping. So, according to Just and Carpenter, word skipping was largely independent of the easiness of the parafoveal word, and the variability in saccade sizes mainly resulted from visual and oculomotor factors. Just and Carpenter's view on landing sites is particularly surprising, given how high-level the rest of their theory on eye movement control in reading was. A very similar argument can be found in O'Regan's (1990) strategy-tactics theory. According to this theory, the inter-word behaviour of the eyes was mainly determined by a preset oculomotor strategy that depended on the global difficulty of the text and on task demands. O'Regan basically made a distinction between a careful reading strategy (i.e., going from word to word) and risky reading strategies (e.g., skipping every second word, only looking at long words). A further distinction was made between the saccade target (i.e., the middle or the optimal viewing position of a parafoveal word) and the actual landing position. According to O'Regan, visuomotor constraints considerably reduce the accuracy with which a saccade can be executed during reading. So, in the strategy-tactics theory, there were two main determinants of word skipping. Words were skipped because of the strategy (e.g., because they were too short) or because of an oculomotor error (the eyes jumped too far). In addition, O'Regan (1990) accepted linguistic influences if the previous fixation exceeded a certain duration. In this case, parafoveal word processing could interfere with the oculomotor strategy. Finally, it should be noted that McConkie in more recent publications (e.g., McConkie, Kerr and Dyre, 1994) returned to the idea of independent oculomotor strategies during reading. Having found that the probability of word skipping can be predicted rather well by equations that only involve word length and launch site (see below), he started to question the viability of linguistic control theories. To put it in his own words: "...we have briefly outlined the approach we are taking in our attempt to produce a mathematical model of the eye movements of normal, skilled readers. Our greatest surprise thus far has been to observe how much of the variance in the data can be accounted for with a relatively few parameters, and these often reflecting such low-level variables as word length, eye position in word and launch site." (McConkie et al., 1994, p. 325).
Word skipping
129
Empirical data In this section we will look at the empirical evidence upon which the above theories are based. We will deal successively with the landing position distribution, the effects of word length and launch site, and the contribution of language factors such as word frequency and the extent to which a word is constrained by the sentence context. Finally, we will try to determine the relative importance of visual and linguistic variables, in order to decide which should come first in an acceptable model. Landing position distribution Although the impact of oculomotor errors on eye movements in reading has been defended most strongly by O'Regan and colleagues (e.g., Coe'ffe and O'Regan, 1987; O'Regan, 1990), it was McConkie and his group who presented the most convincing empirical evidence for such an effect (McConkie et al., 1988,1994; see also Radach and Kempe, 1993). A typical study involved a few participants reading a complete novel, so that analyses could be based on a large number of observations per person. One of the major analyses was the frequency distribution of initial fixations on a word as a function of the launch site (i.e., the letter position relative to the word centre from which the eye movement started) and the length of the target word. Figure 1 depicts the prototypical findings. There are two main effects. First, the distribution of landing positions is well captured by a Gaussian curve, and
Fig. 1. Frequency distributions of initial fixations on 7-letter words, following saccades launched from 5,10, and 15 letter positions to the left of the center of the word (reprinted from McConkie et al., 1994).
130
M. Brysbaert & F. Vint
second, the mean of the distribution is a function of the launch site. For each launch site one character position farther to the left, the mean of the landing position distribution moves leftward by about a third of a character position. McConkie attributes this to a range error, by which the system tends to overshoot near targets and undershoot far targets. So far, the landing position distribution has mainly been used by McConkie and colleagues to explain the distribution of initial fixations within words. However, as can be seen in Fig. 1, the pattern can easily be extended to the between-words case (i.e., word skipping), as has been asserted by O'Regan (see above). The strategytactics theory further draws attention to two other visuomotor constraints. The first refers to the fact that the landing error is likely to be larger after short fixation durations than after long fixation durations (Coeffe and O'Regan, 1987). The second points to the fact that eye movements to a target are influenced by the presence of other targets in the visual field (i.e., the global effect; Findlay, 1982; Vitu, 199la). Given that words of a text are not presented in isolation, the claim is that the properties of the words behind the target word will influence the saccade size towards the target word. This is particularly true for short target words. On the basis of the present evidence, it is beyond doubt that all comprehensive theories of word skipping should take the existence of involuntary word skipping due to oculomotor error into account (as well as the possibility that a word is involuntarily looked at due to a saccade undershoot). What is less sure, however, is whether all deviations from O'Regan's deterministic inter-word strategy can be explained by oculomotor errors. This question is especially relevant when one looks at the effects of word length and launch site on word skipping. Word length and launch site One of the most conspicuous aspects of word skipping is that it happens more often with short words than with long words. For instance, Vitu et al. (1995) reported skipping probabilities of about 80% for one-letter words, 60% for three-letter words, 30% for five-letter words, and 10% for words of seven letters or longer. Interestingly, these figures were virtually the same whether words in real sentences were presented, or meaningless z-strings that had the same layout (cf, the issue of mindless reading; Vitu et al., 1995; Rayner and Fischer, 1996). Kerr (1992) was the first to note that the average word length effect hides another, equally strong, effect of launch site. Figure 2 shows some typical data. They are drawn from a study in which 24 participants read 120 sentences (see Brysbaert and Mitchell, 1996, Experiment 3 for further details), resulting in a total of more than 22,000 observations for words from two letters to nine letters. Skipping rate is plotted as a function of word length (2-9 letters) and launch site (1-15 letters; operationalized as the distance in letter positions from the blank space in front of the
Word skipping
131
Fig. 2. Word skipping probability as a function of word length and launch site in letter positions relative to the blank space in front of the word.
word). Although a two-letter word on the average was skipped on 69% of the cases, this figure ranged from 90% at launch site one (the last letter of the previous word) to some 50% at launch site 15 (when the eyes already had skipped one or two words). Similar data have been reported by Kerr (1992), Vitu et al. (1995), and Rayner, Sereno, and Raney (1996). In addition, Vitu et al. (1995) showed that the effect of launch site on skipping rate also applies for meaningless z-strings. It should be noted that the word length and the launch site effect can equally well be explained by theories that are based on autonomous oculomotor scanning strategies and theories that are based on linguistic control. According to the oculomotor control theories, short words and words close to the launch site are skipped more often because they lie in an area of high visibility (cf. acuity drops away from the fixation location) and/or because they are less easy to land upon (e.g., due to the landing position error and the global effect). According to the linguistic control theories, short words and words close to the launch site are skipped more often because they are more likely to be identified before saccade onset. This is again due to the drop of visibility outside the fixation position and, in the case of word length, to the fact that short words in general are easier than long words (e.g., they usually have a higher frequency). In order to decide between both types of theories, researchers have tried to disentangle length and processing difficulty by looking at skipping rates for carefully controlled stimulus words and sentences.
132
M. Brysbaert & F. Vitu
Language factors The only way to find out whether parafoveal words are skipped because they were identified during the previous fixation, is to examine skipping rate for stimulus materials that are identical except for the target word. Otherwise, it is impossible to disentangle the effects of the scanning strategy (which can be a function of the difficulty of the text up to the target word and of the length of the words in front of the target word) from those of language processing. In addition, a difference may be drawn between studies in which the processing load of the target word was a function of properties of the word itself, and studies where the processing load was a function of the extent to which the word was constrained by the previous words in the sentence. As not all models predict an equal effect size for both cases (e.g., Hochberg and Shebilske predict a larger effect of contextual constraints than of lexical properties), we will discuss them separately. We were able to locate seven studies in which skipping rates were compared for easy and difficult parafoveal words (see Table 1). In five of these studies, the variable that was manipulated, was the frequency of the word. For instance, Rayner and Fischer (1996) compared sentences like "He invested his money to build a store and was soon bankrupt." with sentences like "He invested his money to build a wharf and was soon bankrupt". In the other two studies, the manipulation was whether the word was visible in parafoveal vision or not up to the moment the eyes crossed the blank space in front of the word. As can be seen, in all studies and for all word lengths, skipping rate was higher or equal in the easy condition than in the difficult condition. This establishes beyond doubt that lexical variables do influence the probability of word skipping. On the other hand, it should be noted that the overall difference in skipping rate is rather small (4%) and tends to be slightly larger for short words than for long words. Eight other studies looked at the effects of contextual constraints on word skipping (see Table 2). Contextual constraints are usually measured by examining how many participants fill in a particular word in a cloze task. For instance, given the sentence "The woman took the warm cake out of the ", participants are much more likely to fill in "oven" (93%) than "pantry" (3%) (Schwanenflugel, 1986). Again, in all studies we found, the difference was in the expected direction: predicted words were skipped more often than unpredicted words. The mean difference amounted to 9%. The largest effect (of 23%) was reported by Vonk (1984) who (in Dutch) compared sentences like "Mary was envious of Helen because she never looked so good", where the pronoun had no disambiguating value, with sentences like "Mary was envious of Albert because she never looked so good", where the pronoun did disambiguate (in an unexpected continuation of the sentence). Brysbaert and Vitu (1995) used the same materials, but compared sentences with an expected continuation "Mary was envious of Marc because he
Word skipping
133
Table 1 Skipping rate as a function of word characteristics Study
Manipulation
Word length (letters)
^easy
^diff
Diff.
Blanchard et al. ( 1 989) (exp 1 + 2)
parafov. preview
1-3
0.56
0.43
0.13
4-5
0.20
0.10
0.10
6-10
0.06
0.04
0.02
Pollatsek et al. (1992)
parafov. preview
3-8
0.12
0.10
0.02
Henderson and Ferreira (1993)
word frequency
4-7
0.18
0.18
0.01
Inhoff and Topolski ( 1 994) (exp 2 + 3)
word frequency
4-7
0.27
0.18
0.09
Rayner et al. (1996)
word frequency
5
0.20
0.14
0.06
6
0.19
0.16
0.03
7
0.13
0.12
0.01
8
0.09
0.08
0.01
9
0.06
0.06
0.00
10
0.08
0.07
0.01
Rayner and Raney (1996)
word frequency
6-9
0.17
0.11
0.06
Rayner and Fischer ( 1 996)
word frequency
5
0.18
0.08
0.10
6
0.12
0.05
0.07
7
0.05
0.01
0.04
8
0.02
0.02
0.00
9
0.03
0.02
0.01 0.04
always looked so good" and sentences with an unexpected continuation "Mary was envious of Marc because she never looked so good". They found a difference in skipping rate of 9%. Rayner and Well (1996) pointed out that the difference in contextual constraints between predicted and unpredicted words has to be rather large in order to obtain an effect on skipping rate. For example, in their first experiment, Ehrlich and Rayner (1981) had a difference between 93% and 15% continuations in the cloze task and
134
M. Brysbaert & F. Vitu
Table 2 Skipping rate as a function of contextual predictability Study
Manipulation
Ehrlich and Rayner (1981) context, constr. (exp 1 + 2)
Word length S^ (letters)
5difr
Diff.
5
0.41
0.35
0.06
Balotaetal. (1985)
context, constr.
4-8.0.11
0.02
0.09
Vonk (1984)
pronoun pred.
3
0.40
0.17
0.23
Schustack et al. (1987)
context, constr.
3-8
0.28
0.16
0.12
Hyona(1993)
context, constr.
7-10
0.04
0.00
0.04
4-7
0.13
0.07
0.06
Inhoff and Topolski (1994) word consistency (exp 1) Brysbaert and Vitu (1995)
pronoun pred.
3
0.49
0.40
0.09
Rayner and Well (1996)
context, constr.
4-8
0.22
0.10
0.12 0.09
reported skipping rates of 49% vs. 38%. In their second experiment, however, continuations were only 60% and 0%, and no difference in skipping rate was found (32% in both cases). Similarly, Hyona (1993) compared conditions of 65% and 32% continuation and found virtually no effect on word skipping (4% vs. 0%). To test the effect of context constraints directly, Rayner and Well (1996) compared three conditions: one in which the target word had been given by 86% of the raters in a cloze task, one in which the target word had been given by 41 % of the raters, and one in which the target word had been given by only 4% of the raters. Skipping rates were respectively 22%, 12%, and 10%; that is, virtually no difference was found between the 41% and the 4% condition. The relative importance of visual and linguistic variables Thus far, we have shown evidence for oculomotor, visual, and linguistic influences on word skipping. However, it should be noted that much of this evidence is not fully conclusive. For instance, the data of the landing position distributions (Fig. 1) have been obtained by aggregating over different word frequencies (and other lexical variables) and different text fragments before the eyes reached the launch site. Similarly, the effects of word length and launch site (Fig. 2) are usually
Word skipping
135
reported without reference to the frequency of the words in the different cells. An exception to the latter can be found in Rayner et al. (1996) (Fig. 2) who looked at the effects of launch site and word frequency on the skipping rate for five- and six-letter words. They reported independent effects of word length and launch site, together with a frequency effect at close launch sites (up to three letter positions in front of the target word). However, even in this study, the text fragments on which the data were based may have been quite different in the various conditions (e.g., it is not unlikely to assume that certain sequences of word lengths resulted in far launch sites and others in near launch sites, or that high-frequency and low-frequency words appeared in different sequences of word lengths). This unsatisfactory state of affairs has motivated theorists of eye movement control in reading to stay on their opposite positions. Much of the debate has centred around the presence or absence of statistically significant differences in skipping rate between easy and difficult conditions, and on imposing further constraints on what are good stimulus materials and empirical data (e.g., according to O'Regan, one should exclude cases of long fixation durations because in these exceptional instances the ongoing processing may interfere with the autonomous scanning strategy; unfortunately, O'Regan has never specified what are "long" fixation durations). However, there may be a way out of the debate by capitalizing more on the data of Tables 1 and 2. In these tables, we have skipping rates for words that were presented in identical sentences and which were constructed in such a way as to maximize the difference in processing difficulty (e.g., by having a large difference in frequency or in contextual constraint). So for these stimuli, word length should not have an effect on skipping rate if eye movements are entirely under the control of the ongoing processing. That is, for these studies we must not find that difficult three-letter words are skipped more often than easy five-letter words. The relative contribution of word length and processing difficulty can be determined by running a multiple regression analysis with both variables as predictors and skipping rate as dependent variable. This is the more easy because the predictors in Tables 1 and 2 are orthogonal. Although we could have taken the raw values of word length, we opted for an exponential transformation, which has the advantages that (i) larger weights are given to differences between short words than between large words, and (ii) that the function asymptotes to 1 (for word length zero) and to 0 (for very long words). Ideally, processing ease of the words should also be part of the exponential function (in order to retain the advantage of the asymptotic values), but for reasons of clarity we introduced this factor as a separate variable in the equation. So, skipping rate is predicted as a function of exp(word length) and easiness of the word (operationalized as -0.5 for the difficult condition and +0.5 for the easy condition). There are 36 data cells in Table 1, and 16 in Table 2.
136
M. Brysbaert & F. Vitu
Fig. 3. Skipping rate as a function of word length and word difficulty (circle = easy condition; square = difficult condition). Empirical data from Table 1. Fitted curve based on nonlinear regression with exp(word length) and contextual constraint as predictors.
Figure 3 shows the outcome for the data of Table 1 (processing load due to word characteristics). Parameters were estimated by means of nonlinear regression analysis. The resulting equation is: %skipping = 100 (e-0.36 word length ) + 4.3 processing load Both regression weights were significant (word length: f(34) = -29.4, p < 0.01; processing load: r(34) = 2.73, p < 0.01) and together accounted for 83% of variance, of which 79% was due to word length and 4% to processing load. Figure 4 shows the results for Table 2 (processing load due to context constraints). A problem here was that many studies did not give enough details to calculate the exact mean word length. In these cases, the average of the lengths reported was used as predictor (a better but more complex approach would have been to take the geometric mean). The resulting equation is: %skipping = 100 (e-0.34 word length ) + 10.0 processing load The regression weight of word length was highly significant, the one of processing load reached significance only for a one-tailed test (word length: f(15) = -11.6, p < 0.01; processing load: f(15) = 1.86, p < 0.10, two-tailed). The regression accounted for 55% of variance, of which 44% was due to word length and 11% to processing load. The main conclusion from Figs. 3 and 4 is that even for studies which specifically looked at the effects of processing difficulty on skipping rate, word length was a
Word skipping
3
4
5
6
137
7
8
9
word length Fig. 4. Skipping rate as a function of word length and contextual constraint (circle = easy condition; square = difficult condition). Empirical data from Table 2. Fitted curve based on nonlinear regression with exp(word length) and contextual constraint as predictors.
more important predictor of skipping rate than processing load. That is, to predict how often a word was skipped, it is better to know how long the word was than to know whether it was visible in the parafovea, of high frequency, or highly constrained by the preceding context. Of additional interest is the finding that the regression weight of word length is virtually the same for studies in which word properties were manipulated (0.36) as for studies in which context constraints were manipulated (0.34). So, the empirical evidence strongly points to the conclusion that short words are skipped more often because they are short and not because they are easy. The only thing processing difficulty does, is slightly modulate the basic underlying phenomenon. Towards a new model of word skipping In the previous section we have seen that word skipping is primarily determined by visual factors (such as the length of the upcoming word, and its distance from the launch site; Kerr, 1992). The impact of these factors is not secondary to the probability of recognizing a word with particular language characteristics at a particular distance in the parafovea, although linguistic factors may slightly change the a priori skipping probability. What we need, therefore, is a model of word skipping that is based on visual characteristics of the text. A problem at this point is that nearly all non-linguistic models of inter-word eye behaviour in reading are characterized by a deterministic scanning strategy. In older
138
M. Brysbaert & F. Vitu
models, this was a chain of more or less equidistant fixations; in more recent models, it basically is a word-to-word motion. The fact that eye movement patterns in reading rarely exhibit the simple characteristics predicted by these models, has to be explained by the assumption of a (considerable) oculomotor error and occasional intrusions of the linguistic processor. It may be asked, however, whether the assumption of a non-probabilistic oculomotor strategy has not been the major weakness of the oculomotor theories, as it only enables fairly simplistic and coarse predictions about inter-word eye behaviour. At least, it is tempting to see the regular patterns of Figs. 1, 3, and 4 as the result of an underlying stochastic process. According to this view, most words are skipped not because of error in the landing position, but because the system did not "intend" to look at them. Target selection based on the extended optimal viewing position effect To have a stochastic, non-linguistic model of eye movement control in reading, we must be able to predict the probability of word identification as a function of word length and stimulus distance from the fixation location. This is what Brysbaert, Vitu, and Schroyens (1996) investigated. They presented words of different lengths at various positions relative to the fixation location and for different periods of time. Figure 5 depicts their main finding. It shows the probability of recognizing a
Fig. 5. Probability of word recognition as a function of presentation duration and word position relative to fixation location. Empirical data and best fitting Gaussian distributions (reprinted from Brysbaert et al., 1996).
Word skipping
139
five-letter word on different positions in foveal and parafoveal vision and for time intervals ranging from 14 to 70 ms. For each presentation duration, the probability of recognizing a word at a certain distance could be described reasonably well with a Gaussian distribution that had the mode shifted slightly to the left of the word centre. A similar pattern was found for three-and seven-letter words. Brysbaert et al. (1996) called this the Extended Optimal Viewing Position (EOVP) effect. Given that the probability of recognizing a word presented at a certain distance and for a given period of time is described reasonably well by a normal curve, this information can be used by the eye guidance system to estimate the chances of identifying the upcoming parafoveal words within the time period of an average fixation (200-220 ms), and to select the most appropriate parafoveal target word. The estimates are based on (i) the length of the word blobs and the distance of the word blobs from the fixation location, and (ii) the standard deviation of the Gaussian EOVP curve. The latter is a function of text difficulty and task demands (e.g., it will be larger for easy texts and for cursory reading). Although the estimates are rather crude, they give a relatively good idea of what the chances are to recognize an "average word" of a particular length and at a particular distance in the parafovea during an "average fixation duration". It is our contention that this educated guess, which becomes available rather soon in the fixation because it is based on coarse visual information, is used in combination with the ongoing text processing to decide whether or not the next parafoveal word can be skipped. So, a parafoveal word with an estimated probability of 90% recognition at the end of the current fixation, can most of the time be skipped without implications for the ongoing text processing. Only in a few instances, when processing is hard, is it better not to skip the word. In the long run, this will lead to something like 10% fixation and 90% skipping (so that the skipping rate equals the identification probability). If the above reasoning is correct, then a straightforward prediction follows: We should be able to describe parafoveal target word selection with the use of a simple inverted Gaussian distribution [i.e., 1 - exp(-sqr(letter position - fixation location)/ standard deviation))], as shown in Fig. 6 (note that in this first approach, we disregard the shift of the EOVP curve to the left of foveal word centre). In Fig. 6, we depict the situation for a launch site on the middle of a three-letter word in front of a five-letter parafoveal word. There is a probability of pi that the foveal three-letter word will be refixated, a probability of p2 - pi that the parafoveal word will be fixated, and a probability of 1 - p2 that the parafoveal word will be skipped. Note that the function has only one parameter, the magnitude of the standard deviation. To test the plausibility of our model, we examined the first-pass forward saccades of Brysbaert and Mitchell (1996, Experiment 3; see above) and plotted the landing positions as a function of target word length and launch site. So, for each forward saccade we looked at the length of the words to the right of the launch position (up to 15 letter positions) and noted where the eyes landed relative to those words. The
140
M. Brysbaen & F. Vitu
Fig. 6. Graphical display of how an inverted EOVP curve can be used as a cumulative landing position distribution going from the launch site to several words in the right visual half-field.
Fig. 7. Fit of the mathematical model to empirical data from Brysbaert and Mitchell (1996, Experiment 3) as a function of word length and launch site. Left panel: 3-letter target words; middle panel: 5-letter target words; right panel: 7-letter target words. Left down corner of each panel: probability of landing sites in front of the word; right upper corner of each panel: probability of landing sites behind the word; middle section: probability of landing sites on the word. Lines: empirical values; closed circles: predictions by the model.
Word skipping
141
data are plotted in Fig. 7 (horizontal lines) for target words of three, five, and seven letters. So, when a saccade was launched 15 letter positions in front of a three-letter target word, there was 88% chance that the eyes would land in front of the word, 7% chance that they would land on the word, and 5% chance that they would land behind the word. For a five-letter target word, these probabilities were respectively 91%, 7%, and 2%. For seven-letter words, they were 88%, 11%, and 1% (see Fig. 7, first column of each panel). We then searched for the best fitting EOVP curve; which happened to have a standard deviation of 10 letter positions (indicated by the black circles in Fig. 7). As can be seen, the empirical data were described remarkably well by a mathematical equation that has but one unknown parameter. Fleshing out the model In the previous paragraph, we have described the general outline of a stochastic nonlinguistic eye guidance strategy that can account to a high degree for the observed word skipping rates as a function of word length and launch site. In the present paragraph, we will try to outline some of the assumptions underlying the model. First, it is important to realize that the probabilities depicted in Fig. 7 do not agree with the distribution of landing positions. They represent the probability that the landing site will be in front of the target word, on the target word, or behind the target word. It is perfectly possible (and quite likely) that the underlying landing site distribution is not normal and unimodal but multimodal with a peak near the middle of each parafoveal word. Otherwise, if the data of Figure 7 represented the landing site distribution, our findings would have been more in line with a model that postulates a constant saccade size and a normally distributed oculomotor error. This is not to say that we reject oculomotor errors as a factor contributing to word skipping (see above). As a matter of fact, simulations indicate that the fit of our model increases significantly if a normally distributed landing error with a standard deviation of two letter positions is added to the model. Second, it cannot be denied that for the model to work, it is not enough to calculate the identification probabilities of parafoveal words. It is also necessary to "decide" which word to pick out on the basis of these probabilities. At present, it is not clear which processes are involved in the decision and where the decision is made (leaving aside the possibility of a "homonculus problem"). On the one hand, we could think of a version that closely resembles a random control model. According to this version, at the beginning of a fixation, a random number is generated (between 0 and 1) and the word selected that corresponds to this probability (e.g., if the number falls between p\ and pi in Fig. 6, then the parafoveal five-letter word would be selected as target). On the other hand, one could think of a
142
M. Brysbaert & F. Vitu
version in which the selection depends on the amount of resources left open by the ongoing text processing. As suggested above, we tend to prefer the latter version. This implies that the decision is more likely to be taken by a system related to language processing than by an autonomous oculomotor control system. Given that we accept a decision based on language processing, it may be asked what the difference still is with prevailing linguistic control theories. According to these theories, a word is skipped because it was recognized during the previous fixation. According to our view, a word is skipped because the language system estimates chances high enough that it will be identified by the end of the current fixation or, at least, that bypassing the word will not hinder text understanding (e.g., because the text fragment is easy). So, the difference is essentially one between an evaluation of what has already been achieved and an educated guess of what will be achieved in the near future. We believe the latter forms the basis of eye guidance in reading, because an educated guess can be realized earlier in the time course of a fixation. In our view, attentional theories of eye movement control in reading have seriously underestimated the time it takes to identify a parafoveal word, although Rayner himself pointed to the fact that word identification times increase 90 ms per degree of eccentricity (Rayner and Morrison, 1981; see also Schiepers, 1980). This value should be added to the 60 ms stimulus transfer time from the eyes to the brain and the 100 ms needed for saccade programming (McConkie, 1983), so that it is virtually impossible for a parafoveal word beginning at an eccentricity of three letter positions to determine the size of the upcoming saccade, unless the fixation lasts considerably longer than 60 + 100 + 90 = 250 ms. What is possible, however, is that the system responsible for the programming of the next saccade at a certain point in time (i.e., before the programming of the saccade starts) estimates the likelihood that the parafoveal word will be recognized within the next 100 ms. In principle, the system could use whatever information is available at that moment (Vitu, 1991b), but most of the time this will not be very detailed information about the word, but rather some crude measures such as word length and distance in the parafovea (see also below). At this point, it is necessary to mention explicitly a third assumption of our model, namely that the parafoveal word is processed in parallel with the foveal word, but with a delay (see Schroyens et al., in press, for a more detailed description of our account and some empirical evidence for it). There is little point in estimating the probability that a parafoveal word will be recognized at the end of the current fixation, if processing only starts when the foveal word has been identified (as assumed in current attention theories). Furthermore, if the parafoveal word is processed in parallel with the foveal word, the educated guess can gradually be updated as a function of the duration of the fixation. As shown above, the basic parameters that determine word skipping are word length and launch site. However, the effect of these parameters is to some extent modulated by the difficulty of the
Word skipping
143
parafoveal word. One way of conceiving this, is to state that on a few occasions, the (easy) parafoveal word is recognized soon enough before the saccade is launched, so that the programmed saccade to this word can be cancelled and replaced by a saccade to the next word. This seems to be O'Regan's (1990) reasoning, when he claims that linguistic information can guide eye movements if fixation durations are exceptionally long. Another view is that the first educated target guess, based on visual factors, is constantly updated by incoming linguistic information about the parafoveal word until saccade programming begins. The advantage of the latter view is that the extra information need not be lexical identification, but can be any sub-lexical information (such as stimulus familiarity, fit of the word blob within the contextual expectations, frequency of the first bigram/trigram, and so on). The main purpose of this extra information is to improve the estimated probability that the parafoveal word will be recognized at the moment of saccade onset (which takes place some 100 ms later). Some new predictions There are three main predictions to be made on the basis of our model of word skipping. The first is that differences in inter-word eye behaviour as a function of text difficulty and expertise can be described by a change in the standard deviation of the Extended Optimal Viewing Position curve. This would mean that exactly the same mechanisms are used in different reading situations and that differences in eye behaviour are gradual rather than absolute (the latter is the case, for instance, in the distinction between a word-by-word strategy and a skip-every-second-word strategy). The second prediction is that because word skipping happens on the basis of partial information, in a non-neglectable number of instances words will be skipped erroneously. A word that on the basis of its length and distance looked like a safe bet to skip, may turn out to be more difficult and crucial for the understanding of the text than expected at first sight. In those cases, the skipping is likely to be followed by a regressive eye movement. Indeed, as documented in Chapter 5, the pattern of words initially skipped and immediately afterwards regressed to, occurs very regularly in reading and is probably too frequent to be ascribed to oculomotor error alone. In addition, many of these regressions are preceded by short fixations. For instance, in our eye movement data base of Brysbaert and Mitchell (see above), 35% of the fixation durations shorter than 200 ms were followed by a regression to the previous (skipped) word. This was more than the percentage of forward movements to the next word (23%) or to the second next word (27%). On the other hand, fixations of 300 ms and more had a chance of less than 15% to be followed by a regression to the previous word, and more than 30% to be succeeded by a forward movement to the next word or to the second next word. This finding clearly challenges the predictions
144
M. Brysbaert & F. Vitu
made by the attention theory (e.g., Rayner, 1995) that short fixations should be followed by a forward movement (because they are a sign that the parafoveal word was identified before the saccade started but after the execution deadline expired). The finding also presents problems for the "first-fixation-duration" measure often used in eye movement research, as the duration of a fixation on a target word followed by a regression to the previous skipped word, is unlikely to reveal much about the processing of the target word. The data on regressive eye movements after short fixation durations bring us to the third prediction of the model, namely that language influences on word skipping may be due to a shortening of an intended saccade rather than to a lengthening. According to most current models of eye movement control in reading, the initial target for a forward saccade is the next parafoveal word. So, any linguistic influence must be limited to a replacement of this saccade by a saccade to a more distant word. However, if the target of the initial saccade can be the second or even the third word in the parafovea, as our model claims, then linguistic influences can result in a cancelling of an intended skip and its replacement by a saccade to a less distant word. This would happen, for instance, when the eye guidance system originally estimated that a particular parafoveal word could safely be skipped, but in the time course of the fixation noticed that this was unlikely to be so (e.g., because the processing rate of the parafoveal word was slower than anticipated). Cancelling an intended skip may even be a more common language processing intervention than replacing a saccade by a saccade to a more distant word, as there is some evidence that shortening a saccade is easier than lengthening it (Becker, 1991, pp. 129-130). Conclusion It is tempting to think of eye movements during reading as an activity which is completely regulated by either a dumb oculomotor strategy or by the ongoing text processing. In both cases, the pattern of fixations within a line of text seems needlessly complicated and chaotic, and requires the assumption of a large landing position error. Another approach is to see eye guidance as hypothesis generation on the basis of incomplete information. In this view, errors (and, hence, corrective movements) are an inherent part of the model and should exhibit properties related to the hypothesis generator. We have tried to show that at least word skipping can be described quite well within such a framework.
Word skipping
145
Acknowledgements The collaboration leading to this text was made possible by grants from the Fonds voor Wetenschappelijk Onderzoek and the European Union (BIOMED BMHICT94-1441) to the first author, a grant from the Fyssen Fondation to the second author, and a Tournesol grant (T/94.046) fostering collaboration between France and Flanders. We would like to thank Keith Rayner, Ralph Radach, and an anonymous reviewer for helpful comments on an earlier draft.
References Balota, D.A., Pollatsek, A. and Rayner, K. (1985). The interaction of contextual constraints and parafoveal visual information in reading. Cognitive Psychology, 17, 364-390. Becker, W. (1991). Saccades. In: R.H.S. Carpenter (Ed.), Vision and Visual Dysfunction, Vol. 8: Eye Movements. London: MacMillan, pp. 95-137) Blanchard, H.E., Pollatsek, A. and Rayner, K. (1989). The acquisition of parafoveal word information in reading. Perception and Psychophysics, 46, 85-94. Bouma, H. and De Voogd, A.H. (1974). On the control of eye saccades in reading. Vision Research, 14, 273-284. Brysbaert, M. and Mitchell, D.C. (1996). Modifier attachment in sentence parsing: Evidence from Dutch. The Quarterly Journal of Experimental Psychology, 49A, 664-695. Brysbaert, M. and Vitu, F. (1995). Word skipping: Its implications for theories of eye movements in reading. Paper presented at the Eighth European Conference on Eye Movements. Derby, UK. Brysbaert, M., Vitu, F. and Schroyens, W. (1996). The right visual field advantage and the optimal viewing position effect: On the relation between foveal and parafoveal word recognition. Neuropsychology, 10, 385-395. Bus well (1920) An experimental study of eye-voice span in reading. Supplementary Educational Monographs, 17. Coe'ffe, C. and O'Regan, J.K. (1987). Reducing the influence of nontarget stimuli on saccade accuracy: Predictability and latency effects. Vision Research, 27, 227-240. Ehrlich, S.F. and Rayner, K. (1981). Contextual effects on word perception and eye movements during reading. Journal of Verbal Learning and Verbal Behavior, 20, 641-655. Erdmann, B. and Dodge, R. (1898). Psychologische Untersuchungen uber das Lesen aufexperimenteller Grundlage. Halle. Findlay, J.M. (1982). Global visual processing for saccadic eye movements. Vision Research, 22, 1033-1045. Henderson, J.M. and Ferreira, F. (1990). Effects of foveal processing difficulty on the perceptual span in reading: Implications for attention and eye movement control. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 417-429. Henderson, J.M. and Ferreira, F. (1993). Eye movement control during reading: Fixation measures reflect foveal but not parafoveal processing difficulty. Canadian Journal of Experimental Psychology, 47, 201-221.
146
M. Brysbaert & F. Vitu
Hochberg, J. (1975). On the control of eye saccades in reading. Vision Research, 15,620. Hochberg, J. (1976). Toward a speech-plan eye-movement model of reading. In R.A. Monty and J.W. Senders (Eds.), Eye movements and psychological processes. Hillsdale, NJ: Erlbaum. Huey, E.B. (1908). The psychology and pedagogy of reading. New York: MacMillan. Hyona, J. (1993). Effects of thematic and lexical priming on readers' eye movements. Scandinavian Journal of Psychology, 34, 293-304. Inhoff, A.W. and Topolski, R. (1994). Use of phonological codes during eye fixations in reading and in on-line and delayed naming tasks. Journal of Memory and Language, 33, 689-713. Just, M.A. and Carpenter, P.A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87, 329-354. Just, M.A. and Carpenter, P.A. (1987). The psychology of reading and language comprehension. Newtown, MA: Allyn and Bacon. Kerr, P. W. (1992). Eye movement control during reading: The selection of where to send the eyes. Doctoral Dissertation, University of Illinois at Urbana-Champaign, Urbana, IL. McConkie, G.W. (1979). On the role and control of eye movements in reading. In: P.A. Kolers, M.E. Wrolstad and H. Bouma (Eds.), Processing of Visible Language (Vol. 1, pp. 37^8). New York: Plenum. McConkie, G.W. (1983). Eye movements and perception during reading. In: K. Rayner (Ed.), Eye Movements in Reading: Perceptual and Language Processed. New York: Academic Press, pp. 65-96. McConkie, G.W., Kerr, P.W., Reddix, M.D. and Zola, D. (1988). Eye movement control during reading: I. The location of initial eye fixations. Vision Research, 28,1107-1118. McConkie, G.W., Kerr, P.W. and Dyre, B.P. (1994). What are 'normal' eye movements during reading: Toward a mathematical description. In: J. Ygge and G. Lennerstrand (Eds.), Eye Movements in Reading. Oxford: Elsevier, pp. 315-327. Morrison, R.E. (1984). Manipulation of stimulus onset delay in reading: Evidence for parallel programming of saccades. Journal of Experimental Psychology: Human Perception and Performance, 10, 667-682. O'Regan, J.K. (1979). Eye guidance in reading: Evidence for the linguistic control hypothesis. Perception and Psychophysics, 25, 501-509. O'Regan, J.K. (1980). The control of saccade size and fixation duration in reading: The limits of linguistic control. Perception and Psychophysics, 28, 112-117. O'Regan, J.K. (1990). Eye movements and reading. In: E. Kowler (Ed.), Eye Movements and Their Role in Visual and Cognitive Processes. Amsterdam: Elsevier, pp. 395^453. Pollatsek, A., Lesch, M., Morris, R.K. and Rayner, K. (1992). Phonological codes are used in the integration of information across saccades in word identification and reading. Journal of Experimental Psychology: Human Perception and Performance, 18,148-162. Pollatsek, A. and Rayner, K. (1990). Eye movements and lexical access in reading. In: D.A. Balota, G.B. Flores d'Arcais and K. Rayner (Eds.), Comprehension Processes in Reading. Hillsdale, NJ: Erlbaum, pp. 143-163. Radach, R. and Kempe, V. (1993). An individual analysis of initial fixation positions in reading. In: G. d'Ydewalle and J. Van Rensbergen (Eds.), Perception and Cognition: Advances in Eye Research. Amsterdam: North-Holland, pp. 213-226.
Word skipping
147
Rayner, K. (1978). Eye movements in reading and information processing. Psychological Bulletin, 85, 618-660. Rayner, K. (1995). Eye movements and cognitive processes in reading, visual search, and scene perception. In: J.M. Findlay, R.W. Kentridge and R. Walker (Eds.), Eye Movement Research: Mechanisms, Processes, and Applications. Amsterdam: North-Holland, pp. 3-22. Rayner, K. and Fischer, M.H. (1996). Mindless reading revisited: Eye movements during reading and scanning are different. Perception and Psychophysics, 58, 734-747. Rayner, K. and Morrison, R.E. (1981). Eye movements and identifying words in parafoveal vision. Bulletin of the Psychonomic Society, 17, 135-138. Rayner, K. and Raney, G.E. (1996). Eye movement control in reading and visual search: Effects of word frequency. Psychonomic Bulletin and Review, 3, 245-248. Rayner, K., Sereno, S.C. and Raney, G.E. (1996). Eye movement control in reading: A comparison of two types of models. Journal of Experimental Psychology: Human Perception and Performance, 22, 1188-1200. Rayner, K. and Well, A.D. (1996). Effects of contextual constraint on eye movements in reading: A further examination. Psychonomic Bulletin and Review, 3, 504-509. Schiepers, C. (1980). Response latency and accuracy in visual word recognition. Perception and Psychophysics, 27, 71-81. Schroyens, W., Vitu, F., Brysbaert, M. and d'Ydewalle, G. (in press). Visual attention and eye-movement control during reading: The case of parafoveal processing. The Quarterly Journal of Experimental Psychology. Schustack, M., Ehrlich, S. F. and Rayner, K. (1987). The complexity of contextual facilitation in reading: Local and global influences. Journal of Memory and Language, 26, 322-340. Schwanenflugel, P.J. (1986). Completion norms for final words of sentences using a multiple production measure. Behavior Research Methods, Instruments and Computers, 18, 363-371. Shebilske, W. (1975) Reading eye movements from an information-processing point of view. In: D. Massaro (Ed.), Understanding Language. New York: Academic Press, pp. 291-311. Vitu, F. (199 la). The existence of a center of gravity effect during reading. Vision Research, 31, 1289-1313. Vitu, F. (1991b). The influence of parafoveal preprocessing and linguistic context on the optimal landing position effect. Perception and Psychophysics, 50, 58-75. Vitu, F., O'Regan, J.K., Inhoff, A.W. and Topolski, R. (1995). Mindless reading: Eye movement characteristics are similar in scanning letter strings and reading texts. Perception and Psychophysics, 57, 352-364. Vonk, W. (1984). Eye movements during comprehension of pronouns. In: A.G. Gale and F. Johnson (Eds.), Theoretical and Applied Aspects of Eye Movement Research. Amsterdam: North-Holland.
This page intentionally left blank
149
CHAPTER 7
The Influence of Parafoveal Words on Foveal Inspection Time: Evidence for a Processing Trade-Off Alan Kennedy University of Dundee
Abstract The influence of properties of a 'target' word, presented in the parafovea, on the time to process one of two possible foveal 'prompt' words was examined using measures of gaze and fixation duration. Gaze duration on the prompt showed a sensitivity to the length and word frequency of the to-be-fixated target. There were no effects of the token 'familiarity' of the target's initial letters. More detailed analyses revealed that the obtained frequency effect arose from the inclusion of prompt refixations in the data set. A measure restricted to those cases where the prompt was fixated only once showed a sensitivity to the length of the target and its 'informativeness', defined by the number of words sharing its initial letters, but not to its word frequency or its familiarity. Parafoveal influences were paradoxically inverted, shorter foveal inspection time being associated with long targets and with targets sharing initial letters with many other words. Such parafoveal-on-foveal effects are incompatible with models of reading in which attention is allocated sequentially to successive words. The data are, however, consistent with the proposition that foveal and parafoveal processing occurs in parallel, implicating some form of local process-monitoring taking place over a region larger than the word. It is suggested that saccades can be triggered by a mechanism sensitive to the rate at which sub-lexical parafoveal information can be acquired. Information secured during parafoveal preview is traded-off when a word is eventually inspected foveally. Two responses are given to the objection that the results might be specific to the artificial laboratory task used. First, the same pattern of effects is obtained in an analysis of the filler materials. Second, the effects of parafoveal length and informativeness found under laboratory conditions are also present in a large corpus of eye movement measures obtained under conditions of normal reading. Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
150
A. Kennedy
Introduction In two recent papers, Henderson and Ferreira (1990; 1993) examined the way foveal and parafoveal information combine and interact to determine the timing of eye movements on successive words in text. The topic is of central importance to theories of eye movement control and has been raised in one way or another since the very earliest days of eye movement research. It takes the form of a deceptively simple question: when fixating a particular word, what information is available from words not yet fixated? In their first (1990) set of experiments, Henderson and Ferreira demonstrated that processing load at the current point of fixation (i.e., 'foveal difficulty') changes the effective perceptual span. The experimental procedure looked at the size of the parafoveal preview effect using the boundary technique introduced by Rayner (1975). The difficulty of the foveal word was manipulated in different experiments, either by changing word frequency or local syntactic load (adding or removing a complementiser in a garden path sentence). Both word frequency and syntactic difficulty manipulations produced similar results: the time taken to process a word in the parafovea, when it was eventually fixated, varied as a function of the difficulty of the foveal word. The outcome can be illustrated using the data from the word frequency manipulation (although it was very similar for the manipulation of syntactic difficulty). When the foveal word was of low frequency, the preview advantage was 3 ms. In contrast, with a high frequency foveal word there was a 13 ms preview advantage. The theoretical background for the study of this preview effect is provided by Morrison's (1984) parallel programming model of eye movement control in reading, which is possibly the most clearly articulated example of a theory emphasising on-line processes exerting linguistic control over eye movements in real time. For Morrison, the when and where of eye movement control are influenced from moment to moment by linguistic properties of the word, or words, being processed. Initially, attention is focused on the foveal word n. When a certain criterion level of processing has been reached, attention shifts discretely towards word n + 1 and at the same time, while word n is still fixated, a saccade is programmed towards n + 1. Thus, for a period of time, the reader is fixating word n and processing word n + 1, the time employed by this processing being equal to the saccadic programming time. The reference to parallel programming in the model points to the possibility that a second saccade (to word n + 2) may be programmed before the saccade to n + 1 is initiated. Indeed, if parafoveal processing is particularly efficient, there may be time for two (or even more) attentional shifts, with the consequence that occasionally words may be skipped. The Morrison model is admirably explicit, but it faces a number of challenges. A particular problem arises from the notion of a 'criterion level' of information processing of the foveal word which triggers the attentional shift. If a constant
Influence ofparafoveal words on foveal inspection time
151
criterion is assumed, the model predicts a constant parafoveal preview effect. The amount of information collected from a word in the parafovea should not vary as a function of foveal processing load, for the simple reason that the attentional shift does not occur until the foveal criterion is exceeded. In this sense, the model is to be distinguished from one where parallel processing of foveal and parafoveal words takes place (McClelland and Mozer, 1986; Schroyens et al., 1996). The demonstration by Henderson and Ferreira (1990) that fixation on a difficult word reduces the amount of parafoveal information extracted poses a problem for the notion of a fixed perceptual (or attentional) span. Their suggested solution is to add to the model a maximum possible fixation time, or deadline, after which a saccade becomes inevitable regardless of the current level of processing. The model handles their data under this new assumption because of the way the saccadic deadline interacts with the parallel programming mechanism. If foveal processing is particularly difficult, the deadline will expire before attention has shifted to word n + 1. In this case, a saccade will be prepared, but with a target location within the current word (since that is where attention remains directed). It is worth noting, in the context of later discussion, that the model does not deal with the question of fixation location, but there is a strong presumption that these within-word refixations will invariably be in a left-to-right direction. The deadline assumption predicts a foveal-on-parafoveal interaction because the time spent shifting attention within the foveal word is effectively subtracted from the time attention can be allocated to word n + 1. Thus, if the saccadic deadline is exceeded, any pre-view advantage which might have accrued will be wiped out. It should be noted, however, that effects of foveal load on parafoveal preview are also consistent with a parallel processing model. If visual attention is to be thought of as a 'limited resource', allocating more to a foveal word will mean less available for processing in the parafovea1. Henderson and Ferreira (1993) present data which comment directly on the distinction between models in which attention is allocated in parallel and ones, like their modification of Morrison's, where saccade programming takes place in parallel, but attention is allocated in a strictly sequential fashion. They examined the way in which the difficulty of an as-yet unfixated word in the parafovea influenced processing time on a foveal word. If attention is allocated sequentially, properties of a parafoveal word will not influence foveal processing time, because an attentional shift only occurs when foveal processing is complete (or when the processing deadline expires). On the other hand, if processing occurs in parallel, parafovealon-foveal effects are predicted. Measuring eye movements and manipulating the 1 In fact, a contrary prediction can also be generated from this perspective: If the only relevant variable is the total time available, the more time spent inspecting a foveal word, the more time which can be allocated to parafoveal processing. See Schroyens et al. (1996) for a discussion.
152
A. Kennedy
frequency of successive words embedded in short sentences, they gauged the processing consequences for the currently-fixated word of varying the difficulty of words in the parafovea. The results showed no influence at all of parafoveal difficulty on foveal processing time (see also Carpenter and Just, 1983, for a similar finding). In fact, the time spent inspecting a foveal target was actually (nonsignificantly) shorter (244 vs. 252 ms) when the word in the parafovea was difficult (i.e., of low frequency compared to high). It is perhaps important to make clear that parafoveal words were processed during foveal inspection (that is, there was a pre-view advantage): the point is, parafoveal information was only extracted after the processing of the foveal word had been completed. Obviously, this outcome offers no support for the notion of parallel allocation of attention, but is perfectly compatible with the (modified) Morrison model. It is also compatible with the E-Z Reader Model discussed by Rayner, Reichle and Pollatsek (Chapter 11). This differs in significant ways from the original Morrison conception, in particular by postulating a semi-autonomous 'Motor Control' system controlled, not by attentional shifts, but by the completion of an early (pre-lexical) familiarity check on the inspected word. Nonetheless, despite these differences, the E-Z Reader Model postulates a process of sequential attention allocation, in this case, attention being shifted from word n to n + 1 when lexical access on n is achieved. Parafovealon-foveal effects are not predicted. The motivation for the work reported here rests on the claim that it is premature to attempt to resolve the nature of parafoveal-on-foveal interactions using studies of normal reading. A necessary first step is to examine the effects of a parafoveal target on foveal eye movement patterns and timing using experimental procedures with tighter control than is possible in reading itself. I will consider two general arguments in support of this claim and then turn to more specific issues. First, the dynamics of eye movement control in reading are highly complex and present a major source of variability in measured processing time. The analysis of eye movement measures derived from the unconstrained inspection of text (as distinct from fixations on isolated words) can be particularly problematic when trying to interpret experimentally-induced variation in fixation duration or saccade extent (Trueswell, Tanenhaus and Kello, 1994). In reading, each word is inspected by an initial fixation at a particular position resulting from an 'entry saccade' of a given size, launched from a particular location in another word. As McConkie et al. have convincingly shown (1988, 1989), there are a number of complex interactions between launch site, initial landing site and saccade size. In particular, within-word refixations act to frustrate the acquisition of 'clean' measures of preview effects. As a second general observation it should be noted that some of the more robust and reproducible eye movement effects found with studies of individual words viewed in isolation (e.g., the effects of word frequency on fixation duration or the effect of initial landing position on gaze or refixation rate), become diluted or statistically
Influence of parafoveal words onfoveal inspection
time
153
much less reliable when examined in the context of normal reading. Of course, it could be argued from this that experimental work on eye movements in reading should necessarily focus on that task. But it is equally reasonable to ask whether processing words in text actually engages different cognitive processes and if so, why? What precisely is it which dilutes laboratory-based effects? For example, why should relatively robust effects in a reading task vanish when the same words are scanned in a visual search task? These questions have become the source of a lively debate (Vitu et al., 1995; Rayner and Fischer, 1996), focusing on the contention that any laboratory procedure which does not demand lexical processing may punish the experimenter by delivering data of little relevance to the study of reading itself. A conservative stance on this question, however, must surely hold that the vast literature on lexical processing, almost wholly based on studies of relatively short words viewed in isolation at a fixed location, has provided fundamental insights into cognitive processes relevant to reading. These cannot easily be ignored. The present paper attempts to steer a middle course in the debate. It is obviously crucial to demonstrate how the dynamics of reading in practice modulate cognitive processes known to operate on isolated words, but a first step is to secure the necessary baseline data from a task less complex than reading itself, albeit demanding lexical processing. The more specific motivation for the present experiment relates to certain problematic aspects to the data provided by Henderson and Ferreira (1990,1993). In the first place, the refixation rate on the foveal word in the 1993 studies was extremely low (effectively zero) and it was not measured in the 1990 experiments. A refixation rate of 10-16% might have been expected for the relatively short words they employed (Radach, 1996), and it is rather unclear why the obtained rate was so low. In any case, the choice of materials appears to have been unfortunate in the context of a model where 'cancelled refixations' play such an important role (it will be recalled that whether an inter-word or intra-word saccade is triggered depends on a race between foveal processing load and a saccadic deadline). Even the 'difficult' items were, in absolute terms, apparently not very difficult to process. A further problem2 relates to the variation in difficulty itself. In both the 1990 and 1993 studies this involved a manipulation of word frequency. Word frequency is a property of a word as a whole and there are numerous studies suggesting that sub-lexical encoding processes, in particular relating to word-initial letters, influence the control of eye movements (Lima and Inhoff, 1985; Pynte, Kennedy and Murray, 1991; Rayner, 1978; Rayner, McConkie and Zola, 1980). Thus, for the manipulation of difficulty chosen by Henderson and Ferreira to operate, it would have been necessary for parafoveal words to be identified. The circumstances under This may be seen as more of a general than a specific problem, since the precise manipulation of word properties in continuous text is virtually impossible.
154
A. Kennedy
which this might generally occur are little understood and rather controversial (see Rayner and Sereno, 1994, for a review). There is a real possibility, therefore, that the parafoveal 'difficulty' manipulation was simply ineffective, in which case it would be premature, on the basis of the 1993 null effect, to conclude that parafovealon-foveal effects do not occur. The experiment I shall be discussing was designed to examine parafovealon-foveal effects using a procedure with the level of control typical of single-word presentation methods, but allowing reading dynamics to influence the outcome (see Vitu, 1991, for a similar approach). The study made use of the following procedure. Following a fixation point, participants were confronted with a sequence of three words, each separated by a single space. The first word in the sequence, referred to as the 'prompt', was always either the word 'looks' or the word 'means'. The prior fixation point determined that the initial fixation on this was invariably on its third letter. The prompt determined the operation to be carried out on the other two displayed words, the first of which will be referred to here as the 'target'. When the prompt was the word 'looks', a yes-no decision was called for on the basis of the physical identity of the two adjacent target words; when the prompt was the word 'means', participants made a yes-no decision on the basis of the judged synonymy of the two targets. The decision demanded a sequence of fixations and saccades across the three words but, equally, maintained close control over initial fixation position. Method Participants These were 20 student volunteers who were paid £4.00 to take part. Some had previous experience of psycholinguistic eye movement experiments, but none had participated in experiments involving the present procedure or materials. All had normal (unconnected) vision and were naive as to the purposes of the experiment. Apparatus Eye movements were recorded from the right eye using a Dr Bouis Oculometer. This device presents an infra-red image of the eye to a dense array of photodetectors which computes the location of the pupil centre. Horizontal eye position was sampled every 5 ms and recorded for later analysis off-line. Viewing of the stimulus material was binocular. The Oculometer provided a near-linear output over the visual angle subtended by the displayed words (about 8° at most). The materials were presented using a high-resolution (8x16) monopitch font in negative polarity (i.e., white characters on a black background) on a Manitron
Influence of parafoveal words onfoveal inspection time
155
display driven at a refresh rate of 100 Hz. At the viewing distance of 525 mm, 3 characters subtended approximately 1° of visual angle. Responses were made using light-weight micro-switches which minimised head and shoulder movements. Materials and design The experimental materials comprised a set of 96 five-letter and 96 nine-letter words drawn from the Ku?era and Francis (1967) norms. For each word length, there were eight sets of twelve words, allocated randomly as the first target word to 'match' and 'mismatch' conditions in the 'looks' task. The second item in mismatch pairs was the same length and in the majority of cases had the same initial three letters. The eight sets of experimental items were selected to meet a criterion defining high and low values of word frequency; initial trigram frequency or 'informativeness' (that is, the number of words in the norms having that set of initial letters, regardless of length); and 'familiarity', or the token frequency of the initial three letters. Table 1 illustrates the composition of the experimental materials: for example, the word 'dense' in English, although relatively rare, shares its initial trigram with quite a large number of other words and is consequently relatively 'uninformative'. However, since none of these words is particularly frequent in the language, the trigram has low overall 'familiarity' (token frequency). By way of contrast, while the word 'arena' is rare, and few words share its initial letters, the trigram itself is very familiar. There were also 192 words allocated to the filler task (the 'means' decision), comprising four sets of 48 four-, five, eight- and nine-letter words. Target words demanding a 'yes' decision in this task were reasonably close synonyms. The total set of 384 word triples3 was presented in a different randomised sequence for each participant. It should be emphasised that the critical experimental words were all allocated to the 'looks' task and analysis (at least, initially) was restricted to the decisions made in that task. All the variables were manipulated within-Subjects, the design thus having factors of Decision Type (yes-no), Word Length, Word Frequency, Trigram Frequency and Trigram Token Frequency. Procedure On arrival, participants were given a sheet of instructions for the experiment asking them to observe a fixed routine on each trial. This began with fixation of the marker. After 500 ms the fixation marker was replaced by the display of the three experimental words, each separated by a single space. Participants were asked to look at 3 The complete set of materials is available from the author.
156
A. Kennedy
Table 1 Examples of the materials used. These were words of High and Low Word Frequency (WF), Initial Trigram Frequency (TF) and Initial Trigram Token Frequency (TK). The Table shows, for each case, Short (five-letter) and Long (nine-letter) words. The three numbers associated with each exemplar provide the cell means for word frequency (per million), 'informativeness' (i.e type initial trigram frequency indexed by the number of words in the lexicon sharing an initial trigram), and 'familiarity' (i.e., token initial trigram frequency indexed by the summed word frequency for the set of words sharing a particular initial trigram). HIWF
fflWF
fflTF
HITF
fflWF LOTF
HITK
LOTK doubt
HITK after
state 465 155 3209
103 51 399
766 13 1588
HIWF
LOWF
LOTF
HITF
LOWF fflTF
LOTK
HITK shout
LOTK dense
today 170 9 265
9 180
10 59
5209
290
LOWF
LOWF
LOTF
LOTF LOTK acute
HITK arena 5 14 1899
8 6 27
concerned executive building 146 75 105 310 40 19
emotional precedent diagnosis makeshift hierarchy 117 8 8 3 8 11 326 62 14 6
4041
276
406
870
4851
331
1734
40
each in turn and make a decision by pressing either a right- or left-hand button. As noted above, the nature of the decision to be made, and hence the form of processing called for on the target words, was determined by the prompt word itself. In the 'looks' task, participants had to decide whether the two target words were physically identical; in the 'means' task, whether they meant the same thing. Participants were initially given practice on the procedure and once this had been fully mastered a dental composition bite bar was prepared and the calibration technique demonstrated. This required the fixation of five points distributed evenly across the horizontal axis of the screen at the point where the experimental items would be displayed. The calibration points were restricted to 60 character positions, a region less than the full screen width, to increase resolution. After 24 practice items, participants worked through the experimental materials with a calibration after each set of twelve items. The experiment lasted about one hour, with rest periods given at regular intervals. Fixation duration and position was computed off-line. The effective resolution of the eye-tracking equipment was better than 0.5 characters.
Influence ofparafoveal words onfoveal inspection time
157
Results and discussion Gaze duration on the prompt The first analysis was on the sum of all fixations on the prompt (gaze), up to the point of the first excursion of the eyes outside its boundary, invariably to the right. It is sensible to consider all cases, regardless of errors, since analysis was restricted to behaviour on the prompt and the first of the two targets (i.e., prior to the possible commission of an error). Error-rate on the task was, in any case, extremely low (<1%). The relevant data are shown in Table 2. For the purposes of analysis, the space between the prompt and the target word was treated as the first character of the target. Analysis of variance showed two relatively strong effects of properties of the to-be-fixated word, both somewhat counter-intuitive. First, overall foveal gaze duration was reliably shorter when the parafoveal target was long, F,( 1,19) = 12.10, p = 0.003; F2(l,160) = 19.60, p < 0.001. As the table indicates, this main effect was entirely the result of low word frequency targets: the interaction between Target Length and Target Word Frequency was significant, F,(l,19) = 4.32, p = 0.05; F2(l,160) = 7.25, p = 0.008. There was no effect of Length for high frequency targets, but a highly significant effect for low frequency targets, F,(1,19) = 18.08, p = 0.007; F2(l,80) = 23.61, p< 0.001. There was a frequency effect for long targets, but in an unexpected direction: low frequency items were associated with shorter gaze durations on the prompt, 7^(1,19) = 4.48, p = 0.05; F2(l,80) = 9.41, p = 0.003. The trend in the opposite direction for short targets, was not significant, Fl(l,\9) = 2.81; F2(l,80) = 1.11. The data manifestly demonstrate that properties of the parafoveal target word can influence foveal gaze duration: indeed, they suggest that participants achieved, or were close to achieving, lexical access in the case of short words (albeit, the overall 10 ms 'frequency effect' was non-significant). It cannot be argued, however, that the obtained effects simply reflect parafoveal processing in the presence of a zero foveal load, since the more difficult parafoveal targets were associated with shorter, Table 2 Gaze duration (ms) on the prompt word as a function of the Length and Word Frequency of the parafoveal target word High target word frequency
Low target word frequency
Mean
Short target word
283
293
288
Long target word
280
266
273
158
A.Kennedy
not longer, prompt gaze duration. Moreover, the statistically significant effects all derive from the single case in which the target was a long, low frequency, word. In the other three cases, processing time on the prompt was more or less the same. A plausible, if slightly tendentious, account of this pattern of effects is that the low rate of information extraction from difficult targets triggered an early saccade, in effect prematurely aborting fixation on the prompt. There was also a significant interaction between Target Word Frequency and Target Initial Trigram Frequency, F,(l,19) = 4.57, p = 0.044; F2(l,160) = 3.84, p = 0.049. These data are shown in Table 3. The pattern of results parallels that shown in Table 2, with a single mean significantly shorter than the other three. In this case the 'difficult' targets can be identified as low frequency words with common (i.e., 'uninformative') initial trigrams. The effect of Trigram Frequency was highly significant for low frequency words, F,(l, 19) = 18.08, p = 0.007; F2(l, 160) = 23.61, p < 0.001. In this case, when the initial letters of the target were relatively uninformative an early saccade from the prompt was initiated. Equally, there was an effect of Word Frequency for targets with redundant initial trigrams, prompt gaze duration being shorter (not longer) for low frequency items, F,(l,19) = 4.50, p = 0.045; F2(l,160) = 5.00, p < 0.027. Again, the data are consistent with an early saccade triggered by difficult items in the parafovea. However, the crucial distinction here is that 'difficulty' relates to the rate of acquisition of sub-lexical information capable of disambiguating the target from other possible candidates. Neither the main effect of Target 'familiarity', nor any interactions involving that factor, approached significance. The surprising aspect of these data lies in their direction. For example, time spent on the prompt word was shorter when the target was long and presumably more difficult to process; similarly, prompt gaze duration was shorter (at least, for low word frequency targets) when the parafoveal word shared the same initial letters with many other words. Although these 'inverted' effects are intriguing, their interpretation is complicated by an obvious possible artifact. If, for some reason, the prompt word was refixated, this would act to change the launch position for the Table 3 Gaze duration (ms) on the prompt word as a function of the word frequency and frequency of the initial trigram of the parafoveal target word High target word frequency
Low target word frequency
High trigram frequency
282
272
Low trigram frequency
281
287
Influence ofparafoveal words onfoveal inspection time
159
Table 4 Probability of refixating the prompt word as a function of the length and word frequency of the parafoveal target word. The size of refixating saccades (in character positions) is shown in brackets. High target word frequency
Low target word frequency
Short target word
0.156 (0.243)
0.183 (0.287)
Long target word
0.167 (0.259)
0.132 (0.208)
saccade directed towards the target, making it more visible. The measure of gaze only permits firm conclusions to be drawn about parafoveal processing if all the saccades were launched from the same controlled position. The task was relatively simple to perform, and refixation within a five-letter prompt, presented many times, might appear unlikely, but there was, as Table 4 shows, an overall refixation rate (almost invariably involving right-going refixations) of around 16%. The probability data in Table 4 show a pattern of means similar to that found for the measure of gaze duration. That is, the condition associated with the shortest gaze (long, low frequency, targets) had the lowest prompt refixation rate. In fact, analysis of variance of the Target Length x Target Word Frequency interaction failed to achieve significance, F,(1,19) = 2.11, p = 0.16; F2 (1,160) = 4.22, p = 0.09, but sub-analyses are justified (obviously, to be treated with caution) in view of the significant effects found for the measure of gaze. In the case of low frequency targets, prompt refixation rate was significantly lower when the target was long, F,(l, 19) = 5.10,p = 0.034; F2(l, 160) = 8.65,p = 0.005. The Target Word Frequency effect was also significant for long words, 7^(1,19) = 4.34, p = 0.049; F2(l,160) = 4.80, p = 0.03. When the target was of low frequency the prompt was significantly less likely to be refixated (the apparent trend in the opposite direction for short targets was not significant). This pattern of refixations accounts for much of the variation in gaze duration, with the same rather paradoxical outcome: low frequency targets, or long targets, were less, not more, likely to trigger a refixation of the prompt. Obviously, the obtained 'frequency effect' on gaze duration (i.e., shorter prompt inspection times associated with long, infrequent, parafoveal targets) arose primarily as a result of changes in refixation rate. Table 4 also shows the average signed extent of within-prompt saccades, with left-to-right movements (comprising virtually all refixations) coded as positive. The average shift in position was about a quarter of a character, although it is important to bear in mind that this arose from a mixture of zero movement (the 85% of steady fixations on the prompt) and the occasional shift of one or two characters. The point of importance is that, in this
160
A. Kennedy
case, the eyes were much closer to the target word prior to launching the primary saccade. The Length x Target Word Frequency interaction was significant in these data, F,(l,19) = 4.16,p = 0.05; F2(l,160) = 4.32, p = 0.037, and sub-analyses parallel to those for the probability data gave broadly similar results. Average within-prompt movement was smaller when parafoveal targets were long and infrequent. There was a significant effect of Length for low frequency targets, F,(l,19) = 4.12, p - 0.05; F2(l ,160) = 6.81, p - 0.01 and a marginally significant effect of Frequency for long words, F,(l,19) = 3.12,p = 0.09; F2(l,160) = 3.40, p = 0.07. It is now possible to draw some initial conclusions concerning the apparent influence on prompt gaze duration of properties of words not yet fixated. Three aspects of the data should be emphasised: (1) both lexical and sub-lexical properties of a parafoveal target influence prompt gaze duration; (2) these effects stem largely from modulation of prompt refixation rate, targets which were easier to process being associated with higher (not lower) rates of prompt refixation; (3) the data are consistent with a monitoring mechanism which triggers an early saccade if the rate of information acquisition from the parafovea is insufficiently rapid. They are not easily reconciled with the class of model which predicts an attentional shift and/or an inter-word saccade only after the foveal word has been identified (or after the expiry of a deadline). Rather, the data suggest that properties of the parafoveal target act to modulate the rate of intra-word refixation. This poses severe problems for models which can only plausibly deal with within-word refixations by making them contingent on a foveal load high enough to exceed the saccade deadline, something which can safely be ruled out in the present task.4 The data are equally incompatible with the E-Z Reader, in which intra-word saccades result from a similar mechanism (failure to complete the familiarity check on a foveated word in time to cancel the 'labile program'). Since changes in prompt refixation rate contribute so substantially to changes in gaze duration, it may be argued that the data shown in Tables 2 and 3 do not represent a pure examination of parafoveal-on-foveal effects. Refixations radically changed the visibility of the target and, indeed, in some cases the final point of fixation on the prompt would have been within a character of the target word itself. To meet this objection, separate analyses were carried out on the approximately 85% of cases where the prompt was processed in a single fixation (i.e., a fixation location centred on the prompt throughout its inspection completely controlled Pollatsek and Rayner (1990) argue that refixations might be handled in the Morrison model without reference to a deadline. When processing text, integrative processes may contribute to local difficulty, causing a competition between a decision to stay and a decision to execute the next saccade. Interestingly from the point of view of the present paper, they also suggest that the level of sublexical excitation from neighbouring candidates may influence 'local difficulty'.
Influence of parafoveal words onfoveal inspection
time
161
launch position for the primary inter-word saccade). These 'Single Fixation Case' data provide an estimate of parafoveal influence, uncontaminated by variation in launch position. Gaze duration on the prompt (single-fixation cases) Analysis of variance of cases where the prompt was processed in a single fixation showed a highly significant main effect of Target Word Length, again with shorter durations associated with long targets (Short = 266 ms; Long = 256 ms), F,(l,19) = 9.86, p = 0.006; F2( 1,160) = 12.49, p < 0.001. The Target Length x Target Word Frequency interaction, which was significant in the analysis of gaze, failed to achieve significance, F,(l,19) = 2.47, p = 0.12; F2(l,160) = 2.74, p = 0.09. There was, however, a highly significant Decision Type ('match' vs. 'mis-match' items) x Length x Target Initial Trigram Frequency interaction, F,(l,19) = 15.67, p = 0.001; F2( 1,160) = 4.38, p - 0.04. It should be emphasised that no effects of Decision Type even approached significance in the analysis of gaze duration: in particular, analyses of the data in Tables 2 and 3, restricted to 'match' decisions only, produced virtually identical results. Nonetheless, this three-way interaction calls into question conclusions concerning the possible effects of target word frequency on prompt gaze duration. Prompt refixation appeared sensitive to global physical properties of the display as a whole (i.e., the contrast between a visually congruent 'yes' condition and a visually incongruent 'no' condition in the 'looks' task). The possibility that the obtained effects might, in fact, be peculiarly conditioned by the 'looks' task, which focuses on physical identity, will be taken up in a later section. In the present context, it is clear that one final analysis is demanded, concentrating on 'match' cases. That is, for this task, an uncontaminated estimate of parafoveal effects may be obtained from the condition in which a single saccade was made from the centre of the prompt towards a target configuration on the right involving two identical words. Gaze duration on the prompt (single-fixation, 'match' cases) Analysis of variance of prompt fixation duration showed no Length x Target Word Frequency interaction, but a highly significant interaction between Target Length and Target Initial Trigram Frequency, F,(l,19) = 15.68,p = 0.001; F2(l,80) = 6.89, p = 0.01. The form of the interaction, shown in Table 5, again arises from a single condition (associated with long, 'uninformative' or low-constraint parafoveal targets) where the mean fixation duration was significantly shorter than in the other three. The effect of Initial Trigram Frequency was significant for long targets, Fj(l,19) = 10.95, p = 0.004; F2(l,80) = 11.01, p - 0.002, but not for short. For targets with high frequency initial trigrams, long targets induced shorter prompt fixation times, F,(l,19) = 22.85, p = <0.001; F2(l,80) = 6.44, p = 0.01. This outcome may be compared with that of Lima and Inhoff (1985), showing initial fixation duration on a
162
A. Kennedy
Table 5 Fixation duration (ms) on the prompt word as a function of the length and frequency of the initial trigram of the parafoveal target word. The data are for cases where the prompt word was processed with a single fixation (85% of cases) and for 'match' decisions only. High initial trigram frequency
Low initial trigram frequency
Short target word
265
267
Long target word
244
272
low constraint word itself (i.e., in the terms adopted here, a foveal target) to be relatively short, although, of course, the present data relate to properties of an unfixated parafoveal word. For both types of short target word, and for long targets with highly informative initial letters, prompt fixation duration was roughly equal and relatively long. As noted above, this pattern of data suggests a mechanism in which the decision to make an early inter-word saccade is triggered when the rate of gain of word-initial information is particularly slow. Crucially, however, this arises from sub-lexical, rather than lexical properties of the target. When the target is long and its initial letters are shared with many other words, prompt fixation duration is short. There was no main effect of Target Token Familiarity and no significant interactions involving that manipulation. Interim summary The final analyses, on cases with identical targets in parafoveal and peripheral vision, lead to the conclusion that there is no parafoveal-on-foveal effect of Word Frequency in the absence of prompt refixation. In this respect, the outcome replicates the null effect reported by Henderson and Ferreira (1993). Nonetheless, it is not possible to conclude in favour of 'attentional autonomy' because of the strong effects of parafoveal target informativeness. Some form of process-monitoring appears to be taking place with three possible decisions: (1) to continue processing ('STAY'); (2) to refixate the prompt ('SHIFT'); or (3) to execute a saccade to the target ('GO'). The length of the target word and the overall stimulus configuration influence both the STAY and SHIFT decisions. A complex target configuration, involving a mis-match, which is presumably only visible when the target is short, triggers a within-prompt SHIFT. It is impossible to accommodate this outcome either with the expiry of a deadline based on foveal difficulty, as Henderson and Ferreira suggest, or with the completion of a familiarity check, as proposed in the E-Z Reader model. When the prompt received only a single fixation, the lack of
Influence ofparafoveal words on foveal inspection time
163
clear word frequency effects indicates that the target was not actually identified before the GO decision. Rather, the results point to the rate of acquisition of sub-lexical information (initial trigram frequency) from the parafovea as the major influence on processing time, with an early GO decision when the rate of gain of such information gain is low. It can, of course, be objected that the present task is a weak analogue of the normal reading situation the Morrison model was designed to treat and this point will be considered in some detail in the General Discussion. For the present, it may simply be noted that the task does provide controlled conditions under which the relevant hypotheses can be examined. It was necessary to process the prompt word for the task decision to be made and, in fact, prompt fixation durations clearly fall in the normal range. There are aspects to the data which are not at all congenial to models of eye movement control which appeal to attentional autonomy. The difficulties focus on foveal refixation rate and on sub-lexical parafoveal influences exerted from the target. Regarding refixations, it is evident that the addition of a deadline such as that suggested by Henderson and Ferreira (1990) cannot account for an obtained 18% overall prompt refixation rate on a five-letter word. Even if it could, within-word refixation in the present task appears to be associated with successful, rather than unsuccessful, processing of a parafoveal word. Fixation behaviour on the prompt is sensitive to the rate of acquisition of information from the target, suggesting a control system monitoring the rate at which the target 'candidate pool' reduces. Below a criterion level of information acquisition, a GO decision takes the eyes rapidly and directly to the target. Above this criterion level (at least for this laboratory task) the target is processed while the prompt is fixated (STAY). It remains an open question, however, as to what determines an intra-prompt SHIFT? Why should prompt refixation rate be lowest with the most difficult targets (e.g., long, infrequent words) when a processing advantage might be secured from being closer to the target? Similarly, what accounts for the fact that, in the case of short target words, there is a tendency for low frequency items to increase prompt refixation rate? The Morrison account (assuming this may be extended to a study involving single words) relates intra-word shifts to foveal load but, as noted above, any explanation appealing to the expiry of a deadline simply cannot be reconciled with the overall higher rate of refixation associated with easy (e.g., short) targets. An account more consistent with the obtained pattern of results simply suggests that if parafoveal information is available at all, a high priority is attached to securing it. That is, the cost of programming a saccade in the SHIFT decision can be set against the advantage of maintaining a negligible foveal load. Such a strategy would, of course, only be optimal if parafoveal processing gains could be traded against later, foveal, processing and to address this crucial question it is necessary to examine behaviour on the target word itself.
164
A. Kennedy
Fixations on the target The analysis strategy with regard to the target word is complicated by the need to distinguish single and multiple prompt fixations and match and mis-match cases. Measurements of inspection time on the first target word are also potentially subject to contamination from properties of the second word. For these reasons, and to avoid un-necessarily complicating the discussion, the analyses which will be reported relate to the full data set, with reference to other analyses only in the case of significant differences5. Analysis of eye movements on the target were conducted with the limited objective of gaining further understanding of parafoveal-on-foveal effects. Measures were made of both first fixation duration and gaze duration on the target word (the latter almost invariably resulting from no more than two fixations). In the analysis of first fixation duration, there was only one clear effect: durations were very reliably shorter for long targets (267 ms) than for short targets (286 ms), F,(l, 19)= 14.82, /? = 0.001 ;F2(1,160) = 22.29, p < 0.00. The same highly significant relationship was also found for the 'Single Fixation' case and in an analysis restricted to 'match' decisions. The direction of this difference is not, in itself, unexpected and simply provides indirect confirmation of the observation that long words are unlikely to be processed with a single fixation and the fact that the first fixation, when two or more are made, tends to relatively short (Vitu and O'Regan, 1995). It follows the result can only be sensibly interpreted along with the data on gaze duration and target refixation probability which are treated below. There was a tendency for first fixation duration to be longer on low frequency targets (HWF = 272 ms, LWF = 281 ms), but this narrowly failed to achieve significance by items, F,(l,19) = 4.86, p = 0.04; F2( 1,160) = 3.03, p = 0.08. Interestingly, the effect did not even approach significance in analyses of data derived from cases where the prompt was processed in a single fixation. Thus, the outcome is somewhat equivocal on the issue as to whether word frequency effects are found on first fixation duration with preview. Much seems to depend on whether the prompt was refixated. One explanation of this contrast with the results of Inhoff and Rayner (1986) might simply be that the experimental task is only likely to reveal modest effects of Word Frequency, given that the decision called for relates primarily to a physical match. An alternative account, consistent with the data on refixations, is that a processing trade-off is taking place, partial parafoveal identification of a target diluting any obtained foveal effect. Such a trade-off would be of great relevance in the on-going debate as to whether word frequency effects invariably occur on first fixation duration in normal reading and the issue is considered further below. There were no other significant main effects or inter5 A complete data set for all measures on both the prompt and target words is provided in Appendix I.
Influence of parafoveal words onfoveal inspection time
165
Table 6 Gaze duration (ms) on the target word as a function of its word frequency and the frequency of its initial trigram. High target word frequency
Low target word frequency
High trigram frequency
403
390
Low trigram frequency
402
426
actions in the analysis of target first fixation duration: in particular, there were no effects of either the 'informativeness' or the 'familiarity' of targets. Target gaze duration was very reliably longer (in all analyses) for long targets (442 ms) than for short targets (367 ms), F,(l,19) = 29.60, p < 0.001; F2(l,160) = 129.81, p < 0.001. There was also a robust effect of Word Frequency, (LWF = 414 ms, HWF = 396 ms), F,(l,19) = 6.63, p = 0.018; F2(l,160) = 5.85, p = 0.016 although this difference failed to achieve significance for cases when only a single fixation had been made on the prompt (possibly because of a lower probability of target refixation if the prompt itself had been refixated). The most consistent effect (apart from the ubiquitous Length effect) in all analyses was a highly significant interaction between Target Word Frequency and Target Initial Trigram Frequency, F,(l,19) = 8.34, p = 0.009; F2(l,160) = 6.96, p = 0.009, for the full data set (see Table 6). For low word frequency targets, items with rare initial trigrams were subjected to relatively prolonged inspection, F,(l,19) = 10.40, p - 0.004; F2(l,80) = 10.24, p - 0.002. A parallel sensitivity to trigram frequency was not evident in the case of high word frequency targets. The outcome represents an important qualification to the suggestion by Lima and Inhoff (1985) of an overall relationship between gaze duration and informativeness: this appears to hold true only if the inspected word itself is rare. For targets with low frequency initial trigrams the effect of Target Word Frequency was significant, F,(l,19) = 7.42, p = 0.012; F2(l,80) = 4.11, p = 0.043. The rate of within-word refixation on the target can be used to answer the question as to whether gaze differences stemmed from longer or from more fixations, or from a combination of both. The overall probability of refixating a target was relatively high (about 0.4). The vast majority were right-going, and analysis of these showed a strong effect of Target Length in all analyses: (Short = 0.27; Long = 0.53), F,(l,19) = 39.42, p < 0.001; F2(l,160) = 317.92, p < 0.001, for the full data set. The crucial relationship between Target Word Frequency and Target Initial Trigram Frequency is shown in Table 7. For high word frequency targets, refixation rate was greater in targets with 'uninformative' initial trigrams, F,(l,19) = 4.18, p = 0.05;
166
A. Kennedy
Table 7 Probability of right-going refixations on the target word as a function of it target word frequency and target initial trigram frequency High target word frequency
Low target word frequency
High trigram frequency
0.425
0.372
Low trigram frequency
0.392
0.398
F2(l,80) = 4.02, p - 0.046. A modest apparent trend in the reverse direction in the case of low word frequency targets was not significant. As these means suggest, targets with uninformative initial letters were more likely to be refixated if they were high frequency words, F,(l,19) = 6.92, p = 0.016; F2(l,80) = 6.75, p = 0.011. It should be noted, however, that the overall Target Word Frequency x Target Initial Trigram Frequency interaction narrowly failed to achieve significance in the byitems analysis, F,(l,19) = 7.37, p = 0.013; F2(l,160) = 3.38, p = 0.06, and this interaction was non-significant in analyses of the 'single fixation' and 'match' cases. But, in any case, the data provide no support for the notion that the longer gaze durations associated with any of the manipulated properties of target words (e.g., low frequency; rare initial trigrams; or long items) arose from more fixations alone. Rather, there was a higher incidence of right-going refixations when the initially-fixated region of the target was uninformative. This suggests a form of spatial control, with a higher probability of the eyes shifting towards the other, possibly more informative, end of the word. In contrast to the analysis of right-going refixations, which showed no overall frequency effect, data on the probability of left-going refixations showed a main effect of Target Word Frequency (HWF = 0.029, LWF = 0.046), F,( 1,19) = 5.59, p = 0.027; F2(l ,80) = 7.25, p - 0.008. There was also a significant main effect of Initial Trigram Frequency (HTF = 0.029; LTF = 0.046), F,( 1,19) = 4.16, p = 0.05; F2( 1,80) = 5.97, p = 0.015, (the means are coincidentally identical to those for Word Frequency). Some caution should be exercised in interpreting these findings, because the absolute number of left-going refixations was very low and a proportion may, in fact, have been mislocated fixations on the space between the prompt and target, reflecting attempts to refixate the prompt. The effects of Word Frequency simply confirm the observation that both right- and left-going refixation rate was somewhat higher for low frequency items. The effect of Initial Trigram Frequency is more interesting, since it offers further confirmation of a relationship between the direction of refixation and the spatial distribution of information in the word being inspected. As we have seen, there was an increased probability of right-going
Influence of parafoveal words onfoveal inspection time
167
refixations for targets with uninformative initial letters. The left-going refixation data show a parallel trend, with an increased probability for targets with informative initial letters. It is important to note that this directional selectivity cannot be accommodated in the Morrison Model or the subsequent modifications of it. Saccade extent Saccade size is a poor indicator of the intended landing position on the target in cases where prompt refixation acts to change launch position. Accordingly, analysis of the primary saccade-to-target was restricted to the 'single-fixation' case. The only significant influence on saccade extent was the length of the target word. Saccades directed towards 9-letter targets were 5.75 characters in extent and those towards 5-letter targets 5.08 characters, F,(l,19) = 33.94, p < 0.001; F2(l,160) = 169.64,/? < 0.001. Given the stimulus configuration, it follows that short words were initially fixated around the second letter and long words around the third. No other effects approached significance. The 'means' task I will now attempt to address the objection that the experimental procedure could have induced task-specific strategies. Since the prompt was invariably either the word 'looks' or the word 'means', it is at least possible that 'deep processing' would be discouraged. For example the two tasks can be discriminated on the basis of a single letter in the prompt. The Stroop Effect raises doubts as to whether a samedifferent word-judgement task could, in principle, be completed on the basis of 'minimal inspection' of this kind (see Murray, 1982, for an extended discussion of this issue), but in any case, the objection raises as many problems as it solves. This is because it implies that the task is formally equivalent to a situation where single (target) words are processed from an 'extremely sub-optimal' viewing position. That is, considering the prompt as simply a redundant prefix the eyes will typically be located six or seven letters to the left of the target's Optimal Viewing Position. But the whole pattern of results runs exactly counter to the expected outcome in such a situation, with several effects, and in particular, refixation probability taking an 'inverted' form. Notwithstanding these theoretical difficulties, it is possible to examine the objection by way of an analysis of behaviour on the 'means' task6. The materials employed in this task were filler words, not balanced with respect to any of the key experimental manipulations of word frequency, initial trigram 'informal!veness'
6 This analysis was suggested by Keith Rayner.
168
A. Kennedy
and 'familiarity'. However, a. post-hoc division of the materials is obviously possible and can be used to test the generality of the pattern of results obtained for the experimental materials (i.e., based on the 'looks' task). At the very least, this will rule out explanation in terms of the operation of a very task-specific strategy restricted to the case where a physical match is called for. Using median splits, the items used in the 'means' task were allocated to 'High' and 'Low' values with respect to measures of word frequency and the 'informativeness' and 'familiarity' of their initial trigram. To improve power in the analyses (since the procedure could not derive materials differing widely on the critical measures) values were then collapsed across measures of familiarity (i.e., token initial trigram frequency: it will be recalled there were no significant effects attributable to this factor in the 'looks' task). Analyses were carried out on prompt gaze duration. Since there were some important differences between 'all' and 'single fixation' cases, their analyses will be dealt with separately. In the 'all cases' analysis, the most important determinant of gaze duration was the length of the to-be-fixated target. The same 'inverted' effect evident in the experimental data (i.e, the 'looks' task) was present, with foveal gaze significantly longer in the case of short parafoveal targets (Short Target = 323 ms, Long Target = 302 ms)7, F,(l,19) = 11.81, p = 0.003; F2(l,184) = 24.09, p < 0.001. The effect of Target Word Frequency was not significant by items, but the direction of the obtained difference was 'inverted', with high frequency targets associated with longer prompt gaze duration (HWF = 315 ms; LWF = 308 ms, F,( 1,19) = 4.35, p = 0.04; F2(l,184) = 2.34, p = 0.12). It should be borne in mind that the filler materials included short, four-letter, words, increasing the probability of parafoveal identification. In the 'single fixation' analysis (i.e., when the prompt was processed in a single extended fixation), Target Length was also highly significant (Short Target = 298 ms,LongTarget = 281ms),F,(l,19) = 9.83,/? = 0.005;F2(l,184)=13.49,p<0.001, and the 'inverted' effect of Target Word Frequency was much stronger (HWF = 295 ms;LWF = 284; F,(l,19) =16.39,p = 0.001; F2(l,184) = 8.93,p = 0.004). Although the Length x Target Word Frequency interaction was not significant, F,(l,19) = 3.10, p = 0.09; F2(l,184) = 2.73, p < 0.1, it is interesting to examine the effects for the two lengths separately. There was no effect of Target Word Frequency for long targets (HWF = 283 ms; LWF = 279; F < 1), and a strong effect for short targets, (HWF = 307 ms; LWF = 289; F,(l,19) = 10.68, p = 0.004; F2(l,92) = 7.72, p = 0.007). To repeat, this outcome was reliable for those cases where the prompt was not itself refixated. It is entirely consistent with the proposed process-monitoring account, parafoveal processing success being associated with relatively long prompt inspection time and an early GO decision triggered where the rate of information
7 Average times were considerably slower in the 'means' task.
Influence of parafoveal words onfoveal inspection time
\ 69
acquisition was slow. Further support for this position can be found in parallel breakdown analyses of the effect of Target Initial Trigram Frequency for short and long targets. There was no effect for short targets (HTF = 281 ms; LTF = 282), but a marked tendency for long targets with 'uninformative' initial trigrams to be associated with shorter prompt inspection time, (HTF = 278 ms; LTF = 284; F,(l,19) = 6.00, p = 0.03; F2(l,92) = 3.21, p = 0.07). Normal reading The analyses of the 'means' task are encouraging, but the question still remains as to whether this pattern of results would obtain in normal reading. There is ample evidence of global parafoveal influences in reading: for example, the studies of McConkie and Rayner (1975), Rayner and Bertera (1978) and Underwood and McConkie (1985) all show a deterioration in reading performance when parafoveal information is perturbed. But it is important to determine whether the curious pattern of trade-offs found in the present laboratory task manifests itself in normal reading, if only because this would set significant constraints on models of eye movement control. As a first attempt to answer this question, an approach was made to Ralph Radach in Aachen asking whether it would be possible to carry out appropriate constrained searches of his large corpus of eye movement data. It should, in principle, be possible to discover the influence on relatively high frequency short words (the equivalent of the experimental 'prompt') of the length, word frequency and trigram constraint of the succeeding word8. The corpus of eye movement recordings was derived from four German-speaking students as they read the first two parts of Gulliver's Travels. It comprises parameters of around 50,000 fixations and saccades for each participant, together with letter and word frequency information generated from the text itself and also from the German CELEX corpus (Celex, 1995). A more detailed description can be found in Chapter 4 and Radach (1996). Cases were identified in which a word (ri) was fixated once only and its fixation duration computed as function of the length, word frequency and trigram informativeness of word n + 1. For data to be entered into the analyses, the following conditions were true: 1. Word n was 5-8 letters in length. 2. In cases where the length of n was 3 letters, the initial landing position was on the central letter; when n was 4 letters, initial landing position was one of the central letters; when greater than 4 letters, it was on letter position 3 or 4. The naivete of this proposition became apparent in the many weeks of data analysis, largely committed to making the question tractable. The analyses which follow were carried out in collaboration with Ralph Radach and I am deeply indebted to him for his work on the question and for his many insightful comments on the data.
170
A. Kennedy
3. 4. 6. 7. 8.
Word n frequency was greater than 50 per million. Word n did not begin a line and was not followed by punctuation. Launch position into Word n was not more than 15 characters to the left. The saccade out of Word n was progressive. No fixation lay to the right of Word n prior to its initial fixation (i.e., the data were not associated with re-reads). Since word frequency and word length are highly correlated, the frequency data were computed as quartiles for each length separately. To deal first with the influence of the length of Word n + 1 on Word n fixation duration. It will be recalled that the 'looks' task showed a strong 'inverted' effect, with shorter durations prior to the fixation of a long word in parafoveal vision (273 vs. 288 ms), and this was also evident in the control, 'means', task. Table 8 shows the results of the corpus analysis, averaged over the four participants.9 Whether Word n is relatively short or of medium length the data show more or less the same effects: a pronounced reduction in fixation duration as the length of Word n + 1 increases. The size of the effect overall is remarkably similar to the experimental data (274 vs. 280 ms for words of directly comparable length). Surprisingly, there appear to be no previous published data on this parafoveal effect. The well-known direct effect of word length on foveal fixation duration (in cases where only one fixation is made) is evident in the Word n data, with fixation durations 24 ms longer on longer words. However, there is little doubt that the length of n + 1 modulates this in the opposite direction. It is true that in the natural reading task there is a possible uncontrolled contingency which should be considered in interpreting this effect. Given the correlation between length and frequency, longer n + 1 words will be, on average, of lower frequency than the words which precede them. But for this to account for the effect it would be necessary to postulate parafoveal word identification as the norm rather than the exception. The process monitoring account proposed for the laboratory task offers a more promising explanation: when the next event is 'difficult', parafoveal processing is slower and the incentive to make an early exit from Word n greater.10 Further investigation will be needed to unravel the mechanisms at work, but the evidence clearly suggests a high degree of parallel processing. Furthermore, the question as to how well the experimental data generalise to naturalistic reading has been answered positively, at least for the case of word length. 9 Separate analyses showed the same trend in the data of all four participants. 10 The 'word group' hypothesis (Radach, 1996) suggests another possibility. If some saccades are directed to word groups, a fixation on Word n could be construed as located well to the left of the 'unified Optimal Viewing Position' (see also, O'Regan, 1990). In which case a lowlevel oculomotor routine to re-locate to a more appropriate position might be released (Footnote 1, Chapter 4).
Influence ofparafoveal words onfoveal inspection
time
171
Table 8 Fixation duration (single fixation cases) on Word n as a function of the length of Word n + 1. Data are given for two lengths of Word n and three lengths of Word n + 1. The number of cases in the corpus is shown in brackets. n + 1 (4 letters)
n + 1 (5-6 letters)
n + 1 (7-10 letters)
n (3-4 letters)
268 (1022)
262 (893)
244 (447)
n (5-8 letters)
287 (788)
280 (897)
274(519)
Table 9 Fixation duration on Word n as a function of the Word Frequency and Trigram Frequency of Word n + 1. Note, words with highly frequent initial trigrams are defined as 'uninformative' and words with rare initial trigrams as 'informative'. The data forn + 1 are average length-contingent quartiles for words 5-10 letters in length. Word n was 5-8 letters in length with a frequency greater than 50 per million. Numbers of cases are shown in brackets. Very high
High
Low
Very low
Word frequency
268 (755)
277(517)
279 (665)
286 (749)
Trigram frequency
268 (686)
275 (535)
282 (707)
284 (758)
Turning now to the question of the influence of lexical and sub-lexical properties of words in the parafovea, separate quartiles were computed for the distributions of fixation durations on Word n for each length of Word n + 1. These were then averaged to provide a measure as a function of the word frequency of n + 1 (independent of length). The data are shown in the top line of Table 9. There is a pronounced frequency effect, with fixation duration on Word n some 18 ms shorter when n + 1 is a 'very frequent' (805 per million) rather than 'very infrequent' (20 per million) word. The outcome offers an intriguing gloss on the question as to whether frequency effects can be found in the durations of the first fixation on a word. Clearly, when reading continuous text, such effects can be found even before a word is fixated at all! However, a note of caution must be sounded. Although the experimental data showed no effects of the word frequency of Word n + 1, such trends as were evident ran in the opposite direction and to that extent the outcome here cannot be taken as confirming their generality. It should be borne in mind that although separate analyses at each word length were possible, the data set was not large enough to attempt a dissociated analysis of the word frequency and initial trigram informativeness. In view of this, a more appropriate test of the generality of the experimental data relates to the obtained highly significant effects of sub-lexical
172
A. Kennedy
properties of the target word. The relevant corpus data are shown in the second line of Table 9. Fixation duration on Word n is clearly influenced by the redundancy of the initial letters of Word n + 1. Parafoveal words with common initial trigrams (i.e., 'uninformative' words) lead to an average 10 ms decrease in processing time on Word n, compared to case when n + 1 is highly redundant. The contrast (284 vs. 268 ms) is similar in size and direction to the comparable overall effect in the experimental data (270 vs. 255 ms for the 'looks' task). General discussion It may be concluded that information acquired from a parafoveal target influences eye movement behaviour on a foveal prompt word. This is incompatible with a processing model in which attention is allocated sequentially. In the laboratory task, where word frequency and trigram informativeness could be dissociated, the analysis of prompt gaze duration shows that parafoveal target word frequency may influence foveal reinspection rate. The effect is inverted, suggesting that in three of the four cases (crossing length and word frequency) processing of a parafoveal target progresses by a combination of extended fixations and shifts in position, to take the eyes closer to the as-yet unfixated target word. The strategy is not deployed in the case of the most difficult targets, where an early decision is made to abort prompt inspection. More directly, the data support the notion that sub-lexical properties of an unfixated parafoveal target influence prompt fixation duration when the prompt is processed with a single fixation. As Inhoff and Rayner (1986) suggest, readers make use of target-initial letters. It is not, however, the token familiarity of the stimulus, but its power to constrain identification of the target which exerts an influence. Furthermore, the influence is evident before the stimulus is fixated: it is not merely cashed later as a preview benefit. In a recent paper, Kennison and Clifton (1995) examined the reverse of the situation discussed here, namely foveal-on-parafoveal interactions. They looked at eye movements as adjective-noun pairs were read, manipulating the four combinations of high and low word frequency of the two words. It was not the primary purpose of their experiment (and the means are not reported), but they report no effect of the second word's frequency on either fixation duration or gaze duration on the first word. This does not present a serious conflict with the present results for two reasons. First, as already argued, the manipulation of word frequency alone may be inadequate to produce parafoveal-on-foveal effects. Second, launch position on the first word in normal reading is necessarily uncontrolled. As Kennison and Clifton point out, refixations almost inevitably result in a confounding between the frequency of the first word and the visibility of the second. Since refixations are less
Influence of parafoveal words onfoveal inspection time
\ 73
probable in high frequency words, their effect, somewhat paradoxically in normal reading, may be to make the next word harder to process. The importance of this observation for the present paper lies in their suggestion that information gained parafoveally is not lost, but may be traded off in later, foveal, inspection. As Kennison and Clifton remark, "the extent to which boundary word ("prompt" in the terminology of the present paper) and target word frequency effects can be observed on parafoveal preview benefit will largely be determined by whether the target word is viewed in parafoveal vision from the same distance across experimental conditions." (p. 78). The data reported here point to interacting processes combining information in parallel across the perceptual span: they are difficult to reconcile with the operation of a sequential, word-by-word, process of attention allocation. The results of Henderson and Ferreira (1990) demonstrate that if a foveal word is difficult there is a processing penalty when a succeeding word is encountered. The present results put the other side of the picture: a difficult parafoveal word may act in a parallel way, with a penalty reflected in processing at the current fixation point. It would be excessive to claim that the laboratory task employed was a direct analogue of normal reading, but, the corpus analysis shows more or less the same pattern of effects. Mutual processing interactions across the perceptual span are present in both cases. One crucial question remains: the degree to which prompt and target processing might be traded off. Does an extended time spent processing the prompt reflect itself directly in shorter processing on the target? If the target word has been partly or wholly identified during prompt inspection, is this reflected in a lower processing load when it is directly inspected? Models of eye movement control have been relatively silent on this question. It cannot be answered by the simple correlation between the fixation duration on the two words because of the large spurious positive correlation which would result from massive individual variation in processing time. It is possible, however, to adopt an analysis strategy with regard to the 'looks' task data, normalising individual prompt and target fixation durations using the average total duration on prompt and target combined as a baseline. Individual fixation durations were scaled in this way, restricting the analysis to the crucial 'single fixation match' case. A trade-off clearly predicts a negative correlation, with the null hypothesis of no trade-off predicting zero correlation. The obtained outcome is a strong negative correlation, r = -0.82. The significance of this was computed directly using the method of Random Data Permutation (Edgington, 1987) and produced a value of/? < 0.001. This outcome strongly supports the idea of the parallel processing of two successive words, with processing gains or deficits inherited when a parafoveal word becomes a foveal target.
1 With 10,000 iterations. I am grateful to John Todman for suggesting this procedure.
174
A. Kennedy
I will conclude by trying to relate these inter-word trade-offs to parallel, intraword, effects. It is well-known that if a word is fixated exactly twice, curves describing the relationship between the initial landing position and fixation duration for each of the two fixations form mirror images. For the first of two fixations, the function is an inverted-U (with longer durations when the initial landing is near the word-centre); for the second fixation it is U-shaped. The trade-off between these two is obvious: if the initial fixation duration is at a sub-optimal position it tends to be short and followed by a fixation, of longer duration, at a better location. However, the interpretation of these data is somewhat contentious. O'Regan (1990) in an early, but comprehensive, statement of his Strategy-Tactics model of eye movement control, suggested that initial landing at a sub-optimal location triggers an immediate oculomotor 'SHIFT' response. Thus, although information may be acquired during this initial fixation, local processing (e.g., driven by lexical or sub-lexical effects) cannot determine its duration. Such influences will only come into play during subsequent fixations. Since the U-shaped and inverted U-shaped curves are found in the 'two fixation' case when the stimulus under inspection consists of no more than simple letter strings, it seems likely that such an oculomotor tactic does indeed exist and under the right conditions can operate powerfully. Pynte, Kennedy and Murray (1991), for example, showed that even when all the information needed to identify a long word lay in its final letters, a forced fixation at that point almost invariably triggered a (pointless) left-going saccade. However, the question as to whether oculomotor processes alone determine the duration of the first of two fixations is an empirical one and the bulk of recent evidence suggests otherwise. O'Regan et al. (1994) and Rayner (1995), for example, provide data showing that word frequency clearly influences the duration of the first of two fixations on a word. Early lexical effects of this kind, taken together with the 'mirror image' functions discussed above, appear to open up the possibility of within-word processing trade-offs in normal reading. But there is a serious difficulty with the proposition that local processing strongly influences fixation duration on-line. This arises from consideration of the very large number of cases in normal reading where words are processed in a single fixation. It is reasonable to assume that words are identified in such circumstances and consequently the relationship between duration and initial landing position should be U-shaped. But, in fact, the data show exactly the reverse pattern, with a marked inverted-U relationship (O'Regan et al., 1994). There is something strongly counter-intuitive about this outcome. Less processing time is demanded at sub-optimal viewing positions. Refixation probability, for example, is higher when initial landing is near the beginning or near the end of a word, but when no refixation occurs, fixation duration is shorter, not longer, at these locations. O'Regan et al. suggest that 'for some reason' visuomotor constraints make it harder to programme saccades from the middle of a word, but this is unconvincing in view
Influence of parafoveal words onfoveal inspection time
175
of the low refixation rate at this point. The data presented here point to a more interesting explanation in terms of two quite distinct processing trade-offs between successive words. Short fixation duration on a word's initial letters reflects, in part, successful prior parafoveal processing. In contrast, short fixation duration on a word's final letters reflects unsuccessful parafoveal processing of the succeeding word. If this is correct, we have moved a step closer to an explanation of the contrast between 'isolated word' laboratory studies and 'normal reading' discussed in the introduction. A model of normal reading will need to take into account a high degree of parallel processing across the perceptual span. Acknowledgements I am indebted to John Henderson, Simon Liversedge, Wayne Murray, Joel Pynte, Ralph Radach, Keith Rayner, Fran9oise Vitu, Alan Wilkes and two anonymous reviewers for many helpful comments on earlier drafts of this Chapter (although several profoundly disagree with my interpretation of the data). Some of the results reported here were presented at AMLaP-95, Edinburgh, 1995. The work was supported in part by Grant No. BMHI-CT94-1441 from the European Union under the BIOMED Programme. References Carpenter, P.A. and Just, M.A. (1983). What your eyes do while your mind is reading. In: K. Rayner (Ed.), Eye Movements in Reading: Perceptual and Language Processes. New York: Academic Press, pp. 275-305. CELEX German database. Release D25. Computer software. Nijmegen Centre for Lexical Information, 1995. Edgington, E.S. (1987). Randomisation Tests. New York: Marcel Dekker. Henderson, J.M. and Ferreira, F. (1990). Effects of foveal processing difficulty on the perceptual span in reading: Implications for attention and eye movement control. Journal of Experimental Psychology: Learning, Memory and Cognition, 16, 417-429. Henderson, J.M. and Ferreira, F. (1993). Eye movement control during reading: Fixation measures foveal but not parafoveal processing difficulty. Canadian Journal of Experimental Psychology, 47, 201-221. Inhoff, A.W. and Rayner, K. (1986). Parafoveal word processing during eye fixations in reading: Effects of word frequency. Perception and Psychophysics, 40, 431-439. Kennison, S.M. and Clifton, C. (1995). Determinants of parafoveal preview benefit in high and low working memory capacity readers: Implications for eye movement control. Journal of Experimental Psychology: Learning, Memory and Cognition, 21, 68-81. Ku§era, H. and Francis, W.N. (1967). Computational Analysis of Present-Day American English. Providence, RI: Brown University Press.
176
A. Kennedy
Lima, S.D. and Inhoff, A.W. (1985). Lexical access during eye fixations in reading: Effects of Word-initial letter sequence. Journal of Experimental Psychology: Human Perception and Performance, 11, 272-285. McConkie, G.W. and Rayner, K. (1975). The span of the effective stimulus dirong a fixation in reading. Perception and Psychophysics, 17, 578-586. McConkie, G.W., Kerr, P.W., Reddix, M.D. and Zola, D. (1988). Eye movement control during reading: I. The location of initial eye fixations on words. Vision Research, 27, 227-240. McConkie, G.W., Kerr, P.W., Reddix, M.D., Zola, D. and Jacobs, A.M. (1989). Eye movement control during reading: II. Frequency of refixating a word. Perception and Psychophysics, 46, 245-253. McClelland, J.L. and Mozer, M.C. (1986). Perceptual interactions in two-word displays: familiarity and similarity effects. Journal of Experimental Psychology: Human Perception and Performance, 12, 18-35. Morrison, R.E. (1984). Manipulation of stimulus onset delay in reading: Evidence for parallel programming of saccades. Journal of Experimental Psychology: Human Perception and Performance, 10, 667-682. Murray, W.S. (1982). Sentence matching: The influence of meaning and structure. Unpublished Ph.D. Thesis, Monash University, Australia. O'Regan, J.K. (1990). Eye movements and reading. In: E. Kowler (Ed.). Eye Movements and Their Role in Visual and Cognitive Processses. Oxford: Elsevier. O'Regan, J.K., Vitu, F., Radach, R. and Kerr. R.W. (1994). Effects of local processing and oculomotor factors in eye movement guidance in reading. In: I. Ygge and G. Lennerstrand (Eds.). Eye Movements in Reading. Oxford: Elsevier Pollatsek, A. and Rayner, K. (1990). Eye movements and lexical access in reading. D.A. Balota, G.B. Flores d'Arcais and K. Rayner (Eds.), Comprehension Processes in Reading. Hillsdale: Erlbaum, pp. 143-163. Pynte, J., Kennedy, A. and Murray, W.S. (1991). Within-word inspection strategies in continuous reading: Time course of perceptual, lexical and contextual processes. Journal of Experimental Psychology: Human Perception and Performance, 17, 458-470. Radach, R. (1996). Blickbewegungen beim Lesen: Psychologische Aspekte der Determination von Fixationspositionen (Eye Movements in Reading). Waxmann: Miinster/New York. Rayner, K. (1978). Eye movements in reading and information processing. Psychological Bulletin, 85,618-660. Rayner, K. and Bertera, J.H. (1979). Reading without a fovea. Science, 206, 468-^69. Rayner, K. (1995). Eye movements and cognitive processes in reading, visual search, and scene perception. In: J.M. Findlay, R. Walker and R.W. Kentridge (Eds.), Eye Movement Research: Mechanisms, Processes and Applications. Amsterdam: North-Holland, pp. 3-21. Rayner, K. and Fischer, M.H. (1996). Mindless reading revisited: Eye movements during reading and scanning are different. Perception and Psychophysics, 58, 734-747. Rayner, K., McConkie, G.W. and Zola, D. (1980). Integrating information across eye movements. Cognitive Psychology, 12, 206-226. Rayner, K. and Sereno, S. (1994). Regressive eye movements and sentence parsing: on the
Influence ofparafoveal words onfoveal inspection time
177
use of regression-contingent analysis. Memory and Cognition, 22, 281-285. Schroyens, W., Vitu, F., Brysbaert, M. and d'Ydewalle, G. (1996). Visual attention and eye-movement control during reading: The case of parafoveal processing. Technical Report No. 195, Laboratory of Experimental Psychology, University of Leuven. Trueswell, J.C., Tanenhaus, M.K. and Kello, C. (1993). Verb-specific constraints in sentence processing: Separating effects of lexical preference from garden-paths. Journal of Experimental Psychology: Learning, Memory and Cognition, 19, 528-553. Underwood, N.R. and McConkie, G.W. (1985). Perceptual span for letter discrimination during reading. Reading Research Quarterly, 20, 153-162. Vitu, F. (1991). The influence ofparafoveal pre-processing and linguistic context on the optimal landing position effect. Perception and Psychophysics, 50, 58-75. Vitu, F. and O'Regan, J.K. (1995). A challenge to current theories of eye movements in reading. In: J.M. Findlay, R. Walker and R.W. Kentridge (Eds.), Eye Movement Research: Mechanisms, Processes and Applications. Amsterdam: North Holland, pp. 381-392. Vitu, F., O'Regan, J.K., Inhoff, A.W. and Topolski, R (1995). Mindless reading: Eye movement characteristics are similar in scanning letter strings and reading texts. Perception and Psychophysics, 57, 352-364.
Appendix (See overleaf). Gaze duration on the prompt (ms), probability of refixating the prompt, saccade extent into the target (character positions), first fixation duration on the target word (ms), probability of refixating the target and target gaze duration. The data are shown for 'match' decisions in the 'looks' task as a function of properties of the target word with cases shown separately where the prompt received a single fixation (S for 'single') and where more than one fixation was made (A for 'all').
178
A. Kennedy
Short HWF HTF
LWF LTF
HTF
LTF
HTKF LTKF HTKF LTKF HTKF LTKF HTKF
LTKF
Prompt Dur (A)
280
296
277
276
293
292
298
303
Prompt Dur (S)
254
279
260
262
262
266
278
268
Prompt Refix
0.172
0.173
0.183
0.093
0.190
0.190
0.244
0.163
Sac Extent (A)
5.14
5.38
5.05
5.04
5.07
5.01
4.74
4.80
Sac Extent (S)
5.17
5.48
5.22
5.07
5.11
5.14
5.04
4.83
First Fix (A)
232
233
239
243
246
241
252
255
First Fix (S)
231
230
239
244
246
241
251
248
Right Refix (A)
0.277
0.220
0.253
0.243
0.265
0.308
0.318
0.328
Right Refix (S)
0.343
0.253
0.253
0.260
0.332
0.370
0.323
0.350
Left Refix (A)
0.000
0.031
0.029
0.037
0.070
0.035
0.073
0.018
Left Refix (S)
0.000
0.029
0.013
0.008
0.065
0.020
0.038
0.013
Gaze (A)
358
345
341
356
375
344
372
395
Gaze (S)
374
359
336
353
383
356
374
392
Influence of parafoveal words onfoveal inspection time
179
Long LWF
HWF HTF
LTF
HTF
LTF
HTKF
LTKF
HTKF
LTKF
HTKF
LTKF
HTKF
LTKF
Prompt Dur (A)
270
282
295
111
253
254
283
279
Prompt Dur (S)
241
262
275
271
244
229
271
271
Prompt Refix
0.161
0.152
0.206
0.200
0.120
0.184
0.121
0.113
Sac Extent (A)
5.63
5.77
5.87
5.83
5.75
5.76
5.60
5.66
Sac Extent (S)
5.72
5.85
5.98
5.79
5.75
5.75
5.79
5.71
First Fix (A)
215
223
237
225
230
224
232
228
First Fix (S)
213
219
241
227
227
225
233
224
Right Refix (A)
0.573
0.533
0.472
0.457
0.502
0.493
0.504
0.472
Right Refix (S)
0.597
0.578
0.478
0.509
0.527
0.556
0.471
0.497
Left Refix (A)
0.010
0.008
0.053
0.060
0.025
0.058
0.062
0.050
Left Refix (S)
0.010
0.000
0.035
0.052
0.030
0.065
0.099
0.053
Gaze (A)
445
437
423
400
439
427
456
463
Gaze (S)
465
441
402
407
437
442
466
458
This page intentionally left blank
181
CHAPTER 8
Parafoveal Pragmatics Wayne S. Murray University of Dundee
Abstract The influence of pragmatic plausibility on eye movement parameters was examined in a study in which participants were required to make same/different judgements concerning vertically aligned sentence pairs. The results reported here extend those in Murray and Rowan (1998), addressing only the reading of the initial member of each sentence pair and focusing on the questions of the immediacy of the effects of pragmatic plausibility on eye movement control and the evidence for effects of plausibility derived from 'parafoveal' words before they are directly fixated. It is argued that the effects of plausibility on first fixation duration pose a major problem for 'strong' oculomotor models of eye movement control, which deny or seek to minimise the effects of linguistic processing on eye movement control. Further, the 'parafoveal' effects provide a challenge for 'processing models', such as that proposed by Morrison (1984) and its later revisions. While these results show strong effects of pragmatic plausibility on the when decision of eye movement control, there are no clear effects on where the eyes land. There is, however, some evidence for a lack of independence between the when and where decisions.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
182
W.S. Murray
Introduction While we no longer see arguments suggesting that there is complete decoupling between eye movement parameters and on-going cognitive processing, this certainly does not suggest that there is unanimous agreement about the nature of the relationship. As the chapters in this volume attest, there is considerable debate surrounding the question of what various eye movement measures might reflect. That last sentence is phrased very much from the perspective of a cognitive psychologist who wishes to use eye movement measures to provide a 'window' onto on-going processing, and that, for the most part, is the perspective I will adopt here. We can look at the question, however, from another viewpoint and ask the question "what factors influence the process which determines when a saccade will be launched and where it will be targeted?" Clearly, the two questions both relate to the same underlying mechanism, but it's tempting to speculate that such differences in perspective might underlie at least some of the controversies we see in the current literature relating, at least, to eye movements in reading. From the perspective of an 'eye movement researcher', it may seem pointless or misguided to search for 'high level' influences on fixation location or duration when 'low level' factors are capable of accounting for, say, 90% or more of the variance. From the perspective of, for example, a psycholinguist, however, it may be that this 90% is not particularly interesting. It represents the 'noise' that must be dealt with while searching for high level effects. Neither perspective is necessarily right or wrong, but it seems clear that they are likely to motivate quite different approaches to the question of what needs to be explained. I will not dwell here on the details of any of the eye movement models, since these are more than adequately covered in Chapters 4, 7, 9 and 11. For present purposes, it is sufficient to point out that there appears to be a sort of continuum in what has been suggested. At the extreme 'low level' end there are the 'oculomotor' models of, for example, O'Regan and his colleagues (e.g. O'Regan, 1992). These suggest that in reading the targeting of eye movements is (largely) controlled by oculomotor factors related to word length and spacing. O'Regan's (1992) model denies any effect of higher level factors on either the targeting or the duration of the first fixation which lands on a word1. In the 'centre ground' we find models by, for example, Morrison (1984), Henderson and Ferreira (1993) and Rayner et al. (Chapter 11). These all suggest that (at least some) high level information taken in during the This is a slight oversimplification of the model. Chapter 11 gives a more precise account. The reader should also bear in mind that later versions of this model (e.g. O'Regan et al., 1994) acknowledge some earlier lexical influences. However, the spirit of this model, and of even more moderate oculomotor positions, such as that of Radach and McConkie (Chapter 4), is to propose that it is principally oculomotor factors which drive eye movement decisions.
Parafoveal pragmatics
183
current fixation will influence its duration and that there may be consequences of this for the targeting of subsequent saccades. They deny, however, that either the duration of the current fixation or the targeting of a saccade out of a word will be influenced by other than low level information derived parafoveally from a subsequent word. Finally, on the 'far right' (in left-to-right scanning languages at least) there have been claims that 'higher level' characteristics of the parafoveal to-befixated word can influence either the targeting of the saccade launched into that word or the duration of the fixation before it is launched (for example and details, see Chapters 7 and 9). It seems clear that, at least from the perspective of a psycholinguist, a 'strong' oculomotor model is patently false. While low level configural information related to word length undoubtedly plays a large, sometimes overwhelming, role in determining fixation location and whether or not a refixation will be launched (see, for example, the consequences of landing on the right hand end of a long word in Pynte, Kennedy and Murray, 1991), there is now a large body of evidence showing that the duration of the initial fixation falling on a word will normally be modulated by that word's frequency of occurrence (for a summary, see Chapter 11). There is also substantial evidence that even higher level factors can influence initial fixation duration. A number of studies have demonstrated clear effects of syntactic structure on first fixation duration. Critically, some of these (e.g. Rayner, Carlson and Frazier, 1983; Liversedge, 1990; and Murray and Liversedge, 1994) have shown the effect on a fixation falling on exactly the same word in exactly the same local environment. Under such circumstances it is not possible to claim that there might have been some sort of 'contamination' of the effect related to differences in low-level configural information between the conditions. Further, as will become apparent later in this chapter, even higher level factors influence first fixation duration. This leaves us with a situation where most researchers agree that information about these sorts of higher-level properties of the currently-fixated word can be extracted within around 100-150 ms (since average fixation duration is frequently around 250 ms and saccadic programming time is generally agreed to lie somewhere between 100 and 150 ms, e.g. McConkie et al., 1985) and this information used to modulate both the decision about when the eye should move and where the next fixation should be targeted (including at least decisions about whether or not to reinspect). The next controversy, however, centres around the question of the types of information which may be extracted from words which are not being directly fixated, and what role this information might play in the decisions about when and where to move the eyes. The question is usually phrased in terms of whether 'parafoveal' information can be used to direct eye movements. I will continue to use that terminology here, but it is worth pointing out that the terms 'fovea' and 'parafovea' carry very little weight in the context of reading. There is no clear anatomical dividing line between what is usually termed the fovea and the para-
184
W.S.Murray
fovea. More importantly, the acuity function shows a steep, but very smooth fall off throughout the entire range from close to the centre of the fovea out into the periphery. To complicate the matter further, there is a near-perfect trade-off between effective acuity and letter size, such that it is possible to identify about the same number of letters in one fixation regardless of their size and how many of them consequently fall within the 'fovea'2. The point is that there is nothing 'special' about letters which either do or do not fall within the fovea. It is possible to have a situation where some letters falling within the fovea are not identifiable, or where it is easy to identify letters which fall well outside. At normal reading distances, the letter which is most directly fixated will be the one most clearly registered. Every other letter will be less clear, and increasingly so the further into the periphery it falls. We are therefore talking about a perceptual continuum. Whether this maps onto a functional continuum, in the context of eye movements in reading, is an empirical question. It may be the case that once functional acuity drops below a certain level, a different process takes over, or it may be that the functional influences are graded along with the acuity. In either case, it is clear that what we should really be talking about is the question of what the functional consequences are of information which falls more than a certain number of letter spaces away from the point of fixation. We certainly shouldn't assume that there is any magic dividing line between 'foveal' and 'parafoveal' information. I will not review here the debate between Rayner and colleagues and Underwood and colleagues concerning the influence of 'parafoveal information'. As I have said, the issues are more than adequately covered elsewhere in this volume and so far as I am concerned, it is an entirely empirical question. I can see no principled reason to argue either for or against 'parafoveal effects'. Natural cynicism might have led me to doubt their existence, but cynicism can be overtaken by data. At this point, it is, however, worth pointing out that the view espoused by Rayner and colleagues appears to have, perhaps not logical inconsistencies, but at least what might be termed a lack of continuity. The model, from its earliest incarnation (Morrison, 1984) right though to the latest version (Chapter 11), has consistently proposed that information in the 'parafovea' can be processed, at least to the level of word recognition. It has, however, consistently denied that the nature of this information can influence either the duration of the current fixation or the targeting of the subsequent saccade. It is only the case that if word recognition is completed that a targeting effect will occur, The most compelling demonstration of this simply involves taking a page of text, such as this, steadily fixating a particular letter while holding the page close and determining how much text is in clear vision. When the page is moved to twice the distance or more, you will find that approximately the same amount of text is still in clear vision, despite the fact that now twice as many letters fall on the fovea.
Parafoveal pragmatics
185
with the identified word being skipped (or briefly fixated). No information from the parafovea is allowed to influence current fixation duration and only word length information is allowed to influence targeting3. This is despite the fact that even when the word is not fully identified, there is still seen to be some 'saving' resulting from the parafoveal 'pre-processing'. That is, information of various types is derived from words in the periphery and is used to initiate lexical identification processes. Information from this pre-processing is integrated with information derived when the word is directly fixated and the two together will determine when the current fixation will end (through lexical identification, or 'lexical familiarity' in the current model). We therefore have a situation where parafoveal information influences fixation duration, just so long as the fixation now happens to be targeted on that particular word. If the current point of regard does not correspond to the word from which information is being derived, then the only parts of that information which can influence an eye movement relate to its length or to the fact that it has been identified. This seems possible, if a little curious, but becomes an even more 'interesting' proposal in the light of various other claims originating from this group of researchers. Not least of these is the claim by Balota (1990) that there is no such thing as the 'magic moment' of word identification. Balota views the process as the continual extraction and increasing availability of word-related information over time, and this notion is central to some of the discussions about how contextual and parafoveal orthographic information might be integrated (see, e.g., Balota, Pollatsek and Rayner, 1985). Without a 'magic moment' some other identification criterion must be determined. As mentioned above, the current model suggests that it is not lexical identification which plays a critical role in the triggering of eye movements, but a pre-lexical stage termed 'familiarity'. This is equated to something like 'candidate generation' (see, e.g., Becker, 1976; Norris, 1986; Paap et al., 1982), but could, of course, correspond to any point in the sort of continuum envisaged by Balota. It could correspond to some sort of 'process monitoring' status, an activation criterion or any of a range of possibilities. All of these, however, are pre-lexical (in the conventional sense) and will be based on the extraction of various sorts of information — orthographic, perhaps morphological, perhaps even phonological, as suggested by Pollatsek et al. (1992) — from the 'parafoveal' word. It would be possible to construct an eye movement system that ignored these sources of information and only reacted to whether a particular single criterion had been reached, but it seems difficult to conclude that this must be the case when we know that in long words, at least, the execution of eye movements is strongly conditioned by sub-lexical sources of 'information'. For example, Pynte et al. (1991) found immediate effects 3 But see Rayner, Sereno and Raney (1996) for an apparent contradiction when they suggest that the identification of letters in the periphery might influence targeting (p. 1189).
186
W.S.Murray
of the 'lexical informativeness' of the beginnings and ends of 12 letter words on first fixation duration, probability of refixation and saccade length within the word. Pragmatic plausibility and eye movements It has been known for some considerable time that pragmatic plausibility influences the time taken to process a sentence (e.g. Forster and Ryder, 1971; Forster and Olbrei, 1973; Forster, 1974). But while its influence is clearly implicated in the resolution of prepositional phrase attachment ambiguities (e.g. "The cop shot the man with the gun" cf. "The cop shot the man with the hat") and it has therefore been involved in eye movement studies related to this ambiguity, there has until recent times been little systematic study of the direct effects of plausibility on eye movements. Some exceptions are recent studies by Fodor et al. (1996), Traxler and Pickering (1996), Pickering and Traxler (in press) and Murray and Rowan (1998). All show 'early' effects of plausibility on eye movement parameters: the time spent inspecting words in a sentence relates directly to how plausibly they continue it. This occurs not only in situations, such as the prepositional phrase ambiguities above, where plausibility might provide a cue to the correct syntactic analysis, but also in circumstances where there is no alternative analysis (Fodor et al; Murray and Rowan). In Murray and Rowan (1998) we argue that this reflects a word-by-word incremental sentence processing system where the derivation of sentential representations influenced by pragmatic plausibility is both early and mandatory. We suggest that the fact that there are highly localised pragmatic plausibility effects and that they appear to be both unrelated to the nature of the task and insensitive to repetition context, poses difficulties for certain types of both modularist and interactive sentence processing architectures. For present purposes, however, these results are interesting insofar as they comment on the question of the types of information which can influence eye movement parameters and how rapidly they do so. Before describing the nature of the results, however, it is important to explain the experimental paradigm from which they were derived. This is a procedure which should minimise any possible influence of subject strategic factors and tap early, arguably mandatory, reading processes. Delayed same/different sentence matching When a participant is given two sentences to read and asked to make a simple decision about whether these are physically identical or not, the time taken to make that decision is influenced by both the syntactic and the semantic form of the sentence (e.g. Forster, 1979; Murray, 1982; Forster and Stevenson, 1987). This
Parafoveal pragmatics
187
happens even though both items are simultaneously available for inspection while the judgement is made and although they are, as shown below, vertically aligned on successive lines. It even happens when the first member of the pair has been presented by itself for sufficient time to complete its processing and then response time is taken from the onset of the second, identical, item immediately below it. This appears to always be true for 'same' items, such as the first pair shown below, and under some circumstances, at least, for different items, such as the second. The monkey with the red nose ate a banana. The monkey with the red nose ate a banana. The priest with the green teeth delivered the sermon. The priest with the green teeth delivered one sermon.
I will not attempt to reproduce here the detailed arguments about why this occurs. For present purposes it is sufficient to note that it apparently just happens to be the case that it is faster to derive a higher level representations of the sentences and make fewer comparisons than to compare the items on word-by-word or letterby-letter bases. Thus, for example, with a five word sentence, it would be possible to compare the items word-by-word with five individual comparisons. Alternatively, one could spend a little longer deriving higher level representations of the sentences and make fewer comparisons — perhaps only one. In practice, it appears that, at least for 'same' items, where the entire string needs to be compared, the time taken for the greater number of word-by-word comparisons outweighs the extra processing time required to derive sentence level representations for the two strings. Consequently, the sentence-based comparison process wins the 'race'. Importantly, though, such a comparison process should be based on the earliest available sentential representations and participants should be inhibited from completing any 'unnecessary' processing of the sentences. The reason that higher level effects are found with this task is that these comparison processes happen to be fastest. The development of more complicated, optional, sentence representations would inevitably be slower and highly unlikely to result in a simpler comparison process. The decision participants are required to make is, after all, completely unrelated to the nature of the sentence, or even the fact that it is a sentence. Method Materials Twenty-four sets of experimental items were used. Within each set two factors were systematically manipulated. The first was the plausibility of the relationship between the initial noun phrase (NP) and the verb. The second, the plausibility of the combination of the verb with the subsequent NP. There were therefore, as shown
188
W.S.Murray
below, four versions of each experimental item, and these contained either a plausible (P) or an implausible (I) relationship between NP1 and the verb and either a plausible or an implausible relationship between the verb and NP2. The verb remained the same in all four versions and the manipulations of plausibility involved the substitution of alternative nouns in the first and the second NPs. The alternative nouns in the different versions were of exactly the same length and were closely matched for Kucera and Francis (1967) word frequency. PP: PI: IP: II:
The hunters stacked the bricks. The hunters stacked the tulips. The bishops stacked the bricks. The bishops stacked the tulips.
The term 'plausibility' here is used in a relative sense. It will be apparent from the examples above and the full set of items contained in the appendix that the manipulation does not rely on highly stereotyped or expected relations and that this is a relatively subtle manipulation of plausibility; far more subtle in fact than the type of manipulation used by Traxler, Pickering and other authors that have shown effects of plausibility on eye movement parameters. However, while more subtle, the manipulation nonetheless results in significant differences in rated plausibility. A group of 10 judges, who did not otherwise take part in the experiment, assessed the plausibility of the experimental items on a seven point rating scale. Mean plausibility ratings were, P/P: 5.3; P/I: 3.0; I/P: 3.0; I/I: 2.0. The differences in plausibility were consistent across items, with highly significant effects of both NPl-verb plausibility, F2(l,23) = 458.37, p < 0.001, and verb-NP2 plausibility, F2(l,23) = 373.18, p< 0.001. Four counterbalanced item files were used. Each contained only one version of the members of an experimental set, together with 18 simple declarative filler sentences and 12 practice items. Half of the items in each file were matched with a second sentence which was exactly the same. The other half contained one changed word of exactly the same length as the word it replaced. The changed word occurred with roughly equal frequency in all serial positions. Procedure The 24 participants were informed that their task was simply to decide whether pairs of sentences were identical or not. They were asked to read the first sentence, press a button to initiate the display of the second, and then decide as rapidly and as accurately as possible whether the two were identical. If the sentences differed, they were told, this would always be because one word did not match. Sentences were presented in lower case (except for initial capitals) in a monopitch font on single lines of a high resolution VDU display. At the beginning of each trial, a fixation
Parafoveal pragmatics
189
point appeared momentarily to the left of the first word of the first sentence. This disappeared and the first sentence appeared on the screen. When this had been read, participants pressed a button. This triggered the display of the second (comparison) sentence, vertically aligned, on the line immediately below. Both sentences then remained on the screen until the participant signalled their decision by pressing either a "yes" or a "no" button. Participants' head movements were constrained by use of a dental composition bite bar and chin rest and their eye movements monitored throughout using a 'Dr Bouis' infrared pupil-centre computation device, sampled with a 12 bit A-D at 5 ms intervals. Calibration of the equipment was carried out at the beginning of the experiment and after every three sentence pairs. Data were stored for off-line analysis. The calibration and clustering algorithms employed statistical procedures to maximise resolution for each participant on each trial and provided a resolution of better than one character position (mean resolution was around 0.8 character spaces). The accuracy of calibration was also verified off-line before items were entered into the data analysis. If there was any question regarding the accuracy of the calibration, the item was deleted from the analysis. The final analysis included data from more than 95% of the trials. Results and discussion The overall results of the experiment are reported in Murray and Rowan (1998). In summary, they show localised effects of both NPl-verb and verb-NP2 plausibility on the reading of both the first and the second (comparison) sentence. Some aspects of the results, however, bear repetition here, since they directly relate to the controversies concerning the factors which control eye movements. Some analyses not reported in Murray and Rowan will also be presented. All of the results reported here are derived from the reading of the initial sentence in each comparison pair. That is, they relate only to the reading of the first sentence, up to the point at which the participant presses the button to signal they have finished reading it and triggers the display of the second, comparison, item on the line below. These results therefore reflect only the participants' initial encounter with each sentence and are unrelated to either the nature of the comparison process or to the decision ('same' or 'different') which will eventually be made. Participants made a correct judgement on 93.9% of experimental trials, but since the measures reported here relate only to the reading of the first sentence, no trials were excluded from the analysis on the basis of an inaccuracy of comparison or judgement which would follow after the presentation of the second member of the pair. For analysis purposes, the sentences were considered to contain three regions: the initial NP, the verb and NP2. Fixations falling on the spaces between these regions were assigned to the region on the right.
190
W.S. Murray
'Foveal' effects Initial fixation durations on the verb were significantly longer when the preceding NP was an implausible subject (271 ms) than when it was a plausible subject (254 ms), F,(l,20) = 13.18, p < 0.005; F2(l,20) = 8.76, p < 0.014. Thus, for example, the first fixation on the verb "stacked" was longer when it had been preceded by the NP "The bishops", than when preceded by "The hunters". At this point in the sentence there was no effect of NP2 plausibility or interaction between NP1 and NP2 plausibility (all Fs < 1.32). However, one word (and usually one fixation) later, the duration of the first fixation in NP2 was no longer influenced by NPl-verb plausibility, (F, and F2 < 1), but did increase when the fixation fell on an NP that was an implausible object of the verb, such as "the tulips" (301 ms), than when it fell on a plausible object NP, such as "the bricks" (278 ms), F,(l,20) = 5.49, p < 0.05; F2(l,20) = 10.00, p < 0.005. There was no interaction between the effects of NP1 and NP2 plausibility. These are clear effects on first fixation duration related not to word length, or even to a 'high level' factor, like frequency (since the verb was identical in the two conditions and the words in the second NP were matched for both length and frequency). Nor can they be related to the location of the first fixation falling within either of these regions, since this did not vary with the experimental conditions (Fs < 1). It seems clear that the duration of these fixations is modulated by on-going semantic processing related to the meaning of the sentence. It is somewhat surprising that sentence meaning can exert such a rapid effect on eye movement parameters, but it should be remembered that these were all very simple sentences and would have posed little load on syntactic parsing. Semantic effects might not always manifest themselves in the eye movement record quite so rapidly. However, the fact that they do here, and with such reliability, provides a major challenge to 'oculomotor models' — or at least the subset of these which seek to minimise the effects of on-going linguistic processing on eye movement control. It is not possible to 'escape' from this challenge by suggesting that the effect might be in some way related to low-level configural information: Word length was identical across all conditions and at least one of these effects occurs on an absolutely identical word in the two conditions (the verb). The only way out would appear to be to suggest that these fixations were 'abnormally' long and that this In this and all subsequent analyses which rely on the presence of a fixation in one or more of these regions, cases where the region was not directly fixated were treated as missing data. It does not seem sensible to consider the duration of a fixation and to include zero as a value when it did not occur, or to talk about saccade size into a region if, in fact, the saccade went somewhere else. A zero gaze duration, on the other hand, is sensible. Fortunately, region skipping was a relatively infrequent event. The highest frequency (5.7%) occurred on the verb. Verb skipping did not vary systematically with experimental condition, x* (1) = 0.290.
Parafoveal pragmatics
191
allowed higher level factors to exert an effect. But while fixation durations on the second NP are perhaps a little longer than average, they are not markedly so, and those falling on the verb are well within the normal range. It seems far more parsimonious to suggest that these are normal fixations which have been lengthened in the context of more demanding higher level processing. The allowable duration for a 'normal' fixations is going to become remarkably short if these are to be excluded by the definition. Clearly, 'oculomotor' theorists are right in asserting that word length and the location of a fixation within the word play a major role in determining fixation duration. However, it is apparent that a range of 'higher level' factors, such as word frequency, syntactic and semantic form, also exert potent effects. A sceptic might wish to suggest that these effects are not in fact due to higher level factors such as pragmatic plausibility, but can instead be related to word or syntactic level factors. For instance, it might be argued that the effect of NPl-verb plausibility can be laid at the door of lexical semantics, since many of the 'implausible' NPs contained non-human agents. This, however, is to misunderstand the nature of the effects. There is nothing whatsoever about non-human nouns that restricts their taking an agent role. It is only when they are combined with particular verbs that the agentive role becomes less plausible. There is nothing to suggest that non-human nouns are in any way intrinsically less identifiable than frequencymatched human nouns, and certainly nothing to suggest that such an effect carries over to the following word. But, in any case, such an argument becomes considerably less plausible in the context of reliable effects of NP2 plausibility clearly unrelated to lexical-semantic properties of the nouns, and the observation that, despite the split of implausible agents into 13 human and 11 non-human, the effects on rated plausibility and on fixation duration clearly generalise well across the items tested. I will return to this issue further, below, when discussing 'parafoveal' effects. For the moment, however, it is worth noting that the effects also cannot, of course, relate to syntactic processing differences other than those which arise as a consequence of the perceived plausibility of a particular syntactic analysis. The possible structural analyses are identical across the members of each matched pair. 'Parafoveal'
effects
One intriguing result reported by Murray and Rowan is that not only did the plausibility of the combination of the first NP with the verb influence inspection time on the verb, it also influenced first pass reading time on the NP before the verb had been directly fixated. That is, readers were sensitive to how the currently fixated information would combine with that provided by a word which had not yet been inspected. In order to determine whether this was in fact a 'parafoveal' effect,
192
W.S. Murray
Table 1 Effects of boundary position on First Pass reading time and Last Fixation Duration in the initial noun phrase Boundary 0
-
2
-
3
-
4
- 5
Plausible
482
439
414
369
315
Implausible
512
457
423
364
316
Difference
30
18
11
-5
1
Plausible
248
247
243
238
228
Implausible
279
271
260
244
229
Difference
31
24
17
6
1
First pass NP1
Last fixation NP1
Murray and Rowan set about localising it within the initial NP. Clearly, if the effect was distributed throughout the majority of fixations making up the first pass reading time in the zone, it is unlikely that it could be attributed to parafoveal preview of the verb. In the event, they discovered that it was attributable only to the duration of the last fixation falling in the initial NP. The duration of this fixation increased significantly from 248 to 279 ms as a consequence of the plausibility of the following verb, F,(l,20) = 35.73, p < 0.001; F2(l,20) = 10.90, p < 0.005. They further argued that this is clearly a parafoveal effect, since moving the analysis boundary between NP1 and the verb three character spaces to the left resulted in no remaining effect of NPl-verb plausibility on first pass times in NP1, F, = 0.41; F2 = 0.46, but a continuing effect on first fixation durations to the right of the new boundary, F,(l,20) = 17.27, p < 0.001; F2(l,20) = 8.88, p < 0.01. While the above results appear to present a pretty convincing case for a 'parafoveal' effect related to pragmatic plausibility, we can consider the question in an more systematic way by examining the effects of moving the NPl-verb 'boundary' through a range of values. Table 1 shows the effect of moving this boundary on both the first pass reading time and the duration of the last fixation in the initial NP. A 'zero' boundary position falls immediately after the noun, with fixations on the subsequent space assigned to the following verb. Boundary positions with negative values are to the left of this: -2 is two characters further left, with the last two letters
Parafoveal pragmatics
\ 93
Table 2 Effects of boundary position on First Fixation Duration in the second (verb) zone Boundary 0
-
2
-
3
-
4
- 5
Plausible
254
252
250
247
247
Implausible
271
268
271
271
272
17
16
21
24
25
First fixation duration
Difference
of the noun now not counted as part of the initial zone; -3 is three characters to the left etc. It is clear from these results that the plausibility effects on both first pass reading time and last fixation duration systematically diminish as the boundary is moved further leftward. As mentioned above, the first pass effect is not significant with a boundary moved three characters to the left, but the last fixation duration effect does survive this move, F,(l,20) = 8.49, p < 0.01; F2(l,20) = 5.08, p < 0.05. However, when the boundary is moved one more character to the left, to -A, neither the first pass, nor the last fixation duration effect is sustained, F, = 0.17; F2 - 0.40, and, F, = 1.55; F2 = 0.94. And, clearly, with a boundary five characters to the left, there is no remaining trace of either effect. It is also apparent from Table 1 that the magnitude of the first pass effect in the initial NP rests entirely on the duration of the last fixation in this zone. Both diminish equivalently as the boundary is moved leftwards. The same, however, is not true of the duration of the first fixation which falls to the right of the boundary, on, or slightly to the left of, the beginning of the verb. These first fixation durations are shown in Table 2. It is readily apparent that as the boundary moves up to five characters to the left, the plausibility effect on this fixation does not diminish. In fact, it is remarkably constant and does not vary significantly with boundary position, F,(4,80) = 1.14, p > 0.3; F2(4,80) = 0.78. It appears, therefore, that fixations falling more than four or five character spaces to the left of the verb are not influenced by its plausibility, but that fixations which fall either on the verb or on the last few characters of the preceding word are influenced to an equivalent extent by the plausibility of the verb. Clearly, this suggests that the verb is being identified and processed not only when it is directly fixated, but also when a fixation falls in the latter part of the preceding word. It is not, however, the case that these 'parafoveal' effects are critically related to a situation in which the verb will be skipped over and not directly fixated. The verb
194
W.S. Murray
was directly fixated on 94% of occasions, and an analysis of last fixation duration in NP1, limited to cases where the next fixation falls directly on the verb, continues to show a robust effect of NPl-verb plausibility with mean durations of 250 ms and 274 ms for plausible and implausible items respectively, F,(l,20) = 23.70, p < 0.001; F2(l,20) = 8.02, p < 0.01. Contrary to the claim by Henderson and Ferreira (1993), it appears that fixation measures can reflect the processing difficulty of a word in 'parafoveal vision'. It seems to have been over-simplistic to conclude that it is only the processing of a directly-fixated word which can influence the decision about when to move the eyes. Finally, if we consider whether these 'parafoveal' effects can instead be attributed to some property of the initial noun, it is apparent that this is extremely unlikely. In general terms, it seems clear that any property of these nouns which had a direct effect on fixation duration would be expected to influence the word's inspection pretty well regardless of the exact location of the fixation. It certainly would not be anticipated that lexical properties would influence fixation duration only for those instances where the fixation fell on the last few characters of the word (recall that it is only these fixations which show any indication of a difference in duration). However, despite the implausibility of the argument, we can nevertheless consider whether there is any lexical property differing in the plausible and implausible conditions which might have such an effect. Clearly, it cannot be length or frequency, since these were matched. There are no systematic morphological differences between the nouns. The only possibility would appear to be the aforementioned difference in the number of non-human entities occurring in the two conditions. Ignoring, for the moment, the fact that the participant's task was completely unrelated to meaning, and therefore that strategies based on meaning are very unlikely and also the reliability of the effects shown across the entire set of items tested, we can investigate this possibility directly by completing an analysis involving only the 13 item pairs containing human agents in both conditions. This analysis in fact shows a significant and numerically larger effect of plausibility on last fixation duration: mean fixation duration for plausible nouns was 248 ms and for implausible, 290 ms, F2(l,12) = 10.44, p < 0.01. The conclusion is obvious. Nor, of course, can it be the case that the effect is related to low level orthographic factors such as trigram frequency at the beginning of the 'parafoveal' word (see, e.g., Chapter 7), since this word (the verb) was identical in the two conditions. There therefore appears to be no property of the words involved, other than their combinatorial plausibility, which could give rise to the effects on fixation duration. If we turn now to the where decision, it is clear, as previously mentioned, that with this manipulation at least, there was no effect on landing position on the verb or in the final noun phrase. Pragmatic plausibility did not significantly influence where the eyes landed in either of these regions. There was, however, an intriguing trend in
Parafoveal pragmatics
195
the saccade length measure for the saccade which left the initial NP and landed on the verb. When the verb was plausible, the average length of this saccade was 8.37 characters; when the verb was implausible, the length of the saccade was 8.66 characters. This difference, however, failed to achieve statistical significance, F,(1,20) = 1.96, p = 0.17; F2( 1,20) = 1.84, p = 0.19. It is interesting, nonetheless, that it is in the direction of larger saccades following longer final fixations in NP1. To determine whether there was in fact a relationship between the duration of this fixation (which it will be recalled was influenced by the plausibility of the following verb) and the saccade which was launched from it, a correlational analysis was performed. This involved correlating duration of the fixation and saccade size for the full participants by items data matrix. To avoid introducing inter-subject variability into the analysis, all fixation durations and saccade sizes were 'normalised' by subtracting that participant's mean on each of the measures. The resulting analysis showed a moderate, but significant, positive correlation between duration and extent, r = 0.112, F(l ,460) = 5.80, p = 0.016. While this correlation does not account for a large percentage of the variance, it seems that the when and where decisions of eye movement control are not completely independent. There appears to be some tendency to 'trade-off current inspection time against saccade size. Clearly, this result too suggests that the uptake of 'parafoveal' information is being monitored by the system which is responsible for launching the next saccade. General discussion These data (in concert with the results from many other studies) do not support 'strong' oculomotor theories, which deny or minimise the direct effect of linguistic variables on the current fixation. Not only does it appear to be possible to extract information about word identity within the necessary 100-150 ms needed to influence fixation duration, but the consequences of this word identity for both syntactic and semantic processing can be evaluated and, at least under some circumstances, influence the duration of the fixation. The coupling between eye movements and on-going cognitive processes appears to be even tighter than many would have envisaged even a few years ago. There has undoubtedly been a 'softening' in the views of both oculomotor and 'processing model' theorists in recent years, with a convergence towards a middle ground where it is acknowledged that while oculomotor factors exert a major influence, there are at least some early effects of linguistic factors on eye movement control (e.g. O'Regan et al., 1994). Nonetheless, the speed, magnitude and nature of the higher-level effects reported here seems likely to reach beyond the bounds of what many oculomotor theorists would have been prepared to predict. It appears, however, that these effects are not susceptible to counter-explanation in terms of lower level factors.
196
W.S.Murray
The findings from this study are especially significant since the task employed is one which would be expected to, and has been shown to (e.g. Grain and Fodor, 1987; Forster and Stevenson, 1987; Freedman and Forster, 1985), engage only the most basic and fundamental reading processes. It is not the case that participants in the study needed to understand what the sentence was about, or remember its content, in order to complete the task. The results reported here relate only to their initial scanning of the first sentence presented by itself on the screen, but immediately following this, both this sentence and the comparison item were freely available for inspection as participants made their decision. Under these circumstances there can be little argument that the results are somehow a consequence of 'experimentspecific strategies'. Rather, it appears that even high level information about sentence meaning directly, and very rapidly, influences basic reading dynamics. While these data clearly support what Rayner et al. refer to as 'processing models', there are aspects of the results which do not accord with some of the details proposed by Morrison (1984) and later variants of this model. Specifically, it appears to be over-simplistic to conclude that there is any necessary functional distinction between information picked up from a word when it falls directly under the point of fixation and when it does not. There was no evidence, in this study at least, of any fundamental distinction between information about a word picked up by direct inspection, as compared to when the point of regard fell up to five character spaces to the left. It would be unwise to assume that information related to word identity can always be ascertained from such a distance. It may be the case that the nature of the task imposed a relatively light 'foveal' processing load (Henderson and Ferreira, 1993) and that this enabled greater uptake of 'parafoveal' information. But, where this information can be obtained, it appears to have immediate consequences. The span of apprehension is clearly not limited by word boundaries, and it seems that any information fed into the system during a fixation is, in principle, capable of influencing eye movement decisions. Acknowledgements I would like to acknowledge the many helpful comments made on this work and on an earlier version of the chapter by Marc Brysbaert, Chuck Clifton, Ralph Radach, and two anonymous reviewers. The work was supported in part by Grant No BMHI-CT94-1441 from the European Union under the BIOMED Programme. References Balota, D.A. (1990). The role of meaning in word recognition. In: D.A. Balota, G.B. Flores d'Arcais and K. Rayner (Eds). Comprehension Processes in Reading. Erlbaum.
Parafoveal pragmatics
\ 97
Balota, D.A., Pollatsek, A. and Rayner, K. (1985). The interaction of contextual constraints and parafoveal visual information in reading. Cognitive Psychology, 17, 364-390. Becker, C.A. (1976). Allocation of attention during visual word recognition. Journal of Experimental Psychology: Human Perception and Performance, 2, 555-566. Grain, S. and Fodor, J.D. (1987). Sentence matching and overgeneration. Cognition, 26, 123-170. Fodor, J.D., Ni, W., Grain, S. and Shankweiler, D. (1996). Tasks and timing in the perception of linguistic anomaly. Journal of Psycholinguistic Research, 25, 25-57. Forster, K.I. (1974). The role of semantic hypotheses in sentence processing. In: F. Bresson and J. Mehler (Eds.), Current Problems in Psycholinguistics. Paris: Editions du CNRS. Forster, K.I. (1979). Levels of processing and the structure of the language processor. In: W.E. Cooper and E.C.T. Walker (Eds.), Sentence Processing: Psycholinguistic studies presented to Merrill Garrett. Hillsdale, NJ: Erlbaum, pp. 27-81. Forster, K.I. and Olbrei, I. (1973). Semantic heuristics and syntactic analysis. Cognition, 2, 319-347. Forster, K.I. and Ryder, L.A. (1971). Perceiving the structure and meaning of sentences. Journal of Verbal Learning and Verbal Behavior, 10, 285-296. Forster, K.I. and Stevenson, B.J. (1987). Sentence matching and well-formedness. Cognition, 26, 171-186. Freedman, S.E. and Forster, K.I. (1985). The psychological status of overgenerated sentences. Cognition, 19, 101-132. Henderson, J.M. and Ferreira, F. (1993). Eye movement control during reading: Fixation measures foveal but not parafoveal processing difficulty. Canadian Journal of Experimental Psychology, 47, 201-221. Kucera, H. and Francis, W.N. (1967). Computational Analysis of Present-Day American English. Providence, RI: Brown University Press. Liversedge, S.P. (1990). Does semantics influence syntax? Unpublished honours dissertation, University of Dundee. McConkie, G.W., Underwood, N.R., Zola, D. and Wolverton, G.S. (1985). Some temporal characteristics of processing during reading. Journal of Experimental Psychology: Human Perception and Performance, 11, 168-186. Morrison, R.E. (1984). Manipulation of stimulus onset delay in reading: Evidence for parallel programming of saccades. Journal of Experimental Psychology: Human Perception and Performance, 10, 667-682. Murray, W.S. (1982). Sentence matching: The influence of meaning and structure. Unpublished doctoral dissertation, Monash university. Murray, W.S. and Liversedge, S.P. (1994). Referential context effects on syntactic processing. In: C. Clifton, L. Frazier and K. Rayner (Eds.), Perspectives on Sentence Processing. Hillsdale, NJ: Erlbaum, pp. 359-388. Murray, W.S. and Rowan, M. (1998). Early, mandatory pragmatic processing. Journal of Psycholinguistic Research, Special Issue, 21, 1-22. Norris, D.G. (1986). Word recognition: Context effects without priming. Cognition, 22, 93-136. O'Regan, J.K. (1992). Optimal viewing position in words and the strategy-tactics theory of eye movements in reading. In: K. Rayner (Ed.), Eye Movements and Visual Cognition:
198
W.S. Murray
Scene Perception and Reading. New York: Springer Verlag, pp. 333-354). O'Regan, J.K., Vitu, F., Radach, R. and Kerr, P.W. (1994). Effects of local processing and oculomotor factors in eye movement guidance in reading. In: J. Ygge and G. Lennerstrand (Eds.), Eye Movements in Reading. Oxford: Pergamon, pp. 329-348. Paap, K.R., Newsome, S., McDonald, I.E. and Schvaneveldt, R.W. (1982). An activationverification model for letter and word recognition: The word superiority effect. Psychological Review, 89, 573-594. Pickering, M.J. and Traxler, M.J. (1998). Plausibility and recovery from garden paths: An eye-tracking s.tudy. Journal of Experimental Psychology: Learning, Memory and Cognition, in press. Pollatsek, A., Lesch, M., Morris, R.K. and Rayner, K. (1992). Phonological codes are used in integrating information across saccades in word identification and reading. Journal of Experimental Psychology: Human Perception and Performance, 18, 148-162. Pynte, J., Kennedy, A., and Murray, W.S. (1991). Within-word inspection strategies in continuous reading: The time-course of perceptual, lexical and contextual processes. Journal of Experimental Psychology: Human Perception and Performance, 17, 458-470. Rayner, K., Carlson, M. and Frazier, L. (1983). The interaction of syntax and semantics during sentence processing: Eye movements in the analysis of semantically biased sentences. Journal of Verbal Learning and Verbal Behavior, 22, 358-374. Rayner, K., Sereno, S.C. and Raney, G.E. (1996). Eye movement control in reading: A comparison of two types of models. Journal of Experimental Psychology: Human Perception and Performance, 22, 1188-1200. Traxler, M.J. and Pickering, M.J. (1996). Plausibility and the processing of unbounded dependencies: An eye-tracking study. Journal of Memory and Language, 35,454-475.
Appendix: Experimental materials The savages/uranium smacked the child/money The carpenter/ambulance cleared his throat/finger The fugitive/pedigree groomed the horses/forest The constable/islanders answered the telephone/orchestra The tutor/trout delivered his sermon/bubble The robbers/newborn rehearsed the part/meal The hostess/charity weighed the carcass/sleeves The lecturer/princess delivered the packages/wardrobe The burglars/doorbell amused the crowd/grass The comedians/lyricists picked the lock/lift The labourer/bacteria stalked the tiger/algae The housewife/alligator loaded the rifle/chair The scientist/guerrilla arrested the criminals/ambulance The policeman/therapist tested the chemicals/saxophone The tailor/Libyan took the tickets/scenery
Parafoveal pragmatics
The bishop/knight carried the luggage/buttons The guard/saint cooked the meal/beer The hunters/bishops stacked the bricks/tulips The servant/lawyers locked the gates/drums The porter/rebels heard the organ/dolls The vicar/beast corrected his pupil/giant The soldiers/treasury were tipped by the customers/musicians The politician/expedition wore the costumes/cylinder The butcher/witches welcomed the guests/horses
\ 99
This page intentionally left blank
201
CHAPTER 9
Foveal Processing Load and Landing Position Effects in Reading Simon P. Liversedge University of Durham and Geoffrey Underwood University of Nottingham
Abstract In this chapter we describe two eye tracking experiments which investigated whether orthographic information may be extracted from the parafovea and used to guide the eye towards infrequent letter strings at the beginning of seven-letter target words. We also investigated whether the ease with which the preceding word was processed influenced the degree to which the point of fixation was attracted to an infrequent letter string. In the first experiment we manipulated whether a category word prior to the target word referred to an antecedent noun phrase which was either a typical or an atypical instance of that category. In the second experiment we used a possessive pronoun to refer to an antecedent noun phrase with a stereotypical gender which was either congruous or incongruous with the gender of the pronoun. Reading times during the first pass were not influenced by the two manipulations of foveal load. Additionally, there was no effect of the frequency of the initial trigram on where within the target word a reader initially fixated. However, the foveal load manipulations did cause differences on other eye movement measures indicating the manipulations were not entirely ineffective. Exploratory analyses in Experiment Two suggested that if subjects experience light foveal processing load, orthographic information in the parafovea may influence their landing position on the following word.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
202
S.P. Liver sedge & G. Underwood
Introduction Much of our insight into the processes involved in reading has come from studies which have used eye movement recording techniques. Such techniques provide a relatively natural, non-intrusive, 'on-line' method for studying reading, and allow inferences to be made concerning the perceptual and cognitive processes involved in reading (Rayner and Pollatsek 1989; Rayner et al., 1989; see also Chapter 3). In recent years much progress has been made in understanding how the eyes are controlled as we read. We now know that there are a large number of characteristics of text, both linguistic and visual, that influence where and for how long a reader spends fixating a portion of text as they read. More importantly, a number of these factors have been incorporated into models of eye movement control (e.g. Morrison, 1984; and more recently Reichle, Pollatsek, Fisher and Rayner, 1998) and these provide quite a good account of eye movement behaviour during reading. Importantly, however, these models remain underspecified in two areas. First, they make no predictions concerning regressive saccades required for re-reading text that has already been processed. Such re-reading often occurs due to a failure in higher order processing, for example, a syntactic misanalysis. Upon detection of such a misanalysis, ongoing eye movement control processes are interrupted in order that the sentence may be re-read to permit the reader to reanalyse it. However, to date, such reanalysis procedures are not sufficiently well understood for them to be incorporated into a model of eye movement control. The second area of underspecification concerns the exact location of a fixation within a word. The models provide no account of where a reader will fixate within a word. Again, this is probably because the factors which influence where a reader fixates within text are not yet sufficiently well understood for them to be incorporated into a model of eye movement control during reading. In this chapter we will describe two experiments we conducted to investigate the factors that influence where a reader's point of fixation will land on a word. Although this question is not yet fully understood, a substantial amount of research has been conducted within this area. Indeed, a considerable number of experiments have been conducted to investigate whether a reader will fixate a string of letters within a word possessing certain orthographic or morphemic properties (Everatt and Underwood, 1992; Hyona, 1995; Hyona, Niemi and Underwood, 1989; Hyona and Pollatsek, 1997; Inhoff, Briihl and Schwartz, 1996; Rayner and Morris, 1992; Underwood, Clews and Everatt, 1990; Underwood, Hyona and Niemi, 1987). These studies have produced a number of conflicting findings. Some researchers have obtained results indicating that the point of fixation is drawn towards strings of letters within a word, while others have not. However, there is now a growing body of evidence to suggest that under certain circumstances certain orthographic and morphemic characteristics of a word do influence where the point of fixation will
Landing position effects
203
land on that word. The issue now seems to be one of identifying the exact orthographic and morphemic characteristics along with the precise conditions which cause such effects occur. Basic findings and terminology When discussing experiments investigating landing positions, it is important to distinguish between two qualitatively different types of processing: Foveal and parafoveal processing. The fovea is that part of the retina that extends approximately 2° across the fixation point. Beyond the fovea, the parafovea extends a further 8° and the area which is beyond this is termed the periphery of vision (cf. Balota and Rayner, 1991). The eye movements we make during reading allow text to be viewed with the area of the retina where visual acuity is sharpest. The terms foveal and parafoveal processing are often used in the context of landing position experiments and in such circumstances psychologists term the fixated word the foveal word and the words to the right of the fixated word the parafoveal words. In this chapter we will adhere to this use of the terminology. The extraction of orthographic information from parafoveal words A series of studies by Underwood, Hyona and their colleagues (Everatt and Underwood, 1992; Hyona, 1995; Hyona, Niemi and Underwood, 1989; Underwood, Clews and Everatt, 1990; Underwood, Hyona and Niemi, 1987) have investigated whether there is a sensitivity to the orthographic frequency of letter strings within parafoveal words (i.e., the number of words which appear in the norms which are of a particular length and have a particular initial trigram). They have also investigated whether such orthographic information affects where the eye fixates on those words. Underwood, Hyona and Niemi (1987) found that readers were more likely to fixate an infrequently occurring letter string than a frequently occurring letter string at the beginning of a word. They termed this effect the landing position effect. More recently, Hyona (1995) has suggested that these results may be explained in terms of an Attraction hypothesis. Hyona suggests that unusual (i.e., infrequent) letter strings attract the point of fixation as people read because they represent a less familiar stimulus in the parafovea. Hyona provided evidence in favour of this claim by showing that a highly infrequent letter cluster at the beginning of a ten or eleven character word caused people to land close to the beginning of the word. However, Underwood, Bloomfield and Clews (1988) found that infrequent letter clusters did not attract the point of fixation. Furthermore, studies by Underwood and his colleagues which found a landing position effect have come under criticism from
204
S.P. Liver sedge & G. Underwood
Rayner and Morris (1992). Rayner and Morris argued that such higher level processing of parafoveal words would require complex processing and decision processes inconsistent with the time constraints typical in normal reading. They further argued that the eyetracking system used by Underwood and his colleagues was unreliable considering the small size of the effect obtained. They attempted to replicate the findings of Underwood, Clews and Everatt (1990) using the same stimuli, but found no difference in the landing position on the target word as a function of word type. Rayner and Morris concluded that "low level visual information (primarily word length) is the primary determinant of the initial landing position on a word in reading" (p. 170). Underwood, Clews and Everatt (1990), acknowledged that the landing position effect is small and has emerged only as a trend in some previous experiments (Underwood, Bloomfield and Clews, 1989; Hyona, Niemi and Underwood, 1989, Experiment 3). They suggested that it is possible that readers do not always make use of orthographic information from parafoveal vision in the guidance of their eyes and that the landing position may be determined by parafoveal processing in some cases, and rightwards movement insensitive to orthographic information in others. A factor that has been shown to influence the amount of preview benefit gained from a parafoveal word is foveal processing difficulty (Henderson and Ferreira, 1990). Henderson and Ferreira presented subjects with sentences and independently manipulated both the difficulty of the foveal word and the availability of parafoveal information through the use of the boundary technique (Rayner, 1975). In studies employing the boundary technique, as the point of fixation passes an invisible boundary in the text, the linguistic information to the right of the boundary is changed in some way. In Henderson and Ferreira's study, the preview of the parafoveal word was either visually similar or dissimilar to the word that replaced it when the boundary was transgressed. Foveal difficulty was manipulated lexically as a function of the frequency of the foveal word. They found that when the foveal word had a high frequency, a parafoveal preview of the next word was more beneficial than when the foveal word had a low frequency. In a second experiment they manipulated the syntactic difficulty of the text prior to the parafoveal word and again demonstrated a reduced parafoveal preview benefit when foveal processing was high. They concluded that the perceptual span, that is, the area of effective vision during reading, is variable and attentionally constrained, being shorter when foveal processing load is high and longer when foveal processing load is low. Henderson and Ferreira (1990, 1993; Rayner, 1986; and see also Reichle et al., 1997) proposed a model, the sequential attention model, explaining the relation between covert visuo-spatial attention and eye-movement control, based on Morrison's (1 984) model. The model proposed that at the beginning of a fixation on a new word, attention is allocated to that word. When processing on that word is complete, attention is redirected to a new word allowing a higher level analysis of that word.
Landing position effects
205
This relocation of attention is the signal to move the eyes, and an eye-movement is programmed, taking as its new location the word to which attention is directed. A saccade then brings the eyes to the attended word but this follows an eye-movement programming latency. Therefore, there is a preview benefit derived from the parafoveal word as a function of the latency between the shift of attention and the saccade to that word. To account for their finding that foveal processing difficulty decreased parafoveal preview benefit, Henderson and Ferreira added a programming deadline assumption to the model. If processing of the fixated word is difficult then a programming deadline may be reached before attention has shifted to the next word. In this case an eye-movement is programmed prior to the shift of attention. If processing on the fixated word is subsequently finished, attention shifts to the next word, but the latency between this shift and the next eye-movement will be reduced, thus reducing attentive parafoveal processing and so the preview benefit. This model suggests that if foveal processing load is high, then the ability to detect orthographically infrequent strings of letters in parafoveal words will be reduced. This in turn suggests that the landing position effect should be more apparent in situations where foveal processing load is low than when it is high. In our studies we attempted to determine two things. First, whether the point of fixation was attracted to an infrequent letter string at the beginning of a word. We therefore used target words with either an initial trigram which occurred frequently for seven letter words, or an initial trigram which occurred relatively infrequently for such words. Secondly, whether sensitivity to orthographic information within the parafovea is modulated by foveal processing difficulty.
Experiment One: Imposing a foveal load with category typicality In Experiment One we required a means of manipulating processing load on the word prior to a target word with a frequent/infrequent initial trigram. There are many factors which have been demonstrated to influence foveal processing difficulty, however, for this study the means of inducing such difficulty must possess two important characteristics. First, foveal processing difficulty must be induced by a small localised area of the sentence (preferably a word). Secondly, whilst the tool must induce sufficient processing difficulty to potentially influence parafoveal processing, it must not cause subjects to make a regressive saccade (otherwise it will be impossible to observe the influence of attentional demand on the position of the subsequent fixation on the word to the right). In addition, it is desirable to ensure that the sentence fragment immediately preceding the target word has similar content under each of the four conditions to ensure that the characteristics of the target word could be the only cause of any landing position effect which might occur.
206
S. P. Liversedge & G. Underwood
One phenomenon which induces foveal processing difficulty and which has been shown to be localised to a word, is the process of anaphoric assignment (Garrod and Sanford, 1977; Duffy and Rayner, 1990). In a whole sentence self paced reading study Garrod and Sanford (1977) presented pairs of sentences to subjects. The first sentence contained an instance of a category (e.g., robin) and the second sentence contained the category noun (e.g., bird). Garrod and Sanford manipulated the typicality of the instance, on half the occasions the instance being typical (e.g., robin), and on half the occasions the instance being atypical (e.g., ostrich). They found that the category sentences took longer to read when the sentences contained an atypical instance than when they contained a typical instance. While the study by Garrod and Sanford indicated processing difficulty for category words following atypical instances, their measure of reading time was not sufficiently fine-grained to allow us to be sure that processing difficulty occurred when subjects first read the category noun. However, more recently, Duffy and Rayner (1990) conducted an experiment using eye tracking methodology and sentences similar to those used by Garrod and Sanford (see also Rayner, Raney and Pollatsek, 1995). They found that gaze durations on the category noun and the portion of the sentence immediately after the category noun were longer when the instance in the preceding text was atypical than when it was typical. However, this difference only occurred when the antecedent was close to the anaphor in the text. We therefore constructed pairs of sentences like (l)-(4) below containing either a typical or an atypical instance of a category word which appeared in the second of the sentences. The category word immediately preceded a word which either contained an infrequent initial trigram (e.g., irksome) or a frequent initial trigram (e.g., trivial). 1.
The man hated to watch cricket. He found the sport irksome and boring. (Typical Infrequent)
2.
The man hated to watch hurling. He found the sport irksome and boring. (Atypical Infrequent)
3.
The man hated to watch cricket. He found the sport trivial and boring. (Typical Frequent)
4.
The man hated to watch hurling. He found the sport trivial and boring.
(Atypical Frequent) We predicted that subjects should spend longer reading the category noun after an atypical instance than after a typical instance. We also predicted that this effect
Landing position effects
207
should interact with the landing position on the following word such that when foveal processing load was light, subjects should fixate closer to the beginning of a word with an infrequent initial trigram than to the beginning of a word with a frequent initial trigram. Conversely, when foveal processing load was high, we anticipated no difference in landing position on the target word. Method Subjects Thirty-two subjects from the University of Nottingham participated in the experiment. Materials A prescreen experiment was conducted to obtain typicality ratings for the instances of each category used in the experiment. An analysis of variance (ANOVA) showed subjects gave higher typicality ratings to the typical instances than the atypical instances: F,(U3) = 147.026, p = 0.0001, M5e = 0.320; F2(l,35) = 267.360 p = 0.0001, MSe = 0.462. Using these instances and categories, four files of thirty-six experimental sentence pairs, along with forty intermixed filler sentence pairs were constructed. Each file contained nine items from each condition, such that only one form of each material appeared in each list. The experimental sentence pairs contained an instance of a category in the first sentence. This was either a typical instance or an atypical instance. The category was given in the second sentence. A seven letter word (the target word) immediately followed the category in the second sentence and either contained an infrequently occurring initial trigram, or a frequently occurring initial trigram (i.e., a manipulation of Type Frequency, see Chapter 7). These words were matched for length and syntactic class. The words with an infrequently occurring initial trigram had a first bigram frequency, a second bigram frequency, and a first trigram frequency of less than or equal to 9 in a sample of 20,000 words. The words with a frequently occurring trigram had a first bigram frequency, a second bigram frequency, and a first trigram frequency of greater than or equal to 20 in a sample of 20,000 words (Mayzner and Tresselt, 1965; Mayzner, Tresselt and Wolin, 1965). The sentence pairs were presented one above the other on a computer screen with a blank line between them, thereby minimising the possibility that readers could detect the category word in the second sentence whilst still reading the first sentence. Apparatus Subjects' eye movements were monitored using a SRI Dual Purkinje Generation 5.5 eyetracker produced by Fourward Technologies. The eyetracker has angular resolution of 10° arc. Subjects used both eyes to read, but the tracker monitored only the right eye. Materials were presented on a VDU at a distance of 70 cm from
208
S.P. Liversedge & G. Underwood
subjects' eyes. The VDU displayed four characters per degree of visual angle (cf. Rayner and Morris, 1992). The tracker monitored subjects' gaze location every millisecond and the software sampled the tracker's output to establish the sequence of eye fixations and their start and finish times. Procedure A bite bar and a head restraint were used in order to minimise head movements during the experiment. The eye-tracking system was then calibrated. When the subject was calibrated, the sentence pairs were presented one at a time on the screen. Four practice sentence pairs were displayed first, followed by a mixture of experimental and filler sentence pairs. The subject pressed a key to indicate that they had read the sentence and on a proportion of trials received a question to ensure comprehension. The experiment lasted approximately 45 minutes. Results For the analysis of the results, the experimental sentences were divided into four regions, indicated by the slashes as follows: The man hated to watch cricket. He found the/ sport/ irksome/ and boring. We computed first pass reading times for Regions One and Two. We defined first pass reading time as the sum of all fixations from the first fixation in a region until the point of fixation exited the region to either the left or the right. For Region Two, in which we anticipated differences in reading time due to anaphoric processing, we also considered the first fixation duration and also the duration of the last fixation in Region Two prior to direct fixation of Region Three. In addition to the reading time measures, we considered the landing position in Region Three. Trials where tracker loss occurred, and trials on which Re0ion One, Regions Two and Three, and Region Four had zero first pass reading times were excluded from the reading time analyses. This procedure removed 6% of the data. A 2(Typicality) x 2(Initial Trigram) ANOVA was carried out for these measures across both subjects (F,) and items (F2). The mean reading times for Regions One and Two and the landing positions and saccade lengths in Region Three are shown in Table 1. For the first pass reading times in Region One, a main effect of Typicality was observed: F,(l,31)=12.657,p = 0.0012,M5e = 9.299;F2(l,35)=16.062,p = 0.0003, M5e = 9.751. As expected in this region, there was no main effect of Initial Trigram (F,, F2 < 1) and no interaction between Initial Trigram and Typicality (F,, F2 < 1). Subjects' first pass reading times in Region One were longer when the region contained an atypical rather than a typical instance, even though at this point in the sentence the category word had not yet been read.
209
Landing position effects
Table 1 Mean first pass, and first fixation times for Regions One and Two, and landing positions for Region Three under the four conditions of Experiment One Measure
Typical Infrequent
Mean first pass reading time for Region One (ms/C) Mean first fixation duration in Region Two (ms) Mean first pass reading time for Region Two (ms/C) Mean last fixation duration in Region Two (ms) Mean landing position in Region Three (Chars)
Atypical Frequent
Infrequent
Frequent
37.3
37.1
39.2
39.0
240.9
227.5
230.1
232.9
26.9
27.9
27.2
27.2
228.5
228.3
224.9
224.2
4.99
4.91
4.70
4.67
There was no main effect of Typicality or of Initial Trigram on the duration of the first fixation in Region Two (F,, F2 < 1). There was also no interaction between the two (F,(l,31) = 1.566, p = 0.2202, M5e = 1348.470; F2(l,35) = 1.102, p = 0.3011, A/5e = 2294.340). Similarly, for first pass reading time in Region Two, there was no main effect of Typicality, or of Initial Trigram and no interaction between the two (all F < 1). Finally, an analysis of the duration of the fixation prior to direct fixation of Region Three showed no main effect of Typicality, no main effect of Initial Trigram and no interaction (all F< 1). Clearly, effects of typicality were observed in Region One, but no such effects were apparent for the reading time analyses of Region Two. For the landing position analyses, additional trials were removed from the data set. Those trials where subjects did not fixate Region Three during first pass reading and those trials where subjects skipped the category noun were excluded from the analysis. This procedure removed a total of 20.1 % of the data. We counted the space before the word as landing position 1. Analyses of variance showed no main effect of Initial Trigram (F,, F2 < 1). However, a main effect of Typicality was almost significant by subjects and items: F,(l, 31) = 3.787, p = 0.0608, M5e = 0.589; F2(l,35) = 3.545,/> = 0.068, MSt = 0.424. No interaction was found (F,, F2 < 1). These results suggest that while the reading times in Region Two were insensitive to Typicality effects, there was at least the suggestion that landing position on the following word was affected. The nature of the target word's initial trigram did not modulate this effect.
210
S.P. Liversedge & G. Underwood
Discussion In this experiment, the influence of foveal processing difficulty on the occurrence of the landing position effect in reading was examined. It was predicted that a category name would take longer to process when its antecedent was an atypical, rather than a typical, exemplar of that category. It was also predicted that when the category word was easy to process, the point of fixation would land closer to the beginning of a target word with an infrequently occurring initial trigram compared to a target word with a frequently occurring initial trigram. Following Henderson and Ferreira's (1990,1993) sequential attention model, and Hyona's (1995) attraction hypothesis, it was anticipated that increased foveal load would reduce the amount of information extracted from the parafovea and so reduce the extent to which a fixation was attracted by an infrequent letter string at the beginning of a word. While we observed no effect of typicality for the category noun, typicality did influence first pass reading times for Region One and also the landing position in Region Three. Clearly, the typicality of the instance did affect how easily subjects found sentences to process. However, the lack of an effect of typicality on reading times for Region Two was somewhat surprising as such a result is in disagreement with the findings of Duffy and Rayner (1990). The effect of typicality found in Region One may be explained in three ways. First, perhaps subjects started reading the first sentence and somehow, either with or without direct fixation, detected the category word in the second sentence at this point. We feel that this possibility is extremely unlikely because the sentences were spaced a blank line apart thereby reducing the possibility that subjects could have detected the category word without direct fixation. Furthermore, the first pass reading times for Region One were quite long (mean = 38.2 ms/C). However, this would not have been the case if subjects had directly fixated Region Two because such a saccade would have transgR.>sed a Region One boundary thereby cutting short first pass reading times for that region. A second possibility is that when the instance word in the first sentence is encountered, information concerning its category becomes available automatically from its lexical representation. For example, the representation of the word cricket may contain information concerning membership of the category sport and hence such information may become available automatically. If this is the case, then the atypicality of an instance could cause the increased reading times without subjects necessarily reading the category noun. Hence, the longer reading times may have been due to the category of the instance becoming available when the instance itself was lexically accessed. Alternatively, because typicality and frequency are highly correlated, the differences in reading time observed over Region One may arise due to the lower frequency of the atypical instances compared to the typical instances. However, frequency effects are usually short lived and it would be a little surprising
Landing position effects
211
to see frequency effects spill over several words downstream. Whatever the reason for the difference, it is clear that reading times for Region One were affected by the typicality of the instance prior to subjects encountering the category noun. Whilst typicality did not influence reading times for Region Two, there was an unpredicted marginal effect on the landing position on the target word following the category noun. The initial fixation on the target word was closer to the beginning when the instance was atypical than when it was typical. This finding may offer support to Henderson and Ferreira's view that when readers experience a light foveal processing load they are able to extract more information from the parafovea than when foveal processing load is heavy. If processing of the fixated word is difficult, then parafoveal preview benefit will be reduced. When a reader makes an eye movement they may fixate a position in a word that coincides with the point at which they need to obtain further information about that word. When foveal processing load is light, readers may process the first few letters of the subsequent word and therefore when they fixate that word they may fixate farther into it, to a point where they can gain new information. In contrast, when foveal processing is heavy the reader may fixate closer to the beginning of the following word because less information would have been extracted from the subsequent word. The lack of an effect of typicality on reading times of the category words in the present study is unlikely to be due to the typical and atypical instances used not being perceived as typical and atypical by the subjects. Pre-screening, together with the typicality effect on first pass reading times for Region One and on the landing position on Region Three argue against this explanation. However, in our materials the category word was not always part of a simple definite noun phrase whereas in the Duffy and Rayner study it was. Some of our items contained demonstrative noun phrases and some were adjectival definite noun phrases. Hence, it is possible that the referential effects we observed were slightly delayed until the eye left the category noun to the right at which point referential processing affected fixation locations on the target word. The landing position in Region Three was uninfluenced by the nature of the initial trigram of the target word with no observable landing position effect. This result suggests that under these experimental conditions, the point of fixation was not attracted towards the infrequent letter strings. The results of this experiment were not entirely as anticipated. The manipulation of foveal load was not localised to Region Two as we expected and although we found some evidence suggesting a landing position effect, it was due to the manipulation of typicality rather than to the nature of the initial trigram of the target word. We therefore ran a second experiment to see if we observed similar effects when we kept the characteristics of the target words the same, but changed the manipulation of foveal processing load.
212
S.P. Liver sedge & G. Underwood
Experiment Two: Imposing a foveal load with gender role typicality In Experiment Two, we required an alternative means of manipulating foveal processing load prior to the reader directly fixating the target word. Once more we require that this manipulation is localised to a short region of a sentence but does not cause the reader so much disruption that they have to make a regressive saccade in order to re-read the sentence. We also require that content differences in the sentence fragment up to the target word are minimised. Kerr and Underwood (1984) reported an experiment in which they manipulated gender of a pronoun so that it was either congruous or incongruous with the stereotypical gender of an antecedent noun phrase in the preceding sentence (see also Carreiras, Garnham, Oakhill and Cain, 1996). Kerr and Underwood constructed passages with three sentences, the first of which contained a noun phrase such as the surgeon, which has a strong stereotypical gender associated with it. The third sentence contained a pronoun which referred to the antecedent noun phrase and either matched or mismatched its stereotypical gender. Kerr and Underwood found that subjects spent less time initially fixating the pronoun when it matched the stereotypical gender of the antecedent noun phrase than when it did not. Such a manipulation would appear to be ideal for Experiment Two. This would be particularly so if a possessive pronoun was used, as her and his have the same number of letters, thereby minimising differences in the region before the target word. We therefore incorporated this manipulation in Experiment Two. As before, we attempted to determine whether a reader's point of fixation landed closer to the beginning of a word when it had an infrequent initial trigram compared to a frequent initial trigram. We also tested whether such a landing position effect was modulated by the ease with which the preceding word was processed. Sentences like (5)-(8) below were constructed. The sentences contained a possessive pronoun which was either congruous or incongruous with the stereotypical gender of an antecedent noun phrase in the preceding sentence. The pronoun immediately preceded a word which either contained an infrequent initial trigram (e.g. abysmal) or a frequent initial trigram (e.g. bermuda). 5. The football coach frequently wore outrageous clothing. Due to their bright colours his abysmal shorts looked ridiculous. (Congruous Infrequent) 6. The football coach frequently wore outrageous clothing. Due to their bright colours her abysmal shorts looked ridiculous. (Incongruous Infrequent) 7.
The football coach frequently wore outrageous clothing. Due to their bright colours his bermuda shorts looked ridiculous. (Congruous Frequent)
Landing position effects
8.
213
The football coach frequently wore outrageous clothing. Due to their bright colours her bermuda shorts looked ridiculous. (Incongruous Frequent)
The predictions in Experiment Two were the same as those for Experiment One. We anticipated an interaction between the congruity of the gender match and the nature of the target word. More specifically, when the stereotypical gender of the antecedent noun phrase matched that of the pronoun we predicted a landing position effect on the target word with initial fixations landing closer to the beginning of target words with an infrequent initial trigram compared to target words with a frequent initial trigram. However, when the stereotypical gender of the antecedent noun phrase did not match that of the pronoun we expected no difference in landing positions on the target word. Method Subjects Thirty-six subjects from the University of Nottingham participated in the experiment. Materials Four files of twenty four experimental sentence pairs, along with forty intermixed filler sentence pairs were constructed. Each file contained six items from each condition, such that only one form of each material appeared in each list. The experimental sentence pairs contained an antecedent noun phrase in the first sentence with a strong stereotypical gender cue associated with it. In the second sentence a possessive pronoun which either matched or mismatched the stereotypical gender of its antecedent immediately preceded the target word. The target words had the same characteristics as those used in Experiment One. Apparatus and procedure The apparatus and procedure were identical to that of Experiment One. Results The experimental sentences were divided into four regions, indicated by the slashes as follows: The football coach frequently wore outrageous clothing. Due to their bright colours/ his/ abysmal/ shorts looked ridiculous. We computed first pass reading times for Regions One and Two and the landing position in Region Three. The reading time definitions remained the same as for
214
S.P. Liver sedge & G. Underwood
Table 2 Mean first pass, and first fixation times for Regions One and Two, and landing positions for Region Three in the four conditions of Experiment Two. Measure
Congruous
Incongruous
Infrequent
Frequent
Infrequent
Frequent
2617.3
2728.0
2644.3
2667.4
Mean first fixation duration in Region Two (ms)
91.8
88.9
100.1
86.6
Mean first pass reading time for Region Two (ms/C)
97.3
94.7
107.2
91.0
Mean first pass reading time for Region One (ms)
Mean landing position in Region Three (Chars)
3.96
4.08
4.23
3.93
Experiment One. A 2(Gender Congruity) x 2(Initial Trigram) ANOVA was carried out across subjects (F,) and items (F2). The mean reading times for Regions One and Two and the landing positions in Region Three are shown in Table 2. For the first pass reading times in Region One, there was no effect of Gender Congruity (F,, F2 < 1). There was also no effect of Initial Trigram (F, < 1; F2( 1,23) = 1.358, p > 0.05, MSC = 68764.7) and no interaction between Initial Trigram and Gender Congruity (F,, F2 < 1). Somewhat surprisingly, there was also no effect of Gender Congruity or of Initial Trigram on the duration of the first fixation on the pronoun. There was also no interaction between the two (All F < 1.1). Similarly, for first pass reading time in Region Two, there was no main effect of Initial Trigram (F,(l,35) = 1.295, p > 0.05, M5e = 2469.6; F2(l,23) = 1.058, p > 0.05, Mse = 2913.4), no main effect of Gender Congruity (F,, F2< 1) and no interaction between the two (F,, F2 < 1). Clearly, we did not obtain the effects of Gender Congruity on the pronoun as anticipated. However, the mean first fixation durations and first pass reading times for Region Two were quite short. This was due to subjects skipping the pronoun on a large proportion (59.3%) of trials. We therefore repeated the ANOVAs on the data, replacing the zero first fixation and first pass reading times with the global mean, but obtained very similar patterns of effects to those obtained with the zeros included. Consequently, we recomputed the first fixation and first pass reading times for a redefined Region Two which included the word preceding the pronoun. Our rationale for the redefinition of Region Two was that if subjects were skipping the pronoun, then they were probably processing it when they were fixating the word to its left. Therefore, on those occasions when subjects skipped the
Landing position effects
215
Table 3 Mean first pass, and first fixation times for the two-word Region Two in the four conditions of Experiment Two Measure
Congruous
Incongruous
Infrequent
Frequent
Infrequent
Frequent
194.9
203.2
187.6
202.8
Mean first pass reading time for the 259.1 Extended Region Two (ms/C)
271.2
268.1
275.6
Mean first fixation duration in the Extended Region Two (ms)
pronoun, the fixation(s) which could possibly show an effect of Gender Congruity were not being included in our first set of analyses for the one word Region Two and we may therefore have missed an effect. The mean first fixation durations and first pass reading times for the redefined Region Two are given in Table 3. Despite the fact that redefining Region Two reduced the number of skipped trials (to 11.6%), the results were very similar to those obtained when only the pronoun was included. There was no effect of Gender Congruity or of Initial Trigram on the duration of the first fixation on the region. There was also no interaction between the two (all F < 1.1). Similarly, for first pass reading time in Region Two, there was no main effect of Initial Trigram, no main effect of Gender Congruity and no interaction between the two (all F < I)1. For the landing position analyses, we removed those trials where subjects did not fixate Region Three during first pass reading and those trials where subjects skipped the enlarged two word Region Two were excluded from the analysis. As before, the space before the word was landing position 1. Analyses of variance showed no main effect of Initial Trigram (Fv F2 < 1), no main effect of Gender Congruity (F}, F2 < 1) 1 Although the sentences were necessarily different in the target region and were therefore not controlled for sense-semantic plausibility, we did compute total reading times to check whether there was an effect of gender congruity at all. Interestingly, these analyses showed highly significant effect of gender congruity on Regions Two (F,(l ,36) = 10.8, p < 0.01, MSC = 7213.0; F2(l,24) = 10.3,;? < 0.01, M5e = 4219.4), a marginal effect on Region Three (F,(l,36) = 2.8, p< 0.11, M5e = 23865; F2( 1,24) = 5.2, p< 0.05, MSe = 9033.0), and a significant effect on Region Four (F,(l,36) = 6.58, p < 0.05, M5e = 98827.9; F2(l,24) = 5.84, p < 0.05, MSC = 66424.9). While we interpret these findings with caution, the implication is that in this experiment, stereotypical gender information associated with an antecedent noun phrase had an effect on possessive pronoun resolution, but this occurred relatively late during sentence comprehension.
216
S.P. Liversedge & G. Underwood
and no significant interaction between the two (F,(l,35) = 1.910, p > 0.05, MSe 0.851; F2(l ,23) = 1.077, p > 0.05, MSe = 0.343). As with the findings of Experiment One, these results suggest that the frequency of a word's initial trigram did not influence the location of a reader's initial fixation on the word. We also conducted some exploratory analyses, in response to our failed attempt to cause a variation in foveal load. To reiterate, following the appearance of a gender-associated noun phrase (e.g., football coach) a subsequent congruous stereotypical possessive pronoun (his) was intended to induce a low foveal load, thereby allowing greater parafoveal processing compared with when a non-stereotypical pronoun (her) appeared. As we have seen, this manipulation had minimal effects on our measures of early processing difficulty. Gender Congruity had no effect upon the duration of the first fixation on the pronoun, or upon the first pass reading time, for analyses in which the region on and around the pronoun was defined in different ways. We therefore conducted an analysis of the data in which we considered a sub-sample of the data obtained in the experiment. Within each of the four conditions of the experiment (congruency of pronoun vs. frequency of the initial trigram of the critical word) we selected the single sentence in which the reader gave the least visual attention to the redefined Region Two (i.e. the shortest gaze duration), and the single sentence receiving the greatest amount of attention in Region Two (i.e. the longest gaze duration). Selecting a subset of the data in this manner prevented us from computing item analyses, and therefore, only subjects effects are reported below. The landing position on the critical word was obtained for these four conditions for each subject. The mean landing positions are given in Table 4 and are shown in Fig. 1. An ANOVA applied to these landing positions revealed no main effect of Gender Congruity (F < 1) and no main effect of Initial Trigram (F < 1), in line with results from the original analyses of the complete data matrix. There was a main effect of gaze duration on Region Two (F(l,35) = 9.412, p < 0.01, MSe = 25.087), with the first fixation on the critical word being further into that word (4.66 characters) when Region Two received more attention than when it received less attention (4.07 characters). Importantly, Initial Trigram interacted with Visual Attention (F(l ,35) = 7.158, p < 0.05, MSC - 23.920; see Fig. 1). Pairwise comparisons showed that when foveal attentional demand was low in Region Two, there was a difference between the landing position on critical words containing frequent and infrequent trigrams (p < 0.05). Subjects fixated farther into words containing frequent trigrams than words containing infrequent trigrams. However, when foveal attentional demand was high, the reader fixated approximately the same position in the word regardless of whether the initial trigram of the word was frequent or infrequent (p > 0.05). While the analyses described above provide an interesting pattern of data, they must be interpreted with care. In order for us to make the comparison of landing positions for sentences in which foveal processing demand was high or low, it was
Landing position effects
217
Table 4 Mean saccade lengths into Region Three in the four conditions of Experiment Two Measure
Congruous
Incongruous
Infrequent
Frequent
Infrequent
Least attention to Region Two. Mean saccade length from Region Two (character spaces)
10.53
10.17
10.53
9.00
Greatest attention to Region Two. Mean saccade length from Region Two (character spaces)
9.69
9.69
9.56
10.11
Frequent
Fig. 1. Mean landing positions in Region Three for the selected trials for the four conditions of Experiment Two.
218
S.P. Liver sedge & G. Underwood
necessary to discard a large proportion of the trials contributing to the two conditions. It is quite possible that some of the sentences might have been particularly easy to understand, and others particularly difficult in the high and low foveal load conditions. It seems most unlikely that the sentences that we constructed were equally difficult to read. If it is the case that some sentences caused long reading times because they were intrinsically difficult, and others cause short reading times because they were intrinsically easy, then the sentences will not be evenly represented in the analysis reported above. In such a situation the landing position effect could have been caused by specific sentences contributing to the cells containing the trials in which subjects experienced least difficulty, and other sentences contributing to the trials in which subjects experienced most difficulty. In other words, we would not be able to say that the effect generalised across the target words used in the study to any extent. To check whether this was a problem we compared the number of times that individual sentences appeared in these two cells. If region two of certain sentences always receives short fixations, and region two of other sentences always received longer fixations, then we should expect a negative correlation between the frequencies with which individual sentences appear in the two cells. The correlation emerging from this comparison was not reliable (r(24) = +0.134, p > 0.1), suggesting that there was no systematic bias in the sentences entered into the comparison of least and greatest attention in Region Two. A second possible problem with the analysis of landing positions in Region Three is that the differences were a product not of selected landing positions, but of differences in the location of the previous fixation. If saccadic movements were of a constant amplitude, then the landing position effect would be attributable to the position from which the eye movement was launched. Accordingly, we conducted an analysis of the length of a saccade into the region, using the same factors as were used in the analysis of landing positions. The mean saccade lengths are presented in Table 4, and are represented as character spaces from the landing position in Region Three. An analysis of variance found no main effects of the amount of attention given to Region Two, of Gender Congruity, or of the Initial Trigram of the critical word in Region Three. There were no interactions. These results suggest that the difference in landing position could be due to differences in launch position. Discussion The results of Experiment Two are quite interesting. The failure to find effects of Gender Congruity on first pass reading time measures may be explained in two ways. Either readers do not make an anaphoric assignment immediately upon encountering a possessive pronoun. If this was the case, then we would not anticipate that they would detect any gender incongruity until later in the sentence.
Landing position effects
219
Alternatively, it is possible that readers make an anaphoric assignment, but stereotypical gender information associated with the antecedent noun phrase is not made available until the point of fixation has left the pronoun. If either of these interpretations are correct, then they are at odds with the findings of Kerr and Underwood (1984). Given the failure to find effects of gender congruity, we are again unable to make assertions regarding whether sensitivity to orthographic information in the parafovea was modulated by foveal processing demand. Consistent with Experiment One, we found no main effect of initial trigram frequency. Again, the basic analyses showed that readers landed on about the same position in target words with both frequent and infrequent initial trigrams. The exploratory analyses produced a rather more interesting pattern of effects, however. Although we must interpret the data with care, it does appear that when we consider only those trials in which there was a low foveal processing load, we do obtain a pattern of landing position effects suggesting that a reader's point of fixation may be attracted to an infrequent initial trigram. Since the same effects did not occur when foveal processing load was high, the exploratory analyses suggest that sensitivity to orthographic information in the parafovea may be modulated by foveal processing load. Such data provide support for Henderson and Ferreira (1990). Conclusion To conclude, we conducted two studies to determine whether the point of fixation was attracted to an infrequent letter string at the beginning of a word when it was first fixated. We also investigated whether the degree to which orthographic information was extracted from the parafovea is modulated by foveal processing demands. In both experiments the manipulation of foveal processing load did not influence first pass reading time measures. We were therefore unable to provide a rigorous test of whether foveal load modulates the extraction of orthographic information from the parafovea. The data of both experiments provide further evidence to suggest that landing position effects are not robust. However, the exploratory analyses of Experiment Two do at least suggest that under a light foveal processing load the reader's point of fixation may be attracted to orthographically infrequent strings of letters at the beginnings of words in the parafovea. Acknowledgements The authors would like to thank Rachel Huck for assistance with the collection of the data for Experiment One. We would like to thank Alan Kennedy, John Everatt for helpful comments on an earlier draft of this paper and the members of the
220
S.P. Liversedge & G. Underwood
University of Nottingham Language and Cognitive Processes group for useful discussions about the results of these experiments.
References Carreiras, M., Garnham, A., Oakhill, J. and Cain, K. (1996). The use of stereotypical gender information in constructing a mental model: Evidence from English and Spanish. Quarterly Journal of Experimental Psychology, 49A, 639-663. Duffy, S. A. and Rayner, K. (1990). Eye movements and anaphor resolution: Effects of antecedent typicality and distance. Language and Speech, 33, 103-119. Everatt, J. and Underwood, G. (1992). Parafoveal guidance and priming effects during reading: A special case of the mind being ahead of the eyes. Consciousness and Cognition, 1, 186-197. Garrod, S. and Sanford, A. (1977). Interpreting anaphoric relations: The integration of semantic information while reading. Journal of Verbal Learning and Verbal Behaviour, 16,77-90. Henderson, J. M. and Ferreira, F. (1990). Effects of foveal processing difficulty on the perceptual span in reading: Implications for attention and eye movement control. Journal of Experimental Psychology: Learning, Memory and Cognition, 16,417-429. Henderson, J.M. and Ferreira, F. (1993). Eye movement control during reading: Fixation measures reflect foveal but not parafoveal processing difficulty. Canadian Journal of Experimental Psychology, 47, 201-221. Hyona, J. (1995). Do irregular letter combinations attract readers' attention? Evidence from fixation locations in words. Journal of Experimental Psychology: Human Perception and Performance, 21, 68-81. Hyona, J. and Pollatsek, A., (In press). The role of component morphemes on eye fixations when reading Finnish compound words. Hyona, J., Niemi, P. and Underwood, G. (1989). Reading long words embedded in sentences: Informativeness of word parts affects eye movements. Journal of Experimental Psychology: Human Perception and Performance, 15, 142-152. Inhoff, A.W., Briihl, D. and Schwartz, J. (1996). Compound word naming in reading, online naming and delayed naming tasks. Memory and Cognition, 24 4 466-476. Inhoff, A.W., Pollatsek, A., Posner, M.I. and Rayner, K. (1989). Covert attention and eye movements in reading. Quarterly Journal of Experimental Psychology, 41 A, 63-89. Kennedy, A. (1998). The influence of parafoveal words on foveal inspection time: Evidence of a processing trade off. In: G. Underwood, Eye Guidance While Reading and Watching Dynamic Scenes. Oxford: Elsevier. Kerr, J.S. and Underwood, G. (1984). Fixation time of anaphoric pronouns decreases with congruity of reference. In: A.G. Gale and Johnson, F., (Eds.) Theoretical and Applied Aspects of Eye Movement Research. Amsterdam: Elsevier/North-Holland. Mayzner, M.S. and Tresselt, M.E. (1965). Tables of single letter and bigram frequency counts for various word length and letter position combinations. Psychonomic Monograph Supplement, 1, 2, 13-32.
Landing position effects
221
Mayzner, M.S., Tresselt, M.E. and Wolin, B.R. (1965). Tables of trigram frequency counts for various word length and letter position combinations. Psychonomic Monograph Supplement, 1,3,33-78. McConkie, G.W. and Rayner, K. (1976). Asymmetry of the perceptual span in reading. Bulletin of the Psychonomic Society, 8, 365-368. Morrison, R.E. (1984). Manipulation of stimulus onset delay in reading: Evidence for parallel programming of saccades. Journal of Experimental Psychology: Human Perception and Performance, 10, 667-682. Pollatsek, A., Bolozky, S., Well, A.D. and Rayner, K. (1981). Asymmetries in the perceptual span for Israeli readers. Brain and Language, 14, 174-180. Rayner, K., Inhoff, A.W., Morrison, R.E., Slowiaczek, M.C. and Bertera, J.H. (1981). Masking of foveal and parafoveal vision during eye fixations in reading. Journal of Experimental Psychology: Human Perception and Performance, 4, 529-544. Rayner, K. and Morris, R.K. (1992). Eye movement control in reading: Evidence against semantic preprocessing. Journal of Experimental Psychology: Human Perception and Performance, 18, 163-172. Rayner, K. and Pollatsek, A. (1989). The Psychology of Reading. Englewood Cliffs, NJ: Prentice Hall. Rayner, K., Raney, G.E. and Pollatsek, A. (1995). Eye movements and discourse processing. In: R.F. Lorch and E.J. O'Brien (Eds.), Sources of Coherence in Reading. Hillsdale, NJ: Erlbaum. Rayner, K., Sereno, S.C., Morris, R.K., Schmauder, A.R. and Clifton, C. (1989). Eye movements and on-line comprehension processes. Language and Cognitive Processes, 4, (3/4), SI, 21^9. Rayner, K., Well, A.D. and Pollatsek, A. (1980). Asymmetry of the effective visual field in reading. Perception and Psychophysics, 27, 537-544. Reichle, E.D., Pollatsek, A., Fisher, D.L. and Rayner, K. (1998). Toward a model of eye movement control in reading. Psychological Review, 105, 125-157. Underwood, G., Bloomfield, R. and Clews, S. (1989). Information influences the pattern of eye fixations during sentence comprehension. Perception, 17, 267-278. Underwood, G., Clews, S. and Everatt, J. (1990). How do readers know where to look next? Local information distributions influence eye fixations. Quarterly Journal of Experimental Psychology, 42A, 39-65. Underwood, G., Hyona, J. and Niemi, P. (1987). Scanning patterns on individual words during the comprehension of sentences. In: J.K. O'Regan and A. Levy-Schoen (Eds.) Eye Movements: From Physiology to Cognition. Amsterdam: North-Holland.
This page intentionally left blank
223
CHAPTER 10
Individual Differences in Reading and Eye Movement Control John Everatt, Mark F. Bradshaw and Paul B. Hibbard University of Surrey
Abstract As with other chapters in this book, this chapter emphasizes factors influencing eye movement control. It also discusses such control from the viewpoint of the process of reading and hypothesized linguistic influences on saccadic movements and fixations. However, it differs from other chapters by considering the effects of reading ability upon eye movement control. In terms of linguistic influences, the chapter presents evidence for initial fixations within words being affected by the informativeness/distinctiveness of sections within those words, evidence which also indicates that this effect is not influenced by semantic aspects of the text. In terms of reading ability, the chapter considers evidence for a relationship between the initial fixation location effect and reading ability in able readers. It further investigates such relationships by comparing eye movement control within reading disabled and matched control subjects. These data suggest that a number of individuals diagnosed as dyslexic show less influence of informativeness/distinctiveness ahead of fixation, and this is related to lower levels of reading ability within these subjects. The chapter concludes with a discussion of potential explanations for this relationship between reading deficits and the initial fixation location effect. The evidence presented indicates that the initial fixation location effect can be found within a wide range of different situations and, to some extent, is affected by variables commonly found to influence reading performance. In terms of the two factors which the referenced series of studies investigated, the initial fixation location effect seems to be more affected by the skill of the reader (at least within those with reading problems) than the semantic context within which informative beginning/ending words appear. Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
224
J. Everatt, M.F. Bradshaw & P.B. Hibbard
General introduction to the initial fixation location effect A large proportion of this book has concerned the processes that guide saccadic movements through a text — the processes guiding the eyes from one fixation to the next. This chapter concentrates on one potential influence upon this process, that of within-word letter sequences, or within-word informativeness. There is now substantial evidence indicating that the informativeness or distinctiveness of sequences of letters within words can influence the movements of the eyes within that word. For example, O'Regan et al. (1984) presented French subjects with words with unequal distributions of information. Informative parts of words were identified as the first or last six letters of the word, with these sequences of letters uniquely identifying those words within the reader's lexicon. Analogous words can be found in the English language. For example, in the word 'moralistic' the initial five letters are highly distinctive (there are very few other words that begin with this sequence of letters), but the final five letters are less rare. In contrast, a word such as 'supervisor' contains a common initial sequence of letters, but a distinctive final sequence. O'Regan et al. (1984) presented such words to subjects with the initial fixation imposed on certain letters within the word. If the initial fixation was positioned at the beginning of a word, words with informative beginnings were characterized by a long fixation in their first half, while words with informative endings showed a shorter fixation at the beginning, a saccade towards the end of the word, and a longer fixation at this informative ending. Subsequent studies by O'Regan and LevySchoen (1987) provided evidence that the 'convenient viewing position' was shifted slightly to the right in words with informative endings. The convenient viewing position was a term used by O'Regan (1981) to refer to the position within a word that, if fixated, produced shorter overall gaze durations. This was assumed to be because less processing was required at this position. Within words with informative beginnings, gaze durations were shortest around the letter just to the left of centre, while for words with informative endings the gaze durations were shortest at a position further to the right. Similarly, O'Regan and Jacobs (1992) argued that there is an 'optimum viewing location' within a word and presented evidence that word recognition processes are more efficient if the eyes are located on this point. Such findings suggest that there is a location which can be identified within a word which will lead to the optimal processing of that word and that this position is influenced by the distinctiveness of letter sequences within the word. This view also presents the possibility that if this position is not fixated then reading will not be optimum, and may therefore lead to less efficient reading processes — i.e., the process of locating this optimum viewing location may be related to variations in reading ability. However, both of these terms may be distinguished from the 'preferred viewing location' (the point where the eyes actually land in a word; see
Individual differences in reading
225
Rayner, 1979), suggesting that factors related to the optimal processing of a word are not the only influence on eye movements through a text. The evidence presented so far suggests that within-word fixation locations can be influenced by within-word informativeness; however, research by Underwood and colleagues suggests that within-word informativeness can influence saccadic movements from beyond a word boundary. In the initial studies of this effect, Underwood, Hyona and Niemi (1987) presented Finnish readers with words containing unequal distributions of information (distinctive beginnings or endings) within short passages which the subjects read for comprehension. As in the studies of O'Regan and colleagues, the locations of fixations within these words were influenced by these regions of informativeness. However, the critical finding within the Underwood et al (1987) study was that the initial fixation within a word was influenced by the region of informativeness; it being, on average, 2.33 character places to the left of centre for words with informative beginnings, compared to 1.68 character places for words with informative endings. Similar initial fixation location effects were also reported by Hyona, Niemi and Underwood (1989) and Hyona (1995) with Finnish readers, indicating that letter sequences within unfixated words can influence the position of subsequent fixations within those words. The controversy surrounding this effect is discussed elsewhere in this book. Rather than repeating this discussion, the present chapter considers evidence concerning one possible explanation for this effect, that related to the semantic pre-processing of unfixated words, and then goes on to contrast this with an alternative explanation in terms of orthographic structure. Semantic influences of the fixation location effect One of the explanations by Underwood et al. (1987) for the initial fixation location effect suggested that the morphemic composition of an unfixated word was identified by parafoveal processes and used in the guidance of a subsequent saccade. This has been understood to suggest that a saccadic movement into a target word is influenced by the semantic pre-processing of that target word; although this viewpoint has been challenged (see Rayner and Morris, 1992). The experiments discussed in the first half of this chapter (two of which have been published elsewhere: Underwood, Clews and Everatt, 1990; Everatt and Underwood, 1992), therefore, investigated semantic influences on the initial fixation location effect. The rationale is that if we can show influences of semantic features on the initial fixation location effect then this would provide supporting evidence for the underlying cause of this effect being some form of semantic pre-processing. The following studies incorporated two sets of multi-syllabic words (nine or more letters). The first set comprised words with distinctive or informative beginnings, while the second set possessed distinctive/informative endings. Such words
226
J. Everatt, M.F. Bradshaw & P.B. Hibbard
were obtained via pilot studies in which groups of subjects, different from those involved in the reported studies, but taken from the same college populations, were required to guess a word when only the first or last five letters were provided. The percentage of correct guesses varied due to the distribution of discriminating information (or redundancy) within the words (see Underwood et al., 1987, and Everatt and Underwood, 1992, for a description of the type of pilot study used to obtain these words). Words with distinctive beginnings were those where the subjects were correct more than 89% of the time when given the first five letters of the word, but less than 11% of the time when given the last five letters of the word. Words with distinctive endings were those where the subjects were correct less than 11 % of the time when given the first five letters of the word, but more than 89% of the time when given the last five letters of the word. (The procedures for selecting words means that the findings presented here could equally be considered in terms of redundancy of initial or final sequences of letters.) These words were then checked using forward and backward crossword dictionaries to ensure that they contained distinctive beginnings and endings (i.e., that there were no, or very few, other words with the same beginning/ending designated as distinctive). Each of the studies reported used a computer to present the informative beginning/ending words within short passages which the subject was requested to read. Informative beginning/ending words were positioned such that they did not appear at the beginning or end of a line (to avoid influences of regressions to the next line of text), or at the beginning or end of a sentence (to avoid influences of wrap-up processes); there were also no punctuation marks around these words (since punctuation may affect the pattern of fixations on these words). Subjects were informed that they would be asked questions regarding the passages they were reading at various points during the study. Any subject who could not answer these questions was deemed as having not read the passages, and their data were dropped from subsequent analyses. Separate sets of passages and related questions were also incorporated into the study to allow subjects to practice the procedures and to ensure accurate eye movement monitoring. Underwood et al. (1990) presented English subjects with sentences to read for comprehension, within which were positioned words with unequal distributions of information. The important manipulation for present discussion was that the prior context of sentences within which informative beginning and informative ending words were placed was held constant. Even with this manipulation, words with informative beginnings showed initial fixations closer to their beginning than words with informative endings, indicating that the initial fixation location effect is not due to the prior context of the sentence within which the word is embedded. There was also no effect of word type on the length of the saccade into the target word, suggesting that saccade length did not produce the fixation location effect; although further studies of this potential influence would be appropriate (see, e.g., Kennison
Individual differences in reading
227
Table 1 Examples of the preceding sentences and informative beginning/ending words, and the location of the first fixation within the informative beginning/ending words: locations are given in number of character spaces from the word centre, negative values indicate spaces to the left of centre. Data is based on that given in Underwood et al. (1990) Preceding sentence
informative beginning
informative ending
He miscalculated the
trajectory
multiplier
The students knew their
meddlesome
discomfort
First fixation location
-1.73
-0.35
Table 2 Examples of the contexts used to prime informative beginning and informative ending words; the words in bold are the target words Sentence prime:
It was a very strict school. Even though the schoolboy was only a few minutes late, he knew he would be sent to see the headmaster straight away.
Sentence neutral:
As the headmaster was leaving the school...
Related word:
We would have to sneak past the teachers and headmaster to buy ...
Neutral word:
We would have to sneak past the secretary and headmaster to buy ...
and Clifton, 1995; and Chapter 4 by Radach and McConkie). Table 1 presents examples of the stimuli used in Underwood et al. (1990) Although these findings suggest that the initial fixation location effect can occur independent of prior sentence context, they are uninformative about the influence of semantics on this effect. Therefore, in two subsequent studies (one published: Everatt and Underwood, 1992) we manipulated the context within which informative beginning and ending target words were embedded. This was achieved in two ways (see Table 2). In the first (involving 18 subjects), the general context of the sentence leading to the informative beginning/ending words either primed these target words or was neutral. In the second (36 subjects), the informative beginning/ ending words were preceded by a word with which they were either semantically related or unrelated; the remainder of the sentence context being held constant. (Words were counterbalanced across context and subjects to avoid repeated viewings of the same target words.) In both studies a statistically reliable initial fixation
228
J. Everatt, M.F. Bradshaw & P.B. Hibbard
Table 3 The average location (and standard deviations in brackets) of the first fixation within the informative beginning/ending words for the different types of preceding context. Fixation locations are given in number of character spaces from the word centre, negative values indicate spaces to the left of centre. (The top two rows of data are based on those presented in Everatt and Underwood, 1992) Preceding context
Informative beginning
Informative ending
Sentence prime
-0.81(1.25)
-0.11(1.42)
Sentence neutral
-0.39 (1.08)
0.28 (1.33)
Related word
-1.07 (1.35)
-0.07 (1.17)
Neutral word
-1.10(1.15)
0.17(1.43)
location effect was found (main effects of word type being significant in the first: F(l,17) = 10.68, p = 0.005, MSe = 0.8; and the second: F(l,35) = 28.22, p = 0.001, MSE = 1.63); however, in neither case was this affected by prior semantic context (interactions being non-significant, F < 1 for both studies); the initial fixation location effect was comparable with priming and unpriming prior context (see Table 3). The data presented suggest two conclusions: first, that the initial fixation location effect is robust to different manipulations of context prior to the target word, but that there is little evidence for it being influenced by the prior context. Given the rationale that the semantic content of sentences prior to a target word would influence the semantic pre-processing of that target word, such findings seem inconsistent with the view that the initial fixation location effect is produced by semantic pre-processing. Additionally, although these data cannot be taken as conclusive evidence against the possibility that the morphemic composition of a word is identified prior to its fixation and that this is the cause of the initial fixation location effect, a final study in this series suggests that this account is less likely than an alternative account proposed by Underwood et al. (1987). This alternative viewpoint suggests that rather than morphemic components within words influencing eye guidance, the letter sequences themselves lead to the initial fixation location effect, whether or not those letter sequences form a morphemic component. The two views differ therefore in terms of the level of the parafoveal influence of eye guidance through a text, the morphemic account suggesting that aspects of the meaning of a word will play an important role in eye guidance, whereas the alternative graphemic explanation suggests that distinctive letter sequences, independent of morphology, will guide saccades.
Individual differences in reading
229
Table 4 The average location (and standard deviations in brackets) of the first fixation within the informative beginning/ending words and for words with initial grapheme distinctiveness and final root morpheme. Fixation locations are given in number of character spaces from the word centre, negative values indicate spaces to the left of centre. Informative beginning
Informative ending
Grapheme/morpheme
-1.47(1.28)
-0.11(1.07)
-1.15(1.41)
We therefore conducted a fourth study involving two sets of long, multisyllabic words which contained informative beginnings or endings, similar to those reported in the studies above. However, we included a third set of words which varied in terms of the type of informativeness present in their initial and final sequences of letters. Words such as 'strawberry', 'drawbridge', 'alarmclock', etc., contain an initial letter sequence which is distinctive in terms of its graphemes, whereas the second half of the word is more informative in terms of its root meaning: e.g., berry being more informative as to the meaning of the word strawberry than straw. As in previous studies, all three sets of words were placed in passages which the subjects were required to read for comprehension; questions being asked of the 16 subjects to ensure reading. Similarly, target words were positioned as close to the centre of the screen as possible, and avoided punctuation marks. An initial fixation location effect was found within the data (F(2,30) = 4.57, p = 0.02, MSt = 1.76). Within the third category of words used, this was influenced by the distinctiveness of its graphemes, rather than its root morphemic composition (see Table 4). Both informative beginning words and words with distinctive initial graphemes and final root morphemes differed from informative ending words (p < 0.05 in both cases, based on Fisher post-hoc comparisons) but these did not statistically differ from each other. These findings, and those reported above, suggest that the initial fixation location effect is produced by distinctive letter strings within an unfixated word, independent on whether or not these letter strings form a morpheme (see Hyona, 1995, for similar findings).
Individual differences in the fixation location effect A second feature of the initial fixation location effect considered in this chapter is its relationship with reading ability. O'Regan and Jacobs (1992) proposed that fixating a certain location within a word produces optimal processing of that word. If this position is not fixated, reading will not be optimum. Although O'Regan and Jacobs
230
J. Everatt, M.F. Bradshaw & P.B. Hibbard
discussed this from the viewpoint of factors which influence lexical access, it is also possible that less-than-optimum reading caused by failing to fixate the optimal viewing location may be related to reading ability. We might therefore predict that individuals with superior reading ability should show more evidence of the initial fixation location effect. The degree to which each subject evidenced an initial fixation location effect was assessed via tasks identical to those described in the previous section of this chapter (see Everatt and Underwood, 1994). Subjects (36 in total) were required to read short passages within which informative beginning/ending words were positioned. Again, care was taken to avoid punctuation marks and the beginning/ending of lines. The location of initial fixations within these words will be reported. Passage reading was also assessed following the same method of requiring the subjects to answer questions about the passages, and the total time taken to read the passages was used as an indicator of overall reading speed, one of the measures of reading ability used within the study. Measures of text comprehension and single-word processing were also used to indicate individual differences in reading ability. Comprehension was measured via a variation of Form Y of the GAPADOL reading comprehension test (McLeod and Anderson, 1973) comprising the last two stories ("Brains as Machines" and "Is there Life on Mars?"), and requiring the subjects to provide single words which could be used to fill the gaps within the passages from which single words had been removed. Correct completion of the passages necessitates the comprehension of the text around the gaps, and provides a measure of comprehension (see Everatt and Underwood, 1994, for a discussion of the use of this measure). Single-word processing was assessed via a lexical decision task. One hundred and eighty words were selected from the Shapiro and Palermo (1968) norms. Each word contained three to six letters, and half were converted to pronounceable nonwords (e.g., the word "large" was changed to "larpe", and the word "web" was changed to "wab"). Nonwords contained, on average, the same number of letters and syllables as words. The words used to obtain the nonwords also had, on average, word frequencies similar to those of the word stimuli. Words/nonwords were presented by computer, subjects being required manually to indicate whether the letter string formed a word or nonword. The times taken to make correct word responses were recorded by computer and used as an indication of single word processing; nonword data are not reported here (see Everatt and Underwood, 1994), and the small number of incorrect responses negated their use. Correlations indicate little relationship between the initial fixation location effect and reading ability (see Table 5); correlations with comprehension, reading speed and single word processing being around 0.2 to 0.3 for informative beginning words, but around 0.1 to 0.2 with informative ending words. However, the larger correlations with informative beginning words suggest a relationship in the opposite
Individual differences in reading
231
Table 5 Correlations between initial fixation position within words with informative beginnings or endings, and reading speed (passage reading time), single word processing (LDT: words) and reading comprehension. (Data based on that reported in Everatt and Underwood, 1994) Informative beginnings
Informative endings
Passage reading time
-0.28
-0.18
LDT: words
-0.19
-0.09
Comprehension
0.29
0.09
direction to that expected; i.e., more negative scores (indicating fixations closer to the more informative beginning of the word) are related to slower reading speeds and lower gap comprehension scores. Our own view of the initial fixation location effect is that it is an aid to lexical access (see also O'Regan and Jacobs, 1992). If the most informative parts of a word can be located by an eye movement, then lexical access should be easier. Therefore, we expected those who show more information seeking behaviour to be more able readers due to their improved lexical access procedures. Finding little or no relationship between the initial fixation location effect and measures of text comprehension, single word decision times and text processing speeds does not support this view and suggest that the process of locating informative parts of yet to be fixated words is not related to reading ability within this population. Similarly, although there is evidence that the ability to process information beyond the level of the fixated word varies with developing reading skill in children (e.g., Rayner, 1986), there is scant evidence that the ability to extract information further into peripheral vision is related to increased reading ability in normal adult readers. Jackson and McClelland (1975,1979) presented adult subjects with the task of identifying letters at varying degrees of separation within central and peripheral vision. Performance in this task was unrelated to comprehension ability and reading speed. The size of the field within which a reader can extract information (often termed the perceptual span) does not appear to predict reading ability in normal adult readers. There is also little evidence to suggest differences in the size of the perceptual span between good and poor reading children (Underwood and Zola, 1986; but see Levinson, 1989). Poor reading ability does not appear to be related to under-developed, small perceptual spans. Additionally, Kennison and Clifton (1995) presented data suggesting that poorer readers benefit as much from parafoveal inspection of a word as more able readers. Here reading ability was determined by reading span, a measure which was related to differences in sentence reading times (those with lower spans were slower
232
J. Everatt, M.F. Bradshaw & P.B. Hibbard
readers), and has been shown to be related to reading comprehension ability (Daneman and Carpenter, 1980). Although Kennison and Clifton's study did not directly assess reading ability, their findings are consistent with the data reported by Everatt and Underwood (1994) if one considers that preview benefits are measuring similar variables to the initial fixation location effect. Based upon the above findings, it seems reasonable to conclude that processing non-fixated information (i.e., factors beyond the boundaries of the fixated word) will tell us little about variations in reading ability within a normally achieving population of readers. However, there is some evidence that a potentially different population of readers (those diagnosed as dyslexic) may show a different pattern of results. A number of researchers have presented evidence that dyslexics show evidence of abnormal processing of information outside of foveal vision. For example, Geiger and Lettvin (1987) found that dyslexics were more successful, compared to control subjects, at identifying letters presented in the periphery of vision; a finding which has subsequently been replicated (Geiger, Lettvin and Zegarra-Moran, 1992; Perry et al., 1989), and extended to include increased peripheral identification of colour (Dautrich, 1993; Grosser and Spafford, 1989). These findings suggest that the dyslexic's processing of parafoveal or peripheral information is abnormal, and that they may show greater preview benefits than non-dyslexic readers; however, such effects may be restricted to certain dyslexic individuals (see Rayner et al., 1989). Other findings, however, are less consistent with the enhanced peripheral processing interpretation. For example, Bouma and Legein (1977), and Klein et al. (1990) found little evidence for increased peripheral processing of information within dyslexics, whereas Goolkasian and King (1990) and Slaghuis and Pinkus (1993) found enhanced performance within dyslexics compared with controls only in those conditions which used embedded letters or briefly presented masked items respectively. Hence, preview effects may be determined by presentation conditions. Additionally, Solman and May (1990) found that dyslexics were poor indicators of the position of an item presented to peripheral vision, and Raymond (1995) has indicated that dyslexics performed less well than controls in tasks related to peripheral motion perception. These findings suggest poorer peripheral processing by dyslexics. Further research is obviously necessary to clarify these differences, but the findings suggest that conclusions regarding the processing of information beyond the centre of fixation may not apply to readers diagnosed as dyslexic, and differences related to the optimal viewing position have been found between normally reading children and those undergoing treatment for reading problems (Brysbaert and Meyers, 1993). The following study therefore compared the initial fixation location effect of dyslexics and non-dyslexics.
Individual differences in reading
233
Developmental dyslexia The term developmental dyslexia (we will use the term dyslexia) describes individuals defined primarily as those experiencing difficulties in acquiring literacy skills. Such difficulties with learning to read and poor spelling/writing ability are the main symptoms associated with dyslexia; however, dyslexics also experience problems with certain aspects of mathematics (Miles and Miles, 1992), and present abnormalities within certain perceptual and cognitive tasks (Miles, 1993; Thomson, 1990; Willows, Kruk and Corcos, 1993). Dyslexia is mainly considered as a childhood problem (e.g., see Fawcett and Nicolson, 1994); however, dyslexia-related problems often extend into adulthood (Bruck, 1993; Everatt, 1997; McLaughlin, Fitzgibbon and Young, 1994; Miles, 1993). Despite adult dyslexics often improving in reading ability to near normal adult levels, they often show continued poor spelling/writing performance and many of the perceptual and/or cognitive deficits found in childhood. The relevance for present purposes is that adult compensated dyslexics can be tested on their reading performance with the same passages as adult non-dyslexics without large numbers being rejected because they failed to reach the criteria for ensuring that they have read the passages. We therefore compared a group of adult dyslexics with a group of adult non-dyslexics on a passage reading task similar to those used in the previous studies outlined in this chapter. In total, 25 dyslexic and 23 non-dyslexic subjects were tested, but three of the dyslexic subjects and one non-dyslexic were excluded from the final analyses. The three dyslexic subjects were rejected due to their habit of fixating the end of a line of text and producing a series of leftward saccades to the beginning of the line, from which they then produced a series of rightward saccades. This reverse pattern meant that in many cases the initial fixation on a target word followed a leftward saccade. The single non-dyslexic subject was rejected due to poor scores on the comprehension and spelling measures used within the study (see below). The study requested subjects to read six passages and answer questions following each passage reading. Half of these passages contained words with unequal distributions of information, and the position of the initial fixation with these words was assessed. Questions relating to these passages (three questions per passage) were at a level to ensure reading of the passages, as in the previous studies outlined above. Any subject failing to answer more than one of these questions was rejected from the subsequent analyses. The other three passages were followed by a larger number (ten per passage) of more detailed questions and were selected to provide a distribution of scores indicative of comprehension ability (the use of these passages being based on a series of pilot studies on similar adult dyslexic and non-dyslexic subjects). Subjects were informed that different numbers of questions would be asked about the passages, and were given a practice passage and set of questions.
234
/ Everatt, M.F. Bradshaw & P.B. Hibbard
In addition to the above eye movement monitored task, subjects were assessed on (i) gap reading comprehension (the same variation on the GAPADOL test described above), (ii) single word and non-word naming (measures being the time to name 24 words or non-words), (iii) spelling ability (number of spelling errors from 85 words), and (iv) Raven's progressive matrices (number correct out of 36 items). Comparisons between the two groups on each of the measures can be found in Table 6. These indicated differences in spelling (F( 1,42) = 62.52, p - 0.0001, MSC = 158.82), non-word naming ability (F(l,42) = 13.9, p = 0.0005, MSe = 30.75), and both measures of reading comprehension (passage comprehension: F(l,42) = 5.48, p = 0.02, MSe = 35.86; gap comprehension: F(l,42) = 19.16, p - 0.0001, M5e = 12.83), but no difference between the groups in terms of single word naming (F < 1) and progressive matrices (F(l,42) = 2.4, p = 0.13, MSe = 29.5). Differences were also apparent in the initial fixation location effect; although there was little evidence of a difference between the groups in terms of the position of the initial fixation within informative beginning or ending words (F < 1 in both cases), there was some evidence of a difference between the groups in terms of the degree of the fixation location effect (F(l,42) = 3.9, p = 0.05, M5e = 2.9) (see Table 6). As a group, the dyslexics seem to show little effect of informative regions within a to-be-fixated word, however, they present a larger degree of variability in this measure, indicative of some showing an initial fixation location effect and others not. Correlations between the initial fixation location effect and the two reading comprehension measures indicate little relationship between these variables for normal readers (-0.01 for the passages reading measure and 0.13 for gap comprehension), but suggest that those dyslexics with higher reading comprehension scores show larger effects of unequal distributions of information in words ahead of fixation (-0.43 and -0.36 for passage reading and gap comprehension measures, respectively). There are a number of possible explanations for these differences between dyslexics and non-dyslexics. Evidence indicating differences between dyslexics and non-dyslexics in terms of the processing of information outside of the centre of fixation (e.g., Geiger and Lettvin, 1987) suggests the possibility that dyslexics may not be processing the to-be-fixated word to the same extent as the normal reader. This is consistent with explanations of the initial fixation location effect expressed in Chapter 9, if one assumes that foveal word processing is more resource demanding for the dyslexic than the non-dyslexic, leaving the dyslexic reader with fewer resources available for parafoveal word processing; a finding analogous to that for less experienced readers and the size of the perceptual span (see above: Rayner, 1986). The view that word reading is more resource demanding within the dyslexic is also consistent with recent evidence comparing single versus dual task performance in dyslexics and non-dyslexics (Nicolson and Fawcett, 1990). Such evidence has being used to argue that the dyslexic individual has fewer resources available to perform many common, everyday tasks, including word reading, possibly due to an
Individual differences in reading
235
Table 6 Mean scores (with standard deviations in brackets) for dyslexic and non-dyslexics subjects for the different measures: (i) for the position of the initial fixation within informative beginning and ending words, and the initial fixation location effect, which was calculated by subtracting the position of the first fixation within informative ending words from the position of the first fixation for informative beginning words — a more negative score indicating a larger effect of an information area within the word; (ii) passage reading comprehension, total score out of 30; (iii) gap comprehension; (iv) single word and non-word naming (both in seconds); (v) spelling ability (number of errors); and (vi) Raven's progressive matrices Measures
Dyslexic
Non-dyslexic
Informative beginning words
-0.90(1.44)
-1.20(1.22)
Informative ending words
-0.39(1.53)
-0.10(1.08)
Initial fixation location effect
-0.08 (2.23)
-1.09(0.96)
Passage reading comprehension
12.5 (6.2)
16.7 (5.8)
Gap comprehension
22.4 (3.9)
27.1 (3.2)
Word naming
24.6 (5.8)
23.9 (4.3)
Non-word naming
33.8 (6.7)
27.5(4.1)
Spelling
47.3 (15.4)
17.2 (8.9)
Raven's matrices
23.4(6.1)
25.9 (4.7)
automaticity deficit. A lack of automaticity within reading is resonant of findings of a reverse Stroop effect within the same population of adult dyslexics tested in the present chapter (see Everatt, 1997), with such abnormal interference effects possibly indicating a reduction of automatic word reading compared to colour naming or poorer control of the resources involved in word reading versus colour naming (see Everatt, Warner, Miles and Thomson, 1997). We are in the process of performing a series of further studies of the performance of dyslexics and non-dyslexics in these tasks to investigate the relationship between measures of automatic processing and the initial fixation location effect. An alternative explanation is that the dyslexic may be processing words to the right of fixation as much as non-dyslexics, but that saccades into that word are not as accurate as non-dyslexics. For example, the research of Stein and colleagues has suggested that dyslexics show abnormalities consistent with problems in precise eye movement control (see Stein, 1993, 1996; Stein and Walsh, 1997). This viewpoint has subsequently been linked to the proposal that dyslexia is caused by a visual deficit, a view which may link the eye movement control explanation of the
236
J. Everatt, M.F. Bradshaw & P.B. Hibbard
differences between dyslexics and non-dyslexics in the initial fixation location effect with an explanation in terms of parafoveal/peripheral processing deficits within dyslexics. As a syndrome, dyslexia is often viewed from the perspective of a language disability, and hence as a function of the perception, storage, and/or production of sound (e.g., Liberman et al., 1974; Snowling, 1995; Stanovich, 1988; Velluntino, 1979). However, visual perceptual abnormalities are also associated with dyslexia, many of which are considered to be related to deficits within the transient (magnocellular) visual pathway. The basis of the transient (magnocellular) deficit viewpoint is that the visual system comprises two interactive pathways: the transient or magnocellular (M) pathway, and the sustained or parvocellular (P) pathway. Although the transient/ sustained distinction derives from relatively older neuroanatomical studies of the visual system of the cat, and the magnocellular/parvocellular distinction derives from more recent studies of primates, a commonly held view is that the functions of the these systems are analogous, and comparable with the human visual system (e.g., see Breitmeyer, 1993). Consistent with this proposal, we will therefore treat as synonymous the transient and magnocellular (M) systems, and the sustained and parvocellular (P) systems. Anatomically, the two systems are most clearly distinguished in the lateral geniculate nucleus (the LGN), though M and P cells are also viewed as distinct at the level of the retina (e.g., Bassi and Lehmkuhle, 1990): P cells comprising the larger number of ganglion cells (approximately 80%), particularly within the central/ foveal region of the eye, with M ganglion cells being more evenly distributed across the retina. From the retina, both pathways lead to the LGN, which is situated between the retina and cortex, and comprises six layers of cells, two forming the M pathway, four the P pathway. Both pathways project to the primary visual cortex, from where they separate, the P pathway moving on to temporal cortex regions, the M pathway to the parietal cortex. These neuro-anatomical differences also lead to the pathways been termed temporal (P) and parietal (M) systems. The M and P systems are also considered distinct in terms of the processing they are designed for. The P pathway seems to respond to slowly changing (low temporal frequency) information, to more detailed stimuli (ie, higher spatial frequencies), and to colour (it seems to distinguish patches of different hue which have the same luminance). On the other hand, the M system seems to be more efficient with information of lower spatial but higher temporal frequencies; it seems to be more sensitive to gross detail, moving stimuli, and seems to be relatively insensitive to colour. Psychophysical and neurological evidence has been presented for the view that reading disabled individuals have an abnormally functioning transient system. The psychophysical data indicates that, compared to matched controls, reading disabled individuals show:
Individual differences in reading
(i)
237
poor contrast sensitivity for low spatial frequency stimuli (Martin and Lovegrove, 1984; Evans, Drasdo and Richards, 1994), which is consistent with deficiencies in the pathway responsible for processing low spatial frequency information; (ii) poor flicker sensitivity at high temporal frequencies (Martin and Lovegrove, 1987); consistent with the view that the M pathway is more responsive to high temporal frequencies; (iii) greater visual persistence for higher spatial frequency information (Badcock and Lovegrove, 1981; Slaghuis and Lovegrove, 1984); which is consistent with one view of an M on P inhibitory influence (e.g., Breitmeyer, 1993) suggesting that the M pathway is not inhibiting the responses of the high spatial frequency P pathway; and (iv) poorer motion perception (Cornelissen et al., 1995; Eden et al., 1996); again consistent with the hypothesized view of the functions of the M pathway. Neurological data which favour an M pathway viewpoint suggest cellular migration abnormalities within the M pathway of the dyslexic (Livingstone et al., 1991), and less activity within area V5 of dyslexic individuals during a motion detection task (Eden et al., 1996). Area V5 has been proposed as playing an important role in motion perception, and forms part of the parietal pathway of the M system. Evidence for the M pathway deficit viewpoint is therefore quite compelling; however, a major problem prevents it from becoming a more widely accepted causal factor in dyslexia theorizing. Despite the evidence for differences between reading disabled and reading able subjects in terms of the functions, and even the anatomy, of the M pathway, it is hard to see why deficits within a pathway which is responsible for processing high temporal, low spatial frequency images should lead to poor reading ability. Written words are often thought of as highly detailed visual stimuli, and certainly do not move; why should an M pathway deficit lead to poor reading ability? Two main theories have been proposed which argue that deficits within the M pathway leads to problems in precise eye movement control, or that the M pathway is responsible for specific encoding operations vital for efficient word encoding. For example, Breitmeyer (1980, 1993) has proposed that the two visual pathways mutually inhibit each other, with the normally functioning visual system involving a transient (M) system which inhibits the sustained (P) system at the initiation of a saccadic movement. Information recently processed by the P system is therefore removed, leaving the visual system free to process the next stimuli. A deficit within the M pathway would mean that the previously fixated word may still be within the P pathway, leading to interference with the newly fixated word and thereby poorer text reading ability; a proposal consistent with the data for increased visual persistence of higher spatial frequency information (see above). Also consistent with this
238
J. Everatt, M.F. Bradshaw & P.B. Hibbard
viewpoint, Slaghuis and Lovegrove (1984) found that if a high frequency flicker mask (which may reduce M pathway sensitivity) was used in conjunction with a visual persistence procedure, then control (non-dyslexic) subjects also showed increased persistence at low spatial frequencies. A potential problem with this viewpoint is that Burr, Morrone and Ross (1994) have presented evidence suggesting that within the normally functioning visual system, the M pathway is less efficient during saccadic movements, whereas the operations of the P pathway seem to be relatively unaffected by the movements of the eyes, and may even be enhanced. Such evidence contradicts the views of Brietmeyer (1980,1993) that the M pathway inhibits the functions of the P pathway during an eye movement. An alternative viewpoint has therefore been proposed for the effects of a deficient M pathway on reading ability. There are a number of interrelated variants on this view (see Chase, 1996; Williams, Brannan and Lartigue, 1987; Williams and LeCluyse, 1990), but all assume that the M pathway provides some form of initial analysis of information which is then built upon by the P pathway. Hence, in word reading, the M pathway may provide basic, more global information about a word which is then supplemented by the operations of the P pathway; though further research is needed to identify the exact nature of this basic/global information (see Chase, 1996). Williams and LeCluyse (1990) go on to argue that the former operations of the M pathway may be performed preattentively and used to 'direct the sustained subsystem to particularly salient areas' (p. 112). Such viewpoints are consistent with findings such as the initial fixation location effect, though further research is required to substantiate this connection. Finally, an association has also been made between measures of M pathway processes and precise control of the eyes. Evans, Drasdo and Richards (1996) found that dyslexic subjects showed deficits in tasks involving operations of the M pathway, such as flicker threshold, and binocular stability, and that these measures were related. This, and the hypothesized dominance of M projections to the posterior parietal cortex which is involved in the normal eye movement control, have led some researchers to argue that the deficits proposed within the M pathway of the dyslexic lead to visual processing deficits and poor eye movement control (see Stein and Walsh, 1997). This leads to the possibility that the abnormalities in the initial fixation location effect found within our adult dyslexics may be due to a combination of processing deficits outside the centre of fixation and poor control of the movements of the eyes to those locations. We are therefore conducting a second series of experiments comparing the precise eye movement control of adult dyslexics in reading (involving the initial fixation location effect) and non-reading tasks with measures of motion perception.
Individual differences in reading
239
Acknowledgements
Part of the research reported in this paper was supported by an MRC grant awarded to the first two authors. We are indebted to Keith Rayner, two other anonymous reviewers, and the book editors for comments to an earlier version of this chapter.
References Badcock, D. and Lovegrove, W.J. (1981). The effects of contrast, stimulus duration and spatial frequency on visible persistence in normal and specifically disabled readers. Journal of Experimental Psychology: Human Perception and Performance, 7, 496-505. Bassi, C.J. and Lehmkuhle, S. (1990). Clinical implications of parallel visual pathways. Journal of the American Optometric Association, 61, 98-110. Bouma, H. and Legein, Ch.P. (1977). Foveal and parafoveal recognition of letters and words by dyslexics and average readers. Neuropsychologia, 15, 69-80. Breitmeyer, B.C. (1980). Unmasking visual masking: A look at the why behind the veil of how. Psychological Review, 87, 52-69. Breitmeyer, B.C. (1993). Sustained (P) and transient channels in vision: A review and implications for reading. In: D.M. Willows, R.S. Kruk and E. Corcos (Eds.), Visual Processes in Reading and Reading Disabilities. Hillsdale, NJ: Erlbaum. Brysbaert, M. and Meyers, C. (1993). The optimal viewing position for children with normal and with poor reading abilities. In: S.F. Wright, and R. Groner (Eds), Facets of Dyslexia and its Remediation. North Holland: Elsevier. Bruck, M. (1993). Word recognition and component phonological processing skills of adults with childhood diagnosis of dyslexia. Developmental Review, 13, 258-268. Burr, D.C., Morrone, M.C. and Ross, J. (1994). Selective suppression of the magnocellular visual pathway during saccadic eye movements. Nature, 371, 511-513. Chase, C.H. (1996). A visual deficit model of developmental dyslexia. In: C.H. Chase, G.D. Rosen and G.F. Sherman (Eds), Developmental Dyslexia: Neural, Cognitive and Genetic Mechanisms. Baltimore: York Press. Cornelissen, P., Richardson, A., Mason, A., Fowler, S. and Stein, J. (1995). Contrast sensitivity and coherent motion detection measured at photopic luminance levels in dyslexics and controls. Vision Research, 35, 1483-1494. Daneman, M. and Carpenter, P.A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450-466. Dautrich, B.R. (1993). Visual perceptual differences in the dyslexic reader: Evidence of greater visual peripheral sensitivity to color and letter stimuli. Perceptual and Motor Skills, 76, 755-764. Eden, G.F., VanMeter, J.W., Rumsey, J.M., Maisog, J.M., Woods, R.P. and Zeffiro, T.A. (1996). Abnormal processing of visual motion in dyslexia revealed by functional brain imaging. Nature, 382, 66-69. Evans, B.J.W., Drasdo, N. and Richards, I.L. (1994). An investigation of some sensory and refractive visual factors in dyslexia. Vision Research, 34, 1913-1926.
240
J. Everatt, M.F. Bradshaw & P.B. Hibbard
Evans, B.J.W., Drasdo, N. and Richards, I.L. (1996). Dyslexia: The link with visual deficits. Ophthalmic and Physiological Optics, 16, 3-10. Everatt, J. (1997). The abilities and disabilities associated with adult developmental dyslexia. Journal of Research in Reading, 20, 13-21. Everatt, J. and Underwood, G. (1992). Parafoveal guidance and priming effects during reading: A special case of the mind being ahead of the eyes. Consciousness and Cognition, 1, 186-197. Everatt, J. and Underwood, G. (1994). Individual differences in reading subprocesses. Language and Speech, 37, 283-297. Everatt, J., Warner, J., Miles, T R. and Thomson, M.E. (1997). The incidence of Stoop interference in dyslexia. Dyslexia: An International Journal of Research and Practice, 3, 222-228. Fawcett, A. and Nicolson, R. (1994). Dyslexia in Children. New York: Harvester-Wheatsheaf. Geiger, G. and Lettvin, J.Y. (1987). Peripheral vision in persons with dyslexia. New England Journal of Medicine, 316, 1238-1243. Geiger, G., Lettvin, J.Y. and Zegarra-Moran, O. (1992). Task determined strategies of visual process. Cognitive Brain Research, 1, 39-52. Goolkasian, P. and King, J. (1990). Letter identification and lateral masking in dyslexic and average readers. American Journal of Psychology, 103,519-538. Grosser, G.S. and Spafford, C.S. (1989). Perceptual evidence for an anomalous distribution of rods and cones in the retinas of dyslexics: A new hypothesis. Perceptual and Motor Skills, 68, 683-698. Hyona, J. (1995). Do irregular letter combinations attract readers' attention? Evidence from fixation locations in words. Journal of Experimental Psychology: Human Perception and Performance, 21, 68-81. Hyona, J., Niemi, P. and Underwood, G. (1989). Reading long words embedded in sentences: informativeness of word parts affects eye movements. Journal of Experimental Psychology: Human Perception and Performance, 15, 142-152. Jackson, M. and McClelland, J. (1975). Sensory and cognitive determinants of reading speed. Journal of Verbal Learning and Verbal Behavior, 14, 565-574. Jackson, M. and McClelland, J. (1979). Processing determinants of reading speed. Journal of Experimental Psychology: General, 108, 151-181. Kennison, S.M. and Clifton, C. (1995). Determinants of parafoveal preview benefit in high and low working memory capacity readers: Implications for eye movement control. Journal of Experimental Psychology: Learning, Memory and Cognition, 21, 68-81. Klein, R., Berry, G., Brians, K., D'Entremont, B. and Farmer, M. (1990). Letter identification declines with increasing retinal eccentricity at the same rate for normal and dyslexic readers. Perception and Psychophysics, 47, 601-606. Levinson, H.N. (1989). Abnormal optokinetic and perceptual span parameters in cerebellarvestibular dysfunction and learning disability or dyslexia. Perceptual and Motor Skills, 68, 35-54. Liberman, I.Y., Shankweiler, D., Fischer, F.W. and Carter, B. (1974). Explicit syllable and phoneme segmentation in the young child. Journal of Experimental Child Psychology, 18, 201-212.
Individual differences in reading
241
Livingstone, M.S., Rosen, G.D., Drislane, F.W. and Galaburda, A.M. (1991). Physiological and anatomical evidence for a magnocellular deficit in developmental dyslexia. Proceedings of the National Academy of Sciences, USA, 88, 7941-7947. Martin, F. and Lovegrove, W.J. (1984). The effects of field size and luminance on contrast sensitivity differences between specifically reading disabled children and normal children. Neuropsychologia, 22, 73-77. Martin, F. and Lovegrove, W.J. (1987). Flicker contrast sensitivity in normal and specifically disabled readers. Perception, 16, 215-221. McLoughlin, D., Fitzgibbon, G. and Young, V. (1994). Adult dyslexia: Assessment, counselling and training. London: Whurr. McLeod, J. and Anderson, J. (1973). GAPADOL Reading Comprehension, Form Y. London: Heinemann Educational Books. Miles, T.R. (1993). Dyslexia: The Pattern of Difficulties (second edition). London: Whurr. Miles, T R. and Miles, E. (1992). Dyslexia and Mathematics. London: Routledge. Nicolson, RJ. and Fawcett, R. (1994). Reaction-times and dyslexia. Quarterly Journal of Experimental Psychology, 47A, 29-48. O'Regan, J.K. (1981). The convenient viewing position hypothesis. In: D.F. Fisher, R.A. Monty and J.W. Senders (Eds.), Eye Movements: Cognition and Visual Perception. Hillsdale, NJ: LEA. O'Regan, J.K. and Jacobs, A.M. (1992). Optimal viewing position effect in word recognition: A challenge to current theory. Journal of Experimental Psychology: Human Perception and Performance, 18, 185-197. O'Regan, J.K. and Levy-Schoen, A. (1987). Eye-movement strategy and tactics in word recognition and reading. In: M. Coltheart (Ed.), Attention and Performance XII: The Psychology of Reading. Hove: LEA. O'Regan, J.K., Levy-Schoen, A., Pynte, J. and Brugaillere, B.. (1984). Convenient fixation location within isolated words of different length and structures. Journal of Experimental Psychology: Human Perception and Performance, 10, 250-257. Perry, A.R., Dember, W.N., Warm, J.S. and Sacks, J.G. (1989). Letter identification in normal and dyslexic children: A verification. Bulletin of the Psychonomic Society, 27, 445-448. Raymond, J. (1995). Talk given to the Department of Psychology, University of Wales, Bangor. Rayner, K. (1979). Eye guidance in reading: Fixation locations within words. Perception, 8, 21-30. Rayner, K. (1986). Eye movements and the perceptual span: Evidence for dyslexic topology. In: G.Th. Pavlidis and D.F. Fisher (Eds.), Dyslexia: Its Neuropsychology and Treatment. New York: Wiley. Rayner, K. and Morris, R.K. (1992). Eye movement control in reading: Evidence against semantic preprocessing. Journal of Experimental Psychology: Human Perception and Performance, 18, 163-172. Rayner, K., Murphy, L.A., Henderson, J.M. and Pollatsek, A. (1989). Selective attentional dyslexia. Cognitive Neuropsychology, 6, 357-378. Shapiro, S.I. and Palermo, D.S. (1968). An atlas of normative free association data. Psychonomic Monograph Supplements, 2, 219-250.
T
Slaghuis, W.L. and Lovegrove, W.J. (1984). Flicker masking of spatial frequency dependent visible persistence and specific reading disability. Perception, 13, 527-534. Slaghuis, W.L. and Pinkus, S.Z. (1993). Visual backward masking in central and peripheral vision in late-adolescent dyslexics. Clinical Vision Science, 8, 187-199. Snowling, M. (1995). Phonological processing and developmental dyslexia. Journal of Research in Reading, 18, 132-138. Solman, R.T. and May, J.G. (1990). Spatial localization discrepancies: A visual deficiency in poor readers. American Journal of Psychology, 103, 243-263. Stanovich, K.E. (1988). Explaining the differences between the dyslexic and the gardenvariety poor reader: The phonological-core variable difference model. Journal of Learning Disabilities, 21, 590-612. Stein, J. (1993). Visuospatial perception in disabled readers. In: D.M. Willows, R.S Kruk and E. Corcos (Eds.), Visual Processes in Reading and Reading Disabilities. Hillsdale, NJ: Erlbaum. Stein, J. (1996). Visual system and reading. In: C.H. Chase, G.D. Rosen and G.F. Sherman (Eds.), Developmental Dyslexia: Neural, Cognitive and Genetic Mechanisms. Baltimore, MD: York Press. Stein, J. and Walsh, V. (1997). To see but not to read: The magnocellular theory of dyslexia. Trends in Neuroscience, 20, 147-152. Thomson, M.E. (1990). Developmental Dyslexia. London: Whurr. Underwood, G., Clews, S. and Everatt, J. (1990). How do readers know where to look next? Local information distributions influence eye fixations. Quarterly Journal of Experimental Psychology, 42A, 39-65. Underwood, G., Hyona, J. and Niemi, P. (1987). Scanning patterns on individual words during the comprehension of sentences. In: J.K. O'Regan and A. Levy-Schoen (Eds.), Eye-movements: From Physiology to Cognition. Amsterdam: North-Holland. Underwood, N.R. and Zola, D. (1986). The span of letter recognition of good and poor readers. Reading Research Quarterly, 21, 6-19. Velluntino, F. (1979). Dyslexia: Research and Theory. Cambridge, MA: MIT Press. Williams, M.C., Brannan, J.R. and Lartigue, E.K. (1987). Visual search in good and poor readers. Clinical Vision Sciences, 1, 367-371. Williams, M.C. and LeCluyse, K. (1990). Perceptual consequences of a temporal processing deficit in reading disabled children. Journal of the American Optometric Association, 61, 111-121. Willows, D.M., Kruk, R.S. and Corcos, E. (Eds.) (1993). Visual Processes in Reading and Reading Disabilities. Hillsdale, NJ: Erlbaum.
243
CHAPTER 11
Eye Movement Control in Reading: An Overview and Model Keith Rayner, Erik D. Reichle and Alexander Pollatsek University of Massachusetts
Abstract In this chapter, we review experiments dealing with eye movement control in reading, evaluating the factors that control the decisions about (a) where the eyes move and (b) when the eyes move. We briefly review models attempting to account for aspects of these data and then outline a computational model that we have implemented. This model provides a good fit to the eye movement data at the level of predicting which words are fixated and the individual fixation times on words.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
244
K. Rayner, E.D. Reichle & A. Pollatsek
Introduction There has recently been considerable debate about the characteristics of a model of eye movement control in reading. Two general categories of models have been proposed: (a) those that assign lexical processing or other ongoing comprehension processes a major role in influencing eye movements, versus (b) those that maintain that eye movements are mainly controlled by oculomotor factors and are only indirectly related to ongoing language processing. The first category, which we will refer to as processing models, includes a model proposed by Morrison (1984), with various modifications (Henderson and Ferreira, 1990; Kennison and Clifton, 1995; Pollatsek and Rayner, 1990; Sereno, 1992), as well as one proposed by Just and Carpenter (1980). The second category, which we will refer to as oculomotor models, includes the strategy-tactics model (O'Regan, 1990, 1992), as well as proposals of Kowler and Anton (1987) and McConkie et al. (1989). In this chapter, we will initially briefly review what we take to be the primary empirical facts that exist about eye movement control in reading. We will then discuss the models proposed by Morrison and by O'Regan. We will argue that a major limitation associated with each model is that they are qualitative, verbal descriptions that lack sufficient power to be precisely tested. We will then briefly describe some recent attempts to provide more quantitative models of eye movement control. The remainder of the chapter will then focus on a quantitative model of eye movement control (Reichle et al., 1998) that we recently developed.
Empirical data Understanding how eye movements are controlled in reading is important in devising a model of skilled reading (see Rayner and Pollatsek, 1989). Thus, it is not surprising that there has been considerable interest in this topic. Twenty years ago, many studies focused on the extent to which eye movements were controlled on a moment-to-moment basis (Bouma and deVoogd, 1974; Hochberg, 1975; O'Regan, 1979; Pollatsek and Rayner, 1982; Rayner and McConkie, 1976; Rayner and Pollatsek, 1981). The general consensus that emerged was that where readers look next (fixation location) and when they move to a new location (fixation duration) are relatively independent processes, but they are both on-line decisions (Rayner and Pollatsek, 1987). In this chapter, our primary focus is on the decision regarding when to move the eyes. However, we will first review research dealing with the decision about where to move the eyes. One important feature that distinguishes the processing models from the oculomotor models is the stance they take on the relationship between these two decisions. Processing models view the where and when decisions as
Eye movement control
245
relatively independent. In particular, they assume that the decision of when to move the eyes is primarily affected by linguistic variables and that fixation durations reflect on-line cognitive processing of language, whereas the decision of where to fixate (especially where within a word to fixate) is largely determined by low-level visual computations. In contrast, oculomotor models posit that lower-level oculomotor or visuomotor factors are the primary determinant of both the when and where decisions. For example, where the eyes are fixated in a particular word is viewed as an important factor affecting both how long the eyes remain fixated and where the eyes will go next. The oculomotor models do not necessarily posit that the where and when decisions are made at the same time; usually, the when decision is thought to depend on the outcome of the where decision, and it is in this sense that they are highly dependent. We will discuss these two classes of models in greater detail after presenting the research findings concerning the where and when decisions. Where to fixate next As the above characterization of the models indicates, there seems to be relatively widespread agreement that low-level visual information obtained on the prior fixation (much of it in parafoveal vision) is an important factor determining where to fixate next during reading. Specifically, most theories view word boundaries, defined by the spaces surrounding the fixated word and the next word in the text, as important visual information guiding eye movements. One piece of evidence for this is that when this low level visual information is not available (e.g., when spaces between words are removed), readers move their eyes a shorter distance than when such information is available (McConkie and Rayner, 1975; Morris, Rayner and Pollatsek, 1990; Pollatsek and Rayner, 1982; Rayner and Bertera, 1979; Rayner, Fischer and Pollatsek, 1998; Rayner and Pollatsek, 1981). Epelboim, Booth and Steinman (1994), however, recently challenged the assertion that we just made — that space information is important in guiding the eyes — and argued that the primary reason that unspaced text interferes with reading is that removal of spaces interferes with word identification. Their argument is largely based on the assumption that if readers normally relied on space information to guide the eyes during reading, they should be virtually helpless when spaces are removed, and the pattern of saccades should be altered drastically. Instead, they claim that the pattern of saccades is only minimally altered and that reading unspaced text is surprisingly easy. (However, removal of spaces does cut reading speed in half for most readers.) We agree with part of their claim — that word identification is interfered with when spaces are absent (Pollatsek and Rayner, 1982; Rayner and Pollatsek, 1996) and that one has to be careful about inferring that removing spaces is only interfering with eye guidance. However, recent experiments from our lab (Rayner et al., 1998) and data we review below make it clear that where
246
K. Rayner, E.D. Reichle & A. Pollatsek
the eyes move is guided by space information. For example, the length of the parafoveal word has been shown to strongly influence where the eyes initially fixate on a word (the initial landing position) and hence the length of the saccade into that word (Blanchard, Pollatsek and Rayner, 1989; O'Regan, 1979,1980, 1981; Rayner, 1979). More generally, oculomotor theorists have viewed the initial landing position on a word as perhaps the most important variable mediating reading. In their view, (a) a major oculomotor goal of the reader is to land on the middle of each word, and (b) the resultant oculomotor behavior on that word is chiefly determined by the success of the initial eye movement in achieving the goal of landing in the middle. The data on initial landing positions are reasonably consistent with the first part of the claim. Initial landing positions are quite systematic: readers tend to fixate about halfway between the beginning and middle of words (Dunn-Rankin, 1978; McConkie et al., 1988; O'Regan, 1981; Radach and Kempe, 1993; Rayner, 1979; Rayner and Fischer, 1996; Vitu et al., 1995; Vutu, O'Regan and Mittau, 1990). Rayner (1979) called this prototypical location the preferred viewing location. It should be quickly pointed out, however, that the preferred viewing location is a mean, and there is considerable variability in the initial landing position and that the histograms of initial landing positions tend to look like truncated Gaussian distributions; moreover, readers fixate on the spaces between words about 10% of the time. O'Regan and Levy-Schoen (1987) subsequently distinguished between the preferred viewing location and what is now called the optimal viewing position, which they posited was the center of the word. Presumably, the optimal viewing position is optimal because of visual acuity considerations (i.e., letters are harder to identify, the further they are from fixation). Hence, fixating in the middle of a word may be "best" because it minimizes the maximum distance a letter in the word would be from fixation. McConkie et al. (1988) presented a detailed analysis of a large corpus of data (see also Radach and Kempe, 1993; Rayner, Sereno and Raney, 1996) that indicates that the discrepancy between the preferred and optimal viewing positions may be largely explained by the distance between the center of a target word and the launch site (i.e., the location of the prior fixation). That is, the further the launch site is from the center of a word, the further to the left the mean landing position is. In addition, the further the launch site is from the center of the target word, the more variability there is in the landing position. This pattern is consistent with the hypothesis that the intended target is the center of the word, as much of the motor programming literature indicates that actual motor movements tend to undershoot the target, with both the amount of undershoot and random variability increasing, the larger the movement. The data reviewed so far are consistent with the hypothesis that the center of a word is the target for the initial fixation, but they don't compel the conclusion that the middle of a word is, in fact, the optimal viewing position. Extensive research
Eye movement control
247
efforts have examined the consequences of making fixations at locations other than this optimal viewing position (McConkie et al., 1989; Nazir, 1993; O'Regan et al., 1984; Vitu, 1991; Vitu et al., 1990) and for words in isolation, two general effects have been found, which have been referred to (see Rayner et al., 1996) as refixation and processing-cost effects. The refixation effect is that the further the eyes are from the optimal viewing position, the more likely it is that a refixation will be made on the word. The processing-cost effect is that there is a cost in fixation time on the word associated with initial fixation locations other than the optimal viewing position. Presumably, both effects occur because the less optimal the initial viewing position, the less information from a word can be extracted from the initial fixation, and hence (a) the greater the need to fixate it again, and (b) the more time needed to process it. (We will defer discussion of the processing cost effect until the next section, because it is largely relevant to the when rather than the where decision.) The reason that these effects have been chiefly studied in isolation is that the initial landing position can be experimentally manipulated (by calculating where a subject's fixation is and then presenting the position of a target word contingent on that fixation point). These experiments thus clearly establish that differences in initial landing position are causing these effects in these circumstances. What is less clear, however, is the importance of these findings for reading text. Both effects become quite attenuated when words in text have been examined, although the refixation effect still appears to be significant (Vitu et al., 1990). We will defer for the moment discussing why these effects attenuate in silent reading as the refixation effect still is significant in reading and indicates a second way in which low-level visual information guides the eyes. That is, from the evidence discussed earlier, it appears that most interword saccades are calculated to be directed to the middle of a word. The refixation effect suggests that refixations on a word may also be targeted to some place on a word without direction from cognitive processes. O'Regan (1992) posits that the target is the end of the word farthest from the current fixation, and there is some evidence that many refixations tend to be near the ends of words (Hyona and Pollatsek, 1998; Pynte, 1996; Rayner et al., 1996). However, many refixations also tend to be directed to the middle of a word as well. The refixation data raise a problem for a strong form of the oculomotor view. That is, what is a sensible strategy for refixating? If one has obtained information from the initial part of a word, then it makes sense to refixate near the end. However, if one has not obtained any useful information on the first fixation, then it makes sense to fixate in the middle. But if either strategy can be adopted, how does the reader decide where to refixate? Presumably, this would be on the basis of some cognitive operation. To this point, the data seem to suggest that where a reader's eyes land is completely guided by calculations of where words are and how long they are (presumably indicated by the locations of spaces) and the content of words appears to be irrelevant. In fact, Vitu et al. (1995) essentially attempted to demonstrate this
248
K. Rayner, E.D. Reichle & A. Pollatsek
by examining eye movements when readers scanned nonsense "text" (in which every letter in a coherent text was replaced with a "z" but where the spaces between words were preserved). They found that global aspects of the eye movement record (such as frequency distributions of fixation durations and saccade lengths) were quite similar to those when the same readers read text. They argued, on the basis of these data, that the similarity of the patterns suggested that predetermined oculomotor strategies were an important factor in eye movement control in reading. The problem with these analyses, however, is that they are too gross, and do not attempt to examine whether linguistic variables play a part. In a follow-up experiment, Rayner and Fischer (1996) replicated their findings, but showed that there were important differences between the two situations (reading text and scanning z-strings). There are several consistent findings about where the eyes land in real text that would not be observed in z-string text. One is that words that are low frequency in the language tend to be refixated more than high frequency words, even when word length is controlled (Rayner and Fischer, 1996; Rayner et al., 1996). Moreover, this finding holds even when initial landing position is controlled (Rayner et al., 1998). Thus, it appears that the decision of whether to refixate a word or to move on is clearly controlled by cognitive variables. Another is that the frequency with which a word is skipped is dependent on both its frequency in the language and on how predictable it is from the prior text (Balota, Pollatsek and Rayner, 1985; Ehrlich and Rayner, 1981; Rayner and Well, 1996). Thus, it appears that the decision of whether to fixate a word is also dependent on cognitive variables in addition to low level variables. To summarize, the data on where the eyes land seem to argue for the following factors. First, cognitive variables appear to be influencing gross decisions, such as which word to fixate (e.g., "Do I refixate this word?", "Do I skip this word?"), but the details of where to fixate on words appear to largely be left to lower level processors. The initial fixation location appears to be primarily determined by a low-level oculomotor strategy — fixating the center of a word (though there is bias and random error in carrying out the motor program). In addition, there is some evidence that the targeting of refixations may also be made on a similar basis. However, we need to stress that all the evidence indicates that there is considerable inaccuracy in programming saccadic eye movements. As a result, it is hard to pinpoint the causes of any particular eye movement. For example, there are undoubtedly words skipped because the eyes overshot their target; this would most commonly occur for short words (Blanchard et al., 1989; Rayner, 1979; Vitu et al., 1995). Likewise, there are undoubtedly words refixated because the eyes undershot their target; this would most commonly occur for long words (Hyona and Pollatsek, 1998; Rayner and Morris, 1992). (For a review and an alternative account of word skipping, see Chapter 6). A final issue for which the data are less clear is whether cognitive variables influence where on a word the eyes land. One question is whether semantic
Eye movement control
249
variables influence the initial landing position on a word. Several studies have attempted to get at this issue by using long words (10 or more letters) and varying whether the beginning or the end of a word is "informative" (i.e., the word part is low frequency in the language) or "redundant" (i.e., the word part is high frequency in the language). Some of these studies have suggested that the eyes move farther into a word when the informative portion is located at the end rather than the beginning of the word (Everatt and Underwood, 1992; Hyona, Niemi and Underwood, 1989; Underwood, Bloomfield and Clews, 1988; Underwood, Clews and Everatt, 1990). However, neither Rayner and Morris (1992) nor Hyona (1995) replicated the effect. Moreover, such effects, even if replicable, do not necessarily reflect semantic preprocessing: they could be due to either orthographic or lexical factors. Indeed, there is a suggestion that some type of orthographic processing of the initial letters of words may influence where readers initially fixate (Beavillain, Dore and Baudoiun, 1996; Hyona, 1995). A second question is whether refixation locations are influenced by cognitive variables. Hyona and Pollatsek (1998) demonstrated that the length of the initial morpheme in long compound words influenced the location of the second fixation on a word even when the total length of the word was controlled. They also demonstrated that the frequency of the initial morpheme influenced the pattern of refixations on the word. Thus, it appears that even decisions about where to fixate on a word are influenced by cognitive variables. Whether these influences are limited to very long words in languages that have productive systems for compounding (e.g., Finnish) is still not clear, however. When to move the eyes As indicated earlier, the question of what determines when we move our eyes is central to the distinction between models of eye movement control in reading. Because part of the motivation for the oculomotor models is that cognitive operations are relatively slow and thus unlikely to play a major role in deciding when the eyes move, we will first review some data arguing that this is not necessarily so. One paradigm that has been used to address the issue of how long it takes to encode the visual information involves presenting a visual mask at different intervals during reading (Ishida and Ikeda, 1989; Rayner et al., 1981; Slowiaczek and Rayner, 1987). These studies demonstrated that if text is exposed for longer than 50-70 ms on each fixation before being masked, reading proceeds quite normally. If the mask occurs earlier, however, reading is disrupted. This finding of course does not imply that words are identified within the first 50-70 ms of a fixation; it merely means that sufficient visual information can be extracted within the first 50-70 ms of a fixation so that cognitive operations can proceed normally even when all visual information is removed after that. However, when the visual information during a fixation is changed after the first 50-70 ms, there is disruption (Blanchard et al., 1984).
250
K. Rayner, E.D. Reichle & A. Pollatsek
The above evidence indicates that the completion of early stages of visual processing is quite rapid. There is another reason to expect that full word identification may be completed quite rapidly after a word is initially fixated: word identification often begins before the word is fixated. We have termed this phenomenon preview benefit, and it occurs both in situations where words are presented in isolation (Rayner, McConkie and Ehrlich, 1978; Rayner, McConkie and Zola, 1980) and in reading (Balota et al., 1985; Rayner et al., 1982). In isolated situations, this has usually been assessed by examining the time to name a word, and the findings are that word naming time can be speeded by up to 20-60 ms by seeing a preview of a word before it is fixated. Moreover, preview of a word that is either orthographically similar or phonologically identical to the target word also produces preview benefit (Pollatsek et al., 1992; Rayner et al., 1978); this indicates that preview benefit is not merely a result of full processing of the parafoveal word prior to fixating it and is instead largely some sort of integration of the processing of the word in the parafovea with processing of the word when it is later fixated. (We will come back to preview benefit in reading after we discuss the data on when decisions.) Above, we argued that it is not implausible that cognitive operations are fast enough to guide the decision of when to move the eyes. Perhaps more importantly, a considerable body of data has accumulated over the past twenty years which demonstrates that various lexical, syntactic, and discourse factors influence fixation times on words (for recent reviews, see Rayner, 1995; Rayner and Sereno, 1994). There is evidence that the fixation time on a word is affected on-line by the following psycholinguistic variables: word frequency, lexical ambiguity, semantic relationship, contextual constraint, syntactic complexity, anaphora, and coreference. The most common measure involved in establishing the above relationship is gaze duration, which is the sum of the fixation durations on a word before a saccade is made off of the word. This is clearly a composite measure, which is a function both of the durations of individual fixations and the number of fixations on a word. However, a second measure first fixation duration, which is the mean fixation of the initial fixation on a word, is also influenced by virtually all of the above variables. (Both measures are conditional on the word being fixated.) Thus, it appears that linguistic variables are involved in the decision about when to move the eyes. However, others (see O'Regan, 1990, 1992) have argued that lower level oculomotor factors are primarily responsible for controlling eye movements and hence fixation durations. As we have indicated earlier, one of the pieces of evidence for this assertion is that the probability of making a refixation is influenced by the initial landing position. As the number of refixations influences the mean gaze duration on a word, this suggests that oculomotor factors also influence fixation times on a word. More directly, Vitu et al. (1990) found that there was about a 20 ms cost in the gaze duration on a word for each letter that the initial
Eye movement control
251
landing position was from the optimal viewing point. However, this estimate came from examining words in isolation, and when words were examined in silent reading, there was no significant difference in gaze duration as a function of initial landing position (although there did appear to be a small residual cost for initial landing position being distant from fixation). To summarize, it has been shown that cognitive variables indexing psycholinguistic processing of the text have definite effects both on the time spent processing a word and the duration of individual fixations. This indicates that at least some cognitive operations are fast enough to influence the decision of when to move the eye. Moreover, a parafoveal preview of a target word reduces fixation times (both gaze durations and first fixation durations) on the word; thus one factor helping to speed these cognitive operations relative to the beginning of the initial fixation on a word is that processing for many words starts before they are fixated. There are some data, however, that indicate that lower level factors, such as the initial landing position, influence the duration of fixations. We will discuss many of these issues further in the next section.
Models of eye movement control A process model: Morrison In Morrison's (1984) model, at the beginning of each fixation, eye location and covert visual attention are oriented to the same location: the foveal word (word n). After foveal processing has reached some criterial level (such as some stage of lexical access), attention shifts to word n + 1 during the fixation. This shift of attention allows processing of word n + 1 to begin and also signals the eye movement system to prepare a motor program for an eye movement to word n + 1. Once the motor program is completed, it is executed and the eyes then make a saccade to that word. Because there is a lag between the attention shift and saccade execution due to programming latency, information continues to accumulate from word n + 1 before it is directly fixated. If word n + 1 is identified quickly, attention shifts again to word n + 2 before the eye movement is fully programmed. In this case, the eyes saccade to word n + 2, skipping word n + 1. Because a later program has cancelled an earlier one, the duration of the fixation prior to a skip is inflated compared to when the next word is not skipped (see Hogaboam, 1983; Pollatsek, Rayner and Balota, 1986). If the motor program to word n + 1 is more advanced, however, there will be either (1) a short fixation on word n + 1 followed by a longer fixation of word n + 2 or (2) a fixation located at an intermediate position between words n + 1 and n + 2. The model can thus explain two rather puzzling aspects of eye movement behavior in reading: (1) the fact that there are fixations that are much
252
K. Rayner, E.D. Reichle & A. Pollatsek
shorter than the 175-200 ms saccade latency in simple oculomotor tasks (Rayner et al., 1983) and (2) unusual landing positions (e.g., the space between words). One problem with Morrison's model is that there is no explanation for why words are sometimes refixated. That is, if lexical access is the trigger for attention shifts and hence eye movements, words should never be refixated. Some recent modifications of the model (e.g., Henderson and Ferreira, 1990; Sereno, 1992) incorporate a deadline for programming a saccade; if lexical processing has not reached a criterion level by the deadline, attention does not shift from the current word and it may be refixated. (We will indicate another possible explanation for refixations below.) Two other problems are: (a) that it cannot explain why there are regressions, and (b) it does not explain how "higher-order" psycholinguistic processes, such as anaphora, can influence eye movements. It also doesn't attempt to explain where people fixate on words. An oculomotor model: O 'Regan According to O'Regan's (1990, 1992; O'Regan and Levy-Schoen, 1987) strategytactics model, the initial landing position in a word chiefly determines how long the reader will remain fixated and where the next fixation is made. O'Regan proposed that readers adopt a global strategy (e.g., careful or risky reading) that coarsely influences fixation time and saccade length. He also proposed that readers implement local, within-word tactics that are based on lower-level nonlexical information available early in a fixation. It is the operation and control of these within-word tactics that are most relevant to the issue of when to move the eyes. If the initial landing position is optimal (near the word's middle), there will be a single fixation. However, if the initial landing position is in a non-optimal position, a refixation will generally occur and the fixation time for this refixation time is short and unaffected by any linguistic variable. Moreover, the probability that a word will be refixated does not depend on its lexical status, but on lower level visual factors such as the landing position in the word. Linguistic factors are thought to be slow and thus only can influence (a) the duration of single long fixations (presumably 300 ms or longer, see O'Regan, 1992), or (b) the second of two fixations in a refixated word. Thus, fixation times according to this scheme are mainly determined by oculomotor constraints. As our discussion of the eye movement data indicates, there are some problems with the model. Most specifically, the claim about linguistic effects being limited to long single fixations or second fixations on a word is wrong: (a) the first (of two or more) fixations on low frequency words are longer than on high frequency words (Rayner et al., 1989, 1996); and (b) frequency effects on fixation durations do not only show up in the upper tails of the distributions. However, the model does point to certain ways in which lower-level oculomotor variables influence eye movements. As
Eye movement control
253
we indicated earlier, however, the fact that most of these effects are substantially weaker in reading than when people examine individual words suggests that they are less important in reading than the model would suggest. We think there are several possible reasons for why these oculomotor variables are less important in reading than in more controlled studies. First, we think that attentional strategies may be very different in the two situations. In the more controlled studies, people are asked to make (and maintain) a fixation to a particular location prior to presentation of the target word. This could produce a narrowing of the attentional focus around fixation that is quite atypical of reading. Second, there is no parafoveal preview information in such studies. This would slow down lexical processing and make oculomotor factors more likely to predominate. More generally, if the precise location of eye fixations (e.g., the initial landing position on a word) were as important for reading as posited by these models, it would seem unlikely that they would be as variable as they are. Which model is right? Although we are clearly in greater sympathy with the process model approach, it should be obvious that neither type of model has a monopoly on truth. First, neither type of model is adequate to explain all of the data. Morrison's model, for example, simply does not attempt to explain where readers land on words. On the other hand, as we have argued above, O'Regan's models give an extremely unsatisfactory account of how linguistic variables affect reading. At present, neither model gives an entirely satisfactory account of the details of fixations on words. For example, Vitu and O'Regan (1995) presented data which they argued are inconsistent with the modification of Morrison's model proposed by Henderson and Ferreira (1990), which claims that refixations on words occur because a deadline is reached and lexical access has not occurred yet. Such a model predicts that the first of multiple fixations should be longer than single fixations because the latter should be the result of eye movements programmed prior to the deadline. Instead, Vitu and O'Regan found that the duration of a single fixation is longer than the first of two fixations (see also Rayner et al., 1996). On the other hand, there are word frequency effects that are independent of where the reader initially fixated on the word (Rayner et al., 1996, 1998), which is inconsistent with predictions of the oculomotor model. In addition, both models ignore syntactic effects (e.g., "garden path" effects) or discourse-level effects (e.g., resolution of anaphora) on eye movements in reading. More critically, we would like to argue that neither type of model is sufficiently precise for comprehensive testing. In the remainder of this chapter, we will focus on recent endeavors in our laboratory to provide a more formal model of eye movement control. Prior to doing so, we will briefly mention other such attempts.
254
K. Rayner, E.D. Reichle & A. Pollatsek
Attempts to produce more quantitative models In addition to the qualitative models described above, there have been some recent attempts to produce more quantitative models of the characteristics of eye movements during reading. For example, models by Legge, Klitz and Tan (1997) and Suppes (1990) have focused on low level aspects of reading and have not concerned themselves with the duration of fixations. The two models, however, are quite different. Legge et al.'s model, called Mr. Chips, attempts to explain the details of where readers fixate cognitively, assuming an intelligent guiding mechanism that is controlled by lexical access and the details of which letters can be processed due to acuity limitations and other considerations. In contrast, Suppes assumes rather "dumb" mechanisms and focuses on the stochastic properties of the variability of saccadic eye movements in reading. One other model that deserves mention is that of Thibadeau, Just and Carpenter (1982), which is a formal production system that provides a more quantitative account of the Just and Carpenter (1980) model. Like its predecessor, it focuses on a composite gaze duration measure and ignores important details like the probability of refixations and of word skipping. These models are all reasonable attempts to explain part of the eye movement record. However, our model goes a significant step beyond them by trying to account simultaneously for the details of individual fixation durations and the location of individual fixations (at the level of which word is fixated). To the best of our knowledge, there are no extant models which successfully account for eye movements in reading at both of these levels. However, we should make clear that our model has two clear deficiencies. First, it does not attempt to explain where on a word a reader fixates. This is because, at present, it is completely a process model; however, we think that adding oculomotor stages to handle the where question in more detail is not a conceptually difficult next step, although it would complicate the formal modeling. Second, we do not attempt to explain how syntactic and discourse level processes influence eye movements. We have two major reasons for this. First, we think that this would be too difficult given that there are no theories of syntactic or discourse-level processing that are even close to providing a precise enough theory to incorporate into a model of eye movements. Second, we think it is not an unreasonable hypothesis that these variables may intervene to control eye movements primarily when the reader in fact runs into trouble (such as in "garden path" sentences) and needs to suspend normal processing until the problem is repaired (usually involving regressions and/or very long fixations). Thus, we think that our model, which assumes that word identification is the "engine" that drives the eyes forward may explain a good deal of the eye movement record in reading.
Eye movement control
255
The E-Z Reader Model In the remainder of this chapter, we provide an overview of the E-Z Reader model. More detail on the underlying mathematical equations and formal modelling is provided in Reichle et al. (1998). E-Z Reader1 is similar to Morrison's (1984) model with two notable exceptions. First, E-Z Reader decouples the signal to shift attention from the signal to program a saccade2. Second, E-Z Reader is better specified in that it has been implemented as a computer simulation program. The basic goal of E-Z Reader is to predict (a) the probability a word is fixated, (b) the number of times it is fixated, and (c) the duration of individual fixations on it. There are five basic processes in E-Z Reader: (1) a familiarity check on a word; (2) completion of lexical access; (3) an early, labile stage of saccadic programming which can be canceled by subsequent saccadic programming; (4) a later, nonlabile, stage of saccadic programming; and (5) the actual saccadic eye movement. Figure 1 is a schematic diagram showing how these process interact to control eye movements during reading. The first pair of processes correspond to different stages of lexical access, and thus can be conceptualized as products of a single cognitive module (Fodor, 1983) that is responsible for word identification. The first process, the familiarity check on a word, signals a point in time when lexical access is imminent, and thereby cues the system mediating eye movements to begin planning a saccade. The second process, the completion of lexical access, corresponds to a stage where a word's identity has been determined and this information can be passed on to other systems. As in Morrison's (1984) model, the completion of lexical access causes attention to shift to the next word. However, as already mentioned, our model differs from Morrison's in that E-Z Reader decouples covert shifts of attention from saccadic programming; whereas the familiarity check initiates the programming of a saccade, the completion of lexical access causes attention to shift to the next word. Although the division of lexical access into two discrete stages is partly a modeling convenience, it is not without precedent. For instance, the distinction
1 Our previous modelling efforts (see Reichle et al., 1998) were directed towards finding the minimal set of assumptions that could account for eye movement control in reading. As a result, our final model, E-Z Reader 5, was preceded by several approximations. In this chapter, we will ignore the earlier versions of the model; "E-Z Reader" therefore refers to E-Z Reader 5. 2 We realize that the decision to decouple covert shifts of spatial attention from eye movements is somewhat controversial. However, most of the studies that have shown tight coupling between the two (specifically that covert attention can not be deployed to places other than where a saccade is programmed to) use paradigms in which exogenous signals are driving the eyes. Moreover, a recent study by Stelmach, Campsall and Herdman (1997) has found independence between attention shifting and saccades even in these paradigms.
256
K. Rayner, E.D. Reichle & A. Pollatsek
Fig. 1. Schematic diagram of how the component processes of E-Z Reader influence eye movements and covert attention and how attentional processes in turn influence cognitive operations.
between (a) a rapid feeling of familiarity and (b) a slower retrieval of information from memory is consistent with previous empirical (Atkinson and Juola, 1973) and theoretical work (Hintzman, 1988; Gillund and Shiffrin, 1984). In addition, there have been several two-stage "verification" models of lexical access (e.g., Paap et al., 1982; Van Orden, 1987) that would be reasonably consistent with our model. In E-Z Reader, the time necessary to complete the familiarity check (in the absence of top-down influences) is assumed to be a linear function of the logarithm of word frequency (as tabulated by Francis and Kucera, 1982)3. The additional time necessary for the completion of lexical access is assumed to be a constant multiple of the familiarity check time. Both process durations are also assumed to be affected by top down influences, captured in a predictability value obtained from off-line data obtained from subjects. The time required for both the familiarity check and completion of lexical access are assumed to be reduced (multiplicatively)
3 In our modeling, to minimize the number of parameters, we did not distinguish between frequency and word length effects. Thus "frequency effects" in our model are really a combination of frequency and word length effects because the two are highly correlated in our sample of text as in printed English in general. Similarly, the assumption that the duration of the familiarity check stage is a constant fraction of the duration of lexical access (in the absence of top-down influences) is to minimize free parameters. For a complete explanation of the mathematical equations and details of the formal modeling, see Reichle et al. (1998).
Eye movement control
257
by predictability . Finally, two additional parameters control the familiarity check and completion of lexical access process durations so that the rate of processing becomes slower as eccentricity (i.e., the distance between the word being processed and fixation) increases. The eccentricity assumption was added to increase the psychological plausibility of the model because (a) retinal acuity rapidly decreases as distance from the fovea increases, and (b) words are identified more rapidly in the fovea than the parafovea (Rayner and Morrison, 1981). The three remaining processes (i.e., the labile and nonlabile stages of saccadic programming and the actual saccades) can be viewed as components of a single module that programs and executes eye movements. The division between labile and nonlabile stages of saccadic programming was motivated by Morrison's model, which in turn was motivated by Becker and Jiirgens (1979). Their data indicated that the computations that are necessary to move the eyes to a target location can be canceled or modified if a new target location is presented early during programming, but cannot be interrupted if a new location is presented late in programming. As already mentioned, the programming of an interword saccade is initiated in E-Z Reader when the familiarity check on the previous word has been completed. And, consistent with Becker and Jiirgens (1979), the early, labile stage of programming is canceled whenever the labile program for another saccade is initiated (as in Morrison's model). This process is illustrated schematically in Fig. 2. In the first example, the labile program to move the eyes to word n + 1 completes before the program to move the eyes to word n + 2 is initiated, so that both saccades are executed. (This is the mechanism that could produce very short fixation durations.) In contrast, in the second example, the labile program to word n + 1 is canceled when the program to word n + 2 is initiated; consequently, only the saccade to word n + 2 is executed, and word n + 1 is skipped. The latter sequence of events (i.e., the cancellation of one labile program by another) allows E-Z Reader to generate skips. In the case of intraword saccades (or refixations), the signals that initiate and terminate the labile stage of programming are somewhat different: The labile program is initiated at the moment that a word has been fixated and it will produce a refixation unless it is canceled by the completion of the word's familiarity check (i.e., the event that signals the eye movement system to plan an interword saccade). This assumption is consistent with the following "dumb" default strategy: Fixate each word from more than one viewing location unless the word's familiarity indicates that a refixation is unnecessary, (This assumption is not very different from that of O'Regan.) Finally, as Fig. 2 indicates, the nonlabile stage of saccadic In the model, the impact of predictability on the familiarity check and lexical completion stages was somewhat different. This was handled with a single multiplicative parameter that attenuated the predictability effect for the familiarity check stage. The rationale for this was that top-down processes should affect early stages of lexical access less than later stages.
258
K. Rayner, E.D. Reichle & A. Pollatsek Example 1:
Fig. 2. Schematic diagram showing the relationship between labile and nonlabile stages of saccadic programming.
programming immediately follows the labile stage, and a saccade is executed once the nonlabile stage has completed. For simplicity, the durations of the labile and nonlabile stages of saccadic programming and for the actual saccadic movements were set equal to fixed values. E-Z Reader was applied to a corpus of data collected by Schilling, Rayner and Chumbley (1998). In their study, subjects read 48 sentences while their eye movements were monitored. Each sentence was 8-14 words in length. Eight values were tabulated for each word in the corpus5: (1) the natural frequency of occurrence (from Francis and Kucera, 1982); (2) the predictability (obtained from a rating study); (3) the mean gaze duration; (4) the mean first fixation duration; (5) the mean single fixation duration (i.e., the mean duration of fixations when there was exactly one fixation on a word); (6) the proportion of times a word was skipped; (7) the proportion of times a word was fixated once; and (8) the proportion of times a word was fixated twice. The mean fixation durations and proportion of times skipped, fixated once, and fixated twice were then averaged within five frequency classes of words to produce 30 means. E-Z Reader was fitted to these data by conducting 5 The first and last words in each sentence were excluded from the analyses because (a) the first word was initially fixated by a reading-irrelevant movement from a fixation cross, and (b) the fixation on the last word coincided with a button push. Sentences with interword regressions were also excluded since they generally reflect difficulty with higher-order processing (Ehrlich and Rayner, 1983; Frazier and Rayner, 1982) and are, therefore, beyond the scope of E-Z Reader. However, intraword regressions were included in the data.
Eye movement control
259
Table 1 Observed and predicted values of gaze durations, and individual fixations, probability of skipping, making a single fixation, and making two fixations for five frequency classes of words Gaze durations
First fixation durations
Single fixation durations
Freq. class
Frequency range
Mean freq.
Obs.
Pred. (E-Z)
Obs.
Pred. (E-Z)
Obs.
Pred. (E-Z)
1
1-10
3
293
291
248
251
265
274
2
11-100
45
272
271
234
253
249
263
3
101-1,000
347
256
257
228
246
243
252
4
1,001-10,000 4,889
234
226
223
223
235
224
5
10,001+
40,700 214
211
208
210
216
210
Probability of skipping
Probability of making a single fixation
Probability of making two fixations
Freq. class
Frequency range
Mean freq.
Obs.
Pred. (E-Z)
Obs.
Pred. (E-Z)
Obs.
Pred. (E-Z)
1
1-10
3
0.10
0.09
0.68
0.73
0.20
0.17
2
11-100
45
0.13
0.16
0.70
0.76
0.16
0.07
3
101-1,000
347
4
1,001-10,000 4,889
5
10,001+
0.22
0.27
0.68
0.68
0.10
0.04
0.55
0.49
0.44
0.50
0.02
0.01
40,700 0.67
0.68
0.32
0.32
0.01
0.00
multiple grid-searches of the parameter space to determine the best values for the model parameters. The data and simulation results using the best-fitting parameter values are presented in Table 1. As can be seen in Table 1, there is a close correspondence between the observed and predicted means for the five frequency classes of words. The most serious discrepancies are: (a) the predicted first fixation and single fixation durations for the lower frequency words are a bit long; (b) there is some non-monotonicity in the predicted first fixation durations, with the predicted means for Frequency Class 2 being larger than those for Frequency Class 1; (c) the percent of refixations for Frequency Classes 2 and 3 are underpredicted. The cause of the non-monotonicity is
260
K. Rayner, E.D. Reichle & A. Pollatsek
Opposite page: Fig. 3. Observed and predicted frequency distributions of gaze durations. Each point represents the proportion of gaze durations in a frequency class that occurred within a given 50-ms interval (e.g., the points above the abscissa labeled "100" represent the proportion of gaze durations between 50-100 ms).
complex because first fixation durations are complex: They are a mixture of single fixation durations and the durations of first fixations that are followed by refixations. As can be seen in Table 1, the locus of the problem is not the predicted single fixation durations because they are monotonic and reasonably close to the observed values. Instead, the non-monotonicity stems from the fact that the model generates the right number of reflations for Frequency Class 1 but tends to underpredict the number of refixations for Frequency Classes 2 and 3. Additional simulations indicated that this anomaly can be rectified by softening the assumption of equal process durations for the labile stages of interword and intraword saccade programming.
Additional simulations Pattern of variability The above results indicate that E-Z Reader predicts the mean behavior of readers quite well. However, one can often be misled about a model's utility and/or psychological validity by simply evaluating aggregate properties (Hintzman, 1991). Consequently, we decided to examine the patterns of variability predicted by the model. Figure 3 shows histograms of gaze durations that were observed by Schilling et al. (1998) and those predicted by E-Z Reader6. In both cases, each datum represents the gaze duration on a single word for a single subject. As can by seen in Fig. 3, the absolute ranges and shapes of the observed distributions are in reasonably close agreement to those predicted by E-Z Reader, with the main discrepancy being that the observed distributions are less variable than those predicted by the model. Frequency effects The 48 sentences were used by Schilling et al. (1998) to examine word frequency effects during reading. Half contained high-frequency target words (over 46 per million, mean =141; Francis and Kucera, 1982) and half contained low-frequency
Similar histograms were constructed for the observed and predicted first fixation durations, which also show a close correspondence (see Reichle et al., 1998).
Eye movement control
261
262
K. Rayner, E.D. Reichle & A. Pollatsek
target words (less than 4 per million, mean = 2). One obvious question, therefore, is whether E-Z Reader predicts the frequency effects reported by Schilling et al. on their target words. For the gaze durations, the observed means for the high- and low-frequency target words were 248 ms and 298 ms, respectively, resulting in a 50 ms frequency effect. E-Z Reader predicted means for the high- and low-frequency target words of 260 ms and 298 ms, respectively, giving a 38 ms frequency effect. Thus, the model accurately predicts both the absolute values for the means of the high- and low-frequency words, and the frequency effect7. We were also interested in whether the model could predict frequency effects on spillover (i.e., increased durations of fixations immediately subsequent to fixating a target word). Unfortunately, the corpus did not allow a particularly good empirical test because the high- and low-frequency words were in different sentences. As a result, we substituted the mean values for Schilling et al.'s (1998) high- and lowfrequency target words (141 and 2 per million, respectively) into each of the 48 designated target positions, so that predicted effects of frequency on fixation times on the target word and on spillover durations could be calculated uncontaminated by differences in sentence frame. The mean frequency effect on gaze duration on the target words was close to that reported in the previous section: 35 ms. The mean spillover frequency effect (predicted increase in the gaze duration on word n + 1 was 22 ms, which is a bit smaller than the values observed in prior studies (which range from 30 to 50 ms), but not unreasonable. E-Z Reader predicts a spillover effect because, as the frequency of word n decreases, the time required to complete lexical access on the word increases, thereby reducing any preview benefit on word n + 1 that might otherwise occur while word n is being fixated. Less preview benefit on word n + 1 translates into increased processing times and hence longer fixation durations. Preview benefit We simulated the preview benefit from parafoveal processing on the 48 target words used by Schilling et al. (1998). In the simulated control condition (i.e., normal parafoveal preview), the target word was left unchanged in the parafovea (i.e., the rate of processing was moderated by the eccentricity parameters, as in previous simulations). In the simulated experimental condition (i.e., no parafoveal preview), lexical processing of the target words began when they were fixated. The predicted preview benefit (i.e., the difference between the two conditions) on the gaze duration on the target words was 40 ms, which corresponds well to observed values in prior studies that range from 40 to 60 ms. 7 We also examined frequency effects on first fixation and single fixation durations and found close correspondences between the E-Z Reader predictions and the results reported by Schilling etal. (1998).
Eye movement control
263
Conclusions We think that E-Z Reader is a step forward in understanding eye movements. In our view, it is likely to be the simplest model that can explain much of the lawfulness in the eye movement record. As with any model, it focuses on certain aspects of the data and not others. Thus, like Morrison, we did not attempt to explain where readers fixated within words. Although we believe that it would not be difficult to graft a module (based on McConkie et al.'s equations) that would simulate initial landing positions reasonably well, we think it would be quite difficult to evaluate the success of such an endeavor (i.e., evaluating whether the predictions would be sensitive to changes in the assumptions about cognitive processes, as opposed to low-level visual processes). In addition, as indicated above, the model is incomplete (as are all the models) in ignoring the demonstrated effects of higher-order processes on reading and interword regressions. Even within this more limited scope, our model is clearly far from perfect. Among other things, we suspect that our assumptions about the causes of refixations are too simple. For example, the morphemic effects obtained by Hyona and Pollatsek (1998) could not be predicted by the model. In addition, our assumptions about how word frequency and predictability affect lexical processing are clearly just zeroorder approximations to the truth. However, we believe it is an important tool that helps us to make sense of the eye movement record. Acknowledgements Preparation of this chapter was supported by Grant HD 26765 from the National Institute of Health. The first author was supported by a Research Scientist Award from the National Institute of Mental Health (MH 01255) and the second author was supported by a Traineeship from the National Institute of Mental Health (MH 16745). References Atkinson, R.C. and Juola, J.F. (1973). Factors influencing speed and accuracy of word recognition. In: S. Kornblum (Ed.), Attention and Performance IV. New York: Academic Press, pp. 583-612). Balota, D.A., Pollatsek, A. and Rayner, K. (1985). The interaction of contextual constraints and parafoveal visual information in reading. Cognitive Psychology, 17, 364—390. Beauvillain, C., Dore, K. and Baudouin, V. (1996). The 'center of gravity' of words: Evidence for an effect of the word-initial letters. Vision Research, 36, 589-604. Becker, W. and Jiirgens, R. (1979). Analysis of the saccadic system by means of double step stimuli. Vision Research, 19, 967-983.
264
K. Rayner, E.D. Reichle & A. Pollatsek
Blanchard, H.E., McConkie, G.W., Zola, D. and Wolverton, G.S. (1984). The time course of visual information utilization during fixations in reading. Journal of Experimental Psychology: Human Perception and Performance, 10, 75-89. Blanchard, H.E., Pollatsek, A. and Rayner, K. (1989). The acquisition of parafoveal word information in reading. Perception and Psychophysics, 46, 85-94. Bouma, H. and deVoogd, A.M. (1974). On the control of eye saccades in reading. Vision Research, 14, 273-284. Dunn-Rankin, P. (1978). The visual characteristics of words. Scientific American, 238, 122-130. Ehrlich, K. and Rayner, K. (1983). Pronoun assignment and semantic integration during reading: Eye movements and immediacy of processing. Journal of Verbal Learning and Verbal Behavior, 22, 75-87. Ehrlich, S.F. and Rayner, K. (1981). Contextual effects on word perception and eye movements during reading. Journal of Verbal Learning and Verbal Behavior, 20, 641-655. Epelboim, J., Booth, J.R. and Steinman, R.M. (1994). Reading unspaced text: Implications for theories of reading eye movements. Vision Research, 34, 1735-1766. Everatt, J. and Underwood, G. (1992). Parafoveal guidance and priming effects during reading: A special case of the mind being ahead of the eyes. Consciousness and Cognition, 1, 186-197. Fodor, J.A. (1983). Modularity of mind. Cambridge, MA: MIT Press. Francis, W.N. and Kucera, H. (1982). Frequency analysis of English usage: Lexicon and grammar. Boston, MA: Houghton Mifflin. Frazier, L. and Rayner, K. (1982). Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14, 178-210. Gillund, G. and Shiffrin, R.M. (1984). A retrieval model for both recognition and recall. Psychological Review, 91, 1-67 Henderson, J.M. and Ferreira, F. (1990). Effects of foveal processing difficulty on the perceptual span in reading: Implications for attention and eye movement control. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16,417-429. Hintzman, D.L. (1988). Judgments of frequency and recognition memory in a multiple-trace memory model. Psychological Review, 95, 528-551. Hochberg, J. (1975). On the control of eye saccades in reading. Vision Research, 15,620. Hogoboam, T.W. (1983). Reading patterns in eye movements. In: K. Rayner (Ed.), Eye Movements in Reading: Perceptual and Language Processes. New York: Academic Press, pp. 309-332. Hyona, J. (1995). Do irregular letter combinations attract readers' attention? Evidence from fixation locations in words. Journal of Experimental Psychology: Human Perception and Performance, 21, 68-81. Hyona, J., Niemi, P. and Underwood, G. (1989). Reading long words embedded in sentences: Informativeness of word halves affects eye movements. Journal of Experimental Psychology: Human Perception and Performance, 15, 142-152. Hyona, J. and Pollatsek, A. (1998). The role of component morphemes on eye fixations when reading Finnish compound words. Journal of Experimental Psychology: Human Perception and Performance, in press.
Eye movement control
265
Ishida, T. and Ikeda, M. (1989). Temporal properties of information extraction in reading studied by a text-mask replacement technique. Journal of the Optical Society A: Optics and Image Science, 6, 1624—1632. Just, M.A. and Carpenter, P.A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87, 329-354. Kennison, S.M. and Clifton, C. (1995). Determinants of parafoveal preview benefit in high and low working memory capacity readers: Implications for eye movement control. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 68-81. Kowler, E. and Anton, S. (1987). Reading twisted text: Implications for the role of saccades. Vision Research, 27, 45-60. Legge, G.E., Klitz, T.S. and Tjan, B.S. (1997). Mr. Chips: An ideal-observer model of reading. Psychological Review, 104, 524-553. McConkie, G.W., Kerr, P.W., Reddix, M.D. and Zola, D. (1988). Eye movement control during reading: I. The location of initial eye fixations in words. Vision Research, 28, 1107-1118. McConkie, G.W., Kerr, P.W., Reddix, M.D., Zola, D. and Jacobs, A.M. (1989). Eye movement control during reading: II. Frequency of refixating a word. Perception and Psychophysics, 46, 245-253. McConkie, G.W. and Rayner, K. (1975). The span of the effective stimulus during a fixation in reading. Perception and Psychophysics, 17, 578-586. Morris, R.K., Rayner, K. and Pollatsek, A. (1990). Eye movement guidance in reading: The role of parafoveal letter and space information. Journal of Experimental Psychology: Human Perception and Performance, 16, 268-281. Morrison, R.E. (1984). Manipulation of stimulus onset delay in reading: Evidence for parallel programming of saccades. Journal of Experimental Psychology: Human Perception and Performance, 10, 667-682. Nazir, T.A. (1993). On the relation between the optimal and the preferred viewing position in words during reading. In: G. d'Ydewalle and J. Van Rensbergen (Eds.), Perception and Cognition: Advances in Eye Movement Research. Amsterdam: North Holland, pp. 171-180. O'Regan, J.K. (1979). Eye guidance in reading: Evidence for linguistic control hypothesis. Perception and Psychophysics, 25, 501-509. O'Regan, J.K. (1980). The control of saccade size and fixation duration in reading: The limits of linguistic control. Perception and Psychophysics, 28, 112-117. O'Regan, J.K. (1981). The convenient viewing position hypothesis. In: D.F. Fisher, R.A. Monty and J.W. Senders (Eds.), Eye Movements: Cognition and Visual Perception. Hillsdale, NJ: Erlbaum, pp. 289-298. O'Regan, J.K. (1990). Eye movements and reading. In: E. Kowler (Ed.), Eye Movements and Their Role in Visual and Cognitive Processes. Amsterdam: Elsevier, pp. 395^53. O'Regan, J.K. (1992). Optimal viewing position in words and the strategy-tactics theory of eye movements in reading. In: K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading. New York: Springer-Verlag, pp, 333-354. O'Regan, J.K. and Levy-Schoen, A. (1987). Eye movement strategy and tactics in word recognition and reading. In: M. Coltheart (Ed.), Attention and Performance XII: The Psychology of Reading Erlbaum, pp. 363-383.
266
K. Rayner, E.D. Reichle & A. Pollatsek
O'Regan, J.K., Levy-Schoen, A., Pynte, J. and Brugaillere, B. (1984). Convenient fixation location within isolated words of different length and structure. Journal of Experimental Psychology: Human Perception and Performance, 10, 250-257. Paap, K.R., Newsome, S.L., McDonald, J.E. and Schvaneveldt, R.W. (1982). An activationverification model for letter and word recognition: The word superiority effect. Psychological Review, 89, 573-594. Pollatsek, A., Lesch, M., Morris, R.K. and Rayner, K. (1992). Phonological codes are used in integrating information across saccades in word identification and reading. Journal of Experimental Psychology: Human Perception and Performance, 18, 148-162. Pollatsek, A. and Rayner, K. (1982). Eye movement control in reading: The role of word boundaries. Journal of Experimental Psychology: Human Perception and Performance, 8, 817-833. Pollatsek, A. and Rayner, K. (1990). Eye movements and lexical access in reading. In: D.A. Balota, G.B. Flores d'Arcais and K. Rayner (Eds.), Comprehension Processes in Reading. Hillsdale: Erlbaum, pp. 143-164). Pollatsek, A., Rayner, K. and Balota, D.A. (1986). Inferences about eye movement control from the perceptual span in reading. Perception and Psychophysics, 40, 123-130. Pynte, J. (1996). Lexical control of within-word eye movements. Journal of Experimental Psychology: Human Perception and Performance, 22,958-969. Radach, R. and Kempe, V. (1993). An individual analysis of initial fixation positions in reading. In: G. d'Ydewalle and J. Van Rensbergen (Eds.), Perception and Cognition: Advances in Eye Movement Research. Amsterdam: North Holland, pp. 213-226). Rayner, K. (1979). Eye guidance in reading: Fixation locations within words. Perception, 8, 21-30. Rayner, K. (1995). Eye movements and cognitive processes in reading, visual search, and scene perception. In: J.M. Findlay, R. Walker and R.W. Kentridge (Eds.), Eye Movement Research: Mechanisms, Processes and Applications. Amsterdam: North Holland, pp. 3-22. Rayner, K. and Bertera, J.H. (1979). Reading without a fovea. Science, 206,468^69. Rayner, K. and Fischer, M.H. (1996). Mindless reading revisited: Eye movements during reading and scanning are different. Perception and Psychophysics, 58, 734-747. Rayner, K., Fischer, M.H. and Pollatsek, A. (1998). Unspaced text interferes with both word identification and eye movement control. Vision Research, 38, 1129-1144. Rayner, K., Inhoff, A.W., Morrison, R., Slowiaczek, M.L. and Bertera, J.H. (1981). Masking of foveal and parafoveal vision during eye fixations in reading. Journal of Experimental Psychology: Human Perception and Performance, 7, 167-179. Rayner, K. and McConkie, G.W. (1976). What guides a reader's eye movements. Vision Research, 16, 829-837. Rayner, K., McConkie, G.W. and Ehrlich, S.F. (1978). Eye movements and integrating information across fixations. Journal of Experimental Psychology: Human Perception and Performance, 4, 529-544. Rayner, K., McConkie, G.W. and Zola, D. (1980). Integrating information across eye movements. Cognitive Psychology, 12, 206-226. Rayner, K. and Morris, R.K. (1992). Eye movement control in reading: Evidence against semantic preprocessing. Journal of Experimental Psychology: Human Perception and
Eye movement control
267
Performance, 18, 163-172. Rayner, K. and Morrison, R.M. (1981). Eye movements and identifying words in parafoveal vision. Bulletin of the Psychonomic Society, 17, 135-138. Rayner, K. and Pollatsek, A. (1981). Eye movement control during reading: Evidence for direct control. Quarterly Journal of Experimental Psychology, 33A, 351-373. Rayner, K. and Pollatsek, A. (1987). Eye movements in reading: A tutorial review. In: M. Coltheart (Ed.) Attention and Performance XII: The Psychology of Reading. London: Erlbaum, pp. 327-362. Rayner, K. and Pollatsek, A. (1989). The Psychology of Reading. Englewood Cliffs, NJ: Prentice Hall. Rayner, K. and Pollatsek, A. (1996). Reading unspaced text is not easy: Comments on the implications of Epelboim et al.'s study for models of eye movement control in reading. Vision Research, 36,461^70. Rayner, K. and Sereno, S.C. (1994). Eye movements in reading: Psycholinguistic studies. In: M.A. Gernsbacher (Ed.), Handbook of Psycholinguistics. New York: Academic Press, pp. 57-82. Rayner, K., Sereno, S.C., Morris, R.K., Schmauder, A.R. and Clifton, C. (1989). Eye movements and on-line language comprehension processes. Language and Cognition Processes, 4 (Special issue), 21-49. Rayner, K., Sereno, S.C. and Raney, G.E. (1996). Eye movement control in reading: A comparison of two types of models. Journal of Experimental Psychology: Human Perception and Performance, 22, 1188-1200. Rayner, K., Slowiaczek, M.L., Clifton, C. and Bertera, J.H. (1983). Latency of sequential eye movements: Implications for reading. Journal of Experimental Psychology: Human Perception and Performance, 9, 912-922. Rayner, K. and Well, A.D. (1996). Effects of contextual constraint on eye movements in reading: A further examination. Psychonomic Bulletin and Review, 3, 504-509. Rayner, K., Well, A.D., Pollatsek, A. and Bertera, J.H. (1982). The availability of useful information to the right of fixation in reading. Perception and Psychophysics, 31, 537-550. Reichle, E., Pollatsek, A., Fisher, D.L. and Rayner, K. (1998). Towards a model of eye movement control in reading. Psychological Review, 105, 125-157. Schilling, H.E.H., Rayner, K. and Chumbley, J.I. (1998). Comparing naming, lexical decision, and eye fixation times: Word frequency effects and individual differences. Memory and Cognition, in press. Sereno, S.C. (1992). Early lexical effects when fixating a word in reading. In: K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading. New York: Springer-Verlag, pp. 304-316. Slowiaczek, M.L. and Rayner, K. (1987). Sequential masking during eye fixations in reading. Bulletin of the Psychonomic Society, 25, 175-178. Stelmach, L.B., Campsall, J.M. and Herdman, C.M. (1997). Attention and ocular movements. Journal of Experimental Psychology: Human Perception and Performance, 23, 823-844. Suppes, P. (1990). Eye-movement models for arithmetic and reading performance. In: E. Kowler (Ed.), Eye Movements and Their Role in Visual and Cognitive Processes. Amsterdam: Elsevier, pp. 455-477.
268
K. Rayner, E.D. Reichle & A. Pollatsek
Thibadeau, R., Just, M.A. and Carpenter, P.A. (1982). A model of the time course and content of human reading. Cognitive Science, 6, 101-155. Underwood, G., Bloomfield, R. and Clews, S. (1988). Information influences the pattern of eye fixations during sentence comprehension. Perception, 17,267-278. Underwood, G., Clews, S. and Everatt, J. (1990). How do readers know where to look next? Local information distributions influence eye fixations. Quarterly Journal of Experimental Psychology, 42A, 39-65. Van Orden, G.C. (1987). A rows is a rose: Spelling, sound, and reading. Memory and Cognition, 15, 181-198. Vitu, F. (1991). The influence of parafoveal processing and linguistic context on the optimal landing position effect. Perception and Psychophysics, 50, 58-75. Vitu, F. and O'Regan, J.F. (1995). A challenge to current theories of eye movements in reading. In: J.M. Findlay, R. Walker and R.W. Kentridge (Eds.), Eye Movement Research: Mechanisms, Processes, and Applications. Amsterdam: North Holland, pp. 381-393. Vitu, F., O'Regan, J.K., Inhoff, A.W. and Topolski, R. (1995). Mindless reading: Eye movement characteristics are similar in scanning letter strings and reading text. Perception and Psychophysics, 57, 352-364. Vitu, F., O'Regan, J.K. and Mittau, M. (1990). Optimal landing position in reading isolated words and continuous text. Perception and Psychophysics, 47, 583-600.
269
CHAPTER 12
Eye Movements During Scene Viewing: An Overview John M. Henderson and Andrew Rolling worth Michigan State University
Abstract How do the semantic and visual characteristics of local scene regions influence the placement and duration of eye fixations during scene viewing? First, we review research on eye movement behaviour during scene viewing, focusing particularly on the influence of semantic information on eye movement behaviour. Second, we identify a number of factors that may influence eye movement behaviour in scenes, and suggest directions for future research. Finally, we propose a descriptive model of eye movement control in complex scenes.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
270
J.M. Henderson & A. Hollingworth
Overview In this chapter our goal is to provide an overview of eye movement patterns during scene viewing. There are at least three important reasons to understand eye movements in scene viewing. First, eye movements are critical for the efficient and timely acquisition of visual information during complex visual-cognitive tasks, and the manner in which eye movements are controlled to service information acquisition is a critical question. More generally, the interaction between vision, cognition, and eye movement control can be seen as a scientifically tractable testing ground for theories of the interaction between input, central, and output systems (Henderson, 1996). The vast majority of our current knowledge of eye movement control in complex visual-cognitive tasks derives from studies of reading, but a complete theory will require generalization to other ecologically valid tasks like scene viewing. Second, how we acquire, represent, and store information about the visual environment is a critical question in the study of perception and cognition. The tradition in the study of scene perception (and in perception and visual cognition generally) has been to study performance in tasks that use static, briefly presented images as stimuli. However, vision is a dynamic process in which representations are built up over time from multiple eye fixations. The study of eye movement patterns during scene viewing contributes to an understanding of how information in the visual environment is dynamically acquired and represented. Finally, eye movement data provide an unobtrusive, online measure of visual and cognitive information processing. In order to capitalize on this measure, it will be necessary to develop a more complete understanding of the manner in which visual-cognitive processing is reflected by eye movement behaviour. This chapter is divided into three sections. First, we briefly review the literature on eye movement behaviour during scene viewing, with particular emphasis on where the eyes tend to fixate in a scene, and how long they tend to stay at a particular location. Our focus here is on static scenes. Reports of recent investigations of eye movements during the viewing of dynamic scenes can be found in Chapters 17-19. Second, we identify some largely unexplored factors that may affect the placement and duration of eye fixations in a scene. Finally, we offer a tentative descriptive model of eye movement control during scene viewing. Review of eye movements during scene viewing Eye movement behaviour during scene viewing can be divided into two relatively discrete temporal phases, fixations, or periods of time when the point of regard is relatively (though not perfectly) still, and saccades, or periods of time when the eyes are rotating at a relatively rapid rate to reorient the point of regard from one spatial
Eye movements during scene viewing
271
position to another. Useful pattern information is acquired during the fixations, with little useful pattern information taken in during the saccades due to a combination of visual masking and central suppression (Matin, 1974). During fixations, the quality of the information acquired falls off rapidly and continuously from the center of the point of regard (fixation position) due to the optical properties of the eyes and the neural structure of the retina and visual cortex, with the highest quality visual information acquired from the spatial area immediately surrounding that point. Two important issues for understanding eye movement control during scene viewing are where the fixation position tends to be centered during scene viewing, and how long the fixation position tends to remain centered at a particular location in a scene. We will address these issues of fixation position and fixation duration next. Where do viewers look in a scene? Effects of general region informativeness on fixation position The first systematic exploration of fixation positions in scenes was reported by Bus well (1935), who asked 200 participants to look at 55 pictures of different types of artwork under a variety of viewing instructions. An important result was that fixation positions were found to be highly regular and related to the information in the pictures. For example, viewers tended to concentrate their fixations on the people rather than on background regions when examining Sunday on the Island of La Grande-Jatte by Georges Seurat. These data thus provided some of the earliest evidence that eye movement patterns during complex scene perception are related to the information in the scene, and by extension, to perceptual and cognitive processing of the scene. Buswell concluded that "Eye movements are unconscious adjustments to the demands of attention during a visual experience. The underlying assumption in this study is that in a visual experience the center of fixation of the eyes is the center of attention at a given time." (Buswell, 1935, pp. 9-10). Buswell's finding that informative scene regions tend to receive more fixations has been replicated many times. In the first study to explore this relationship analytically, Mackworth and Morandi (1967) divided each of two colour photographs into 64 square regions, and a group of participants then rated the informativeness of each region based on how easy it would be to recognize on another occasion. A new group of viewers then examined the pictures to decide which one of the two they preferred. Fixation density (the number of discrete fixations) in each of the 64 regions in each scene was found to be related to the informativeness rating of the region, with regions rated more informative receiving more fixations. Regions that received low informativeness ratings were often not fixated at all, suggesting that the scenes were filtered by peripheral vision and that uninformative regions could be rejected as potential fixation sites based on peripheral information alone. Mackworth
272
J.M. Henderson & A. Hollingworth
and Morandi (1967) also found that viewers were as likely to place their fixations on informative regions in the first two seconds of scene viewing as in other two-second intervals, providing evidence for relatively early, peripherally-based scene analysis. The two pictures used by Mackworth and Morandi (1967) were visually simple: One depicted a pair of eyes within a hooded mask, and the other was a coastal map. Using images of more complex scenes taken predominantly from the Thematic Apperception Test, Antes (1974) provided additional evidence that region informativeness affects fixation position. Like Mackworth and Morandi (1967), Antes (1974) asked one group of viewers to rate each scene region according to the degree to which it contributed to the total amount of information conveyed by the whole picture. A different group of viewers then examined the scenes while their eye movements were recorded. Their task was to decide which scene they preferred. There were two main results relevant to fixation position. First, the density of fixations in a scene region was highly correlated with that region's informativeness, with regions rated more informative receiving more fixations, replicating Mackworth and Morandi (1967). Second, the first fixation position selected by a viewer (following the experimenter-induced initial fixation position at the center of the scene) tended to be within an informative region of a scene, suggesting rapid control of fixation position by scene characteristics. In summary, the studies reviewed in this section suggest that the positions of individual fixations in a scene, including the position of the fixation after the first saccade, are determined in part by the informativeness of scene regions, with more fixations being directed to more informative regions. However, because region informativeness was determined by experimenter intuition (Buswell, 1935; Yarbus, 1967) or by viewer ratings (Antes, 1974; Mackworth and Morandi, 1967), visual and semantic informativeness were probably correlated in these studies. Therefore, it is not possible to determine whetner there is an independent effect of semantic informativeness (i.e., the meaning of a region) beyond visual informativeness (i.e., the presence of discontinuity in texture, colour, luminance, and depth) on the positions of fixations in a scene. This issue is important because it is related to the question of whether fixation positions reflect cognitive operations as well as perceptual processes during scene viewing. If so, then semantically informative regions should be more likely to receive fixations during scene viewing, holding visual informativeness constant. We turn to this issue next. Effects of semantic informativeness on initial fixation positions In perhaps the first study to investigate the influence of semantic informativeness on fixation location, Loftus and Mackworth (1978) presented viewers with line drawings of scenes in which a manipulated target object was either high or low in semantic informativeness. Semantic informativeness was defined as the degree to
Eye movements during scene viewing
273
which an object was predictable within the scene, with the logic that an object unlikely to be found in a scene is more informative than an object likely to be found there. Importantly, visual informativeness was controlled by exchanging objects across scenes. For example, a farm scene could contain either a tractor (low informativeness) or an octopus (high informativeness). An underwater scene contained the same two objects, so that the semantic informativeness of the target objects was reversed. The two target objects occupied the same position in each scene. Participants viewed the scenes for four seconds each in preparation for a later recognition test. There were two main findings with respect to fixation location. First, viewers tended to fixate the inconsistent objects earlier during the course of scene viewing. Second, and more interestingly, viewers were more likely to fixate the semantically informative objects immediately following the first saccade within the scene. Because the distance of the saccade to the target objects averaged 6.5-8° of visual angle, these data suggest that viewer's could determine in a single fixation the semantic informativeness of an object based on peripheral information, and that semantic informativeness could then exert an immediate effect on eye movement control. De Graef, Christiaens and d'Ydewalle (1990) investigated the influence of semantic informativeness on eye movement patterns during scene viewing using a visual search task: Viewers searched line drawings of scenes for object-like figures that were not associated with any identifiable real-world object ("non-objects"). Using the same manipulation as had Loftus and Mackworth (1978), pre-specified target objects were placed in the scenes, and these objects were either semantically consistent or inconsistent (referred to by De Graef et al. as probability violations) with the scene. (Other types of violations were used as well, but we will focus on the semantic consistency here.) In contrast to Loftus and Mackworth (1978), De Graef et al. (1990) found no evidence that semantically inconsistent objects were fixated earlier than consistent objects. In fact, when De Graef et al. (1990) plotted the cumulative proportion of targets fixated as a function of informativeness, they found that viewers were no more likely to fixate the inconsistent than the consistent objects for the first 8 fixations. Our examination of this cumulative probability distribution (De Graef et al., 1990, Fig. 2) suggests to us that after the first 8 fixations in a scene, there was even some tendency for viewers to fixate the consistent objects sooner than the inconsistent objects. Clearly, these data do not support the view that the eyes are immediately drawn to semantically informative objects. We recently conducted two new experiments to provide additional evidence concerning the role of semantic informativeness on eye movement patterns during scene perception (Henderson, Weeks and Hollingworth, 1999). In the first experiment, we attempted to replicate and extend Loftus and Mackworth (1978). We constructed 24 line drawings of real-world scenes generated from photographs (De
274
J.M. Henderson & A. Hollingworth
Graef et al., 1990). Semantically uninformative (consistent) target objects were drawn independently for each scene. Pairs of objects were then inserted into two yoked scenes to create two scenes in which the objects were informative and two in which the objects were uninformative, as shown in Fig. 1. The two target objects in a pair were always placed in the same location in a given scene so that the distance from the initial fixation point and lateral masking from surrounding contours would be controlled. During the experiment, we asked viewers to look at the scenes in preparation for a later memory test (which was, in fact, never given). The viewers were shown each of the 24 scenes once, half containing the informative target object for that scene and half containing the uninformative target object. Whether a given scene contained the informative or uninformative object was counterbalanced across viewers. In contrast to Loftus and Mackworth (1978) but similar to De Graef et al. (1990), we found that viewers were no more likely to fixate the more informative target object than the less informative object early during scene viewing. First, viewers were no more likely to fixate the informative than the uninformative target after the first saccade in the scene, fixating the target immediately on about 10% of the trials in both conditions. Viewers were also no more likely to fixate the informative target after two saccades, fixating both types of objects after the first or second saccade in about 20% of the trials. Second, viewers initially landed on a target object after an average of about 11 fixations in the scene regardless of the semantic informativeness of the object. Third, the magnitude of the initial saccade to the target object was about 3°, and there was no evidence that these saccades were longer to the informative targets. These data suggest that the eyes are not initially driven by peripheral semantic analysis of individual objects. In a second experiment, we introduced a visual search task to provide additional evidence concerning the relationship between semantic informativeness and initial fixation placement. During each trial, viewers were provided the name of a target object and then shown a line drawing of a scene. The viewer's task was to determine as quickly as possible whether the target object was present in the scene. Because of the instructions, viewers should have been highly motivated to find the targets as quickly as possible. If semantically informative objects can draw the eyes from peripheral regions of the scene, informative objects should be found more quickly than uninformative objects. As in the first experiment, however, viewers were no more likely to fixate the informative than the uninformative target after the first saccade in the scene. Instead, uninformative targets were fixated sooner (after about 3 fixations) than informative targets (after about 3.5 fixations). This finding presumably resulted from the fact that the positions of the uninformative objects were more constrained by the scenes, and so they were easier to find. For example, a blender in a kitchen is likely to appear on a counter-top rather than on the floor or elsewhere in the scene. A blender in a farmyard, by comparison, might appear just about
Fig. I . Pairs of objects inserted into two yoked scenes (bar and laboratory) to create two scenes in which the objects were informative and two scenes in which the objects were uninformative.
276
J.M. Henderson & A. Hollingworth
anywhere, and would thus be more difficult to find. Finally, it is of interest that viewers moved their eyes to the targets more quickly in the second experiment (after about 3 fixations) than in the first (after about 11 fixations), suggesting that they could use peripheral visual information to guide their search. Even so, there was no evidence in either experiment that the eyes were drawn to semantically informative objects. In summary, four experiments have examined the effects of semantic informativeness on initial fixation placement. Of these, one experiment has shown that the eyes are drawn to inconsistent object (Loftus and Mackworth, 1978), while three have shown that they are not (one experiment reported by De Graef et al., 1990, and two experiments reported by Henderson et al., 1999). Why might Loftus and Mackworth (1978) have found that viewers' initial fixations were drawn to semantically informative objects? One possible explanation is simply that the Loftus and Mackworth (1978) result was due to statistical error. This explanation seems possible given the relatively low spatial and temporal resolution of the eyetracking equipment that was available at the time of that study. If we assume that the Loftus and Mackworth result was not due to statistical error, there are at least three other potential explanations for the inconsistency across studies. First, semantic informativeness and visual informativeness may have been correlated in the Loftus and Mackworth experiment (De Graef et al., 1990; Rayner and Pollatsek, 1992). This problem might arise if, for example, the consistent target objects were initially drawn in the scenes, and then the target objects were swapped across scenes. If this were true, then the result of semantic informativeness on initial fixations may actually have been due to visual factors. While we cannot say for certain whether this was a problem in the Loftus and Mackworth experiment, we do know that it was not a problem in our study: All scenes were created in the same way, and target objects were drawn independently of the scene backgrounds. Second, it could be that the scenes used by Henderson et al. (1999) and by De Graef et al. (1990) were visually more complex than those used by Loftus and Mackworth (1978). For example, if there were fewer contours in the Loftus and Mackworth scenes, then there may have been less lateral masking of individual objects, and so it may have been easier for viewers to semantically analyse peripheral objects. A third and related possibility is that the difference in results across studies may have been due to a difference in the size of the scenes used across studies. Larger scenes might lead to greater peripheral semantic analysis because the objects in the scenes would potentially be larger. In our study, the scenes subtended 10x14.5° while the Loftus and Mackworth (1978) scenes subtended 20x30°. Contrary to this hypothesis, however, De Graef et al. (1990) used scenes that subtended 20x30°, but as discussed above, they observed no influence of peripheral object semantics on early fixation placement.
Eye movements during scene viewing
211
There is an additional point that leads us to believe that the Loftus and Mackworth (1978) result was anomalous. Loftus and Mackworth (1978) observed an average saccadic amplitude of over 7° in their study. This average is roughly twice as large as the average saccadic amplitude typically observed in scene viewing experiments. For example, viewers in both of our experiments moved their eyes to the target objects from about 3-4° away, and very few saccades were in the 6-8° range (Henderson et al., 1999). (We report further evidence concerning distributions of saccadic amplitudes during scene viewing below.) The smaller saccadic amplitudes observed in our study were not due to the size of our scenes. Antes (1974) presented scenes that subtended 20x20° and observed average saccadic amplitudes in the same range as we did. Saida and Ikeda (1979) had participants view 14.4x18.8° pictures in preparation for a later memory test. In their control condition in which the entire scene was visible throughout the trial, the modal saccade length was under 2° and very few saccades were greater than 4°. Shiori and Ikeda (1989) reported that the median saccade size in a non-degraded viewing condition of their study was about 3° in 15x15° pictures, with 75% of all saccades between about 1.5 and 5.5° (estimated from Shiori and Ikeda, 1989, Fig. 10). Van Diepen, De Graef and d'Ydewalle (1995) found average saccadic amplitudes of about 3.4° when viewers searched for "non-objects" in line drawings of scenes that subtended 16x12°. (We ignore here conditions in the Saida and Ikeda (1979), Shiori and Ikeda (1989), and van Diepen et al. (1995) studies in which the amount of the scene that was visible during each fixation was manipulated using a window or mask that moved contingent on eye position; see Chapter 15 for information on these manipulations.) Overall, then, the saccadic amplitudes observed by Loftus and Mackworth (1978) appear to be anomalous given the remainder of the picture viewing literature. Effects of semantic informativeness on fixation density Fixation density can be defined as the number of discrete fixations within a given region. As reviewed above, viewers tend to cluster their fixations within informative regions of a scene (Antes, 1974; Buswell, 1935; Mackworth and Morandi, 1967; Yarbus, 1967). An examination of the figures presented by Buswell (1935) and Yarbus (1967) suggest that these clusters are not entirely determined by visual factors, but instead that viewers tend to concentrate their fixations on regions that are semantically interesting. Other evidence for an influence of scene semantics on fixation density comes from the manipulation of viewing instructions by Yarbus (1967). Yarbus found that when looking at a picture of I.E. Repin's An Unexpected Visitor, viewers tended to concentrate their fixations on the people in the picture and particularly on their faces when they were attempting to determine the ages of the people, but tended to distribute their fixations more widely over the scene when they were attempting to estimate the material circumstances of the family.
278
J.M. Henderson & A. Hollingworth
Fixation densities in a scene region can be influenced both by the number of fixations made within that region each time it is examined (including the first time), and by the number of times viewers look back to that region. The figures presented by Buswell and Yarbus provide some qualitative evidence that both the number of initial fixations and the number of looks back to a scene region are affected by the informativeness of the region. There is also quantitative evidence supporting these conclusions. First, we have shown that the number of fixations viewers make in a region when that region is first fixated is affected by scene semantics (Henderson et al., 1999). In addition, there are two studies that provide quantitative evidence that viewers tend to return their gaze to semantically informative regions over the course of scene viewing (Loftus and Mackworth, 1978; Henderson et al., 1999). In our study, we found that viewers looked to informative objects about 3.3 times and to uninformative objects about 2.6 times on average over the course of 15 seconds of scene viewing. In contrast to the results reported by Loftus and Mackworth (1978) and Henderson et al. (1999), Friedman (1979) found no effect of informativeness (likelihood) on the number of discrete looks to an object from a position beyond that object (Friedman and Liebelt, 1981). In that study, Friedman (1979) used a correlational approach to investigate the relationship between semantic consistency and eye movement patterns. Participants viewed line drawings of real-world scenes in preparation for a memory test in which "they would have to later be able to distinguish between the original pictures and new pictures in which, for example, only a small detail on one object would be different." Each scene contained objects that had been rated for their likelihood within the scene by a separate group of participants. A likely explanation for the lack of effect of semantic informativeness in the Friedman (1979) study is that the overall manipulation of informativeness was relatively weak; objects ranged continuously from very likely to somewhat likely in the scenes, with no truly unlikely objects. In our study (Henderson et al., 1999) as well as that of Loftus and Mackworth (1978), when a scene contained a semantically inconsistent object, that object was highly anomalous in the scene. Thus, the effect of semantic informativeness on fixation density was probably easier to detect in these latter studies. Summary In summary, the results of the past scene viewing studies indicate that the positions of fixations within a scene are non-random, with fixations clustering on informative scene regions (Antes, 1974; Buswell, 1935; Henderson et al., 1999; Mackworth and Morandi, 1967; Yarbus, 1967). However, the specific effect of semantic informativeness beyond that of visual informativeness on fixation position is less clear. Loftus and Mackworth (1978) observed that viewers tended immediately to fixate semantically informative objects, but neither De Graef et al. (1990) nor Henderson
Eye movements during scene viewing
279
et al. (1999) were able to replicate this effect. At the same time, both Loftus and Mackworth (1978) and Henderson et al. (1999) observed that viewers tended to look back more often to semantically informative than to uninformative scene regions, while Friedman (1979) did not observe this effect. How long do viewers look at different scene regions? While initial studies of eye movement patterns during scene viewing did not report viewing time measures (Antes, 1974; Buswell, 1935; Mackworth and Morandi, 1967; Yarbus, 1967), later research provides good evidence that the amount of time viewers fixate a scene region is dependent on the informativeness of that region. At a macro level of analysis, the total time that a region is fixated in the course of scene viewing (the sum of the durations of all fixations in that region) is correlated with the number of fixations in that region. Because, as discussed in the preceding section, fixation density is higher for visually and semantically informative scene regions, total viewing time spent on those regions also tends to be longer. At a micro level of analysis, one can ask whether the durations of individual fixations and temporally contiguous clusters of fixations in a region (rather than the sum of all fixations) are also affected by region informativeness. Several commonly used micro-level measures of fixation time include first fixation duration (the duration of the initial fixation in a region), first pass gaze duration (the sum of all fixations from first entry to first exit in a region), and second pass gaze duration (the sum of all fixations from second entry to second exit in a region). In a recent series of experiments, van Diepen and colleagues have manipulated the quality of the visual information available during each fixation using a moving mask paradigm (see Chapter 15). Viewers searched for non-objects in real-world scenes, and the image at fixation was normal or was degraded. Image degradation was manipulated by reducing the contrast or overlaying a noise mask on the fixated region. When the image was degraded beginning at the onset of fixation, first fixation durations were longer than in a control condition, suggesting that the duration of the initial fixation is controlled, at least in part, by the acquisition of visual information from the fixated region. The van Diepen et al. study (Chapter 15) is the only direct exploration of the influence of visual factors on fixation duration during scene viewing that we are aware of, and there is currently no direct data concerning whether first fixation durations or gaze durations in a scene are affected by other correlates of visual informativeness such as contour density or contrast. The effects of semantic informativeness on micro measures of fixation time during scene viewing have been studied more extensively. Loftus and Mackworth (1978) found that first pass gaze durations were longer for semantically informative (i.e., inconsistent) objects. Friedman (1979) similarly showed that first pass gaze duration on an object was correlated with the rated likelihood of that object in the
280
J.M. Henderson & A. Hollingworth
scene, with longer gaze durations on objects that were less likely to be found in a particular scene.1 Using the non-object counting task, De Graef et al. (1990) also found that first pass gaze durations were longer for semantically inconsistent objects, though this difference appeared only in the later stages of scene viewing. Finally, Henderson et al. (1999) found that first pass gaze duration and second pass gaze duration, as well as total fixation duration, were longer for semantically informative than uninformative objects. The influence of semantic informativeness on the duration of the very first fixation on an object is less clear. De Graef et al. (1990) found that overall, first fixation durations on an object did not differ as a function of semantic informativeness. However, when first fixation duration was examined as a function of fixation moment (whether an object was fixated during the first or second half of all the fixations on the scene within which it appeared), first fixation durations on objects that were first encountered relatively late during scene exploration (following the median number of total fixations) were shorter on semantically uninformative (consistent) objects. We have recently analysed the first fixation duration data from our study (Henderson et al., 1999). Overall, we did not observe an effect of semantic informativeness, with mean first fixation durations of 317 ms in the informative condition and 314 ms in the uninformative condition, F < 1. In a subsequent analysis, we used a median split to divide the data into first fixations that occurred during the first versus second half of scene exploration, and again found no effect of semantic informativeness for either fixation moment. It appears that if they exist, effects of region semantics on first fixation durations during scene viewing are fragile. Factors that may influence eye movement patterns during scene viewing While there has been reasonable consistency in the eye movement patterns that have been observed across scene viewing studies, there are also some notable differences, as discussed above. It is often difficult to determine the cause of these differences because there are a number of potentially important factors that vary from study to study. These factors include image size, viewing task, viewing time per scene, image content, and image type. Table 1 summarizes the values of these factors used in the studies reviewed above. These factors could each produce main effects and could also interact with each other in complex ways to influence dependent measures of eye movement behaviour such as saccadic amplitudes, fixation positions, and fixation durations. 1 We note that both Loftus and Mackworth and Friedman called their measures duration of the first fixation, though the measures are equivalent to what has commonly been called gaze duration. Their eyetracking equipment did not have the spatial resolution to allow these investigators to examine the true first fixation duration.
281
Eye movements during scene viewing
Table 1 Summary of methods for eye movement scene studies Study
Image size
Viewing task
Viewing time per scene
Image type/content
Buswell (1935)
varied
generally choose which images are pleasing
selfpaced
colour paintings and images of other works of art
Mackworthand Morandi (1967)
16xl6c
decide which image preferred
10s
colour photographs of a mask and a coastline
Yarbus (1967)
varied
varied
varied; up colour paintings and to 30 min images of other works of art
Antes (1974)
no more than 20°
decide which image preferred
20 s
Loftus and Mackworth(1978)
20x30°
prepare for a later 4s recognition memory test
black and white line drawings of realworld environments
Friedman (1979)
20x30°
prepare for memory test in which a small detail of one object may have changed
30 s
black and white line drawings (with some shading) of real-world environments
Friedman and Liebelt(1981)
20x30°
prepare for memory test in which a small detail of one object may have changed
30 s
black and white line drawings (with some shading) of real- world environments
De Graef, Christiaens and d'Ydewalle (1990)
20x30°
count non-objects
8s
black and white line drawings of realworld environments
Henderson, Weeks and Hollingworth (1999)Exp. 1
10x14.5°
prepare for memory test in which a small detail of one object may have changed
15 s
black and white line drawings of realworld environments
monochrome shaded drawings (mostly from TAT test)
(continued)
282
J.M. Henderson & A. Hollingworth
Table 1 (continuation)
Study
Image size
Viewing task
Viewing time per scene
Image type/content
Henderson, Weeks and Hollingworth (1999)Exp.2
1 Ox 1 4.5°
search for a pre-specified target object
until response
black and white line drawings of realworld environments
Henderson and Hollingworth (1997)
10x14.5°
prepare for memory test in which a small detail of one object may have changed
15s
black and white line drawings of realworld environments
Image size One example of how variation in one factor can make interpretation across studies difficult arises in the case of the effect of the semantic informativeness of a scene region on the amplitude of a saccade to that region. As discussed above, it is possible that the amount of the visual field subtended by a depicted scene affects saccadic amplitudes, and that the influence of semantics on amplitude is mediated by this factor. While our review above led us to conclude that mean saccadic amplitudes range between about 2 and 4° despite scene size when the scene subtends between 10 and 20°, there are no published studies that were designed to directly examine saccadic amplitudes as a function of scene size. It is possible that cross-experiment comparisons are misleading because other factors have not been held constant, and that saccadic amplitudes do scale with scene size. Only studies designed to directly test these possibilities will be able to answer this question. Viewing task Another important variable in scene viewing is the task given to the viewer. Buswell (1935) and Yarbus (1967) both presented evidence that viewers place their fixations in a scene differently depending on the viewing task. However, these studies were descriptive in that the conclusions were based on a qualitative analysis of the viewing behaviour of particular individuals on specific scenes under differing viewing instructions. In the study described above, we compared eye movement patterns in a memory preparation task and a visual search task (Henderson et al., 1999). This is, to our knowledge, the only study that has held the nature of the stimulus image constant and quantitatively examined the influence of viewing task on eye movement patterns. As discussed above, our results showed that participants
Eye movements during scene viewing
283
made fewer fixation in a scene prior to fixation on a particular object when they were searching for that object than when they were trying to memorize the scene. We must point out, however, that in this study the comparison of eye movement patterns across tasks was accompanied by variations in other factors. For example, different groups of viewers took part in each task, and the amount of time viewers were allowed to look at the scenes differed in the two viewing tasks, with 15 seconds of viewing time in the memory task and self-termination of the scenes in the visual search task. Viewing time per scene Scene viewing time is another potentially important factor in determining viewing patterns over scenes. In the studies reviewed above, scene viewing time has ranged from a minimum of 4 seconds per scene (Loftus and Mackworth, 1978) to a maximum of 30 minutes per scene (Yarbus, 1967), though one has to wonder at the patience of the viewer in the latter case. Other studies have used scene presentation durations that fall between these extremes, as shown in Table 1. In Buswell's study, scene viewing time was determined by the viewer (Buswell, 1935), and there were large individual differences in the length of time viewers wanted to look at the scenes. Thus, it is not clear what the appropriate duration for scene presentation should be. In addition, there has been some evidence that viewing patterns change over time. For example, Friedman (1979) found that the difference in gaze durations on high and low probability objects decreased from 342 ms on first entry (first pass gaze duration) to 78 ms on the third and higher re-entries. This change appeared to be due to a decrease in gaze durations on each entry for low probability objects, but not for high probability objects. Given that viewing patterns and eye movement measures may change over the course of scene viewing, these measures may also change depending on how long viewers are allowed to look at a picture. There are currently no data available on this topic. Image content The content of the images presented to viewers in eye movement studies has varied markedly, as can be seen in Table 1. At one extreme, Buswell (1935) and Yarbus (1967) obviated the problem of image content by using a wide variety of types of scenes, while at the other extreme, Mackworth and Morandi (1967) presented only two images, one of a hooded face, and the other of an aerial view of a coastline. Both of these latter images contained large areas of uniform background. It is not clear what effect the use of such a restricted set of images has on viewers' eye movements. It may be that scenes with different content (e.g, outdoor versus indoor, large-scale spaces versus small-scale spaces) produce systematic effects on eye movement patterns. Currently, this is another unexplored issue.
284
J.M. Henderson & A. Hollingworth
Image type The final potentially important factor that we will discuss here is the manner in which a scene is depicted. As shown in Table 1, scene depiction has varied from line drawings (e.g., Friedman, 1979; Henderson et al., 1999; Loftus and Mackworth, 1978) to monochrome shaded drawings (Antes, 1974) to colour paintings (Buswell, 1935) and colour photographs (Mackworth and Morandi, 1967). All of the work that has so far been conducted to examine the influence of semantic informativeness on eye movement patterns has used line drawings as stimuli (De Graef et al., 1990; Friedman, 1979; Henderson et al., 1999; Loftus and Mackworth, 1978). It is not yet clear to what extent the results generated from one type of image type will generalize to other image types. Furthermore, it will ultimately be important to determine whether the results that are derived from images that depict real-world scenes generalize to the visual world itself, that is, to the situation in which the viewer is looking at the actual visual environment. The introduction of viable head-mounted eyetracking equipment in the last few years should help to encourage the exploration of this latter issue. In order to begin to get a feel for the influence of image type on eye movement patterns, we recently conducted a study in which we contrasted viewing behaviour on line drawings, colour photographs, and computer-rendered 3-D colour images of real-world scenes (Henderson and Hollingworth, 1997). Eight viewers examined each of 30 scenes for 15 seconds each. The viewing instructions were the same as those used by Henderson et al. (1999): Viewers were told that after they had viewed all of the scenes, they would be given a memory test in which they would have to discriminate the test scenes from new scenes in which only a small detail of a single object might be changed. Ten exemplars of each of three image types (colour photographs, line drawings, and computer-rendered 2-D images from 3-D models) were presented. All images depicted common real-world scenes. The colour photographs and line drawings depicted the same 10 categories of scenes, while the rendered images depicted rooms in a house. All of the images were viewed by the same set of 8 participants. The images were presented in a random order determined individually for each participant, so that participants would be less likely to develop different viewing strategies for different image types. The three image types were presented on the same SVGA display system at the same resolution (800x600 pixels) and visual angle (10x14.5°). Eye movement data were collected using a Fourward Technologies Generation 5.5 dual-Purkinje image eyetracker. Further details of our general method can be found in Henderson et al. (1999). The eye movement data were analysed using analysis software developed in our laboratory (see Henderson et al., 1999). Colour Plate la shows the eye movement pattern of one viewer on a colour photograph of a kitchen, and Colour Plate Ib shows the pattern for that same viewer on a line drawing depicting a similar scene. In the figure, the green dots represent fixations, the red numbers indicate the ordinal
Eye movements during scene viewing
285
Colour Plate la. Viewing pattern for one participant viewing a photograph of a kitchen. Green dots represent discrete fixations, ordinally numbered in red. Green lines represent saccadic vectors.
Colour Plate Ib. Viewing pattern for the same participant viewing a line drawing of a kitchen. Green dots represent discrete fixations, ordinally numbered in red. Green lines represent saccadic vectors.
286
J.M. Henderson & A. Hollingworth
number of the associated fixation, and the green lines represent (straightened) saccade vectors. As can be seen in the figure, this viewer tended to distribute her fixations over a relatively large area of the scene, with more fixations concentrated on the more distant counter top where there were many objects than on the closer but empty counter top. Colour Plate 2 presents a contour plot of total fixation time summed across the eight viewers for the kitchen photograph (Plate 2a) and line drawing (Plate 2b). In this figure, cooler colours represent less total fixation time, and hotter colours represent more fixation time, with the colours ordered from dark blue through bright red. This figure illustrates that the viewers as a group spent the majority of their time fixating the informative regions of the scenes. A particularly striking example of this tendency can be seen by comparing the total fixation times (Colour Plate 2) on the closer counter top for the photograph and line drawing. In the colour photograph where the close counter was empty, very little fixation time was spent in that region of the scene. In contrast, in the line drawing the close counter contained a rolling pin, and fixation time clearly was devoted to that object. This contrast points out very nicely how fixation time is directed to informative scene regions. In a quantitative analysis of these data, we found small but reliable differences in eye movement parameters as a function of image type. First, viewers made an average of 36.5 fixations in each scene. They tended to fixate the photographs reliably fewer times (34.8) than the line drawings (36.8) or rendered scenes (37.9). Second, the duration of each fixation was on average 327 ms. Offsetting the reduced number of fixations on the photographs, the duration of the average fixation in the scene was reliably longer for photographs (336 ms) than for line drawings (324 ms) or rendered scenes (321 ms). Given the instructions to prepare for a memory test, it is possible that fixations were longer on the photographs because more visual information was available to commit to memory in photographs than in the other, more schematic stimuli. Finally, the mean saccade length was 2.4° and did not differ as a function of image type. Despite the small differences in eye movement parameters across image type, the consistency of the viewing patterns is quite striking, as can be seen in Colour Plates 1 and 2. Also, the specific nature of the fixations and saccades in the different scene types was very similar. For example, Fig. 2 shows the frequency distributions of the fixation durations (top panel) and saccadic amplitudes (bottom panel) as a function of image type for all participants. As can be seen in the figure, fixation duration and saccadic amplitude distributions were remarkably similar for the different image types, with modal fixations durations of about 220 ms and modal saccadic amplitudes of about 0.5°. For the purposes of comparison, we have also plotted the data from a reading study in Fig. 2. In this study, conducted by Fernanda Ferreira and Melissa Johnson, 36 participants each read 20 paragraphs of text for comprehension. The text was
Eye movements during scene viewing
287
Colour Plate 2a. Contour plot of fixation times summed across eight viewers for a photograph of a kitchen. The scene was originally presented in colour. On this image, cooler colours represent less total fixation time, and hotter colours represent more fixation time, with the colours
Colour Plate 2b. Contour plot of fixation times summed across eight viewers for a line drawing of a kitchen. Cooler colours represent less total fixation time, and hotter colours represent more fixation time, with the colours ordered: dark blue, light blue, dark green, light green, yellow, red.
288
J.M. Henderson & A. Hollingworth
Fig. 2. Frequency distributions of the fixation durations (top panel) and saccadic amplitudes (bottom panel) for participants viewing scenes (line drawings, colour photographs, and colour 3D renderings), and reading text.
presented in graphics mode on the same display system, and subtended the same visual angle as the scenes we used as stimuli. Importantly, the eye movement data were collected using the same eyetracking system, and were analysed using the same software parameters for determining the onset of a fixation and saccade as we used in the scene study. To our knowledge, this is the first report of a direct comparison of eye movement behaviour in reading and image viewing. As can be seen in the top panel of Fig. 2, the modal fixation duration in reading and scene
Eye movements during scene viewing
289
viewing was the same. However, fixation duration was considerably less variable for reading, with fewer fixations lasting longer than 340 ms. The greater number of longer fixations in scene viewing appears to account for the common finding that mean fixation duration is longer in scene viewing than in reading. The bottom panel of Fig. 2 shows that the modal saccadic amplitude is longer but less variable in reading than in scene viewing. A generalization that can be extracted from Fig. 2 is that both fixations and saccades are more variable in scene viewing than in reading. Of course, these comparisons must be viewed with some caution; a task that emphasizes memory (as used in the scenes viewing study) may differ in important ways from a task that emphasizes comprehension (as used in the reading study). A saliency map framework for eye movement control in scene viewing In this final section we want to outline a model of eye movement control in scene viewing (Henderson, 1992; Henderson et al., 1999). This framework is, at this point, descriptive rather than quantitative, but we believe that it is specific enough to generate new predictions. The framework is also couched in such a way that it could be computationally modelled, and several of the proposed components have been modelled as independent modules for other purposes (e.g., Mahoney and Ullman, 1988). The framework is also in the spirit of the computational model proposed by Reichle et al. (1997; see also Chapter 11). The saliency map framework expands on ideas originally discussed by Henderson (1992), which in turn extended the model of eye movement control in reading proposed by Morrison (1984) and elaborated upon by Henderson and Ferreira (1990; Henderson and Ferreira, 1993) and Rayner and Pollatsek (1989; see Chapter 11). The framework is meant to account for fixation placement and fixation duration for those fixations that are directed in the service of visual analysis and cognitive processing. The framework is not meant to account for more fine-grained eye movement behaviour such as micro-saccades and ocular drift, and also ignores other oculomotor phenomena like the global effect (Findlay, 1982) and the optimal viewing position effect (O'Regan, 1992; Vitu et al., 1995), though of course we do not deny their existence. In the saliency map framework, a representation of potential saccade targets is generated from an early parse of the scene into regions of potential interest and a background that is relatively undifferentiated. This initial parse is derived from a fast early analysis of the low frequency information available during an initial fixation in the scene. The positions of the regions of potential interest are coded in a representation of visual space and are assigned a saliency weight. The combination of spatial position and saliency weight is the saliency map (Mahoney and Ullman, 1988). Initially, region salience is determined by visual factors such as luminance, contrast, texture, colour, contour density, and so on, because this is the only information that is available about each region. Salience may also initially be
290
J.M. Henderson & A. Hollingworth
modified by top-down factors such as the viewer's task, but only if the task can be based on these visual factors. For example, the salience of scene regions that are the same shape as a search target (e.g., rectangular) would be increased, leading to relatively efficient search, as found by Henderson et al. (1999). Further, a global semantic analysis of the scene could contribute to search by constraining the likely position of semantically consistent targets. According to the saliency map framework, the visual information acquisition system follows two simple rules: (1) Allocate visual-spatial attention to the scene region with the highest saliency weight (Koch and Ullman, 1985), and (2) Try to keep the eyes fixated on the attended scene region (Henderson, 1992; Henderson and Ferreira, 1990; Henderson and Ferreira, 1993). Because initially saliency weights are determined only by visual factors, initial attention allocation and initial fixation placement will be determined by visual rather than semantic characteristics of the scene. When the eyes are in fixation, the amount of time they remain stationary will primarily be determined by the amount of time needed to complete perceptual and cognitive analysis of that region. Once processing is complete, the saliency weight for that region will be reduced and attention will be released. Attention is then reallocated to the region that now has the highest saliency weight, and the eyes are programmed to move to that region (Henderson, 1992; Henderson, Pollatsek and Rayner, 1989). If perceptual and/or cognitive analysis of the currently fixated region is taking too long given the present fixation position (i.e., the rate of information acquisition is too low), then visual-spatial attention will be reallocated within the current region to optimize information acquisition, and a refixation will be programmed to the new locus of attention (Henderson, 1993; see also McConkie, 1979, for a similar explanation of refixations in reading). Selecting a sub-region within a region is assumed to be based on constructing a saliency map at a finer scale of resolution. The reallocation of attention within a scene region accounts for the finding that scene regions that are difficult to analyse are more likely to receive refixations (Henderson et al., 1999). Refixations may also be programmed based on oculo-motor factors alone (Henderson, 1993; O'Regan, 1992). In the saliency map framework, initial movements of the eyes during scene viewing should be controlled by stimulus features rather than by cognitive features. However, as individual scene regions are fixated and cognitively analysed, saliency weights will be modified to reflect the relative cognitive interest of those regions. In other words, we assume that the source of the saliency weight for a given scene region will change from primarily visual to primarily cognitive interest as regions are fixated and understood. As scene viewing and understanding progresses, region salience will become heavily determined by factors such as semantic informativeness. The eyes will then be more likely to be sent to regions of cognitive salience rather than drawn by regions of visual salience, leading to greater fixation density and total fixation time on semantically interesting objects and scene regions.
Eye movements during scene viewing
291
In contrast to initial fixation placement, the amount of time the eyes initially remain fixated in a region, and the number of initial refixations in that region, should be affected by semantic aspects of the region right from the first time the region is fixated, because these aspects of eye movement behaviour are determined by the amount of time required to complete cognitive analysis of that region. In other words, the length of time the eyes remain in a region is controlled primarily by the needs of perceptual and cognitive analysis of the region. In addition, to the extent that additional looks back to a region are needed for additional cognitive analysis of that region, fixation times during these additional looks should also be influenced by the same factors that influence initial fixation times. Conclusion There had been something of a hiatus in the exploration of eye movements during scene viewing following the studies that were conducted in the 1960s and '70s. Now, after 20 years of relative inactivity, there has been a resurgence of interest in this topic, as exemplified by many of the chapters in this volume. We see this renewed interest as positive and necessary: while a great deal has been learned about eye movement behaviour during scene viewing, there are still a large number of unresolved questions. Ultimately, answers to these questions will provide a more complete understanding of the interface between perception and action, will contribute to our knowledge of scene perception, and will allow eye movement monitoring to fulfill its promise as a noninvasive, on-line measure of visual-cognitive processing. Acknowledgements We would like to thank Fernanda Ferreira for her lively discussions of the issues raised here, and several anonymous reviewers for their comments. The work described in this chapter was supported by grants from the U.S. Army Research Office and the National Science Foundation to John M. Henderson, and by a National Science Foundation graduate fellowship to Andrew Hollingworth. The contents of this article are those of the authors and should not be construed as an official Department of the Army position, policy, or decision.
References Antes, J.R. (1974). The time course of picture viewing. Journal of Experimental Psychology, 103, 62-70.
292
J.M. Henderson & A. Hollingworth
Buswell, G.T. (1935). How people look at pictures. Chicago: University of Chicago Press. De Graef, P., Christiaens, D. and d' Ydewalle, G. (1990). Perceptual effects of scene context on object identification. Psychological Research, 52, 317-329. Findlay, J.M. (1982). Global processing for saccadic eye movements. Vision Research, 22, 1033-1045. Friedman, A. (1979). Framing pictures: The role of knowledge in automatized encoding and memory for gist. Journal of Experimental Psychology: General, 108, 316-355. Friedman, A. and Liebelt, L.S. (1981). On the time course of viewing pictures with a view towards remembering. In: D.F. Fisher, R.A. Monty and J.W. Senders (Eds.), Eye movements: Cognition and Visual Perception. Hillsdale, NJ: Erlbaum. Henderson J. (1992). Visual attention and eye movement control during reading and picture viewing. In: K. Rayner (Ed.), Eye movements and Visual Cognition. New York: Springer-Verlag Henderson, J.M. (1993). Eye movement control during visual object processing: Effects of initial fixation position and semantic constraint. Canadian Journal of Experimental Psychology, 47, 79-98. Henderson, J.M. (1996). Visual attention and the attention-action interface. In: K. Aikens (Ed.), Perception: Vancouver Studies in Cognitive Science (Vol V). Oxford: Oxford University Press. Henderson, J.M. and Ferreira, F. (1990). Effects of foveal processing difficulty on the perceptual span in reading: Implications for attention and eye movement control. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16,417-429. Henderson, J.M and Ferreira, F. (1993). Eye movement control in reading: Fixation measures reflect foveal but not parafoveal processing difficulty. Canadian Journal of Experimental Psychology, 47 (Special Issue), 201-221. Henderson, J. M. and Hollingworth, A. (1997). Eye movements during viewing of line drawings, color photographs, and computer renderings of natural scenes. Unpublished data. Henderson, J.M., Pollatsek, A. and Rayner, K. (1989). Covert visual attention and extrafoveal information use during object identification. Perception and Psychophysics, 45, 196-208. Henderson, J.M., Weeks, P.A., Jr. and Hollingworth, A. (1999). Eye movements during scene viewing: Effects of semantic consistency. Journal of Experimental Psychology: Human Perception and Performance, in press. Koch, C. and Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4,219-227. Loftus, G.R. and Mackworth, N.H. (1978). Cognitive determinants of fixation location during picture viewing. Journal of Experimental Psychology: Human Perception and Performance, 4, 565-572. Mackworth, N.H. and Morandi, A.J. (1967). The gaze selects informative details within pictures. Perception and Psychophysics, 2, 547-552. Mahoney, J.V. and Ullman, S. (1988). Image chunking defining spatial building blocks for scene analysis. In: Z. Pylyshyn (Ed.), Computational Processes in Human Vision: An Interdisciplinary Perspective. Norwood, NJ: Ablex. Matin, E. (1974). Saccadic suppression: A review and an analysis. Psychological Bulletin, 81,899-917.
Eye movements during scene viewing
293
Morrison, R.E. (1984). Manipulation of stimulus onset delay in reading: Evidence for parallel programming of saccades. Journal of Experimental Psychology: Human Perception and Performance, 10, 667-682. McConkie, G.W. (1979). On the role and control of eye movements in reading. In: P.A. Kolers, M.E. Wrolstad and H. Bouma (Eds.), Processing of Visible Language, Vol. 1. New York: Plenum Press, pp. 37-48. O'Regan, J. K. (1992). Optimal viewing position in words and the strategy-tactics theory of eye movements in reading. In: K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading. New York: Springer-Verlag, pp. 333-354. Rayner, K. and Pollatsek, A. (1989). The Psychology of Reading. Englewood Cliffs, NJ: Prentice-Hall. Rayner, K. and Pollatsek, A. (1992). Eye movements and scene perception. Canadian Journal of Psychology, 46 (Special Issue), 342-376. Saida, S. and Ikeda, M. (1979). Useful visual field size for pattern perception. Perception and Psychophysics, 25, 119-125. Shiori, S. and Ikeda, M. (1989). Useful resolution for picture perception as a function of eccentricity. Perception, 18, 347-361. van Diepen, P.M.J., De Graef, P. and d'Ydewalle, G. (1995). Chronometry of foveal information extraction during scene perception. In: J.M. Findlay, R. Walker and R.W. Kentridge (Eds.), Eye Movement Research: Mechanisms, Processes, and Applications. Amsterdam: Elsevier. Vitu, F., O'Regan, J.K., Inhoff, A.W. and Topolski, R. (1995). Mindless reading: Eyemovement characteristics are similar in scanning strings and reading texts. Perception and Psychophysics, 57, 352-364. Yarbus, A.L. (1967). Eye Movements and Vision. New York: Plenum Press.
This page intentionally left blank
295
CHAPTER 13
Eye Guidance and Visual Search John M. Findlay and Iain D. Gilchrist University of Durham
Abstract The observer in a visual search task is required to look for a specified target over an extended region of the visual field. Eye movements during such a task direct the high resolution region of central vision to different locations. A dominant tradition in work on visual search has given little consideration to eye movement control but instead has used the concept of covert visual attentional movements (redirecting attention without moving the eyes). In this chapter, we argue that this emphasis is misguided. We analyse the results of search experiments to demonstrate that, when the eyes are free to move, no additional covert attentional scanning occurs. We show that, unless instructions explicitly prevent eye movements, subjects in a search task show a natural propensity to move their eyes, even in situations where it would be more efficient not to do so. We suggest that the reason for this preference is that in naturally occurring search situations, eye movements form the most effective way of sampling the visual field.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
296
J.M. Findlay & ID. Gilchrist
Introduction The topic of visual search brings together converging streams of interest from a wide range of vision specialists. Measurement of search speed has offered an attractive alternative to threshold measures for the investigation of the sensory processes of early vision. Results from search tasks underpin much current thinking about visual attention and visual cognition. The area of search offers a link between experimental laboratories and real world visual tasks such as product inspection. There is a large and diverse literature on the subject within psychology and allied disciplines, but a surprisingly small proportion of this has been concerned with eye guidance during visual search — the topic of the current chapter. How has such a paradoxical situation arisen when the process of looking at different potential locations would appear to be an essential feature of searching? One significant reason relates to the massive interest in covert visual attention over the last three decades following the pioneering ideas of Neisser (1967). Visual attention can be directed covertly to different locations in the visual field while the eyes themselves are maintained in a fixed position. Redirection of attention in this way gives clear benefits when the speed and accuracy of visual processing at the attended location are measured. An important review of this work (Posner, 1980) occurred in the same year that Treisman and Gelade (1980) introduced their well known feature integration theory of search in which serial scanning by covert attention played an important role. In this chapter we shall develop an argument which emphasizes the vital role of eye movements in visual search. We first present a brief review of feature integration theory emphasizing two of its key assumptions. Following this we discuss some recent work addressing the control of eye movements in visual search, which we argue should lead to a re-evaluation of the importance of these overt movements and a corresponding reduction in the emphasis on covert attentional scanning. Feature integration theory The theory of Treisman and Gelade (1980) distinguished between parallel and serial processes in search. Certain search tasks seem easy and effortless, for example detecting a red item in a display in which no other red item is present. Other search tasks seem more arduous and time consuming. Treisman and Gelade showed how these distinctions could be made experimentally precise using accurate measurements of the time needed to carry out the search. Typically, an experimental subject is presented with a display consisting of a number of non-target (distractor) items. Some displays also contain a target and the task of the subject is to make a present/absent decision depending on the presence or absence of the target. A
Eye guidance and visual search
297
search function is then plotted, measuring the way in which the time required for the search varies as the number of distractor items is varied. The easy tasks showed a flat search function with no increase in search time when the distractor number was increased. Typically, flat search functions occur when the target is distinguishable from the distractors on the basis of a single feature, such as its colour. In such tasks search appears to be carried out in parallel across the whole visual field and the target 'pops out' effortlessly. In contrast, other tasks showed an approximately linear increase of search time as the number of distractors was increased. Linear increases are typically observed when the target cannot be distinguished from the distractors using any single feature but only in the way in which two or more features are combined, for example searching for a red horizontal line in a display for which the distractors include red vertical lines and green horizontal lines. In such a task, the target is defined not by a single feature but by the conjunction of features. Treisman argued that the linear increase came about because encoding of & feature conjunction is an operation which demands attention and the search involved a serial process of covert attentional scanning whereby each item in turn had to be separately examined. Her theory is often referred to as feature integration theory. Feature integration theory contains a number of assumptions and two in particular will be addressed in the following argument. The first assumption is that of spatial homogeneity of processing. All display elements, at no matter what eccentricity they are presented, are assumed to be equivalent and all parts of the display are assumed to be equally accessible to the attentional process. Although this attentional process affords the temporary functional differentiation of a localised region for feature integration, the neuro-anatomical structural differentiation of foveal and peripheral visual regions of vision is not incorporated into the theory. The second assumption concerns rapid attentional movements. It is assumed that the focus of attention can be deployed readily and rapidly amongst the display elements. These rapid attention shifts are held to be covert, i.e. they occur at an internal processing level and are not directly observable. Overt eye movements play no role in this account of the search process. The theory renders them superfluous since attention can be deployed more rapidly with the covert process. Explicit justification of the neglect of overt eye movements has appeared in some discussions. Thus, it has been claimed that the parallel and serial search functions characteristic of feature integration theory can be obtained in search tasks when eye movements are eliminated, either with explicit instructions (Klein and Farrell, 1989) or by using brief exposures (Treisman and Gormican, 1988). It should be noted, however, that high error rates occurred in both these studies. Feature integration theory is currently held to be a useful heuristic starting point for theoretical treatments of visual search although it is widely recognised to be
298
J.M. Findlay & l.D. Gilchrist
over-simplified (e.g. Treisman and Sato, 1990). Searches for more sophisticated accounts have taken various forms. Duncan and Humphreys (1989) drew attention to the importance of the homogeneity or otherwise of the distractor set. Wolfe has pointed out the weakness of the assumption that search must be either serial or parallel and developed a model of the way in which interactions between serial and parallel processes could occur (Wolfe, 1994; Wolfe, Cave and Franzel, 1989). Pashler (1987) argued that parallel search for feature conjunctions could occur over a limited number of items. However, with a few exceptions that will be noted, there has been no consideration given to the two assumptions described above of spatial homogeneity and of rapid attentional movement. We wish to challenge both these assumptions and present a view of visual search which assigns a considerably more significant role to overt eye movements.
The inhomogeneity of the visual system Our immediate apprehension of the pictorial qualities of vision has led to a view, often held implicitly rather than explicitly, that the entire retinal image is analysed by the visual system to construct some form of extended scene representation. It is increasingly recognised that this view is fallacious (Churchland, Ramachandran and Sejnowski, 1994; Grimes, 1996). The visual system is organised so that exquisite detail processing occurs in the central foveal region. Processing ability shows a systematic decline outside this region. There are two factors which contribute to the decline, illustrated in Fig. 1. Measures of visual acuity and contrast sensitivity, show a steady deterioration as a target is presented more peripherally (Rovamo, Virsu and Nasanen, 1978; Strasburger, Harvey and Rentschler, 1991; Wertheim, 1894). This decline is mainly attributable to the properties of the retino-cortical mapping function (but see Strasburger, Rentschler and Harvey, 1994). In addition to this decline, which is found with isolated stimuli, an additional factor appears when more than one stimulus is present in neighbouring regions of the peripheral visual field. Such stimuli act in a reciprocally interfering manner to produce a further decline in recognition, as illustrated in the study by Bouma (1970, 1978) shown in Fig. 1. This effect, often termed lateral masking, is generally assumed to be structural although some evidence indicates it may be partly influenced by covert attention direction (Geiger and Lettvin, 1986). A recent thorough study (Toet and Levi, 1992) has indicated that individuals differ considerably in the width of the region over which lateral masking occurs and in some cases these regions cover a substantial part of the visual field. A consequence is that, in some situations, a target will not be discriminable unless it is within a restricted region centred on the foveal axis. This restricted area has been variously termed the conspicuity area (Engel, 1971, 1974), functional
Eye guidance and visual search
299
eccentricity (degrees)
Fig. 1. (a) The decline of visual resolution in the visual periphery. The points show the (reciprocal of) the minimum resolvable letter height in degrees at different locations in peripheral vision. Data replotted from Anstis (1974) with kind permission from Elsevier Sciences Ltd., The Boulevard, Langford Lane, Kidlington, OX5 1GB, UK. (b) Data showing the influence of lateral masking on peripheral detectability. The open circles show the decline in the ability to identify individually presented letters presented briefly in the visual periphery. The filled circles show the decline when the letter is presented simultaneously with two redundant flanking crosses. Data replotted from Bouma (1970) with permission of Macmillan Magazines Limited, Porters South, Crinan St., London Nl 9XW, UK.
300
J.M. Findlay & I.D. Gilchrist
visual field (Ikeda and Takeuchi, 1975) useful field of view (Bouma, 1978) or visual lobe (Courtney and Chan, 1986). Its size depends on a large number of visual parameters characterising the target and non-target elements in the situation. The area is largely determined by structural factors, although centrally determined factors also play a role. Thus the size of the area shrinks if the subject has an additional central visual task (Ikeda and Takeuchi, 1975) and the area is elongated in the direction of the visual field to which the subject is directing attention (Engel, 1971). If the conspicuity area is smaller than the part of the visual field where the target might occur, there will be occasions on which the target falls outside this. It will then be essential to use eye movements to bring the target within the conspicuity area. Multiple fixations will normally be necessary to perform the search. It might be expected that the size of eye saccades during such visual searches will relate to the size of the conspicuity area although only rarely (Miura, 1990; Widdel, 1983) has the connection been tested in a free search task over an extended display. Some detailed studies of the relationship have occurred in the restricted case of systematic left to right scanning of text like material (Jacobs 1986, 1991; Prinz, 1984; Prinz, Nattkempfer and Ullman, 1992; Rayner and Fisher, 1987). One conclusion to emerge is that saccade sizes are generally smaller than expected on the basis of optimum positioning of successive conspicuity areas across the visual display. In other situations, the search target is conspicuous enough to be detected anywhere in the search field. Feature integration theorists argue that in these cases, covert attention can substitute for eye movements and take over the role of attentional scanning. How might this assumption be tested? When eye movements are permitted in search tasks with conspicuous targets, the number of saccades increases as the number of display elements increases (Binello, Mannan and Ruddock, 1995; Zelinsky and Sheinberg, 1995, 1997), although the former number is always considerably smaller than the latter. Is it the case that in addition to these overt shifts brought about by eye movements, a larger number of covert attention shifts are also occurring with more than one covert shift during each eye fixation? We are sceptical about this suggestion and will argue conversely that when overt eye movements are occurring, no additional covert shifts of attention also take place. The next section reviews work on covert attentional scanning. Covert attention Covert attentional processes are readily demonstrated in experimental paradigms (Egeth and Yantis, 1997; Posner, 1980). When the eyes are held stationary but attention is directed covertly to a peripheral location, processing benefits are found to occur in the form of faster responses to, and improved visual discrimination
Eye guidance and visual search
301
abilities for, visual targets at the attended location. Conversely, material at unattended locations is subject to processing costs, showing slowed responses and impaired discrimination. The allocation of covert attention can be achieved voluntarily, with instructions to attend to a peripheral location, or automatically, when some visual stimulation occurs at the location immediately preceding the test material. Both forms of attentional allocation generate quite similar patterns of costs and benefits (Miiller and Findlay, 1988; Mttller and Rabbit, 1989). While the existence of covert attention is not doubted, many other issues relating to the phenomenon are still controversial. It has not proved easy to specify the relationship between this form of attention and overt eye movements. Two extreme positions can be identified. On the one hand, the covert and overt systems could be entirely independent. Alternatively, the systems might be tightly coupled so that covert attention shifts would involve the preparation to move the eyes but without the release of the motor activity. Intermediate positions between these extremes can also be envisaged. The independence position has been strongly advocated by Klein (1980; Klein, Kingstone and Pontefract, 1992). His arguments depend on results from tasks in which subjects are required to prepare both a saccade and a manual response. He finds no consistent decrease in manual detection latency (the measure of covert attention) at the target location for the saccade. However, one problem with these experiments arises from the introduction of an extra discrimination task to identify the command signal for the required response. Both oculomotor and manual responses times are prolonged as a consequence and the results may be unrepresentative of normal attentional functioning. Evidence from tasks not involving such a discrimination component suggests more interdependence. Shepherd, Findlay and Hockey (1986) required their subjects to initiate a voluntary saccade while at the same time carrying out a reaction time task to assess covert orienting. The critical condition was when a different location to the saccade target was designated (by prior instructions and by stimulus probability manipulation) as the locus for covert orienting. Under these circumstances, the shortest reaction times were nevertheless associated with the location to which the saccade was directed, rather than that to which covert attention was directed. This result indicates that it is not possible to attend to one location and simultaneously move the eye to a different location. A recent replication (Hoffman and Subramaniam, 1995) confirmed the result and showed that it was not dependent on the continuation of the stimulus after the eye movement. This offers support to the alternative suggestion of a tight linkage between the two forms of attention. The most fully formulated proposal of this nature is that by Rizzolatti, who has proposed a pre-motor or oculomotor readiness theory (Rizzolatti et al., 1987; Rizzolatti, Riggio and Sheliga, 1994). In this theory, covert attention uses some of the processes involved in programming an eye movement but the final
302
J.M. Findlay & I.D. Gilchrist
motor stage is not completed. A somewhat similar position has been taken by Henderson (1993) who argues that overt eye movements are always necessarily preceded by covert attentional shifts. Further discussion of these issues can be found in Findlay and Walker (1996). A rather different question asks why two evidently similar systems should be in co-existence. Overt movements of the eyes both move visual attention to a new location and also cause the region of the retina with highest resolution to be directed to that location. This latter function is not achieved by covert attentional movements. Can an alternative advantage for the covert form of movement be discovered which is not shared by overt movements? An obvious possibility is that covert attentional shifts can be made more rapidly than eye movements. Evidence about the speed at which covert attention can be shifted has not yet resulted in a clear cut story (Egeth and Yantis, 1997). A peripherally presented cue improves detection of a subsequent target at the same location (Muller and Rabbitt, 1989) with substantial cueing advantages occurring at cue lead times as short as 50 ms. However, it is not certain whether this a true indication of the speed of attentional movement (Tassinari et al., 1994) nor whether a similar figure is possible for sequential attentional scanning. Feature integration theory postulates a serial attentional scan and thus an indirect estimate of attentional scanning rate can be made from the search function. For example, as all items are assumed to be scanned on target absent trials, the slope of the search function on these trials gives an estimation of the scanning rate. Typically speeds of 50-60 ms per item are derived (Treisman and Gelade, 1980). A different approach, initiated by Bergen and Julesz (1983) has required subjects to take in several items from a multi-item display when these items are cued in rapid sequence. This has come up with figures around 30 ms/item. The reasonable correspondence of these figures appears convincing but may be illusory. The estimated speed derived from search experiments would be increased if item grouping (Treisman and Gormican, 1988; Wolfe, 1994) plays a significant role in the process or if some degree of parallel processing took place (Pashler, 1987) rather than the strictly sequential, item by item, scanning. Theoretical analyses have shown how a parallel process can, in principle, mimic a sequential process in producing a linear search function (Muller, Humphreys and Donnelly, 1994; Townsend, 1971). The figures derived from rapid sequence experiments are also capable of alternative explanations (Egeth and Yantis, 1997). Other recent empirical work favours a slower rate of covert attentional scanning. Duncan, Ward and Shapiro (1994) presented two spatially separated items in rapid sequence and measured the interfering effects of the first item on the second. Interference reached a maximum with a time difference (stimulus onset asynchrony) of about 200 ms. Duncan et al. concluded that covert attention showed a comparable dwell time to that of eye movements themselves. In the following sections, we analyse data from search
Eye guidance and visual search
303
experiments where eye movements are encouraged and find no evidence for the involvement of rapid covert scanning.
Search with eye movements Relatively few studies have addressed the relationship between eye movements and the search process. Williams (1966) examined search in large displays whose elements varied in colour, size and shape. The target was uniquely identified by a small indicator numeral, only visible when the target was fixated. Williams studied the effects of giving advance information of various target characteristics (colour, size and shape), either singly or in combination. Information about target colour resulted in faster search and most eye movements prior to finding the target landed on non-targets which had the same colour. Pre-specification of size or shape was much less effective, both in speeding the search and constraining the eye scanning. Neither was pre-specification of more than one feature differentially useful. In some ways, these results can be seen as a precursor of the ideas of feature integration. Viviani and Swensson (1982) required subjects to locate a star-shaped target amidst 15 disk-shaped distractors located between 4.1 and 12.7° eccentricity. Targets at small eccentricities were generally located with a single saccade in contrast to those at larger eccentricities where single saccades to the target occurred much more rarely. Viviani and Swenson also noted that quite frequently saccades landed in empty space between target and distractors. This may be a consequence of a well established effect whereby saccades are influenced by visual stimulation over an extended spatial region (Findlay, 1997; Findlay and Gilchrist, 1997; Vitu, 1991). In a subsequent tightly argued, although rather pessimistic, review article, Viviani (1990) urges great caution in interpreting eye movement measures to derive conclusions about cognitive mechanisms. It may be noted that many of his strictures would apply even more forcefully to the construction of theoretical models using only overall search times as a measure. In a recent research programme (Findlay, 1995, 1997; Findlay and Gilchrist, 1997; Gilchrist, Findlay and Hey wood, in press), we have studied the factors affecting the generation of the first saccade in a visual search task. The displays of the type shown in Fig. 2, consisting of targets of size approximately 1.2° presented at eccentricities of 5.7 and 10.2°. Subjects were asked to move their eyes to the target as rapidly as possible. Saccades were accepted as on target if their direction was within 15° of the target direction and their amplitude was within 2.8° of the target eccentricity. Subjects in this experiment were able to produce an accurate saccade with short latency (ca 230 ms) on over 80% of trials to a target defined by shape. In the majority of the remaining cases, the first saccades were only slightly outside the
304
J.M. Findlay & I.D. Gilchrist
Fig. 2. Simple feature search (shape) and feature conjunction search (colour and shape) used in the study by Findlay (1997). The displays were presented with the subject initially fixating centrally and free viewing was allowed. The subjects were instructed to locate the target with their eyes as rapidly as possible.
on-target limit. There is a marked contrast between this achievement of accurately directed saccades to targets defined by shape and the inability to use shape information in the experiment of Williams 0966). The difference is probably attributable to the less cluttered displays we used, designed to minimize lateral masking. We then examined a feature conjunction task, using displays of the type shown in Fig. 2b in which red and green display elements were alternated and subjects were required to find a target specified by both shape and colour. Feature integration theory predicts that a serial attention process would be needed to locate such a feature conjunction target. We thus anticipated that subjects would not be able to locate such a target accurately with a single saccade. One subject indeed performed only slightly above chance level. However three other subjects showed a surprisingly good target location ability. When the target was located within the inner ring of eight elements, between 60 and 70% of first saccades were directed to the target. The mean latency of these saccades was around 230 ms, comparable to that of the simple feature search. Targets in the outer ring generally led to a saccade to a distractor in the inner ring (mean latency 227 ms), but in some cases (20-40% for
Eye guidance and visual search
305
different individuals), an accurate on-target saccade occurred with a slightly prolonged latency (250 ms). These results show that retinal position in relation to the fovea needs to be considered much more carefully than has been customary in analyses of visual search. This conclusion has also been noted in other recent pieces of work (Carrasco et al., 1995; Carrasco and Frieder, 1997; Palmer, Ames and Lindsey, 1993). A careful analysis was made of the destination of saccades which did not land on the target. If these saccades had been directed to a non-target distractor arbitrarily, then an estimated 34% would have landed on distractors sharing neither colour nor shape feature with the target. The actual proportions of such cases were much lower (ranging from 5 to 20% across subjects with an average of 12%). In contrast, the number landing on distractors sharing one of the target features was correspondingly increased (for two of the subjects, the shape feature seemed to draw the most eye movements and for the other two the colour feature). This result contrasts with one recently reported by Zelinsky (1996) who also analysed first saccades in a search task which landed on distractors. Zelinsky found that shared features with the target contributed only a small amount to the determination of these saccades. We attribute the difference to the type of display used by Zelinsky in this experiment in which targets and distractors could be at different eccentricities. Proximity to the fovea exerts a powerful influence on saccade choice between multiple targets (Findlay, 1980) and the data shown by Zelinsky suggest that saccades were probably going to a distractor that was close to the fovea. It may be that this foveal proximity effect was strong enough to swamp any effect of shared target features. These results begin to call into question the role of covert attention in the search process. Two alternative models might be suggested of the process leading to selection of the saccade target (Findlay, 1997). A rapid covert attention scan might take place in a strictly serial item-by-item way with several covert attentional movements during the initial fixation. Some form of deadline compulsion process would need to be invoked to account for the erroneous cases where saccades are made to a distractor rather than to the target. Alternatively, covert attention might play no role, with a purely parallel process leading to saccade destination selection. Under this suggestion, the saccade would be directed to the point of highest salience in some hypothetical 'salience map'. The results offer little support for the first suggestion. If a covert attentional scan formed part of the target selection stage in the feature conjunction task, then a longer latency would be expected than in the simple feature search. In fact the average latency in the two tasks was almost identical. A constraint on modelling is that the selection process must occur within the 250 ms saccade latency period which is of course much faster than the 400-500 ms response time found typically in search tasks requiring manual reactions. Additionally, part of the latency period is taken up with retinal and oculomotor processes. A further comparison is between the latency
306
J.M. Findlay & I.D. Gilchrist
Fig. 3. The face search task used by Brown, Huey and Findlay (in press). The task was to locate the face when presented with three jumbled face distractors. The four display elements could be located either on the oblique axes as shown here, or on the four principal axes. Reproduced with permission of Pion Limited, 207 Brondesbury Park, London NW2 5JN.
of eye saccades to isolated single targets (approximately 200 ms under conditions similar to that of the conjunction search task used here). The number of locations at which targets in the feature conjunction task were correctly located was about six on average. If the extra latency delay results from a covert attention scan which must scan six locations, these results point to a very high speed of covert attentional scanning (8 ms per item). Such rates seem too fast to be acceptable and it must be concluded that the second model appears to offer a more plausible alternative. Such an account in terms of parallel processing is also common in discussions of visual search from the viewpoint of neuroscience (Desimone and Duncan, 1995; Duncan, 1996;Schall, 1995).
Eye guidance and visual search
307
A further search task we have studied reinforces the conclusion. Brown, Huey and Findlay (1997) investigated people's ability to move their eyes to a human face target when presented with scrambled face distractors closely matched on visual characteristics (Fig. 3). Naive subjects made short latency eye movements which went to distractors as frequently as to the targets. With a few hours' practice, subjects learned to delay the release of their saccade and directed a much higher proportion of saccades to the face target. An interesting additional finding was that practice was considerably more effective with upright faces than with inverted faces. These results could have come about in two ways. The times involved are such that, in contrast with the feature conjunction case discussed above, it now seems more plausible to suggest that subjects make a covert attentional scan to locate the target before moving the eyes. Saccade latencies after practice had latencies around 500 ms and the successful search could be accounted for on the basis of a prior serial covert scanning process operating at a rate around 50-100 ms/item, a rate consistent with claims for covert attentional scans. It should be pointed out, however, that the data could also be explained by a purely parallel process in which increasing the time available increased the discriminability. Findlay and Gilchrist (1997) have found a speed-accuracy trade-off in a simple feature search task with eye movements showing that, even in a simple search situation permitting parallel processing, evidence for search selection cumulates over time. Although the strategy of delaying the saccadic eye movement could be learned quite easily, no subject adopted this as the normal approach to the task prior to the training. The subjects seemed rather to experience a compulsion to move the eyes quickly. The following section offers a proposal about why it might be the case that subjects adopted this ineffective strategy. Attention, visual search and visual ecology The results described in the previous section may be summarised as follows. In our search tasks, we asked subjects to locate the target with their eyes. In some of the tasks, a high proportion of first saccades were accurately directed to the target. This was true for both simple feature searches and for feature conjunction searches although care was taken to use displays which avoided the problems of lateral masking. In these tasks, latencies of the first, target directed, saccade were usually less than 300 ms. For more difficult search tasks, such as the face detection task, it was not possible to locate the target with a single short latency saccade although it was possible to delay the release of the saccade and move the eyes directly to the target. Our analysis of these results concluded that when subjects are encouraged to move their eyes, they do not, in general, make a sequence of prior covert attentional movements in order to select the target. Only in the case of the face search task was
308
J.M. Findlay & I.D. Gilchrist
there possible evidence of such a serial use of covert attention. Even in that case, an alternative account of the data could be given. In the face search task the displays were far removed from the cluttered visual scenes characteristic of the way in which search is typically used in everyday life. In this task, the display elements were artificially arranged to be equidistant from the fovea, and to be widely separated in order to reduce lateral masking. Any relaxation of these conditions would increase the difficulty of carrying out whatever search selection process operated in peripheral vision during covert attentional scanning. We reach the conclusion that covert attention in search can only be useful in tasks which possess a peculiar and unusual kind of difficulty. This could explain why subjects are so ready to use a 'shoot first, think later' strategy in relation to the generation of saccadic eye movements during search. As we have shown, an appreciable degree of visual analysis can occur (in parallel over the near visual field) during the typical 200 ms programming time of an eye movement. Situations in which advantageous additional processing can occur when eye movements are delayed would appear to be ecologically rare. Thus for the visual search tasks encountered in everyday life, moving the eyes in order to scan successive locations with the foveal high resolution area will almost always be the most efficient procedure. The readiness of subjects to move their eyes in the laboratory plausibly reflects a carry over effect from their experience in such real-life situations. The function of covert attention We suggest that a reappraisal of the role of covert attention in vision is in order. Visual search does not provide a rationale for the existence of the covert attentional mechanism. Search situations in which the use of covert attention is advantageous are artificial and unusual. Most search tasks will be served better with overt eye scanning, guiding the eye as well as possible from the information which is being processed in parallel over the central regions of the visual field. Additional processes would be needed to ensure that scanning was distributed widely and effectively (Klein, 1988). We felt able to reject with some confidence the possibility that a fast covert scan of attention takes place during the fixations in visual search. This leads to the question of what alternative role might be assigned to covert attentional mechanisms in vision. One possibility is that it allows the scanning of features within an item (Baylis and Driver, 1993) rather than between items. However it should be noted that Blanchard et al. (1984) found no support for the suggestion that a covert attentional scan was taking place during a fixation in reading. An alternative suggestion is that covert attention assists the selection of a target for the control of vergence eye movements during fixation (Erkelens and Collewijn, 1991).
Eye guidance and visual search
309
References Anstis, S.M. (1974). A chart demonstrating variations in acuity with retinal position. Vision Research, 14, 589-592. Baylis, G.C. and Driver, J. (1993). Visual attention and objects: evidence for hierarchical coding of location. Journal of Experimental Psychology: Human Perception and Performance, 19,451^70. Bergen, J.R. and Julesz, B. (1983). Parallel versus serial visual processing in rapid pattern discrimination. Nature, 343, 696-698. Binello, A., Mannan, S. and Ruddock, K.H. (1995). The characteristics of eye movements made during visual search with multi-element stimuli. Spatial Vision, 9, 343-362. Blanchard, H.E., McConkie, G.W., Zola, D. and Wolverton, G.S. (1984). Time course of information utilization during fixations in reading. Journal of Experimental Psychology, Human Perception and Performance, 10, 75-89. Bouma, H. (1970). Interaction effects in parafoveal letter recognition. Nature, 226, 177-178. Bouma, H. (1978). Visual search and reading: eye movements and functional visual field. A tutorial review. In: J. Requin (Ed.), Attention and Performance VII. Hillsdale, NJ: Erlbaum, pp. 115-147. Brown, V., Huey, D. and Findlay, J.M. (1997). Face detection in peripheral vision. Do faces pop out? Perception, 26, 1555-1570. Carrasco, M., Evert, D.L., Chang, I. and Katz, S.M. (1995). The eccentricity effect: target eccentricity affects performance on conjunction searches. Perception and Psychophysics, 57, 1241-1261. Carrasco, M. and Frieder, K.S. (1997). Cortical magnification neutralizes the eccentricity effect in visual search. Vision Research, 37, 63-82. Courtney, A.J. and Chan, H.S. (1986). Visual lobe dimensions and search performance for targets on a competing homogeneous background. Perception and Psychophysics, 40, 39-44. Churchland, P.S., Ramachandran, V.S. and Sejnowski, T.J. (1994). A critique of pure vision. In: C. Koch and J.L. Davis (Eds.), Large Scale Neuronal Theories of the Brain. Cambridge MA: MIT Press, pp. 23-60. Desimone, R. and Duncan, J. (1995). Neural mechanisms of selective attention. Annual Review of Neuroscience, 18, 193-222. Duncan, J. (1996). Cooperating brain systems in perception and action. In: T. Inui and J.L. McClelland (Eds.), Attention and Performance XVI. Cambridge MA: MIT Press, pp. 549-578. Duncan, J. and Humphreys, G.W. (1989) Visual search and stimulus similarity. Psychological Review, 96, 433-458. Duncan, J., Ward, R. and Shapiro, K. (1994). Direct measurement of attention dwell time in human vision, Nature, 369, 313-315. Egeth, H.E. and Yantis, S. (1997). Visual attention: control, representation, and time course. Annual Review of Psychology, 48, 269-297. Engel, G.R. (1971). Visual conspicuity, directed attention and retinal locus. Vision Research, 11,563-576.
310
J.M. Findlay & ID. Gilchrist
Engel, G.R. (1974). Visual conspicuity and selective background interference in eccentric vision. Vision Research, 14,459-471. Erkelens, C.J. and Collewijn, H. (1991). Control of vergence: gating among disparity inputs by voluntary target selection. Experimental Brain Research, 87, 671-678. Findlay, J.M. (1980). The visual stimulus for saccadic eye movements in human observers. Perception, 9, 7-20. Findlay, J.M. (1995). Visual search: eye movements and peripheral vision. Optometry and Vision Science, 72,461-466. Findlay, J.M. (1997). Saccade target selection in visual search. Vision Research, 37, 617-631. Findlay, J.M. and Gilchrist, I.D. (1997) Spatial scale and saccade programming. Perception, 26,1159-1167. Findlay, J.M. and Walker, R. (1996). Visual attention and saccadic eye movements in normal human subjects and in patients with unilateral neglect.In: W. Zangemeister, H.S. Stiehl and C. Freska (Eds.),Visual Attention and Cognition. Advances in Psychology, Vol. 116. Amsterdam: North-Holland, pp. 95-114. Geiger, G. and Lettvin, J. (1986). Enhancing the perception of form in peripheral vision. Perception, 15,119-130. Gilchrist, I.D., Findlay, J.M. and Heywood, C.A. (in press). Surface and edge information for spatial integration: a saccadic-selection task. Visual Cognition. Grimes, J. (1996). On the failure to detect changes in scenes across saccades. In: K. Akins (Ed.), Perception. New York: Oxford University Press, pp. 89-110. Henderson, J.M. (1993). Visual attention and saccadic eye movements. In: d'Ydewalle, G. and Van Rensbergen, J. (Eds.), Perception and Cognition. Advances in Eye Movement Research. Amsterdam: Elsevier, pp. 37-50. Hoffman, J.E. and Subramaniam, B. (1995). Saccadic eye movements and visual selective attention. Perception and Psychophysics, 57,787-795. Ikeda, M. and Takeuchi, R. (1975). Influence of foveal load on the functional visual field. Perception and Psychophysics, 18, 255-260. Jacobs, A.M. (1986). Eye movement control in visual search: how direct is visual span control? Perception and Psychophysics, 39,47-58. Jacobs, A.M. (1991). Eye movements in visual search: a test of the limited cognitive effort hypothesis and an analysis of the search operating characteristic. In: R. Schmid and D. Zambarbieri (Eds), Oculomotor Control and Cognitive Processes. Normal and Pathological Aspects. Amsterdam: North-Holland, pp. 397-410. Klein, R. (1980). Does oculomotor readiness mediate cognitive control of visual attention? In: R.S. Nickerson (Ed.), Attention and Performance VIII. Hillsdale NJ: Erlbaum, pp. 259-276. Klein, R. (1988). Inhibitory tagging facilitates visual search. Nature, 324,430-431. Klein, R. and Farrell, M. (1989). Search performance without eye movements, Perception and Psychophysics, 46,476-482. Klein, R., Kingstone, A. and Pontefract, A.(1992). Orienting of visual attention. In: K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading. New York: Springer-Verlag, pp. 46-65. Miura, T. (1990). Active function of eye movement and useful field of view in a realistic set-
Eye guidance and visual search
311
ting. In: R. Groner, G. d'Ydewalle and R. Parham (Eds.), From Eye to Mind: Information Acquisition in Perception, Search and Reading. Amsterdam, North-Holland, pp. 119-127. Miiller, H.J. and Findlay, J.M. (1988). The effect of spatial attention on peripheral discrimination thresholds in single and multiple element displays. Acta Psychologica, 69, 129-155. Miiller, H.J., Humphreys, G.W. and Donnelly, N. (1994). Search via Recursive Rejection (SERR): visual search for single and dual form conjunction targets. Journal of Experimental Psychology: Human Perception and Performance, 20, 235-258. Miiller, H.J. and Rabbitt, P.M.A. (1989). Reflexive and voluntary orienting of visual attention: time course of activation and resistance to interruption. Journal of Experimental Psychology: Human Perception and Performance, 15, 315-330. Neisser, U. (1967). Cognitive Psychology. New York: Appleton-Century-Crofts. Palmer, J., Ames, C.T. and Lindsey, D.T. (1993) Measuring the effect of attention on simple visual search. Journal of Experimental Psychology. Human Perception and Performance, 19, 108-130. Pashler, H. (1987). Detecting conjunction of color and form: re-assessing the serial search hypothesis. Perception and Psychophysics, 41, 191-201. Posner, M.I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3-25. Prinz, W. (1984). Attention and sensitivity in visual search. Psychological Research, 45, 355-366. Prinz, W., Nattkempfer, D. and Ullman, T. (1992). Moment to moment control of saccadic eye movements: evidence from continuous search. In: K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading. New York: SpringerVerlag, pp. 108-129. Rayner, K. and Fisher, D.L. (1987). Letter processing during eye fixations in visual search. Perception and Psychophysics, 42, 87-100. Rizzolatti, G., Riggio, L., Dascola, I. and Umilta, C. (1987). Reorienting attention across the horizontal and vertical meridians: evidence in favor of a premotor theory of attention. Neuropsychologia, 25, 31-40. Rizzolatti, G., Riggio, L. and Sheliga, B.M. (1994). Space and selective attention. In: C. Umilta and M. Moscovitch (Eds.), Attention and Performance XV. Cambridge MA: MIT Press, pp. 231-265. Rovamo, J., Virsu, V. and Nasanen, R. (1978). Cortical magnification factor predicts the photopic contrast sensitivity of peripheral vision. Nature, 271, 54-55. Schall, J.D. (1995). Neural basis of saccade target selection. Reviews in the Neurosciences, 6, 63-85. Shepherd, M., Findlay, J.M. and Hockey, G.R.J. (1986). The relationship between eye movements and spatial attention. Quarterly Journal of Experimental Psychology, 38A, 475-491. Strasburger, H., Harvey, L.O. and Rentschler, I. (1991). Contrast thresholds for identification of numeric forms in direct and eccentric view. Perception and Psychophysics, 49, 495-508. Strasburger, H., Rentschler, I. and Harvey, L.O. (1994). Cortical magnification factor fails to
312
J.M. Findlay & I.D. Gilchrist
predict visual recognition. European Journal of Neuroscience, 6, 1583-1588. Tassinari, G., Aglioti, S., Chelazzi, L., Peru, A. and Berlucchi, G (1994). Do peripheral non-informative cues induce early facilitation of target detection? Vision Research, 34, 179-189. Toet, A. and Levi, D.M. (1992). Spatial interaction zones in the parafovea. Vision Research, 32, 1349-1357. Townsend, J.T. (1971). A note on the identifiability of parallel and serial processes. Perception and Psychophysics, 10, 161-163. Treisman, A. and Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology, 12,97-136. Treisman, A. and Gormican, S. (1988). Feature analysis in early vision: evidence from search asymmetry. Psychological Review, 95, 15-48. Treisman, A. and Sato, S. (1990). Conjunction search revisited. Journal of Experimental Psychology: Human Perception and Performance, 16,459-478. Vitu, F. (1991). The existence of a centre of gravity effect during reading. Vision Research, 31, 1289-1313. Viviani, P. (1990). Eye movements in visual search. Cognitive, perceptual and motor control aspects. In: E. Kowler (Ed.), Eye Movements and Their Role in Visual and Cognitive Processes. Amsterdam: Elsevier, pp. 353-393. Viviani, P. and Swensson, R.G. (1982). Saccadic eye movements to peripherally discriminated visual targets. Journal of Experimental Psychology: Human Perception and Performance, 8, 113-126. Wertheim, T. (1894). Uber die indirekte Sehscharfe. Zeitschrift fur Psychologic und Physiologic der Sinnesorgans, 7, 121-187. Widdel, H. (1981). A method for measuring the visual lobe area. In: R. Groner, C. Menz, D.F. Fisher and R.A. Monty (Eds.), Eye Movements and Psychological Functions: International Views. Hillsdale NJ: Erlbaum, pp. 73-83. Williams, L.G. (1966). The effect of target specification on objects fixated during visual search. Perception and Psychophysics, 1, 315-318. Wolfe, J.M. (1994). Guided search 2.0 A revised model of visual search. Psychonomic Bulletin and Review, 1, 202-228. Wolfe, J.M., Cave, K.R. and Franzel, S.L. (1989) Guided search: an alternative to the feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance, 15,419-433. Zelinsky, G. (1996). Using eye saccades to assess the selectivity of search movements. Vision Research, 36, 2177-2187. Zelinsky, G. and Sheinberg, D. (1995). Why some search tasks take longer than others: using eye movements to redefine reaction times. In: J.M. Findlay, R. Walker and R.W. Kentridge (Eds.), Eye Movement Research: Mechanisms, Processes and Applications. Amsterdam: North-Holland, pp. 325-336. Zelinsky, G.J. and Sheinberg, D.L. (1997). Eye movements during parallel-serial visual search. Journal of Experimental Psychology: Human Perception and Performance, 23, 244-262.
313
CHAPTER 14
Prefixational Object Perception in Scenes: Objects Popping Out of Schemas Peter De Graef University ofLeuven
Abstract Semantic influences of context on the ease of object identification in real-world scenes are commonly accepted, but when eye movements are taken into account the unanimity dwindles. The question is whether object-in-scene semantics only come into play when the object is foveated or whether they already have an impact during extrafoveal, prefixational object processing, and if so, whether semantic consistency or inconsistency would enhance extrafoveal processing. A theoretical framework (mismatch theory) is borrowed from reading and word recognition to support the hypothesis that both consistency and inconsistency may facilitate extrafoveal processing. Two earlier studies of context-sensitive object identification in scenes are reanalysed to provide an initial test of the validity of the theoretical framework. Analysis of context effects on gaze shift frequency, gaze shift destination and gaze shift latencies suggests that in the earliest stages of scene exploration sceneinconsistent objects are more salient saccade targets. However, this does not appear to be a pop-out phenomenon based on attentional capture by schema-inconsistent objects, but rather reflects a smaller useful field of view for such objects. If attention is selectively captured at the onset of scene exploration, it appears to be by schema-consistent rather than inconsistent objects.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
314
P.DeGraef
Introduction When comparing reviews of eye-movement research on real-world scene perception, there is unanimity on one conclusion: objects fixated in a scene in which they are likely to appear are easier to identify than the same objects fixated in an implausible context (Boyce and Pollatsek, 1992a; De Graef, 1992; Henderson, 1992a; Rayner and Pollatsek, 1992). This unanimity is hardly surprising given the repeated finding that plausible objects exhibit shorter first-fixation durations (De Graef, Christiaens and d'Ydewalle, 1990), shorter gaze durations (Antes and Penland, 1981; Friedman, 1979; Loftus and Mackworth, 1978), and shorter naming latencies (Boyce and Pollatsek, 1992b) than implausible objects. As will be documented below, however, less unanimity exists with respect to plausibility effects on the identifiability of objects that are not being fixated. Logically, this implies extrafoveally located objects that have not yet been fixated (i.e., prefixational) or objects the eye has already left behind (i.e., postfixationaf). In the remainder of the discussion, I will focus on the former category because, to the best of my knowledge, no empirical data are available for the latter. Prefixational perceptibility: pop-out versus schema-driven? Empirical data on this issue are not abundant but this has not prevented the emergence of strong claims about the impact of context on the extrafoveal perceptibility of prefixational objects. A recent example is Christie and Klein's (1995) reference to "the well-established phenomenon of unexpected items popping out in natural scenes (Loftus and Mackworth, 1978)" (p. 550). In citing Loftus and Mackworth, Christie and Klein follow a long tradition in which this paper is cited to support the claim that objects that do not belong in a given real-world environment stand out and immediately and inexorably draw the viewer's attention and gaze (e.g., Johnston et al., 1990; Marks, McFalls and Hopkinson, 1992; Pezdek et al., 1989). Among researchers of scene perception, however, this claim has received little support. Chapter 12 by Henderson and Hollingworth provides an exhaustive overview of the criticisms, so I will limit myself to three points pertinent to the present chapter. First, the Loftus and Mackworth finding that an implausible object in a scene is fixated earlier on in scene exploration than a plausible object at the same location in the same scene has frequently been criticized for a possible confound between implausibility of an object and its visual dissimilarity from the rest of the scene (e.g., Rayner and Pollatsek, 1992; Henderson, 1992a). Second, subsequent eye-movement studies failed to find a fixation precedence for implausible over plausible objects either when they are presented together, at different positions in the same scene
Prefixational object perception in scenes
315
(Friedman and Liebelt, 1981) or when presented separately at the same position in separate copies of the same scene (De Graef, Christiaens and d'Ydewalle, 1990; Henderson, Weeks and Hollingworth, in press). Finally, when extrafoveal target objects need to be discriminated in briefly flashed scenes, performance is consistently better for plausible objects, which appears to indicate the opposite of implausible pop-out (Biederman, Mezzanotte and Rabinowitz, 1982; Boyce, Pollatsek and Rayner, 1989; De Graef and d'Ydewalle, 1995). Based on the ambiguous evidence for attention/gaze capture and on the superior extrafoveal discriminability of plausible objects, several authors have suggested that object plausibility increases the useful field of view (Antes and Penland, 1981; Biederman et al., 1981; Friedman, 1979). In other words, the perceptibility of prefixational objects in real-world scenes is enhanced when they belong in the scene. Following Friedman (1979), the main mechanism underlying this enhancement is the activation of a scene-specific schema stipulating the gross features of the objects likely to be present in the scene. For instance, activation of a kitchen schema could sensitize the visual system towards detection of a refrigerator, defined as a large, upright, shiny and white, brick-shaped object. Wolfe and Bennett (1997) recently argued that features of this type (with the possible exception of shape) can be preattentively segmented from a scene and correctly assigned to an object. According to Friedman (1979), the mere detection of such a schema-specified bundle of features may in itself be sufficient to trigger object identification. Alternatively, the feature-bundle may constitute a salient target for an attention shift followed by enhanced data-driven processing at the attended location (Antes and Kristjanson, 1993; Antes and Penland, 1981). Either way, prefixational perceptibility will be greater than that of objects which are not part of the activated schema and whose diagnostic features need to be extracted and bound in a foveal analysis of high-resolution detail. From the above, it should be clear that two quite different views are being advocated with respect to the context-sensitivity of prefixational object perception in real-world scenes: schema-driven facilitation of context-consistent objects on the one hand, and perceptual pop-out of context-inconsistent objects on the other hand. Surprisingly, these two views are primarily held as convictions and no systematic attempts have been made either to reconcile them or unequivocally decide between them. This chapter does not pretend to put an end to this state of affairs. Rather, I would like to set the stage for future experimental investigation of this neglected issue in scene research. Specifically, I will indicate some potentially relevant hypotheses developed in research on word recognition and reading. As a preliminary test of their applicability to prefixational object perception, they will be pitted against previously unanalyzed eye-movement data which I collected in an ongoing series of studies aimed at establishing the boundary conditions under which object fixations in scenes exhibit context effects.
316
P.DeGraef
A mismatch theory of prefixational perceptibility: pop-out and schema-driven In an extended series of experiments, Johnston and colleagues presented viewers with brief flashes (33-400 ms) of four-item arrays containing either words (Johnston et al., 1990; Johnston, Hawley and Farnham, 1993) or nonsense symbol-strings (Hawley, Johnston and Farnham, 1994). Following a mask, viewers were given a probe word or string and were asked to indicate where it had been located in the array. Since the arrays subtended about 6x4° and exposure durations allowed for one eye movement at most, perceptual processing in this task had to be primarily prefixational, which explains the potential relevance of these studies to our current topic of discussion. Analysis of the percentage of correct localizations showed three basic effects. First, the baseline effect, that is, superior localization in all-familiar arrays — which repeatedly combined the same items — than in all-novel arrays — which consisted of previously unseen items. Second, novel pop-out, defined by better localization of single novel objects in otherwise familiar arrays (i.e., one-novel arrays) than for the same objects in all-novel arrays. Third, familiar sink-in, reflected in lower accuracy for familiar objects in a one-novel array than in all-familiar arrays. In order to explain these findings coherently, Johnston and Hawley (1994) argue that the effects show that schema-driven perception and pop-out need not be mutually exclusive. In fact, they consider it to be an essential characteristic of the adaptive mind to refrain from superfluous data-driven processing of predictable input and at the same time remain alert for novel input. According to Johnston and Hawley's mismatch theory, this is achieved by an automatic learning process through which repeated co-occurrence of the same items in an array fuses the items into a unitized array. This process has three consequences for future encounters with the unitized arrays. Specifically, during the initial glance parallel processing of array items will very rapidly produce (1) increased concept-driven processing of items that match expectations, (2) suppressed data-driven processing of the same expected items, and (3) enhanced data-driven processing of items that do not match expectations. At first sight, this computational model of perceptual facilitation and inhibition in experimentally learned arrays covers a great deal of the available data for experientially learned scenes. First, increased concept-driven processing of expected inputs explains why plausible objects require shorter fixation and naming times in extended scene viewing and exhibit superior prefixational detection and forced choice recognition in briefly flashed scenes. Moreover, the limitation of the hypothesized processing effects to unitized arrays would explain why no plausible advantage is found when object arrays are used instead of full scenes (Antes and Penland, 1981; Antes, Penland and Metzger, 1981; Biederman et al., 1988; De Graef, 1990; Henderson, Pollatsek and Rayner, 1987). Second, suppressed data-driven processing of expected items would explain inferior memory for the presence, location and featural details
Prefixational object perception in scenes
317
of plausible objects in real-world scenes (Friedman, 1979; Henderson and Hollingworth, 1997; Pezdek et al., 1989). Finally, enhanced data-driven processing of unexpected items could account for a fixation precedence for single implausible objects in scenes (Loftus and Mackworth, 1978) under the assumption that attention/ gaze shifts to a particular location are a consequence rather than the cause of heightened input processing at that location (Dark, Vochatzer and Van Voorhis, 1996; Hoffmann, 1987; Johnston and Hawley, 1994). The disappearance of the implausible fixation precedence in non-unitized arrays of episodically related objects (Antes and Penland, 1981; De Graef, 1990) is also consistent with mismatch theory. In spite of its intuitive appeal as a comprehensive account of plausibility effects in scene perception, mismatch theory still does not seem to resolve the controversy surrounding prefixational object perceptibility. The theory appears to predict earlier fixation and superior extrafoveal discrimination of implausible objects, and neither prediction is confirmed. However, the solution to this problem may lie in a recent exchange about the reliability of novel pop-out in word arrays. Specifically, Christie and Klein (1996) criticized the work by Johnston and colleagues for failing to establish replicable and unconfounded within-array novel pop-out, that is, better localization of the novel than of the familiar objects in one-novel arrays. Johnston and Schwaiting (1996) conceded that within-array novel pop-out is a somewhat elusive effect because its emergence depends on the relative magnitude of all three perceptual processing effects defined above. Specifically, if concept-driven facilitation of expected items is too large to be outweighed by suppression of expectedinput processing and enhancement of unexpected-input processing, there will still be a perceptibility advantage for the expected inputs in one-novel arrays. Hence, Johnston and Sch waiting argue, between-array novel pop-out is a better diagnostic. If unexpected inputs in one-novel arrays do draw the focus of data-driven processing, then perceptibility of the novel singleton should be superior to that of the same object in an all-novel array. When applied to scene perception, the within-array vs. between-array distinction may resolve the schema vs. pop-out debate. Specifically, it suggests that only in scenes with a low to moderate conceptual facilitation of the plausible objects in them, one may be able to observe fixation precedence and superior extrafoveal discrimination of the implausible intruder relative to its plausible companions in the scene. In scenes with a strong conceptual facilitation of plausible objects, a perceptual advantage for the implausible object may disappear or reverse: either the extrafoveal detection of schema-specified object features allows superior prefixational identification of plausible objects, or it more effectively segments the plausible object from its background and thus provides a more salient target for an attention shift. Hence, the repeated failures to replicate the Loftus and Mackworth (1978) finding of an implausible-object fixation precedence need not rule out that object implausibility increases attentional saliency. Instead, the easier segmentation of
318
P.DeGraef
plausible objects from the densely packed scenes used in the later studies (De Graef et al., 1990; Friedman and Liebelt, 1981; Henderson et al., 1997) may have swamped an implausible-object pop-out effect which did surface in the sparsely populated scenes that were used by Loftus and Mackworth (1978). Testing a mismatch theory of prefixational object perceptibility Future tests of the mismatch account of scene perception will require two levels of research. A first level is to establish the three between-array effects: (1) the baseline effect, i.e., better perceptibility of plausible objects in a scene populated with plausible companion objects (all-plausible scene) than in a scene with only implausible and episodically unrelated companions (all-implausible scene); (2) familiar sink-in, i.e., better perceptibility of plausible objects in an all-plausible scene than in scene containing an implausible singleton in a company of plausible objects (oneimplausible scene); and (3) novel pop-out, that is, better perceptibility of the implausible singleton in a one-implausible scene than in all-implausible scene. A second level at which the theory needs to be tested is that of the perceptual processing hypotheses advanced by Johnston and colleagues: do plausible objects exhibit enhanced concept-driven as well as suppressed data-driven processing and is enhancement of data-driven processing limited to implausible objects? It is at this level that I want to introduce some data which appear to question the validity of mismatch theory. Christie and Klein (1995) presented subjects with arrays of two extrafoveally located letter strings, one a regular word, the other an unpronounceable nonword. After a brief exposure of 100, 200, or 400 ms, one of the strings was shifted up or down and the subject was to report the direction of the shift as quickly and accurately as possible. The important finding in this study was that shift detection for the regular words was superior to that for non-words, but only for the 100 and 200 ms exposures. Christie and Klein conclude that this shows an initial period of enhanced data-driven processing of familiar items caused by a rapid capture of attention. Further support for processing precedence for familiar items can be found in the work of Hoffmann (1987), who demonstrated that in arrays of extrafoveally located objects, the pictures with fast basic-level categorization times (approx. 485 ms) invariably drew more attention than the pictures with slower categorization times (approx. 510 ms). Similarly, Dark et al. (1996) reported that in 100 ms exposures of two-word arrays, attention was captured by the extrafoveal word that was semantically related to a central prime presented just prior to array-exposure. These data indicate that the prevailing activation level of a lexical or conceptual node strongly influences the perceptual system's reactivity to any evidence in the visual field for that node's referent. Applied to the domain of prefixational object
Prefixational object perception in scenes
319
perceptibility this suggests that not the implausible but the plausible objects should initially capture attention and data-driven processing because their conceptual representations receive a surplus activation from context.
Initial attention capture: evidence from the wiggle paradigm In order to determine to what extent scene-context effects on object identification are caused by inter-object priming and/or object-in-scene plausibility, De Graef, De Troy and d'Ydewalle (1992) conducted a study in which viewers were asked to search for non-objects embedded in a black-on-white line drawing of a real-world scene. While participants could freely explore the scene, we did attempt to steer their first saccade from a designated prime object to a designated target object by means of a technique borrowed from Boyce and Pollatsek (1992b). As shown in Fig. 1, each trial started with a fixation cross on a blank CRT-screen. When participants fixated the cross for at least 200 ms, as registered by a Generation 5.5 dualPurkinje-image eye tracker, the stimulus control program automatically initiated an 8-s exposure of a scene. Stimuli always contained an object at the former location of the fixation cross (the prime object) as well as a peripherally located object which rapidly moved up and down after 160 ms of scene exposure (the target object). By introducing this target wiggle during fixation of the prime we hoped to elicit an automatic orienting response from prime to target. While we were primarily interested in target fixation parameters as a function of the target's relation to the background and the prime, this study now provides us with some data on prefixational target perceptibility. Specifically, because the wiggle started at 160 ms delay from scene onset we can test the notion of an initial processing capture by the conceptually activated sceneconsistent objects. Following Christie and Klein's (1995) logic we can assume that if plausible objects initially capture attention, an early wiggle of a plausible target should be easier to detect. Because our participants were not informed about the wiggle an implicit measure of wiggle detection needed to be computed. For this purpose, I selected all trials on which the wiggle occurred during the first scene fixation, which in this experiment always fell on the prime object. On these trials, two groups of indexes of ocular reactivity were computed. The first group pertained to the saliency of the wiggled object as a saccade target and included three measures: (1) the proportion of trials on which the viewer's gaze moved directly from the prime to the target (i.e., direct hits); (2) the proportion of trials on which the target was left unfixated (skips); and (3) the number of intervening fixations on the remaining trials where the gaze did not directly shift from prime to target (delayed hits). The second group measured the degree to which the target wiggle interrupted ongoing prime processing. This was operationalized by computing fixation para-
320
P. De Graef
Fig. 1. Course of a trial in the wiggle paradigm. Following a fixation of minimum 200 ms on a fixation cross, an 8-s scene exposure is automatically initiated, resulting in fixation on the prime object (circled for expository purposes). After a 160 ms delay, a 120 ms wiggle is started in which a designated target object moves up and down twice with an amplitude of 4 min of arc (arrow illustrates motion). The wiggle is intended to elicit a saccade from prime to wiggled target (example of ideal saccade superimposed on scene).
Prefixational object perception in scenes
321
Fig. 2. Example of plausible (right panel) and implausible (left panel) conditions for the target rolling pin. Scene exploration for these stimuli would start at the computer in the implausible office background, and at the blender in the plausible kitchen background.
meters for the prime: First fixation duration, first gaze duration (the sum of consecutive fixation durations before the eye first leaves the prime), and first-pass refixations (the number of consecutive fixations in the first gaze). The data relevant to the present discussion were collected from 12 viewers for two stimulus conditions: one in which the wiggled target was plausible as were all the accompanying objects, and one in which it was an implausible singleton. As illustrated in Fig. 2, stimuli were constructed by inserting each of 20 target objects in two different contexts — one plausible, one implausible — at approximately the same distance from the initial scene fixation. Thus, targets wiggled at an eccentricity of 7.5° on average which was constant across plausible and implausible targets as verified in a targets x target plausibility analysis of eccentricities, F(l,19) = 0.27). A more detailed description of the stimuli can be found in De Graef et al. (1992). Proportions of direct hits and skips were analyzed in a subjects x target plausibility repeated-measures ANOVA. As can be seen in Table 1, target plausibility did not affect the proportion of direct hits, F(l,l 1) = 0.1. There was a tendency to skip the plausible targets more often but it was not reliable, F(l,l 1) = 2.67, p - 0.13, MSe = 0.009. In addition, a subjects x target plausibility repeated-measures ANOVA on delayed hits only showed no reliable effect on the number of fixations required to complete a gaze shift from prime to target, F(l,l 1) = 0.04. Prime fixations were analyzed in a subjects x target plausibility x gaze shift type (direct hit vs. skip vs. delayed hit) repeated-measures ANOVA. Table 2 shows that when an implausible target was wiggled extrafoveally the primes received reliably shorter first gazes (F( 1,11) = 11.15, p < 0.007, MSe = 63,789) and fewer fixations in the first gaze (F(l,l 1) = 11.89, p < 0.006, MSe = 0.386). This effect was qualified by a target plausibility x gaze shift type interaction (F(2,19) = 6.14, p < 0.009, MSe 34,085, for gaze durations; F(2,19) = 3.05, p < 0.071, MSe = 0.342, for first-pass refixations. Specifically, prime fixations preceding a direct gaze shift towards the
322
P. De Graef
Table 1 Effects of wiggled target's plausibility on direct target hits, target skips, and number of intervening fixations preceding delayed target hits (wiggle SOA =160 ms) Target Plausible
Implausible
Direct target hits (%)
35.0
35.8
Target skips (%)
17.5
11.1
Prime-target delays
3.57
3.41
Table 2 Ongoing prime fixations as a function of wiggled target's plausibility and type of subsequent gaze shift (SOA = 160 ms) Gaze shift type
Implausible target
Plausible target Direct hits
Skips
Delayed hits
Direct hits
Skips
Delayed hits
First fixation (ms)
392
351
348
375
377
321
First gaze (ms)
428
612
503
426
402
376
First-pass refixations
1.22
1.69
1.60
1.18
1.16
1.31
target were unaffected by the target's plausibility, while prime fixations preceding a delayed hit or a skip of the target were longer when the target was plausible. While the more frequent and earlier fixation of implausible objects indicates that they are more salient saccade targets than plausible objects, there is no evidence that this is caused by a greater noticeability of the implausible-target wiggle which in turn is the result of an immediate capture of attention. To the contrary, target plausibility had no effect on the frequency and speed of those gaze shifts which are most likely to be directly elicited by the wiggle, that is, the direct hits. That target plausibility did affect prime fixations preceding a delayed target hit or a target skip is consistent with the notion that plausible targets are easier to process extrafoveally. The underlying rationale is that the latency of a gaze shift away from the prime is determined by the rate at which foveal prime and extrafoveal target information are acquired: As long as this rate does not drop below a minimum criterion, the gaze will remain on the prime. Because such a long prime gaze is
Prefixational object perception in scenes
323
partially caused by extensive extrafoveal target processing the need for foveal target analysis is reduced, resulting in a subsequent delay or even cancellation of target fixation. Note that this argument is entirely consistent with explanations of similar data patterns in reading and word recognition. Specifically, Kennedy (Chapter 7) reports that gaze durations on a foveal word are affected by length and initial trigram frequency of a simultaneously present parafoveal word. Kennedy interprets this as evidence for a model of eye-movement control in which the rate of foveal and parafoveal information intake is a determinant of when the eye will move. Further support for a relation between long gaze durations and extensive extrafoveal processing is found in reading where the skipping rate of words was found to increase when they were highly predictable from the preceding sentence (Balota, Rayner and Pollatsek, 1985; Rayner and Well, 1996). This skipping effect was attributed to extrafoveal identification of predictable words because fixations on the preceding word were much longer, presumably due to the additional extrafoveal processing and the subsequent reprogramming of the next saccade to go beyond the identified word. So far, the wiggle paradigm has yielded indications of a smaller useful field of view for implausible targets resulting in a greater need for foveal analysis. No evidence was found of a greater responsiveness to implausible-target wiggles caused by a stronger capture of attention. However, average speed and frequency of direct gaze shifts to the wiggling target may not be the best measure of wiggle responsiveness because direct hits can result from two categories of gaze shifts: Reactive shifts exogenously controlled by the wiggle, and active shifts endogenously controlled by the visual system's need for foveal analysis. Clearly, the former type would be most informative with respect to wiggle responsiveness: objects that more frequently capture attention prior to the wiggle should elicit a greater proportion of truly reactive shifts towards the wiggle. One way to distinguish between active and reactive gaze shifts is to look at time-locked effects of the wiggle on the distribution of single prime fixations. Previous research showed a disruptive effect of an abrupt visual onset on the duration of the ongoing fixation with a minimum delay of about 90 ms (Blanchard et al., 1984; McConkie et al., 1985) and a maximum impact after about 120 ms (van Diepen, De Graef and d'Ydewalle, 1995). Given its constant timing, this effect on eye movements can be interpreted as a reflex-like orienting response to the onset. Assuming that the wiggle is a kind of abrupt visual onset, this means that the distribution of all single prime fixations of at least 160 ms should show a timelocked rise starting at wiggle SOA + 90 (= 250 ms) and peaking at wiggle SOA + 120 (= 280 ms). Such a peak in the distribution of gaze shift latencies, would indicate a sudden increase in the likelihood to reflexively shift away from the prime in response to the target wiggle. Figure 3 shows the distributions as a function of
324
P. De Graef
Fig. 3. Relative frequency distribution of gaze shift latencies away from the prime as a function of the wiggled target's plausibility. All distributions are based on single-fixation prime gazes of at least 160 ms. Bin-size is 20 ms and the graphs plot the midpoints of the bins. Dark bands indicate hypothetical location for reflexive (240-320), fast voluntary (340-400), and delayed voluntary (420-460) gaze shift distributions.
target plausibility. As expected, both distributions show a sudden rise in the 240-260 ms bin (i.e., wiggle SOA + 90 ms) which peaks in the 280-300 ms bin (i.e., wiggle SOA +120 ms). The peak is somewhat larger for the implausible targets, which would be consistent with the notion of attentional capture by implausible objects. Much more striking, however, is the plausibility effect in the remainder of the distributions. While both distributions show a second peak between 340 and 400 ms, only the plausible-target distribution has a third peak between 420 and 460 ms. The multimodality of the distributions suggests that perhaps they are a multinomial mixture of three different underlying distributions, centered in the shaded areas in Fig. 3. Yantis, Meyer and Smith (1991) argue that a statistical test of this hypothesis requires pure samples of the underlying basis distributions, which in turn requires a theory of the different processes giving rise to the different distributions. Only then can one set up the experimental conditions under which pure samples of each distribution are most likely to be obtained. Obviously, this is beyond the scope of the present chapter which merely reanalyses earlier data. At this stage, I can only speculate about the origin of the observed distributions. The basis for this speculation is an inspection of the locations targeted by the gaze shifts. Specifically, I first computed the distribution of direct prime-to-target shifts along the gaze shift latency continuum. As can be seen in Fig. 4, direct hits (0 intermediate fixations) occurred most frequently following a gaze shift in the 340-400 ms region. A further delay of a direct gaze shift to the target was less likely
Prefixational object perception in scenes
325
Plausible Targets
Fig. 4. Relative frequency distribution of latencies to shift gaze from prime to target. Interval 0 distributions plot direct prime-to-target shifts as a function of target plausibility, interval 0+1 distributions do the same for prime-to-target shifts with maximum 1 intervening fixation. Construction of distributions and time bands is identical to Fig. 3.
if the target was implausible. For plausible targets, however, there was a second large concentration of direct hits following gaze shifts with latencies between 420 and 460 ms. Surprisingly, there were relatively few direct hits following gaze shifts in the 240-320 ms region which presumably includes reflexive orienting responses to the target wiggle. A visual inspection of scanpaths suggested, however, that this may be a matter of accuracy: Fast saccades away from the prime appeared to frequently just miss the target and were then followed by a quick corrective saccade landing on the target. When we include these 'corrected' hits (1 intermediate
326
P. De Graef
Fig. 5. Relative frequency distribution of gaze shift latencies away from the prime as a function of the wiggled target's plausibility and the destination of the gaze shift. Top panel plots gaze shifts that landed on the target after maximum 1 intervening fixation, bottom panel plots all other gaze shifts. Construction of distributions and time bands is identical to Fig. 3.
fixation), the likelihood of a reactive hit becomes comparable to that of later hits, demonstrating that the faster gaze shifts were indeed less accurate. Figure 5 (top panel) plots the resulting distributions of direct + corrected hits as a function of target plausibility and compares them with the distributions of gaze shifts to non-target locations (bottom panel). The first thing to note is that the greater proportion of reflexive shifts in the overall implausible-target distribution of Fig. 3 reflects more frequent gaze shifts to both target and non-target locations in that time band. This could mean that,
Prefixational object perception in scenes
327
regardless of where attention is allocated, peripheral motion of an implausible object is always a more noticeable event, perhaps because of its presumed featural dissimilarity from the scene (Rayner and Pollatsek, 1992). Consequently, the distributions provide no strong evidence of meaning-based attentional capture by implausible objects, particularly when one takes into account the slightly greater density of plausible-target directed shifts at the very beginning of the reflexive time band (i.e., at wiggle SO A + 90 = 250 ms). What the distributions do show is that gaze shifts in the 340-400 ms time band are preferentially directed towards the target: regardless of target plausibility, the proportion of target-directed shifts peaks in this region, while that of non-target shifts remains constant or drops. Apparently, target saliency was at its maximum at this point and the timing coincides with that of voluntary saccades measured for stimuli identical to the line drawings used in the present experiments. Specifically, for viewers that were freely exploring these scenes for purposes of memorization or object search, Henderson et al. (in press) reported unimodal fixation duration distributions peaking at 210-220 ms. Thus, the 370 peak in the present distributions can be interpreted as a population of voluntary saccades towards objects that require further foveal analysis after their presence has been signalled by their motion 210 ms earlier (i.e., at wiggle SOA 160 ms). Finally, for plausible targets only, gaze shifts to both targets and other locations show an outspoken frequency increase in the 420-460 ms range. The effect is strongest for shifts to non-target locations which show a steep recovery after a period of infrequency due to greater target saliency. Combined with the earlier finding of long fixation times preceding delayed target hits and target skips (Table 2) this suggests that the third peak is the product of a period of extensive extrafoveal attending to plausible targets, resulting in delayed saccades. Summarizing the above, I would like to speculate that the distributions in Fig. 3 are a mixture of three distinct populations of gaze shifts: (1) in the 240-320 ms region, involuntary, wiggle-evoked, and inaccurate gaze shifts; (2) in the 340-400 ms region, fast, voluntary, target-directed and accurate gaze shifts, prompted by an insufficient rate of extrafoveal target processing; (3) in the 420-460 ms region, slow, voluntary gaze shifts targeting or deliberately skipping peripherally located plausible target objects, delayed because of a high rate of context-supported extrafoveal target processing. As noted before, this hypothetical multinomial mixture of component distributions needs to be confirmed statistically in future research. Pertinent to the present discussion will be the question whether the three mixing proportions are reliably affected by the plausibility of the wiggled target. It seems safe to predict that, relative to plausible targets, implausible targets will prove to elicit more fast voluntary and fewer delayed voluntary gaze shifts, thus indicating that an implausible object constitutes a more salient target for a voluntary saccade because it is more difficult to process extrafoveally. Predictions are less clear with
328
P. De Graef
respect to context effects on the proportion of samples from the reflexive gaze shift population. On the one hand, the reflexive time band displayed a greater overall ocular reactivity to implausible-target wiggles. On the other hand, at the very onset of the reflexive band, gaze shifts specifically directed at the wiggling object were more likely when it was plausible. One way of reconciling these observations is to assume that attention was more likely to be captured by plausible targets at the time of the wiggle, but that the wiggle itself was a more noticeable and disruptive peripheral event when the wiggling object was implausible. The latter would be consistent with the notion of greater featural dissimilarity between implausible objects and their inconsistent environment, while rapid attentional capture by plausible objects would be in line with Christie and Klein's (1995) findings with an explicit motion detection paradigm. To examine the viability of this interpretation, I analyzed data from a second wiggle study. Again, this study was conducted to examine context effects on the identification of fixated objects and thus provides no direct tests of the hypothesised context effects on prefixational, attentional capture. However, the study did involve two changes from the first experiment which may be informative. First, if attention was indeed initially captured by plausible targets then this advantage should increase as the wiggle SO A is reduced from 160 ms to 140 ms. The capture effect Christie and Klein (1995) found for familiar targets was strong at 100 ms motion SO A but virtually disappeared by 200 ms, so the 160 ms SO A wiggle in the first study may only have revealed the tail of the plausible capture effect. Second, if implausible-target motion is more noticeable in peripheral vision then this advantage should be reduced by bringing the moving object closer to the fixation point. In this second study, we systematically manipulated the distance between the wiggled target and the preceding prime. Each of 16 targets could appear in a 6-s exposure of a plausible or implausible scene, but orthogonal to this they could also be near or far to the point of fixation when they wiggled. This manipulation of target eccentricity was achieved by simply creating two versions of each plausible and implausible scene: one with the target at about 3° from the prime and one with the target at about 8° from the prime. A targets x target plausibility x target eccentricity analysis of prime-target distances revealed F(l,15) < 1 for all effects involving target plausibility. Eye-movement data were collected from 12 new participants and with the exceptions noted above experimental procedure and stimuli were identical to that of the first study. Table 3 presents the means obtained in a subjects x target plausibility x target eccentricity repeated-measures ANOVA of all trials on which the wiggle occurred during the first scene fixation. The proportion of trials on which the eye saccaded directly to the wiggled target was affected by target plausibility and target eccentricity: implausible targets produced more captures (F(l,l 1) = 6.74, p < 0.03, MSC 0.007), and so did near targets (F(l,ll) = 11.25, p < 0.007, M5e = 0.008). The
Prefixational object perception in scenes
329
Table 3 Effects of wiggled target's plausibility and eccentricity on direct target hits, target skips, and number of intervening fixations preceding delayed target hits (wiggle SOA =140 ms) Near target
Far target
Plausible
Implausible
Plausible
Implausible
Direct target hits (%)
62.6
64.6
49.5
60.3
Target skips (%)
11.2
4.0
6.6
4.1
Prime-target delays
4.01
2.84
3.07
1.81
plausibility effect appeared to be stronger for far targets, but this interaction was not reliable (F( 1,11) = 1.2, p < 0.3, MSe = 0.019). Plausible targets were also marginally more likely to be left unfixated (F(l,ll) = 4.41, p < 0.06, MSe = 0.006) an effect which appeared to be larger for near targets but this interaction was not reliable (F( 1,11) = 2.1, p < 0.18, M5e = 0.003). In addition, a subjects x target plausibility x target eccentricity repeated-measures ANOVA on delayed target hits revealed that it took more fixations to complete the prime-to-target gaze shift when the target was plausible (F(l,l 1) = 4.83, p < 0.05, MSe = 11.38) or when the target was near to the initial scene fixation (7^(1,11) = 4.15, p < 0.06, MSe = 8.64). All these data are consistent with the interpretation that the need for foveal analysis of a given object determines its saliency as a saccade target, and that the need for foveal analysis is greater for implausible and more eccentric objects. Prime fixations that were ongoing at the time of the wiggle were entered in a subjects x target plausibility x target eccentricity x gaze shift destination (target vs. other) ANOVA. The latter variable divides the observations as a function of where the eye shifted after leaving the prime: target-directed shifts with maximum 1 fixation in the prime-target interval, and shifts directed away from the target. The analysis showed no reliable effects involving target plausibility, although Table 4 reveals that the pattern for far targets was identical to that observed in Table 2: prime fixations preceding a target-directed gaze shift were unaffected by the target's plausibility, while prime fixations preceding a delayed hit or a skip of the target were longer when the target was plausible. Earlier, I interpreted this pattern as suggesting context-supported extended extrafoveal processing of plausible objects, delaying or cancelling the need for a saccade towards these objects. That the pattern was absent for near targets is consistent with the idea that context provides little additional benefit when extrafoveal preview quality is high (Henderson, 1992b). Judging from a comparison of Tables 1 and 3, reducing the wiggle SOA by 20 ms did not decrease but increased the greater saliency of implausible targets as reflected in the higher frequency and shorter delays with which they were targeted by
330
P. De Graef
Table 4 Ongoing prime fixations as a function of wiggled target's plausibility, target eccentricity, and destination of gaze shift (SOA = 140 ms) Plausible target
Implausible target
Target
Other
Target
Other
First fixation (ms)
340
297
330
300
First gaze (ms)
378
343
359
365
First-pass refixations
1.17
1.16
1.13
1.34
First fixation (ms)
356
341
335
347
First gaze (ms)
377
406
375
367
First-pass refixations
1.14
1.25
1.18
1.08
Gaze shift destination Near targets
Far targets
saccades. Based on Table 2,1 argued that this saliency does not result from faster attentional capture by implausible objects but from a greater need for foveal analysis and this interpretation is again confirmed by the similar pattern in Table 4. When one looks at the distributions of gaze shift latencies in this second study, this conclusion is corroborated further. Figure 6 plots gaze shift distributions as a function of target plausibility and eccentricity. The top panels are based on all gaze shifts, the bottom panels exclude gaze shifts that were not directed at the target but these only amounted to 20% of the data which explains the similarity between top and bottom panels. For far targets, the data replicate the findings from the first study (Fig. 3): three gaze shift populations for plausible targets and only two for implausible targets where the delayed-voluntary peak is virtually absent. Note, however, that, with the exception of the fastvoluntary distributions at 370 ms, the timing of the peaks has changed. For plausible targets, the reflexive peak shifted forward by 20 ms as was to be expected given the reduction of the wiggle SOA by 20 ms. In contrast, for implausible targets the reflexive peak shifted backward and moved outside of the reflexive time band (i.e., wiggle SOA +120 ± 40 ms). Finally, the delayed voluntary peak also shifted forward by 20 ms, moving outside of the time band hypothesised from the first study. This could indicate that, like the reflexive gaze shifts, the delayed voluntary shifts are also time-locked. Specifically, one could argue that if attention is summoned 20 ms earlier
Prefixational object perception in scenes
331
Fig. 6. Relative frequency distribution of gaze shift latencies away from the prime as a function of the wiggled target's plausibility and eccentricity. All distributions are based on single-fixation prime gazes of at least 140 ms. Bin-size is 20 ms and the graphs plot the midpoints of the bins. Dark bands indicate hypothetical location for reflexive (220-300), fast voluntary (340-400), and delayed voluntary (420-460) gaze shift distributions. Top panels plot all gaze shifts regardless of destination, bottom panels only plot gaze shifts that landed on the target after maximum 1 intervening fixation.
by the wiggle, then extrafoveal target processing can begin 20 ms earlier and will end 20 ms earlier. Following this rationale, the constant, fast voluntary shifts at 370 ms are not locked to the wiggle which runs counter to the earlier suggestion that they follow the wiggle onset with a saccadic latency that is modal for free exploration of the particular stimuli used in these studies. Obviously, a definitive settlement of these timing issues will require more research using a control distribution of gaze shifts in the absence of a wiggle. It is quite clear, however, that speeding up the wiggle resulted in a separation of the reflexive distributions for plausible and
332
P. De Graef
implausible targets. Gaze shifts to plausible targets speeded up in synchrony with the reduction in wiggle SOA supporting the hypothesis that attention was already captured by the plausible target at the onset of its wiggle. Gaze shifts to implausible targets slowed down, suggesting that by speeding it up the wiggle now preceded the allocation of attention to the target. Thus, the data indicate that by reducing the wiggle SOA, it now occurs within an initial time interval during which plausible objects have attentional precedence over implausible objects. This interpretation is strengthened by inspection of the near-target distributions where plausible targets display a similar advantage. Also note that the delayedvoluntary peak in these distributions is considerably smaller than for the far targets. This probably reflects the fact that identification of near targets in parafoveal vision requires much less surplus processing time than target identification in peripheral vision. Finally, the comparison of near and far targets shows that the slowed, reflexive peak for the implausible targets is much smaller for the near targets. This is consistent with the hypothesis that a greater overall reactivity to implausible-target wiggles reflects a greater peripheral noticeability based on featural target-background dissimilarity.
Conclusions When research on object perception in real-world scenes is referenced in other domains, this is usually done to illustrate the existence of semantic context effects on stimulus identification and on stimulus selection. I started this chapter by pointing out that scene researchers agree on the first effect (although see Henderson and Hollingworth, 1997 for a recent vote of disagreement), but are not convinced with respect to the second. Some authors see no role for context as a determinant of where attention and the eye will go, and others disagree as to what the direction of the effect is: semantically plausible objects might be detected across a greater extent of the visual field and thus receive processing preference, or alternatively, semantically implausible objects might pop out of the scene and therefore attract attention. While scene research had a long-standing theory on why plausible objects should be more perceptible in peripheral vision (i.e., schema-driven perception), no such explanation existed for implausible pop-out which appeared to be supported exclusively by a single, influential observation of a fixation precedence for objects that do not belong in a scene. Nevertheless, implausible pop-out was not classified as a U.F.O. (unreplicable freak observation). This should probably be attributed to the intuition that implausible pop-out would be good from an evolutionary perspective: Slowed detection of the unexpected tiger on the kitchen counter would not contribute to the success of the human species. In support of implausible pop-out, however, Johnston and colleagues recently outlined and tested a computational model of visual
Prefixational object perception in scenes
333
input processing in which competing influences of schema-driven perception and novel pop-out are detailed and reconciled. Based on this theory, scene research can now test for specific patterns of effects as well as processing characteristics which would be diagnostic for schema-driven perception and/or implausible pop-out. In a first attempt to contribute to this enterprise, I presented some data relevant to the hypothesis that implausible objects are more likely than plausible objects to capture the processing focus early on in scene exploration. If this were true, features or events at an implausible target location should be detected more quickly. Specifically, rapid and repeated movement of a peripherally located object should elicit faster and/or more frequent gaze shifts if the wiggled object was implausible and therefore was more likely to be at the center of ongoing data-driven processing. While objects that did not belong, proved to be more salient saccade targets, a detailed analysis of gaze shift latencies and destinations revealed that after 160 ms of scene processing there was no unequivocal processing precedence for these objects: plausible-target wiggles also elicited reflexive orienting responses and the saliency of implausible objects appeared to be caused by a greater difficulty to process them in extrafoveal vision. It is not clear at this point how the larger useful field of view for plausible objects should be understood. One possibility is that top-down activation from context combines with bottom-up activation from extrafoveal processing and thus increases the diagnosticity of extrafoveal features because context has narrowed down the possible identities of the extrafoveal target. This seems to be the view held in explanations of context effects on word skipping in reading (e.g., Rayner and Morris, 1992; Rayner and Well, 1996). Another possibility is that contextual activation directly enhances the quantity and speed at which extrafoveal target features are encoded from plausible objects in the image. Consistent with this notion is the claim that conceptually activated objects exert a very early pull on attention (e.g., Christie and Klein, 1995; Hoffmann, 1987) thus enhancing perceptual sensitivity at the attended location (e.g., Downing, 1988; Hawkins et al., 1990). This claim was supported in a between-experiment comparison of wiggle detectability at 160 and 140 ms SOA. At the shorter SOA, plausible-target wiggles were more likely than implausible-target wiggles to elicit reflexive orienting responses and the effect decreased at the longer SOA. Naturally, a within-experiment parametric variation of wiggle SOA is needed to unambiguously establish the suggested interaction between processing interval and processing precedence. Given the fine-grained analysis that appears to be required to detect the signs of processing precedence, this does not promise to be an easy task. However, if we want to improve our understanding of the blend of conceptdriven and data-driven processing in real-world scene perception we will have to allow for the possibility that the exact dosages may vary over time, both within a single fixation and across the whole extent of scene exploration.
334
P. De Graef
Acknowledgements The writing of this chapter was supported by GOA-grants 93/1 and 98/1 from the Research Council of the University of Leuven, and by research grant G.2058.94 from the Fund for Scientific Research-Flanders (F.W.O.-Vlaanderen). I also want to thank Gery d'Ydewalle, John Findlay and an anonymous reviewer for their helpful comments on an earlier draft.
References Antes, J.R. and Kristjanson, A.F. (1993). Effects of capacity demands on picture viewing. Perception and Psychophysics, 54, 808-813. Antes, J.R. and Penland, J.G. (1981). Picture context effects on eye movement patterns. In: D.F. Fisher, R.A. Monty and J.W. Senders (Eds.), Eye Movements: Cognition and Visual Perception. Hillsdale: Erlbaum, pp. 157-170. Antes, J.R., Penland, J.G. and Metzger, R.L. (1981). Processing global information in briefly presented pictures. Psychological Research, 43, 277-292. Balota, D.A., Pollatsek, A. and Rayner, K. (1985). The interaction of contextual constraints and parafoveal visual information in reading. Cognitive Psychology, 17, 364-390. Biederman, I., Blickle, T.W., Teitelbaum, R.C. and Klatsky, G.J. (1988). Object search in nonscene displays. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14,456-467. Biederman, I., Mezzanotte, R.J. and Rabinowitz, J.C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14, 143-177. Biederman, I., Mezzanotte, R.J., Rabinowitz, J.C., Francolini, C.M. and Plude, D. (1981). Detecting the unexpected in photointerpretation. Human Factors, 23, 153-154. Blanchard, H.E., McConkie, G.W., Zola, D. and Wolverton, G.S. (1984). Time course of visual information utilization during fixations in reading. Journal of Experimental Psychology: Human Perception and Performance, 10, 75-89. Boyce, S.J. and Pollatsek, A. (1992a). An exploration of the effects of scene context on object identification. In: K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading. New York: Springer Verlag, pp. 227-242. Boyce, S.J. and Pollatsek, A. (1992b). Identification of objects in scenes: The role of scene background in object naming. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 531-543. Boyce, S.J., Pollatsek, A. and Rayner, K. (1989). Effect of background information on object identification. Journal of Experimental Psychology: Human Perception and Performance, 15, 556-566. Christie, J. and Klein, R. (1995). Familiarity and attention: Does what we know affect what we notice? Memory and Cognition, 23, 547-550. Christie, J. and Klein, R.M. (1996). Assessing the evidence for novel popout. Journal of Experimental Psychology: General, 125, 201-207. Dark, V.J., Vochatzer, K.G. and VanVoorhis, B.A. (1996). Semantic and spatial compo-
Prefixational object perception in scenes
335
nents of selective attention. Journal of Experimental Psychology: Human Perception and Performance, 22, 63-81. De Graef, P. (1990). Episodic priming and object probability effects. Unpublished Master's thesis, Department of Psychology, University of Massachusetts, Amherst. De Graef, P. (1992). Scene-context effects and models of real-world perception. In: K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading. New York: Springer Verlag, pp. 243-259. De Graef, P., Christiaens, D. and d'Ydewalle, G. (1990). Perceptual effects of scene context on object identification. Psychological Research, 52, 317-329. De Graef, P., De Troy, A. and d'Ydewalle G. (1992). Local and global contextual constraints on the identification of objects in scenes. Canadian Journal of Psychology, 46,489-508. De Graef, P. and d'Ydewalle, G. (1995). Speeded object verification in real-world scenes: Perceptual, decisional, and attentional components (Psyc. Rep. No. 170). Leuven, Belgium: University of Leuven, Laboratory of Experimental Psychology. Downing, C.J. (1988). Expectancy and visual-spatial attention: Effects on perceptual quality. Journal of Experimental Psychology: Human Perception and Performance, 14, 188-202. Friedman, A. (1979). Framing pictures: The role of knowledge in automatized encoding and memory for gist. Journal of Experimental Psychology: General, 108, 316-355. Friedman, A. and Liebelt, L.S. (1981). On the time course of viewing pictures with a view towards remembering. In: D.F. Fischer, R.A. Monty and J.W. Senders (Eds.), Eye Movements: Cognition and Visual Perception. Hillsdale: Erlbaum, pp. 137-155. Hawkins, H.L., Hillyard, S.A., Luck, S.J., Mouloua, M., Downing, C.J. and Woodward, D.P. (1990). Visual attention modulates signal detectability. Journal of Experimental Psychology: Human Perception and Performance, 16, 802-811. Hawley, K.J., Johnston, W.A. and Farnham, J.M. (1994). Novel popout with nonsense strings: Effects of predictability of string length and spatial location. Perception and Psychophysics, 55, 261-268. Henderson, J.M. (1992a). Object identification in context: The visual processing of natural scenes. Canadian Journal of Psychology, 46, 319-341. Henderson, J.M. (1992b). Identifying objects across saccades: Effects of extrafoveal preview and flanker object context. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 521-530. Henderson, J.M. and Hollingworth, A. (1997). Object perception is functionally isolated from scene meaning: Evidence from a change detection paradigm (Tech. Rep. 1). East Lansing: Michigan State University, Eye Movement Laboratory. Henderson, J.M., Pollatsek, A. and Rayner, K. (1987). The effects of foveal priming and extrafoveal preview on object identification. Journal of Experimental Psychology: Human Perception and Performance, 13, 449-463. Henderson, J.M., Weeks, P.A. Jr. and Hollingworth, A. (in press). The effects of semantic consistency on eye movements during complex scene viewing. Journal of Experimental Psychology: Human Perception and Performance. Hoffmann, J. (1987). Semantic control of selective attention. Psychological Research, 49, 123-129. Johnston, W.A. and Hawley, K.J. (1994). Perceptual inhibition of expected inputs: The key
336
P. De Graef
that opens closed minds. Psychonomic Bulletin and Review, 1, 56-72. Johnston, W.A., Hawley, K.J. and Farnham, J.A. (1993). Novel popout: Empirical boundaries and tentative theory. Journal of Experimental Psychology: Human Perception and Performance, 19, 140-153. Johnston, W.A., Hawley, K.J., Plewe, S.H., Elliott, J.M.G. and DeWitt M.J. (1990). Attention capture by novel stimuli. Journal of Experimental Psychology: General, 119, 397^11. Johnston, W.A. and Schwarting, I.S. (1996). Reassessing the evidence for novel popout. Journal of Experimental Psychology: General, 125,208-212. Loftus, G.R. and Mackworth, N.H. (1978). Cognitive determinants of fixation location during picture viewing. Journal of Experimental Psychology: Human Perception and Performance, 4, 565-572. Marks, W., McFalls, E.L. and Hopkinson, P. (1992). Encoding pictures in scene context: Does task demand influence effects of encoding congruity? Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 192-198. McConkie, G.W., Underwood, N.R., Zola, D. and Wolverton, G.S. (1985). Some temporal characteristics of processing during reading. Journal of Experimental Psychology: Human Perception and Performance, 11, 168-186. Pezdek, K., Whetstone, T., Reynolds, K., Ashkari, N. and Dougherty, T. (1989). Memory for real-world scenes: The role of consistency with schema expectation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 587-595. Rayner, K. and Pollatsek, A. (1992). Eye movements and scene perception. Canadian Journal of Psychology, 46, 342-376. Rayner, K. and Morris, R.K. (1992). Eye movement control in reading: Evidence against semantic preprocessing. Journal of Experimental Psychology: Human Perception and Performance, 18, 163-172. Rayner, K. and Well, A.D. (1996). Effects of contextual constraint on eye movements in reading. Psychonomic Bulletin and Review, 3, 504-509. van Diepen, P.M.J., De Graef, P. and d'Ydewalle, G. (1995). Chronometry of foveal information extraction during scene perception. In: J.M. Findlay, R. Walker and R.W. Kentridge (Eds.), Eye Movement Research: Mechanisms, Processes and Applications. Amsterdam: North-Holland, pp. 349-362. Wolfe, J.M. and Benett, S.C. (1997). Preattentive object files: Shapeless bundles of basic features. Vision Research, 37, 25-43.
337
CHAPTER 15
Functional Division of the Visual Field: Moving Masks and Moving Windows Paul. M.J. van Diepen, Martien Wampers and G6ry d'Ydewalle University ofLeuven
Abstract Moving mask and moving window paradigms manipulate the visible region and appearance of visual stimuli, as a function of gaze direction. In addition, elapsed time since the onset of an ongoing fixation can be used to further control stimulus properties. For instance, a picture is presented in which the central image quality is degraded, but only during the initial part of each fixation. These paradigms are employed to study the spatio-temporal aspects of information acquisition during reading and scene perception. The research reviewed in this chapter indicates that fixation of words and objects generally is a prerequisite for accurate identification. Foveal information seems to be acquired early during fixations. Peripheral information is apparently utilized during the later part of fixations, mainly to select a future fixation location. Global scene characteristics can be obtained easily from a low resolution image, while fine image details enhance object localization.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
338
P.M.J. van Diepen, M. Wampers & G. d'Ydewalle
Introduction Human eyes do not have the same resolution and sensitivity at every position of the retina (see, e.g., De Valois and De Valois, 1988); fine details can only be discriminated efficiently by the fovea—the central part of the retina. At higher eccentricities, perceptual acuity decreases rapidly, and only coarse-grained information can be resolved. Consequently, peripheral vision is not particularly suited for object identification or word recognition, as these processes rely on detailed image analyses. Eye movements, or saccades, are required to project image areas onto the fovea. Saccades are not made to random image locations: most fixations occur on interesting parts of the stimulus, such as words in a text or objects in a scene (Buswell, 1935; Yarbus, 1967; see also Chapter 12). Evidently, extrafoveal information can be used successfully to guide saccades to informative stimulus locations. Furthermore, scene identity can be determined using peripheral vision alone. Boyce, Pollatsek and Rayner (1989) showed that object identification is facilitated by a consistent scene, even when that scene was presented briefly and did not contain any foveal information that could lead to scene identification. Van Diepen et al. (1994) demonstrated that scenes could be identified accurately in a 160 ms tachistoscopic presentation, even when the central part of the scene was missing. A number of studies showed that an extrafoveal stimulus can be processed to a certain degree, facilitating identification on a subsequent fixation (the so-called preview benefit; e.g., Henderson, 1992a; Henderson, Pollatsek and Rayner, 1989; Pollatsek, Rayner and Collins, 1984). Apparently, fine detail discriminations require foveal processing. Extrafoveal information is used to obtain global image characteristics, to guide saccades, and to preprocess saccade targets. Do all these processes take place during each fixation? Are they processed in parallel, and if not, which is the preferred part of a fixation for each process? How long does each process take? What is the nature of information required for each process? In this chapter we will discuss moving mask and moving window paradigms that study the above issues by disentangling foveal and peripheral vision. First, a brief review of the history of moving mask and moving window paradigms will be given, including some applications in reading research. Then an overview of moving mask and moving window experiments in scene perception is presented. Recent display change techniques involved in these experiments are discussed in the appendix. Moving mask and moving window paradigms The so-called moving mask and moving window paradigms have commonly been used to study the functional division of the visual field during reading. Both
Moving masks and moving windows
339
paradigms require eye movement contingent display changes. In the moving mask paradigm, foveal information is masked after a preset delay from the onset of each new fixation, in order to study temporal aspects of foveal information extraction. For example, at the beginning of each fixation, all the text is visible. Later during the fixation, the foveal text is replaced by a mask (e.g., a row of Xs). As soon as the eyes move again, the mask is removed and the original text restored. In the moving window studies, extrafoveal rather than foveal information is masked. By manipulating the size of the window containing unmasked information, the dimensions of the useful field can be estimated. In most moving window experiments, no window onset delay is inserted and extrafoveal information is masked immediately after each saccade. A predecessor of the moving window paradigm was simply a mask with a hole in it which was moved over a hard-copy text (Poulton, 1962). The first computerized moving window technique was developed by McConkie and Rayner (1975). By manipulating the number of available letters, they found that word-length information could be obtained from 12 to 15 positions to the right of the fixated letter. Letter- and word-shape information, however, could only be obtained from up to 10 or fewer characters to the right of the fixation position. In a later experiment, McConkie and Rayner (1976) demonstrated that the perceptual span during reading is in fact asymmetric, and no visual information is used more than four letters to the left of the fixated letter. Rayner et al. (1981) employed moving masks and moving windows in three reading experiments. In Experiment 1, a moving window of manipulated size was used to estimate the perceptual span during reading. With a window size of 29 characters (i.e., 14 characters to the left and right of the fixated letter), performance did not differ from the normal reading situation, where all text was visible. Hence, the perceptual span during reading was estimated to extend up to 14 letter positions to the right of the fixation position, replicating earlier findings. Asymmetry of the perceptual span was not tested. In Experiment 2, a moving mask of manipulated size was centred on the fixation position, from the beginning of each fixation. Even the smallest, one-letter mask substantially decreased the reading rate, while with masks of seven characters or more, many words could not be identified. With masks of 13 characters or more, reading became virtually impossible. From the first two experiments it can be concluded that foveal vision is required for the semantic identification of words, but that extrafoveal information is utilized as well, for other than semantic purposes. In Experiment 3, a combination of a moving mask and moving window was used. The window limited availability of text to either 7 or 17 characters centred on the fixation position. After a preset delay from the beginning of each fixation, foveal information was masked as well, rendering the entire line masked. The mask delay ranged from 10 to 150 ms. In a window-only condition, no foveal mask appeared. In
340
P.M.J. van Diepen, M. Wampers & G. d'Ydewalle
mask-only conditions, no moving window was used, and foveal information was masked after a delay ranging from 0 to 150 ms, following the fixation onset. In full-line conditions, all text was visible at the beginning of fixations, and the entire text line was masked after a preset delay, ranging from 10 to 150 ms. Finally, a control condition was included where no masking occurred. Masking of foveal information only affected reading rate and eye movement parameters when the mask delay was less than 50 ms. For shorter mask delays, reading rates decreased, while fixation durations and the number of forward fixations increased. Foveal information evidently could be extracted within the first 50 ms of fixations. When peripheral information was limited by a moving window, reading performance was worse compared with the control condition. Full-line masking affected reading performance compared with the control condition, even at the longest 150 ms delay. These results indicate that extrafoveal information is necessary longer, or only later during fixations, compared with foveal information. The findings of Rayner et al. (1981) are compatible with a simple two-phase model of information processing: apparently, during the first 50 ms of each fixation, foveal information is extracted for word identification. Consequently, foveal masking after the initial 50 ms does not affect reading performance, since foveal information is no longer required. During the later part of fixations, extrafoveal information is used to select, and preprocess, a new saccade target. Accordingly, a moving window or a full-line mask slows down reading, even later during fixations.
Moving masks and moving windows in scene perception research Eye contingent display change techniques Application of moving mask and moving window paradigms in scene perception research has been rather limited. This was mainly due to the complexity of the stimuli involved, and the low predictability of the size and direction of eye movements, compared with reading situations. Moving a mask across a text can be achieved rapidly by changing a small number of ASCII codes, representing the letters involved. In scene perception research, eye contingent display changes usually require a change of high resolution, and eventually full-colour, graphics. Modifying thousands of pixels in the computer's video memory would simply take too much time. Other techniques have been proposed, but most approaches limited the shape and content of the mask or window. Usually, the stimulus inside or outside of a square window is rendered black or white, and image resolution often is poor. Recently, however, several new techniques have been developed to deal with the above limitations (see Appendix to this chapter), enabling moving mask and moving window paradigms in experiments that require advanced eye contingent
Moving masks and moving windows
341
manipulations of high resolution graphical images. They already have been used in some of the experiments discussed below. Foveal information acquisition Saccades centre a selected part of an image on the fovea, to enable a detailed analysis of the local stimulus information. This will normally result in identification of the fixated stimulus. The studies described in this section examine whether foveation is mandatory for identification, and estimate the duration of foveal information acquisition. Henderson et al. (1997) studied the effect of an artificial scotoma on object identification. Stimuli consisted of linear arrays of objects, rather than complete scenes. In Experiment 1, subjects viewed 96 arrays of four line drawings of real-world objects. Object size was 1.5x1.5° on average, and the centres of adjacent objects were approximately 2.4° apart. After inspection, subjects self-terminated the array presentation by pressing a response button, upon which a probe word was displayed that correctly named one of the objects on half of the trials. Subjects had to decide if the probe had been contained in the array. During the presentation of the object arrays, eye movements were registered to remove the fixated object (central scotoma condition), or the object to the right of the fixated object (offset scotoma condition). In a control condition, no objects were removed. Accuracy of the response to the probe was high in all conditions and did not differ statistically from each other. Foveation of objects apparently is not a requisite for object identification. The number of object region entries was similar in the offset scotoma and control condition, but increased in the central scotoma condition. Gaze duration (the sum of all fixation durations between the first entry and first exit in an object region), and the number of fixations per gaze decreased in the central scotoma conditions, compared with the other conditions. Total fixation time and total fixation count per object region increased in the offset scotoma condition, and even more in the central scotoma condition, compared with the control level. Hence, while both scotoma conditions did not affect accuracy, eye movement parameters showed significant deviations from the control condition. In the central scotoma condition, subjects presumably had to resort to parafoveal processing of object information. Presumably, accuracy remained high because objects were only 2.4° apart, and subjects inspected the arrays 43% longer than in the control condition (personal communication, May 1997). Surprisingly however, fixation durations increased barely in the central scotoma condition. The mean fixation duration in the central scotoma condition increased only 9 ms compared with the control condition, as opposed to a 24-ms increase in the offset scotoma condition. Maintaining a fixation on a non-informative area, while attending a parafoveal object, is apparently difficult. A saccade to the attended area seems to be
342
P.MJ. van Diepen, M. Wampers & G. d'Ydewalle
inevitable. This was also reflected in the gaze measures. In the central scotoma condition, the probability of an intra-object saccade was smaller compared with the other conditions, as indicated by the small number of fixations per gaze. These observations are consistent with Henderson's sequential attention model (1992b; see also Chapter 12), which postulates that the eyes go where attention is allocated. A premature saccade to a location that was being parafoveally processed would remove the object on that location. Hence a subsequent saccade away from the object would have to be performed to continue the parafoveal identification process. Interestingly, subjects did not develop the strategy of fixating between two objects, even though in this way a clearer parafoveal view of one of the objects could be obtained. Landing position distributions showed that, irrespective of condition, fixations occurred close to the centre of objects. Apparently, saccades need a visual goal to which to be programmed. Increased gaze durations with the offset scotoma were explained by Henderson et al. (1997) in terms of a reduced preview benefit, due to the loss of extrafoveal information. The increase of mean fixation durations obviously reflects the same effect. An alternative explanation could be that increased fixation durations in the offset scotoma condition reflected difficulty with the saccade target selection. Subjects started inspecting the arrays on the left side, and no near object was visible to the right of the fixated object. Therefore, saccades to the right were probably more difficult to direct. Experiment 2 eliminates eventual impairments to the saccade target selection process by replacing objects by a placeholder. Experiment 2 was a replication of Experiment 1, but instead of erasing objects with the background colour they were replaced by a circle with a plus sign at the centre. Its purpose was to rule out an oculomotor explanation of the results of Experiment 1. Similar results were obtained as in Experiment 1, indicating that the absence of foveal or parafoveal stimulation in the previous experiment did not cause the difficulty to maintain a fixation, or to select a saccade target. Henderson et al. (1997) concluded that foveal processing is beneficial but not necessary for object identification. Furthermore, the observed difficulty of maintaining a fixation while attending an extrafoveal target suggests that the foveal processing superiority cannot be attributed to higher foveal acuity alone. Presumably, a functional predisposition of the fovea exists to identify visual information. Van Diepen, De Graef and d'Ydewalle (1995) examined the process of foveal information extraction during scene perception. Line drawings of real-life scenes were inspected in the context of a search task (for an example of our line drawings, see Colour Plate Ib in Chapter 12). Subjects had to count the number of non-objects in each scene as accurately and fast as possible. They self-terminated the scene presentation upon completion, after which they could give their answer. Eye movements were monitored to align an ovoid noise mask with the fixation position, using the moving overlay technique (van Diepen, De Graef and Van Rensbergen,
Moving masks and moving windows
343
1994; see also Appendix). Either a small (1.5x1.0°) or a large (2.5x1.7°) ovoid noise mask covered foveal information after a preset delay from the beginning of each fixation, ranging from 15 to 120 ms. Stimuli subtended 16x12°. The large noise mask completely covered most objects in the scenes, while the small mask only covered the central part of objects. In a control condition, no masking occurred. It was predicted that foveal masking would only disrupt scene perception at short mask delays. When foveal masking occurred later during fixations, a normal level of performance was expected. Scene inspection time was taken as a global measure of task performance. Fixation durations were presumed to reflect processing difficulty at a local level. Scene inspection times and fixation durations were expected to increase with decreasing mask delays. The expectations with respect to the scene inspection times were confirmed: The 15-ms mask delay significantly increased scene inspection times compared with the control level, while masking after the initial 45-75 ms of fixations hardly affected performance. Apparently, sufficient foveal information could be extracted within the first 45-75 ms of a fixation. Fixation durations exhibited an unexpected pattern: compared with the control condition, fixation durations increased by an average 16 ms in all masking conditions. The increase was hardly affected by the mask size or mask delay. Fixation duration distributions revealed a time-locked effect of the appearance of the mask, relative to the delay. It was concluded that the sudden change of foveal information captured attention, and consequently sustained fixation durations. As in the Henderson et al. (1997) experiments, subjects did not develop a strategy to fixate near objects instead of on them, although this would make parafoveal processing of the object possible. Interestingly, the number of intra-object saccades decreased, even though following each saccade at least a 15-ms snapshot of the object would be available. As a result, mean saccadic amplitudes were longer in all masking conditions, regardless of the mask delay, compared with the control condition. Apparently, a saccade target is selected only later during a fixation, excluding the information-poor masked area. A possible reason for the absence of an effect of early masking on fixation durations — other than the attraction of attention—could be that foveal information was completely removed. Maintaining a fixation while the noise mask was present therefore would not result in additional information regarding the identity of the masked object. A follow-up study (van Diepen, 1998) reports four experiments where foveal information was degraded, without being completely removed. Subjects inspected line drawings in the context of a non-object search task and selfterminated the scene presentation when they were ready. Stimuli subtended 16x12°. An electronic video switcher (van Diepen, 1997; see also Appendix) was used to degrade the image within an ovoid window, centred on the fixation position.
344
P.M./ van Diepen, M. Wampers & G. d'Ydewalle
In Experiment 1, image contrast within a 3.0x2.3° ovoid foveal window was decreased by a factor of seven, after a change-delay of 15, 40, or 65 ms from the beginning of each fixation. In a control condition, foveal contrast was not decreased. Degradation of foveal contrast was supposed to slow down instead of stop information extraction. Accordingly, scene inspection times, as well as fixation durations were expected to increase when the contrast change delay decreased. With the longest, 65-ms delay, no effect of the contrast manipulation was expected, since sufficient foveal information would have been extracted by that time. Surprisingly, the contrast manipulation hardly affected performance. Scene inspection time only increased by an average 300 ms in the 15-ms condition, compared with the control condition. Fixation durations did increase significantly, by 21 ms on average, when foveal information was degraded. However, they were hardly affected by the change delay. Again, the sudden foveal change alone presumably captured attention, and by that sustained fixations. Two explanations were supplied for the small detrimental effect of the contrast manipulation. First, stimuli were high contrast line drawings, with black contours on a white background. The contrast manipulation changed the black contours to light-grey. Although contrast decrement has been demonstrated to slow down reaction times and fixation durations (e.g., Loftus et al., 1992), the contrast of light-grey lines on a white background might still have been sufficiently high to allow fast object identification. Second, even with the shortest change delay, an average 15 ms of normal foveal contrast was available at the beginning of each fixation. A sensory memory representation of the initial full-contrast image supposedly compensated for the later decrease of foveal contrast. In van Diepen, De Graef and D'Ydewalle (1995) the noise mask potentially erased such a representation. Both the foveal noise mask and the foveal contrast degradation managed to increase fixation durations, due to the sudden change of the foveal stimulus during an ongoing fixation. Evaluation of the effect of the degradation of the foveal image on fixation durations therefore required a control condition that induced a foveal image change as well, without affecting the stimulus quality. In Experiment 2, an alternative control condition was tested that was expected to have the above characteristics: a red ellipse that outlined the window area. Both inside and outside the ellipse, the undegraded scene was presented. The appearance of the ellipse was compared with a foveal noise mask, foveal contrast degradation, and with the no-change condition that served as control in previous experiments. The window size was 2.5x1.9°. Only the shortest possible change delay of 15 ms was used. With respect to scene inspection times, similar results were obtained in the ellipse and the no-change condition, while longer inspection times were observed for the noise mask and contrast degradation. Fixation durations however, increased by about 19 ms in the low-contrast, the noise-mask, as well as in the ellipse condition, compared
Moving masks and moving windows
345
with the no-change condition. Fixation distributions of the change conditions all exhibited a time-locked effect of the foveal change. These results nicely indicate that the appearance of the red ellipse was able to capture attention in a similar way as the noise mask and the contrast degradation, but that it did not degrade the foveal stimulus quality. Therefore, the ellipse condition served as control condition in subsequent experiments. Note that contrary to the previous experiment, contrast degradation substantially increased scene inspection times, compared with the no change condition. Unfortunately, detailed quantitative analyses however were compromised by the fact that some of the subjects had used a speed-accuracy trade-off to deal with difficult conditions, taking less time to inspect scenes and consequently making more errors. Instructions to the subjects in the following experiments emphasized the importance of accuracy. Furthermore, feedback was introduced following every trial regarding the answer of the subject. In this way we hoped to obtain an equally high level of accuracy over conditions for all subjects. Foveal contrast degradation did not have a clear effect on scene perception in the prior experiments. Therefore, another way to degrade foveal information was introduced in Experiment 3. By replacing 40% of the pixels within a foveal window by random pixels, a partial noise mask was created that presumably would degrade both foveal information, and its sensory memory representation. Because foveal information was masked only partially, maintaining a fixation would be helpful to obtain additional foveal information, even when the partial noise mask was present. Experiment 3 compared four types of changes within a foveal 2.5x1.9° ovoid window: partial masking by noise, complete masking by noise, contrast degradation, and the appearance of a red ellipse outlining the window. Two change delays were employed: a short 15-ms delay, and a long 100-ms delay from the onset of each fixation. It was predicted that later during a fixation, foveal degradation would have ceased to affect scene perception. Hence, for all window types, comparable scene inspection times and fixation durations were expected with the long change delay. In the short-delay condition, however, foveal degradation should disrupt visual processing. As in van Diepen, De Graef and d'Ydewalle (1995), longer scene inspection times were expected with the complete noise mask in the short-delay condition, compared with the long delay. Fixation durations however would not be affected more than in the red-ellipse control condition. Conversely, the partial noise mask was expected to increase fixation durations in the short-delay condition, but have a smaller effect on scene inspection times. Foveal contrast degradation presumably would show similar, but smaller effects as the partial noise mask. All expectations were confirmed. As before, complete masking by noise after the short change delay disrupted foveal information extraction, requiring more fixations to identify most objects, which resulted in longer scene inspection times. Contrast degradation and partial masking by noise increased fixation durations, rather than scene inspection times, when the short change delay was used. Fixations were
346
P.M.J. van Diepen, M. Wampers & G. d'Ydewalle
sustained, instead of repeated. Fixation durations and scene inspection times hardly differed as a function of change delay in the ellipse condition. With the long change delay, all window types showed comparable scene inspection times and fixation durations. The results of Experiment 3 confirmed the thesis that foveal information can be extracted within the initial part of a fixation. When foveal information was degraded before the end of the foveal acquisition process, fixations either were sustained, or terminated and repeated, depending on the stimulus quality of the degraded image. In Experiment 4, a new computer algorithm was used that enabled foveal image manipulations from the very beginning of each fixation (see Appendix). Experiment 3 was replicated, with three window types and two change delays. Information within a foveal window was either decreased in contrast, partially masked by noise, or outlined by a red ellipse. The change delay was 0 ms (no-delay condition), or 50 ms (delay condition). Similar predictions were made as in the previous experiment. In the no-delay condition, no sensory memory representation of the normal contrast foveal image could be formed. Accordingly, an exaggerated effect of the contrast manipulation was expected, compared with the previous experiment. It was found that indeed both partial masking and contrast degradation resulted in longer scene inspection times and fixation durations in the no-delay condition, compared with the delay condition. As expected, no such effect was present in the ellipse condition. In the delay condition, no substantial differences in scene inspection times or fixation durations were observed among the three window types, although a slight increase for both measures was present in the partial noise condition. These results again confirm the idea that beginning of fixations is used to acquire foveal information. Perceptual span of vision and useful peripheral resolution Because the more eccentric parts of the retina are incapable of resolving fine detail image information, due to the decreased perceptual acuity, it is commonly assumed that only coarse peripheral information is used to direct saccades, and to preprocess stimuli on future fixation positions. The following studies manipulated the content and availability of peripheral information, to determine the perceptual span and the useful amount of peripheral detail during picture viewing. Saida and Ikeda (1979) used an electronic effect circuit that combined the camera image of a picture with that of a square generated on an jc-y oscilloscope. The square could be moved rapidly, and was contingent on the current eye position. The resulting image showed a rectangular part of the picture, centred on the fixation position, while outside of the rectangle, the display was blanked. During a study phase, subjects inspected 80 pictures (line drawings) in preparation of a memory task. The size of the rectangle was manipulated, ranging from 3.3x3.3° to a non-restricting size, that served as control condition. Furthermore, viewing time
Moving masks and moving windows
347
(0.5-20 s) and display size (10.2x13.3° and 14.4x18.8°, the latter displaying the same stimuli at a greater magnification) were manipulated. In a subsequent test phase, subjects had to indicate which of 160 pictures had been presented during the study phase. During the test phase, no image manipulations occurred. The study phase-test phase sequence was repeated ten times for each subject, with different pictures in different conditions. It was found that with a rectangle of half the display size, performance in the test phase was comparable to the control condition, regardless of the absolute display size. This indicates that the useful field of vision is not of absolute size, but that it depends on the stimulus size. Note, however, that in this experiment, stimulus size was confounded with stimulus density (i.e., the size and number of picture details per visual degree). Consequently, acuity limitations and increased information load may have reduced the perceptual span for the small display size. Related to this issue, Shioiri and Ikeda (1989) distinguished useful and available resolution. Available resolution was defined as the smallest size of details that could be discriminated, whereas useful resolution was the fineness of detail actually required to achieve a normal level of performance. Useful resolution at a given eccentricity is by definition equal to, or poorer than, the available resolution. Available resolution as a function of eccentricity was estimated by measuring the discriminability of a gap in Landolt's rings. A moving window paradigm was used to determine the useful resolution for picture perception as a function of eccentricity. Within a square window, centred on the fixation position, the normal picture was presented, whereas outside of the window, noise was added to the picture. This was achieved by replacing random parts of the picture outside the window by white pixels, using a specially made video montage circuit. Subjects had to inspect line drawings in preparation for a recognition task. As in the Saida and Ikeda (1979) experiment, sessions consisted of an 80 picture study phase, followed by a 160 picture test phase. Three subjects each participated in forty sessions. Window size (ranging from 2.7x2.7° to a non-restricting size), level of degradation (60-100%), and exposure duration (0.25-16 s) were manipulated. The display size was 15x15° in all conditions. For each window size, the maximum degradation level was determined that did not decrease performance compared with the non-restricted viewing condition. It was found that useful resolution decreased rapidly with increasing eccentricity, but that the available resolution decreased much slower. Peripheral vision during picture memorization apparently requires only coarse, low resolution information, even though finer details are available. For the largest, 10.6x10.6° window size, no difference in performance was observed between any of the degradation conditions and the non-restricted viewing. This replicates the Saida and Ikeda finding that the span of perception during picture memorization subtends half the stimulus size.
348
P.M./ van Diepen, M. Wampers & G. d'Ydewalle
Van Diepen, Wampers and d'Ydewalle (1995) used a moving window paradigm to investigate which spatial frequencies are used preferentially in the extrafoveal visual field during scene exploration. Subjects explored 3D rendered grey-level drawings of real-world scenes, looking for non-objects. Scenes subtended 16x12°. Eye movements were measured to position an ovoid window at the fixation position. The window size was 6.0x4.6°, capturing approximately 11 % of the entire scene surface. Inside the window (i.e., foveally) the normal scene was presented in all conditions. An electronic video switcher (van Diepen, 1997; see also Appendix) was used to manipulate the information outside the window. In two experimental conditions, either the high or the low spatial frequencies of the drawing appeared outside the window. In the control condition, the normal scene was presented both inside and outside the window. As could be expected, performance deteriorated in the two experimental conditions compared with the control condition. The time needed to inspect the scenes was longer, and saccadic amplitudes were smaller in the experimental conditions than in the control condition. A comparison of the two experimental conditions revealed a clear benefit for the high frequency periphery. It was concluded that high frequency peripheral information was more useful than low frequency peripheral information in the context of an object search task. The visual system may not be suited very well to identify fine detail peripheral information, but the presence of high frequency peripheral information apparently could be detected, serving as a cue to object locations. Peripheral object selection becomes difficult when only low frequency information is present in the periphery. Similar results were obtained by McConkie and Loschky (1997). Subjects viewed grey-level pictures of real-life situations in an object search task (Experiment 1) and a picture memorization task (Experiment 2). Information outside of a gaze-contingent circular window was degraded using wavelet decomposition. Three window sizes were employed, and three degradation levels. In a control condition, pictures were undegraded. In both experiments, performance was affected by the window size, but not by the degradation level. With the largest window, capturing more than 20% of the display area, performance was close to normal. Obviously, the perceptual span is smaller with complex photographic pictures compared with simple line drawings (Saida and Dceda, 1979; Shioiri and Ikeda, 1989). Probably, stimulus density is the moderating factor. For smaller windows, saccadic amplitudes were smaller and scene inspection times (Experiment 1), as well as fixation durations (Experiment 2) were longer, compared with the control condition. It is difficult to directly compare wavelet decomposition to Fourier filtering, used by van Diepen, Wampers and d'Ydewalle (1995). Still, both degradation by wavelet decomposition, and low-pass Fourier filtering remove fine details from pictures and drawings, which decreased performance in both studies.
Moving masks and moving windows
349
The van Diepen, Wampers and d'Ydewalle (1995) and McConkie and Loschky (1997) results appear to be inconsistent with a study by Groner and Groner (1996). They presented meaningless pictures for which subjects had to invent a suitable name. Pictures contained only high or low frequency information. In a control condition, undegraded pictures were presented. Saccadic amplitudes were similar for undegraded and low frequency pictures, but smaller for pictures that contained only high spatial frequencies. It was concluded that low frequency information is preferred to high frequency information to direct saccades during picture viewing. Furthermore, Shioiri and Ikeda (1989, see above) found that during picture memorization, the useful peripheral resolution was much lower than the available resolution. We suspect that the incompatibility of the above results is due to a difference in task demands. Both Shioiri and Ikeda (1989) and McConkie and Loschky (1997, Experiment 2) used a memorization task, but memory for scene identity was tested in the first study, whereas memory for picture details was tested in the latter. Van Diepen, Wampers and d'Ydewalle (1995) and McConkie and Loschky (Experiment 1) employed an object search task. Low frequency information might be preferred to process global image characteristics, that are important for memorization of picture identity (Shioiri and Ikeda, 1989) or picture naming (Groner and Groner, 1996). High frequency information might, however, enhance object localization, improving both the search task and detailed picture memorization. Indeed, Schyns and Oliva (1996) showed that the visual system can switch between different frequency ranges in a flexible way as a function of task demands. Van Diepen and Wampers (1996) tried to determine which kind of peripheral information is preferentially used during the initial part of a fixation. Subjects explored full-colour 3D rendered drawings of real-world scenes in a non-object search task. An electronic video switcher (van Diepen, 1997; see also Appendix) was used to degrade information outside of a 3.5x2.6° ovoid foveal window, during the first 150 ms of fixations. Within the window, the undegraded scene was presented. After the initial 150 ms of a fixation, the undegraded scene was presented on the entire 16x12° display. The peripheral degradations included low-pass, band-pass and high-pass filtering, and blanking (replacing the periphery by the average scene colour). In the control condition, the normal scene was presented throughout the fixation. Performance with degraded peripheral information was significantly worse compared with the control condition. However, no difference was found among the degradation types. These results were interpreted to reflect the impact of a change of peripheral information during an ongoing fixation. Since complete absence of peripheral information in the blanking condition did not result in a lower performance compared with conditions where filtered peripheral information was available, the preliminary conclusion was that peripheral information is utilized only later during fixations.
350
P.M.J. van Diepen, M. Wampers & G. d'Ydewalle
Conclusions Moving mask and moving window paradigms reveal interesting properties of visual perception in relatively naturalistic viewing tasks. Foveation is beneficial, and sometimes essential, for accurate word and object identification. Foveal information can be acquired during the initial part of fixations, while peripheral information seems to be utilized only later. Attending an extrafoveal location automatically tends to result in a saccade to that location. Furthermore, saccades are hardly programmed to information-poor areas, even when it would be strategic to do so. The perceptual span of vision during scene perception ranged from 20 to 50% in the reviewed studies, depending on the stimulus complexity. Coarse peripheral information is preferred to obtain global image characteristics, while object localization benefits most from a high resolution peripheral image. Acknowledgements This chapter was written while the first author (P.v.D.) visited the Psycholinguistics and Visual Cognition Laboratory at Michigan State University, East Lansing, Michigan. The visit was supported by the Fund for Scientific Research - Flanders (Belgium). Scene perception research in the Laboratory of Experimental Psychology, University of Leuven, Belgium was supported by the Programma on Interuniversity Poles of Attraction contract No. 31 and the G.O.A. Convention No. 93/1. The authors wish to thank Andrew Hollingworth, Iain Gilchrist, and two anonymous reviewers for their comments on earlier drafts. Ongoing research can be followed at URL: http://www.psy.kuleuven.ac.be/~paul/pvd.html. References Boyce, S.J., Pollatsek, A. and Rayner, K. (1989). Effects of background information on object identification. Journal of Experimental Psychology: Human Perception and Performance, 15, 556-566. Buswell, G.T. (1935). How People Look at Pictures. Chicago: University of Chicago Press. De Valois, R.L. and De Valois, K.K. (1988). Spatial Vision. New York: Oxford University Press. Groner, R. and Groner, M.T. (1996, September). The effect of spatial frequency filtering on eye movement parameters. Paper presented at the Workshop on Spatial Scale Interactions, Durham, UK. Henderson, J.M. (1992a). Identifying objects across saccades: Effects of extrafoveal preview and flanker object context. Journal of Experimental Psychology: Learning, Memory and Cognition, 18, 521-530.
Moving masks and moving windows
351
Henderson, J.M. (1992b). Visual attention and eye movement control during reading and picture viewing. In: K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading. New York: Springer Verlag, pp. 260-283. Henderson, J.M., McClure, K.K., Pierce, S. and Schrock, G. (1997). Object identification without foveal vision: Evidence from an artificial scotoma paradigm. Perception & Psychophysics, 59, 323-346. Henderson, J.M., Pollatsek, A. and Rayner, K. (1989). Covert visual attention and extrafoveal information use during object identification. Perception and Psychophysics, 45, 196-208. Loftus, G.R., Kaufman, L., Nishimoto, T. and Ruthruff, E. (1992). Effects of visual degradation on eye-fixation durations, perceptual processing, and long-term visual memory. In: K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading. New York: Springer, pp. 203-226. McConkie, G.W. and Loschky, L. (1997). Human performance with a gaze-linked multiresolutional display. Proceedings of the Army Research Laboratory Advanced Displays and Interactive Displays Federated Laboratory First Symposium, ARL Adelphi Laboratory, Adelphi, MD. 28-29 January 1997. McConkie, G.W. and Rayner, K. (1975). The span of the effective stimulus during a fixation in reading. Perception and Psychophysics, 17, 578-586. McConkie, G.W. and Rayner, K. (1976). Asymmetry of the perceptual span in reading. Bulletin of the Psychonomic Society, 8, 365-368. Pollatsek, A., Rayner, K. and Collins, W.E. (1984). Integrating pictorial information across eye movements. Journal of Experimental Psychology: General, 113, 426-442. Poulton, E.C. (1962). Peripheral vision, refractoriness, and eye movements in fast oral reading. British Journal of Psychology, 53, 409-419. Rayner, K., Inhoff, A.W., Morrison, R.E., Slowiaczek, M.L. and Bertera, J.H. (1981). Masking of foveal and parafoveal vision during eye fixations in reading. Journal of Experimental Psychology: Human Perception and Performance, 7, 167-179. Saida, S. and Ikeda, M. (1979). Useful visual field size for pattern perception. Perception and Psychophysics, 25, 119-125. Schyns, P.G. and Oliva, A. (1996, September). Flexible task-dependent scale encodings of complex visual stimuli. Paper presented at the Workshop on Spatial Scale Interactions, Durham, UK. Shioiri, S. and Ikeda, M. (1989). Useful resolution for picture perception as a function of eccentricity. Perception, 18, 347-361. van Diepen, P.M.J. (1997). A Pixel-resolution video switcher for eye contingent display changes. Spatial Vision, 10, 335-344. van Diepen, P.M.J. (1998). Foveal stimulus degradation during scene perception.Submitted for publication. van Diepen, P.M.J., De Graef, P. and d'Ydewalle, G. (1995). Chronometry of foveal information extraction during scene perception. In: J.M. Findlay, R. Walker and R.W. Kentridge (Eds.), Eye Movement Research: Mechanisms, Processes and Applications. Amsterdam: Elsevier, pp. 349-362. van Diepen, P.M.J., De Graef, P., Lamote, C. and Van Wijnendaele, I. (1994). The role of central and peripheral image cues in scene recognition. Perception, 23 (Suppl.), 113
352
P.M.]. van Diepen, M. Wampers & G. d'Ydewalle
van Diepen, P.M.J., De Graef, P. and Van Rensbergen, J. (1994). On-line control of moving masks and windows on a complex background using the ATVista Videographics Adapter. Behavior Research Methods, Instruments and Computers, 26,454-460. van Diepen, P.M.J. and Wampers, M. (1996). Scene exploration with Fourier filtered peripheral information. Manuscript submitted for publication. van Diepen, P.M.J, Wampers, M., and d'Ydewalle, G. (1995). The use of coarse and fine peripheral information during scene perception. Paper presented at the Eight European Conference on Eye Movements, Derby, UK. Yarbus, A.L. (1967). Eye Movements and Vision. New York: Plenum Press.
Appendix: Recent Eye Contingent Display Change Techniques The Moving Overlay Technique was developed in our laboratory to enable moving mask and moving window experiments with high resolution graphical stimuli (van Diepen, De Graef and Van Rensbergen, 1994). A high-end graphics board was used which has the capability to mix an external video signal with an internally stored image. In the overlay display mode of the graphics board, one alpha bit per pixel is stored, together with the image information. For each pixel, the alpha bit specifies whether the internally stored or the externally generated image has to be displayed on the corresponding pixel position on the display. Normally, this ability will be used, for example, to superimpose subtitles on prerecorded images, supplied by an external VCR. However, instead of text, a mask of arbitrary size and content can be stored in the video memory. Alpha bits of pixels that belong to the mask are set to display the internal image (i.e., the mask), while all other alpha bits select the external image. The internally stored image can be repositioned very quickly, relative to the display origin, hence enabling a moving mask. The position of the externally supplied image however will not change. In other words, the internal image is like a transparent sheet, with a mask drawn on it, which is put on top of the external image. By moving the sheet, the mask moves across the external image. Instead of a VCR, a second graphics board can be used to generate the external image. The undegraded stimulus is loaded on the second graphics board, and the combined image of the two boards will display the mask, superimposed on the undegraded stimulus. A moving window can be realized as follows: a mask is loaded in the entire video memory of the first graphics board, except for a central window area. Alpha bits within the window are set to select the external picture, whereas all others select the internal mask. The image is like an opaque sheet, with a hole in the centre of it, that can be moved across the stimulus. By aligning the window centre with the eye position, foveally the undegraded stimulus is visible, while the remaining part of the display is masked.
Moving masks and moving windows
353
The content and shape of the masked area are defined by a high resolution, grey-level or full-colour, graphical image. A simple noise mask, completely occluding the foveal part of the stimulus, but also more sophisticated masks are possible. For example, a pattern mask could be created, superimposing curves on the stimulus, while the stimulus remains visible between the curves. In this way, a selected stimulus area can be degraded, without being completely occluded. Two limitations still restrict the applications of the moving overlay technique. First, the use of the video inputs of the first graphics board limits the display refresh frequency to 60 Hz interlaced. Repositioning the mask therefore can take up to two refresh cycles (i.e., 33.3 ms). Van Diepen, De Graef and Van Rensbergen (1994) argued, however, that displaying every other line of a mask would sufficiently degrade image information for most masking purposes. This maximally takes one frame refresh cycle of 16.7 ms. After one additional refresh cycle, all mask lines are displayed. A more conceptual second limitation is that alpha bits are stored together with the pixel information of the mask image. Consequently, when the alpha bits are repositioned relative to the screen origin, so is the mask. Therefore, the mask will have the same appearance for every fixation position. For most moving mask and moving window experiments this is no problem. But suppose that the foveal stimulus information has to be replaced by an adapted version of the same stimulus (e.g., a low contrast version). Then, for each new fixation position, the "mask" content has to be updated to display the appropriate part of the adapted stimulus for that specific position. Even though it is possible to write-protect the alpha bits of the first graphics board, reloading the adapted stimulus to match the new fixation coordinates would take too much time. Another eye contingent display change technique was developed, to deal with the limitations of the moving overlay technique (van Diepen, 1997). The set-up consists of three synchronized graphics boards, connected to a specially built electronic video switcher. The video switcher selects one of two graphics boards for output to the stimulus display. The selection is controlled by the binary level of a key signal. A switch from one graphics board to the other takes about 15 ns. For most graphics boards, this switching time is shorter than the duration of one pixel. Hence, by supplying the appropriate key signal, the video switcher will output any combination of the two input graphics boards. The third graphics board supplies the key signal. The three graphics boards are synchronized, so that all output signals correspond to the same pixel position. The first graphics board contains the normal version of a stimulus, while the second board outputs an adapted version of the same stimulus. The third graphics board contains only black and white pixels. Black pixels correspond to a low-level key signal for the video switcher, causing the normal stimulus version to be selected for output to the display. Conversely, white pixels select the adapted stimulus version. A moving window is realized, for
354
P.M.J. van Diepen, M. Wampers & G. d'Ydewalle
example, by storing the image of a white oval on a black background in the video memory of the third graphics board. By repositioning this image relative to the display origin, the oval can be moved to any display position. On the display, the adapted stimulus will appear within the window selected by the white oval, while outside of the window, the normal stimulus is displayed. Strictly speaking, this is not a moving window set-up in the classical sense. However, henceforth we will use the term "moving window" in a broader sense, to refer to all eye contingent display changes that manipulate the stimulus appearance inside or outside of a window, aligned with the eye position. A moving mask is a special case of a moving window, that masks all foveal information. Although the graphics boards that we used were factory designed to synchronize to interlaced video signals, a simple hardware modification made it possible to synchronize in a 60 Hz non-interlaced screen refresh mode, displaying 756x486 full-colour pixels. A 100 pixels high window can be moved, and fully displayed in 4.5 to 18 ms, depending on the concurrent position of the electron beam at the moment of the repositioning command. The above technique stores the window definition and its content separately. Accordingly, when the window is repositioned, overlapping areas of the old window and the new window will display the same image before and after the position change. Suppose, for example, that the within-window stimulus is a noise mask (i.e., the entire image of the second graphics board is loaded with random pixels). When the window is shifted one pixel position, the displayed image will change, but only at the window borders. The advantage of this feature is that foveal masking can be achieved from the very beginning of each fixation. To accomplish this, the window position has to be updated continuously during a saccade, to end with a small final position correction at the beginning of the following fixation. The small change to the window border at the start of a fixation is hardly noticeable. Obviously, a high speed eye tracker is required to successfully implement this algorithm. Earlier moving mask techniques (e.g., the moving overlay technique) would shift the mask content together with the mask position, hence inducing noticeable motion of the entire foveal noise pattern. Obviously other kinds of foveal as well as peripheral degradation can be accomplished without delay from the fixation onset, using the above algorithm. Another advanced display change technique was developed recently at the Beckman Institute, Illinois (McConkie and Loschky, 1997), using a Viewgraphics display controller. It has a 1 Gbyte display memory that can store thousands of video pages. Each page is loaded with a unique stimulus + window combination. As soon as new fixation coordinates are determined, the page is selected with a window position closest to the fixation position. The switch between video pages can be accomplished almost immediately, resulting in a minimal time lag between the onset of a fixation and the appearance of the corresponding stimulus + window
Moving masks and moving windows
355
combination. Since all stimulus + window combinations are prepared beforehand, their appearance is limited only by the imagination of the experimenter. In contrast to other moving window techniques, this feature therefore allows a gradual transition (spatially) from the within-window version of the stimulus to the version outside of the window. Other techniques will always show a sharp window border, that to some extent could attract subjects' attention. The necessity of assembling all stimulus + window combinations prior to the experiment is at the same time a major limitation of the technique. If all available pages are used for one experimental trial, a horizontal and vertical window position resolution of twelve pixels can be accomplished. In that case, for every fixation, a stimulus + window combination is available with a window centre that deviates no more than about 8.5 pixel distances from the actual fixation position. Better position accuracy could be achieved by further expanding the display memory, but given the size of windows that are generally employed, the 8.5 pixel position accuracy is fully acceptable. Thousands of stimulus versions however take too much time to load between trials, not to speak of the huge amount of disk space required to store all stimulus versions for a complete experiment. McConkie and Loschky (1997) therefore loaded only 330 stimulus + window combinations of each picture, allowing for a 35 pixel horizontal, and a 34 pixel vertical window position resolution. Consequently, the maximum distance from a fixation position to the window centre was approximately 24 pixel units, which was about one sixth of the diameter of the smallest window employed in the experiment. In this way, stimulus + window combinations for nine experimental trials could be loaded prior to the experiment, and six additional undegraded pictures for control trials.
This page intentionally left blank
357
CHAPTER 16
Film Perception: The Processing of Film Cuts Gery d'Ydewalle, Geert Desmet and Johan Van Rensbergen University ofLeuven
Abstract In film, three levels of editing errors in sequencing successive shots can be distinguished. First-order editing errors refer either to small displacements of the camera position or to small changes of the image size, disturbing the perception of apparent movement and leading to the impression of jumping. Second-order editing errors follow from a reversal of the camera position, leading to a change of the left-right position of the main actors (or objects) and a complete change of the background. With third-order editing errors, the linear sequence of actions in the narrative story is not obeyed. The present experiment shows that there is an increase of eye movements from 200 to 400 ms following both second- and third-order editing errors. Such an increase is not obtained after a first-order editing error, suggesting that the increase of eye movements after second- and third-order editing errors is due to postperceptual, cognitive effects.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
358
G. d'Ydewalle, G. Desmet & J. Van Rensbergen
Introduction Watching a movie is an enjoyable and seemingly effortless activity. We do not feel that much processing is going on to produce a smooth perception of a large sequence of discontinuous events. In the story of the movie, the chronologically linear sequence of events is often systematically changed, as for example when flashbacks are occasionally inserted. Moreover, a film is constructed from a large number of shots, each shot typically lasting only 10 s. For example, we analysed a 10-min segment of the middle part of a recent movie, The Remains of the Day: there were 71 shots, giving an average duration of 8.45 s per shot. We define here a shot as a single run of the camera, while a cut is the transition between the end of one shot and the beginning of the next one. Films are, on average, one hour and half long; this implies that there are usually more than 500 cuts. Despite the large number of discontinuities, the perception of a movie is typically experienced as a continuous one. Despite pervading discontinuities in the sequencing of films, their perceptual processing has not received much research attention. However, for several decades, there has been a long-standing debate among film-makers about the principles of film editing. The classical viewpoint defends the Formal Editing Principle or the so-called Hollywood concept of editing (see Bretz, 1962; Mascelli, 1965; Reisz, 1953/1968; Wurtzel, 1983). The principle implies that successive shots should try to minimize perceptually disruptive transitions. The rules which are derived from the Principle can be standardized, independently of the content of the movie. The modern viewpoint (see Godard, 1966; Wurtzel, 1983), on the one hand, stresses the consistency of the narrative structure (the Narrative Editing Principle); the consistency will perceptually overrule the disturbing effects in the transition between successive shots. On the other hand, the modern viewpoint also stresses the importance of deliberately inserting editing errors, in order to highlight occasionally critical parts of the story. Are shot transitions indeed perceptually disruptive? In order to tackle the problem empirically, we need to distinguish several types of transitions or cuts. In d'Ydewalle and Vanderbeeken (1990), we distinguished three levels of transition types (which we call editing "errors", following the terminology of the classical viewpoint). Editing errors of the first order refer either to small displacements of the camera position or to small changes of the image size, disturbing the perception of apparent movement and leading to the impression of jumping. Editing rules of the second order are based on the ability of the perceiver to construe a spatial-cognitive schema of the displayed scene, and an editing error implies a switch in the left-right location of the objects in the scene. We illustrate two types of editing errors of the second order: the static and dynamic reversedangle shots. A static re versed-angle shot involves the transition between shots taken on the two sides of the gaze axis. This occurs when a displayed actor looks to the
Film perception
359
right in the first shot, but due to the change of the camera position, to the left in the second shot. When more than one object is in view, this results in a change in relative position of the displayed objects where left and right positions are being switched; similarly, the background is also completely different. In order to restore the perceptual continuity, the perceiver will have to rotate the scene. In the case of a dynamic reversed-angle shot, the object (or actor) leaves the screen in a first shot, moving from the right to the left of the screen. Direction constancy on the screen requires the same direction of movement in the next shot, again from right to left; therefore, the object (or actor) is expected to enter the screen at the next shot, again from the right to the left. A dynamic reversed-angle shot is present when the direction constancy is violated. Third-order editing rules are meant to reinforce the narrative continuity of the story. Hochberg and Brooks (1978) call the cognitive activities to follow the story the "visual momentum". These higher-order cognitive processes, at the level of discourse processing, are assumed to be slower than the more lower-level perceptual processes involved in the first- and second-order editing transitions. In two pilot experiments, d'Ydewalle and Vanderbeeken (1990) attempted to validate the distinction between the three levels of editing errors. In some versions of the movie, various jump cuts (first-order editing errors) and reversed-angle shots (second-level editing errors) were inserted; the order of the various parts of the movie were either scrambled (third-order editing error) or not. In order to assess the effect of the first- and second-order editing errors, the eye movements of the perceivers were recorded during two intervals: 0-200 ms and 200-400 ms after the cut. In some conditions, the participants were also required to respond manually as quickly as possible whenever a transition from one shot to another one was detected: It was assumed that a narrative film with its more demanding cognitive processes will slow down the reaction times as compared with the reaction times when the narrative structure does not exist (scrambled version). While Experiment 1 of d'Ydewalle and Vanderbeeken (1990) showed faster manual responses when the story was scrambled, the data on the reaction times were less clear in Experiment 2: the reaction-time measurement is possibly too slow a measure for the on-going cognitive processes. Globally, the editing errors decreased the extent of eye movements but there were more eye movements following second-order than first-order editing errors during the second time interval. Accordingly, we interpreted the findings as evidence for a sequence of two steps in the processing of the editing error. There is first an immediate (i.e., during the first time interval) effect of the editing errors, narrowing (or focusing) the attention to the changed parts of the screen. Thereafter, second-order editing errors increase the eye movements, revealing some postperceptual confusion of the participants by the reversed-angle shots and/or their attempts likely to rotate the picture into the right axis.
360
C. d'Ydewalle, G. Desmet & J. Van Rensbergen
The present experiment was primarily meant to improve methodologically the two pilot experiments in d'Ydewalle and Vanderbeeken (1990) and to confirm the findings. The manual reaction-time task was no longer used; as for the first- and second-order editing errors, effects of the third-order editing errors were also assessed by the analysis of the eye movements. Following the Narrative Editing Principle, we expected fewer eye movements at the cuts when the narrative continuity is preserved; for example, lack of narrative continuity will lead the viewer to explore more the less-informative parts of the scene, or the viewer may try to re-establish the underlying discourse structure. In order to improve the reliability of the data, there were more participants. Finally, we extended the analysis of the eye movements to three additional time intervals, starting with 200 ms before the cut, and going to 800 ms after the cut: when first- or second-order editing errors occur, we expect fewer eye movements to occur immediately after the cut, but following second-order editing errors, an increase is predicted at the 200-400 ms interval.
Method Subjects Ninety-one first- and second-year students from various Departments at the University of Leuven, Belgium, volunteered to participate in the experiment. From this pool, 15 participants were excluded. Participants wearing glasses or lenses were excluded. A very few participants had one or other eye abnormality, and they were also excluded. The remaining group of participants (38 male and 38 female) was divided into two conditions as a function of their experience and acquaintances with film editing. The data of a postexperimental questionnaire allowed us to form a first condition: in this condition, all participants were totally unaware of, and were uninformed about the movie editing. All participants of the second condition were well informed about the movie editing, as they were at various stages involved in the preparation and construction of the films for the experiment. Both conditions had an equal number of participants. Materials and equipment The movie which we constructed did not include a sound track and was presented in black and white. It involved six people displaying various actions during a walk: its narrative structure is essentially a simple linear one (i.e., one action following another one), about 7 min long, and with 50 shots (i.e., 49 cuts). It can be divided into nine segments as a function of the simultaneous presence of one or more actors. The segments can be further divided into two subsegments depending on the performed action of the actor(s).
Film perception
361
Four versions of the movie were constructed as a function of the 2x2 design of the experiment: with versus without narrative continuity, and with and without editing errors. One version showed narrative continuity without editing errors. In the two versions without narrative continuity, the subsegments were rearranged randomly with the restriction that the transition from one subsegment to the next one was perceptually not disruptive. In the two versions including editing errors, we inserted three jump cuts, five static re versed-angle shots, and five dynamic re versed-angle shots. The three types of editing errors were more or less equally distributed over the whole movie. The movies were shown on a BARCO monitor with a screen of 40x50 cm, standing at about 150 cm from the ground. The participants were seated at a distance of about 150 cm from the screen, subtending a visual angle of approximately 15x19°. We used an eye movement registration system (DEBIC 90), which is based on the corneal reflection-pupil centre method. It has a sampling rate of 50 Hz, so every 20 ms the X and Y value of the participant's point of regard in the visual field as well as the pupil diameter and a time code are stored on a computer. The equipment needs to be calibrated for each participant, and this takes approximately 2 min. The results of the eye movement recording can be seen on screen (a crosshair moving constantly over the scene of the film), and they were taped. The values of the X and Y coordinates of the crosshair on the screen were stored on disk (256x256 lines). The entire analysis of the eye movements was done with the values of these coordinates. It cannot sufficiently be emphasized that the nature of the movie excludes any analysis in terms of eye fixations and saccadic eye movements. As the movie displays a considerable number of actions and movements, the eye movement pattern of the participants in the study involved not only fixations and saccadic movements but also pursuit movements which are intermixed between some fixations and saccadic movements. Therefore, we carried out the analyses directly on the raw data of the X and Y coordinates which are sampled every 20 ms. Results Five time blocks were defined for each cut where in the versions with editing errors such errors occurred: 200 to 0 ms before the cut, or 0-200 ms, 200-400 ms, 400-600 ms, and 600-800 ms after the cut. For each block, we took as measure of the variance of eye movements the standard deviations of the values on the X and Y coordinates (the horizontal and vertical movements). The standard deviations were first subjected to a multivariate analysis of variance, with the standard deviations on the X and Y coordinates as dependent variables. The independent variables were the nature of the participants (naive or informed), the narrative continuity, and the presence or absence of editing errors as between-subjects variables; the type of editing errors as well as the
362
G. d'Ydewalle, G. Desmet & J. Van Rensbergen
five time blocks were within-subjects variables. This was followed by univariate analyses, separately on the standard deviations in the X and Y coordinate. A wealth of significant main and interaction effects was obtained. As the findings from the univariate analyses converge with the ones from the multivariate analysis, we restrict the present presentation to the separate findings on the X and Y coordinate. If main or interaction effects are involved in a higher-order interaction effect, only the higher-order interaction is described. We first present the findings on the horizontal X coordinate. Narrative continuity interacts significantly with the five time blocks, F(4,212) = 3.767, MSC = 9.637, p < 0.006. Figure 1 shows a larger standard deviation at the 200-400 ms interval in the absence of a narrative continuity than when there is a narrative continuity. The a posteriori tests confirm this pattern: The peak in the figure differs significantly (at least p < 0.05) from all other means. The introduction of editing errors also increases the standard deviation at the 200-400 ms interval. Figure 2 displays the averages involved in the significant interaction effect between the presence or absence of editing errors and the five time blocks, F(4,212) = 5.968, MSe = 9.637, p < 0.0002. The a posteriori tests again confirm this pattern: The peak in the figure differs significantly (at least p < 0.05) from all other means. Not all types of editing errors increase the standard deviations, as evidenced in the significant interaction effect between the presence or absence and the type of editing errors, F(2,106) = 3.827, MSe = 11.570, p < 0.03. From Fig. 3, it appears that both types of reversed angle shots (static and dynamic) increase the standard deviation but this does not happen with jump cuts. In both cases, the reversed angle shots increase the standard deviation significantly (a posteriori Tukey tests), while the difference with jump cuts is opposite but not significant. To summarize the findings on the X coordinate, all cuts clearly increase the standard deviation at the 200-400 ms interval following the cut. This increase is even larger when the narrative continuity is disturbed and when reversed angle shots are inserted. Jump cuts do not affect the eye movements. As to the findings on the Y coordinate, it should first be pointed out that there is much more noise in the data. After removal of all variances due to the main and interaction effects, the residual means square of error is 7.887 and 11.805 for the data in the X and Y coordinates, respectively. All significant main and interaction effects are involved in one significant second-order interaction effect, involving the presence or absence of editing errors, the nature of the editing errors, and the five time blocks, F(8,424) = 2.600, MSe = 11.805, p < 0.01. Figures 4 (conditions with editing errors) and 5 (conditions without editing errors) display the averages involved in the significant interaction. We performed a posteriori Tukey tests for the differences between conditions with and without editing errors, separately for each type of editing error at each time
Film perception
363
Fig. 1. Standard deviation (SD) on the X coordinate, as a function of narrative continuity and time blocks.
Fig. 2. Standard deviation (SD) on the X coordinate, as a function of the presence of editing errors and time blocks.
364
G. d'Ydewalle, G. Desmet & J. Van Rensbergen
Fig. 3. Standard deviation (SD) on the X coordinate, as a function of the nature of the editing errors.
with and without editing errors, separately for each type of editing error at each time interval. Only two differences are significant. The first one is rather a mystery: at the interval preceding the cut, jump cuts produce larger standard deviations. We cannot explain why the jump cut could affect the eye movements before the cut occurs. In the condition with editing errors the jump cut could not be anticipated; moreover, the shot preceding the jump cut was exactly the same in the conditions with and without editing errors. The second significant difference is produced by the dynamic reversed-angle shots, again at the 200-400 ms interval. It is further worthy of note in Figs. 4 and 5 that the peak at the 200-400 ms interval is obtained with the static reversed-angle shots but it also emerged in the absence of those shots (conditions without editing errors). Discussion The findings confirm the validity of the distinction between the proposed three levels of editing errors. Narrative discontinuity increased the eye movements at the 200-400 ms interval following a cut. Additionally, second-order editing errors also increased the eye movements at the same time interval; this occurred independently
Film perception
365
Fig. 4. Standard deviation (SD) on the Y coordinate, as a function of the nature of the editing errors and time blocks (conditions with editing errors).
Fig. 5. Standard deviation (SD) on the Y coordinate, as a function of the nature of the editing errors and time blocks (conditions without editing errors).
366
G. d'Ydewalle, G. Desmet & J. Van Rensbergen
from the effects of the narrative discontinuity as no significant interaction was obtained between the narrative structure (continuous versus discontinuous) and the two lower-order editing manipulations. The eye movements were apparently uninfluenced by the first-order editing errors, the jump cuts. The displacement of the displayed actors was large enough to make any apparent motion illusion between consecutive shots unlikely; in other words, the jumps were clearly visible. Notwithstanding, such jumps did not redirect the eyes to other positions on the screen. This is in agreement with the findings of d'Ydewalle and Vanderbeeken who showed larger standard deviations with secondorder than with first-order editing errors. Here, it should be noted that d'Ydewalle and Vanderbeeken (1990) also carried out all the analyses on the standard deviations of the X and Y values; unfortunately, the paper continuously refers incorrectly to analyses on the variances of the X and Y values but this reporting error does not affect neither the nature of the findings nor the subsequent discussion. In d'Ydewalle and Vanderbeeken (1990), we proposed a two-stage sequence of processing following an editing error: first, a focusing of the eyes leading to less variance in the eye movements; after 200 ms following the presentation of a re versed-angle shot, the mind tries to restore the continuity of the perceptual space by an imagery rotation of the scene, leading to more variance in the eye movements. The present study confirms the second stage. However, we do not find evidence for the first focusing stage: the standard deviation of the eye coordinates at 0-200 ms is not significantly smaller after an editing error than without an editing error, and it is also not smaller than the standard deviation preceding the cut (at -200 to 0 ms). All cuts, with and without editing errors, produced a larger standard deviation at 200-^400 ms following a cut. However, the standard deviation becomes even larger with second- and third-order editing errors. It is worthwhile mentioning that using informed participants did not change the nature of the findings; the nature of the participants did not produce main and interaction effects, suggesting that the obtained effects are not affected by the information about editing and the nature of the movie. While the experiment nicely shows dissociative effects of the various types of cuts, a few limitations need to be mentioned. First, the setting of the movie was a rather restricted one (a walk party with several actions); a typical movie covers of course a large number of settings. Second, the eye movements suggest that something happens after the cuts but we do not know yet what is going in the perceptual and cognitive system of the perceiver. Our current research program tries to unravel more precisely the perceptual and cognitive processes in restoring the continuity between two consecutive shots. For example, re versed-angle shots imply a reversal of the left-right position of the main actors on the screen and a complete change of the background. In an unpublished study of the laboratory, we manipulated orthogonally the relative position of the
Film perception
367
actors and the background; it then appears that the changed background is left largely unnoticed while the change of relative position of the actors is disturbing. In another experiment, Germeys (1997) looked at the visual areas which are fixated by the eye following a cut. Fine-gradient analyses confirm fast eye movements, back and forth, between informative parts of the display, following a second-order editing error. The back and forth movements are clearly intended to restore the perceptual continuity in the processing of the screen action(s). Acknowledgements This research is supported by a G.O.A.-grant from the Flemish Government of Belgium, Convention No. 93/1. The findings were presented at the 8th European Conference on Eye Movements, Derby, United Kingdom, 6-9 September 1995. The authors thank Filip Germeys for pointing out the reporting error in d'Ydewalle and Vanderbeeken (1990), as indicated in the present paper. References Bretz, R. (1962). Techniques of Television Production. New York: McGraw-Hill. d'Ydewalle, G. and Vanderbeeken, M. (1990). Perceptual and cognitive processing of editing rules in film. In: R. Groner, G. d'Ydewalle and R. Parham (Eds.), From Eye to Mind: Information Acquisition in Perception, Search, and Reading. Amsterdam: Elsevier/North-Holland, 129-139. Germeys, F. (1997). De verwerking van filmmontages: Oogbewegingen bij het bekijken van montagefouten [The processing of film editing: Eye movements while watching editing errors]. Unpublished licence thesis, University of Leuven, Belgium. Godard, J.-L. (1966). Montage, mon beau souci. Cahiers du Cinema in English, 3, 44-45. Hochberg, J. and Brooks, V. (1978). The perception of motion pictures. In: E.G. Carterette and M.P. Friedman (Eds.), Handbook of Perception, Vol. 10: Perceptual Ecology. London: Academic Press, pp. 259-304). Mascelli, J.V. (1965). The Five Cs of Cinematography: Motion Picture Techniques Simplified. Hollywood, CA: Cine/Graphic Publications. Reisz, K. (1953/1968). The Technique of Film Editing. New York: Communication Art Books. Wurtzel, A. (1983). Television Production. New York: McGraw-Hill.
This page intentionally left blank
369
CHAPTER 17
Visual Search of Dynamic Scenes: Event Types and the Role of Experience in Viewing Driving Situations Peter R. Chapman and Geoffrey Underwood University of Nottingham
Abstract The cognitive processes involved in driving a vehicle remain poorly understood, partly because widely different strategies can underlie highly similar observable behaviour. Eye movement recording and analysis provide important techniques for understanding the nature of the driving task and are important for developing driver training strategies and accident countermeasures. There are, however, surprisingly few widely replicated findings from this field of research. Moreover, it is difficult to generalise from existing research to the kinds of dangerous situations which actually cause road accidents. The study reported in this chapter records the eye movements of relatively large groups of both novice and experienced drivers while watching videos of dangerous situations. Records of visual behaviour can thus be aggregated over groups of drivers watching identical situations. The largest overall difference between the groups was that novices had generally longer fixation durations than experienced drivers in this task. It is argued that this reflects the additional time required by novices to process information in the visual scene. Important differences were also found between the types of situation used in the study. Rural situations, even those chosen to be particularly dangerous, generally evoked fewer responses from subjects and longer fixation durations. In urban films both groups of subjects reported many more dangerous events and had shorter mean fixation durations. These results are in line with the previous literature but are surprising when compared with a moment by moment analysis of the subjects' behaviours. Here it is clear that dangerous events generally evoke long fixation durations. This result demonstrates the dangers in averaging eye movement data over periods of time in dynamic scenes. We thus argue that to fully understand the subtlety of such data, and to draw realistic conclusions about the cognitive process underlying observable behaviour, it is necessary to develop a detailed understanding of the moment by moment 'syntax' of driving situations. Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
370
P.R. Chapman & G. Underwood
"New drivers tend to look for long periods at only one aspect of the traffic scene and this often results in steering errors and a lack of information on other aspects of the road and traffic situation. An active visual road scan will not only provide more information at an earlier stage, but will also give more time to respond. "The eye movements of experienced drivers are very rapid, moving quickly rom one point of interest to another, checking and re-checking areas of risk." (Miller and Stacey, 1995, p. 204). The above quotation taken from The Driving Instructor's Handbook provides standard information to British driving instructors about the importance of visual search and the development of eye movement patterns with increasing driving experience. Their basic advice is that new drivers should aim to develop a pattern of visual search which is characterised by many short fixations widely spread across the visual scene. This advice is in broad agreement with that provided by six American driver education publications reviewed by Zwahlen (1993); however, it raises a number of practical and theoretical questions, for example: (i) is there good experimental evidence to support the general patterns of differences between novice and experienced drivers which they describe? (ii) are differences between drivers consistent across all types of road environment, or do they depend on the precise characteristics of the ongoing situation? (iii) are these differences present only in the earliest stages of learning to drive, or may they also be related to differences in accident liability later in one's driving career? (iv) how can we define a "point of interest" or "areas of risk" in everyday driving situations? This chapter attempts to answer these questions by reviewing some of the relevant literature and describing a study in which we have explored the eye movements of drivers watching films of dangerous situations. The study focuses particularly on accident liability having passed the driving test by comparing a large group of newly qualified drivers known to be at high risk of accident on the road with a group of experienced drivers known to be at much lower risk. First we will describe some of the existing research on drivers' eye movements with particular reference to changes with experience and the judgement of danger in driving scenes. Eye movements and driving It is widely accepted that deficiencies in visual attention are responsible for a large proportion of road traffic accidents (Sabey and Taylor, 1980). An understanding of the visual search strategies of drivers is thus extremely important, and much research has been conducted in this area. Although there are clear problems with the assumption that records of eye movements fully describe the distribution of visual attention, these provide the best source of data available from naturalistic studies. At
Search of dynamic scenes
3 71
their simplest, drivers' fixation patterns on straight roads can be described as concentrating on a point near to the focus of expansion (the point in the visual field in front of the driver where objects appear stationary) with occasional excursions to items of road furniture and road edge markers (Helander and Soderberg, 1972; Mourant and Rockwell, 1970; Shinar, McDowell and Rockwell, 1977). This reliance on the focus of expansion in the scene is assumed to be because it provides precise directional information to the driver and is the location near to which future traffic hazards are likely to be first visible. Increasing the complexity of the visual scene (by adding vehicles, road furniture, or irrelevant signing) increases the number of eye movements made and decreases the mean fixation durations on individual objects (Erikson and Horberg, 1980; Luoma, 1986; Miura, 1990; Robinson et al., 1972; Rutley and Mace, 1968). This seems to be a natural response to having more objects available in the visual field to look at; it is not clear whether decreases in fixation durations mean that objects are processed less completely or that redundant fixation time is simply reduced. Cohen (1981) found that subjects viewing slides in the laboratory adopt longer fixation durations than those actually driving a vehicle in the same situations. He suggests that this is largely a consequence of the lack of time pressure in the laboratory and argues that on the road subjects adopt more task-relevant strategies and pick up more information per unit time. It may thus be that long fixation durations in situations with low visual complexity tell us little about the information that is being extracted from the scene and more about the low subjective workload imposed in such situations. The eye movement patterns become slightly more complex when the driver is required to negotiate a curve. Drivers generally adjust their fixation locations to maximise their sight distance and provide information about the future curvature of the road (Helander and Soderberg, 1972; Shinar, McDowell and Rockwell, 1977). In many cases this means focusing on the tangent point made by the driver's line of sight ahead to the inside of the curve (Land and Lee, 1994), though information about lane position from closer to the vehicle's current position also seems to be necessary for accurate curve following (Land and Horwood, 1995; McLean and Hoffmann, 1971, 1973). Once again it should be noted that although extended fixations on the tangent point are frequently observed, these may be optional strategies adopted in low workload situations, a possibility which is supported by large individual differences in the number of off-road features which drivers choose to fixate (Land and Lee, 1994). The above patterns of eye movements may be to some degree simply determined by the nature of the visual scene rather than representing complex learned strategies of information acquisition. It is thus of theoretical interest to see whether these patterns change as a function of traffic experience. This is also an area of great applied interest. In Britain a driver in their first year of driving since passing the test
372
P.R. Chapman & G. Underwood
has been estimated to be 69% more likely to be involved in an accident than one in their second year of driving (Forsyth, Maycock and Sexton, 1995). Clearly such calculations are dangerous because of the confounding of experience with changes in age and exposure (Brown, 1982), however, careful modelling of the effect suggests that it can be mostly attributed to changes in traffic experience (Maycock, Lockwood and Lester, 1991), with a 38% reduction in accident risk over the first year for a 17 year old being solely attributable to the increased experience (Forsyth et al., 1995). Clearly then changes in visual search as a function of traffic experience may be directly related to accident involvement and are thus of great practical importance to study. Effects of traffic experience The research of Mourant and Rockwell (1972) on the relationship between experience and eye movements in driving is widely cited as demonstrating differences between experienced and novice drivers in their visual search, specifically that novices concentrate their search in a smaller area, closer to the front of the vehicle than experienced drivers do (e.g. Evans, 1991). However, before accepting this intuitively appealing result it is important to appreciate some of the limitations of Mourant and Rockwell's research. The study used only ten subjects, four experienced drivers and six novices, and the novices had no more than 15 minutes previous driving experience before taking part in the first phase of the study. Although the novices were found to scan less widely in the horizontal axis than the experienced drivers, training of these novice drivers appeared to make their horizontal scanning behaviour less rather than more like that of experienced drivers. Mourant and Rockwell (1970) also found that substantial reductions in the spread of visual search could be achieved simply by having drivers repeat the same route three times. The reported difference in vertical gaze location (novices looking closer to the front of the vehicle) was only actually significant (p < 0.05) in one of the nine driving subtasks described in the study. Mourant and Rockwell (1972) also noted a tendency for pursuit movements to decrease in frequency as a result of training. While these reported differences in visual search between novice and experienced drivers remain plausible and interesting, the evidence for their ubiquity across varying levels of experience and traffic situations is still relatively slight. The steering behaviour of drivers has been found to change quite markedly throughout the course of learning to drive (Smiley, Reid and Fraser, 1980), suggesting that novice drivers may be using different control strategies to those adopted by more experienced drivers and this could clearly require the acquisition of different visual information (cf. Land and Horwood, 1995). Cohen and Studach (1977) have directly explored differences in eye movements in curve negotiation as a function of traffic experience. They compared four inexperienced drivers with five experienced
Search of dynamic scenes
373
ones; experience here was defined in terms of total mileage driven — all subjects had been driving for at least a year at the time of testing. They found large differences in fixation durations and movement amplitudes for experienced drivers depending on the direction of the curve being driven, but no significant differences for their inexperienced drivers. On curves to the right experienced drivers had significantly shorter fixation durations than inexperienced ones, while on curves to the left the experienced drivers had significantly greater movement amplitudes. While the differences between curve directions are difficult to interpret, these results could be seen as evidence for experienced drivers adopting wider search strategies and requiring less time to process information at specific locations. Another small study has also suggested that novice or inexperienced drivers may make less use of their mirrors (Mourant and Donohue, 1977), though there was no evidence for differences in fixation durations on the mirrors in this study (see also Mourant and Rockwell, 1972). Many of the above differences may be in large part caused by basic car control skill limitations during the earliest stages of learning to drive. From a perspective of reducing accident liability these stages are of relatively little interest. For example Forsyth et al. (1995) found accident rates while learning to drive to be less than 0.6% compared with the 18% accident rate for drivers in the first year after passing the test. Though there are numerous reasons for this difference (e.g., the fact that learners are accompanied when driving, the small distances that they cover, and the types of road environments they experience during learning), the statistical fact remains that it is newly qualified drivers rather than learners who pose the greatest challenge to road accident researchers. Arguably the current British driving test is relatively good at preventing drivers with grossly inadequate control skills from entering the driving population (Forsyth, 1992); it may be harder to judge the adequacy of a candidate's visual search strategies. The types of novice-experienced differences that would thus be particularly interesting are those caused by the experienced driver's increased knowledge of the road environment. It is conceivable that substantial traffic experience would allow drivers to predict the locations of potential hazards and modify their search strategies accordingly. In support of this hypothesis, Theeuwes (1996; Theeuwes and Hagenzieker, 1993) has demonstrated the importance of top-down processes in the perception of traffic scenes. In these experiments experienced drivers were found to direct their eye movements towards areas where relevant information was likely to be located, demonstrating considerable impairment in performance when stimuli appeared in unpredictable locations. These results may explain differences in the on-road eye movements of novice and experienced drivers reported by Underwood, Crundall and Chapman, (1997; see also Chapter 18). In this study experienced drivers were found to have increased horizontal search and decreased fixation durations relative to novices in demanding dual-carriageway driving. These results can be interpreted in terms of
374
P.R. Chapman & G. Underwood
experienced drivers having greater knowledge about the potential locations for threat-related information. However, in on-road driving there is always the danger that what is being measured is some form of interaction with the driver's level of control skills (e.g., novices finding lane maintenance difficult at high speeds) or the type of situation that the driver actually gets into (novice drivers creating dangerous situations by virtue of poor driving). These problems can best be resolved by laboratory studies of driver behaviour. Miltenburg and Kuiken (1990) avoided many of the problems with on-road studies by having drivers watch video recordings of six common traffic situations and recording their eye movements. Their study had the additional advantage of using a relatively large number of subjects, an issue which is particularly important when conclusions are to be drawn about group differences in driving experience. They tested 47 subjects split into four groups on the basis of their driving experience, ranging from novice drivers with less than one year of driving experience, to very experienced drivers with more than five years of experience and more than 100,000 km driven in the previous year. They found that for one of their scenes, crossing an intersection, experienced and very experienced drivers fixated more briefly than inexperienced drivers who in turn fixated more briefly than novice drivers. They found no evidence supporting two other experimental hypotheses; that novice drivers might fixate closer to the front of the car, or that experienced drivers might fixate relevant objects sooner. They did find a number of differences between their groups in a post-hoc analysis, particularly noting that novice and inexperienced drivers spent longer fixating near to the vehicle when the film showed the car negotiating a bend. Other differences were present but did not change across the four groups in a way that suggested a relationship with traffic experience. However, Miltenburg and Kuiken's study was plagued by missing fixation data and many apparently interesting effects proved to be created by replacing missing data with mean values. They also made no attempt to differentiate theoretically between the six traffic scenes that they used; before results from any such research are likely to prove generalisable it will be necessary to have a better understanding of the types of stimuli that are being used. Judgement of dangerous traffic scenes One way to understand more about traffic scenes is to explore drivers' judgements of them on a variety of dimensions. Riemersma (1988) had subjects rate a series of 28 road scenes (presented as slides) on 24 different scales. Factor analysis of these data revealed a first factor relating to safety (involving scales such as the safe speed to drive in the situation, the probability of slow moving vehicles being present, and the clarity of the visual scene), and three further factors relating to the urban/rural distinction, the nature of the road boundary, and the presence of clear road markings.
Search of dynamic scenes
375
Groeger and Chapman (1992,1996) performed similar studies using short videos of driving situations; once again a factor analysis revealed danger as the first factor (in this case accounting for 44% of the variance), with a second factor identified as the difficulty of driving in the situation. In these studies significant differences were found not only between filmed situations, but also between drivers. Groups of drivers differing in age and driving experience were found to use the scales differently, with older drivers concentrating particularly on the dangers in the scene. Groeger and Clegg (1995) explored this result further by exploring the development of such a factor structure over the course of the first 18 months of driving experience. Their findings suggest that the development of such an understanding of difficulty and danger in traffic scenes develops as a function of driving experience, particularly as measured by the total mileage recorded since passing the driving test rather than as a simple function of time. A developmental understanding of danger in the traffic scene has frequently been proposed as a major factor in novice road accidents (Brown and Groeger, 1988). The importance of danger in the judgement of traffic scenes is emphasised by a number of studies looking for a relationship between people's abilities to detect hazards in driving scenes and their accident involvement (Pelz and Krupat, 1974; Quimby et al., 1986; Quimby and Watts, 1981). In such research it is assumed that the time taken to detect hazards in driving scenes and make a manual response represents the kind of safety margin that would be available to the driver on the road (cf. Miller and Stacey, 1995). Two recent reviews of the literature on individual accident liability have both concluded that such hazard perception abilities represent the most promising perceptual or cognitive predictors of road traffic accident involvement (Elander, West and French, 1993; Lester, 1991). McKenna and Crick (1994) additionally report that hazard perception ability improves with the transition from novice to experienced driver status and further improves with the transition to expert driver as a result of specific training techniques. Such research causes us to anticipate that novice and experienced drivers may differ considerably in their visual search strategies in dangerous situations, and moreover that these differences may be important in predicting and possibly reducing accident involvement. Novice and experienced drivers viewing dangerous traffic scenes Studying the eye movements of drivers in dangerous road situations raises a number of practical and ethical difficulties. Even when everyday road situations are used there are significant problems attempting to research eye movements during actual driving. Real traffic situations will always differ from one subject to the next, and the subjects have control over aspects of the situation likely themselves to interact with eye movement measures (i.e., control use, speed, lane position, following distances). These problems can be reduced by exploring behaviour in a driving
376
P.R. Chapman & G. Underwood
simulator but there are important methodological problems even here. A number of studies have suggested that while eye movement patterns in the laboratory may be a reasonable reflection of behaviour on the road (e.g., Hughes and Cole, 1986a, 1986b), dynamic rather than static scenes are required (Cohen, 1981), and the fidelity of reproduction of the environment is extremely important (Staplin, 1995). This latter point is of particular importance when hazardous scenes are to be used. The anticipation of danger in real driving situations may depend on quite subtle cues which are difficult to identify and produce accurately in a driving simulator. There are considerable advantages to using filmed driving scenes in the manner of Miltenburg and Kuiken (1990) even though vehicle control is not required in such circumstances. Cohen's (1981) study and our own earlier discussion also suggest that a degree of time pressure is necessary if subjects are to adopt realistic visual search strategies when watching driving scenes in the laboratory. In the study we describe below this is created by requiring novice and experienced drivers to identify hazards in a film as quickly as possible while their eye movements are recorded. We have attempted to avoid Miltenburg and Kuiken's (1990) problems with missing data by using a sufficiently large sample that subjects with missing data can be excluded from overall analyses. The novice subjects are specifically chosen to be those at greatest accident risk on British roads, drivers who are in their first year of driving after passing their test. Method Subjects The 112 subjects in this experiment were selected from those attending our laboratory as part of a larger study exploring the skills of newly qualified drivers. The drivers for this experiment were in one of two categories: Novices who were all tested within their first year of holding a full British driving licence, and Experienced drivers who had held a licence for between 5 and 10 years at the time of testing. There were 83 Novices, 49 males and 34 females and 29 Experienced drivers, 13 male and 16 female. Subjects ranged in age from 17 to 44, with the mean age of Novices being 21 and the mean age of Experienced drivers being 27. Towards the end of their first year of driving (for Novices) or at an equivalent time for Experienced subjects, all but eight of the participants filled in a questionnaire to assess the types of driving they had done and any accidents they had been involved in. This data is briefly presented to further describe the nature of our Novice and Experienced groups. The Experienced drivers drove an average of 14,024 miles in the period, while Novice drivers drove only an average of 4,513 miles. In this period the Experienced drivers reported having had a mean of 0.07 accidents while the Novice drivers reported a mean of 0.51 accidents. This corres-
Search of dynamic scenes
377
ponded to 37% of the Novice drivers being accident involved (including 9% in two accidents, 3% in three accidents) and just 7% of the Experienced drivers despite their higher mileages (none of the Experienced drivers reported involvement in more than one accident). These figures are in good agreement with those that would be predicted from national statistics (Maycock et al., 1991) and suggest that our sample is representative of normal British Novice and Experienced drivers. It also confirms that our Novice drivers are at considerably elevated risk of accident involvement in their everyday driving. Stimuli/procedure Subjects watched a series of 13 short film clips lasting between 22 and 64 seconds each. The films showed driving situations recorded from the driver's point of view and were designed to contain a number of potentially dangerous events such as bicycles pulling out suddenly, pedestrians emerging from behind parked cars or the driver having to overtake a horse on a narrow country lane. Films contained between one and three such events each such that a total of 20 defined dangerous events occurred over the 13 films. The films included a wide variety of different road types and traffic participants. The films were presented as MPEG video on a computer monitor 1 m from the subject, subtending approximately 15° of visual angle. Eye movements were monitored using a SRI Dual Purkinje Generation 5.5 eye-tracker produced by Fourward Technologies. Calibration was monitored during the presentation of films and an opportunity for recalibration was available before the presentation of each new film. Head movements were reduced by using a head restraint and chin cup. Subjects were tested individually in a session lasting approximately 20 minutes. They were informed that they should watch the films as if they were the driver of the car and press a response button as soon as possible if they saw an event coming up that would require them to brake or take some form of evasive action. The order of presentation of the films was randomly chosen for each subject. Eye movement data was recorded directly onto the computer used to present the video stimuli. This computer also recorded the timing of response button pressing by the subject. Results Overall analyses Two general measures of button responses were calculated for all subjects, firstly the total number of responses per hazard per film (allowing for the fact that some films contained multiple hazards), and secondly a reaction time measure calculated as the time between the onset of each dangerous event and the next button press.
378
P.R. Chapman & G. Underwood
Although both these measures differed significantly from film to film, F( 12,1320) = 13.03 and 6.24, both p < 0.01, neither of these overall measures differed significantly between the Experienced and Novice drivers [F(l,l 10) = 0.60 and 0.30] nor were there any significant interactions between group (Experienced versus Novice) and film [F( 12,1320) = 0.76, and 0.80]. Five overall measures of eye fixation patterns were then calculated, the mean fixation duration, the mean horizontal and vertical fixation locations, and the variance in horizontal and vertical fixation locations over the individual film. Eye movement data from dynamic scenes require special criteria to be used in the determination of fixations since virtually all objects on the screen are in motion for much of the time. A fixation was declared to be in progress when the point of gaze remained within an area of 0.25° for a period of 50 ms. No distinction was made between fixations and pursuit tracking movements, all data sequences that fulfilled the above criterion were defined as fixations while data at all remaining times were rejected from further analysis. Data were also rejected where there was any question about the quality of the eye movement recording. The data from 27 subjects was regarded as sufficiently poor to be excluded completely from the analysis, leaving 85 subjects for the eye movement analyses, 26 Experienced drivers and 59 Novices. The most common reason for totally excluding eye movement data from the analysis was difficulty in recording caused by subjects wearing glasses or contact lenses. Our Experienced sample were specifically recruited for having uncorrected vision and we thus lost much less data from such subjects. A further 39 subjects had data excluded for particular films. Overall analyses of variance thus used only the 46 subjects with complete data for all stimuli, however, the pattern of results is the same as that obtained with all 85 subjects included by substitution of missing values with means, or performing individual comparisons on a film by film basis. Fixation durations differed reliably between Novice and Experienced drivers, F( 1,44) = 4.86, p < 0.05, with Novices having reliably longer fixation durations (447.5 ms versus 399.7 ms). There was a main effect of film, F( 12,528) = 16.20, p < 0.01, but no significant interaction between the two [F( 12,528) = 0.82]. All of the remaining four eye movement measures showed significant differences between the 13 films, but no main effects of subject group or interactions. Table 1 shows details of the main measures for the 13 different film clips. On viewing the clips it is immediately apparent that the substantial differences between clips in gaze angles and variances can be attributed largely to gross physical characteristics of the scenes. Thus the clips with particularly large horizontal gaze variances show a vehicle negotiating sharp bends and those with the largest vertical gaze variances show the vehicle passing over the brow of hills. While these results may be of some interest in themselves, the lack of any effect of experience suggests that they may be very general characteristics of the information distribution in our films rather than the result of complex search strategies. Of greater interest are the
Search of dynamic scenes
379
Table 1 Summary data for the 13 film clips, with averages of each measure for rural, suburban and urban films Road environment
Mean Mean responses fixation duration per sec (ms)
1
Rural
0.048
433.4
2.01
-0.88
6.78
0.57
2
Rural
0.037
435.2
0.77
-0.93
8.04
0.96
3
Urban
0.066
386.7
0.62
-0.711
3.30
0.86
4
Suburb
0.068
427.3
-0.07
-0.18
6.72
0.22
5
Urban
0.097
406.1
-0.08
-0.56
6.00
0.66
6
Urban
0.075
402.2
0.31
-0.33
5.93
0.48
Suburb
0.074
406.8
0.29
-0.39
5.97
0.60
8
Suburb
0.046
401.1
-1.30
-0.26
4.85
0.42
9
Suburb
0.062
415.3
-2.08
-0.38
5.22
0.58
10
Urban
0.079
415.0
1.02
-0.92
3.88
0.37
11
Rural
0.063
458.6
2.41
-0.75
4.19
0.49
12
Rural
0.045
481.2
-0.19
-0.42
2.29
0.36
13
Rural
0.079
489.3
-1.39
-0.41
3.10
0.61
Means
Rural
0.048
452.1
1.25
-0.75
5.33
0.60
Suburb
0.062
412.6
-0.79
-0.30
5.69
0.45
Urban
0.079
402.5
0.47
-0.63
7.28
0.59
7
Mean horizontal gaze angle (°)
Mean Horizontal Vertical vertical gaze gaze gaze angle variance variance (°)
Film no.
differences between clips in mean fixation durations. We followed Riemersma (1988) in performing multidimensional scaling to further explore the nature of our stimuli related to this attribute. Based on the fixation durations of all subjects a (Euclidean) distance measure between the experimental films was computed. Multidimensional scaling produced a two-dimensional solution with 10.8% stress (r squared = 0.96). It can be seen from Fig. 1 that this solution can be characterised in terms of the transition of the road environments from rural, to suburban, to urban (Dimension 1); the second dimension is harder to interpret but may relate to the number of button presses per second within each road environment. The means in
380
P.R. Chapman & G. Underwood
Fig. 1. Derived stimulus configuration for the 13 film clips using mean fixation durations to calculate a proximity matrix. Stimulus labels are as given in Table 1.
Table 1 show thus the anticipated relationship between road environment, fixation durations, and button presses — with the a priori higher demand environments being characterised by more button responses and shorter fixation durations. Differences within films While the above analyses show an interesting difference between groups on mean fixation durations, and differences between films in most measures, it is clear a priori that averaging eye movement measures over 40 second films obscures as much as it reveals. To provide a record of dynamic changes in fixations over the time course of each film we thus averaged fixation data for the 85 subjects with good eye movement recording on a moment by moment basis. Four measures were calculated every 100 ms for as long as fixations were in progress: mean vertical and horizontal gaze locations, the duration of the ongoing fixation, and the saccade distance made to the onset of the current fixation. These measures were then averaged across each group of subjects, additionally allowing the calculation of the mean number of button presses per subject per second at any moment and the variance across subjects in horizontal and vertical gaze angles (note that this is a very different concept to the within subject variances calculated in the previous section). As an initial assessment of relationships between these measures correlations were calculated between the above seven measures for each of the thirteen films. Table 2 shows the average (using Fisher's z transformations) correlations between these measures across the 13 films.
Search of dynamic scenes
381
Table 2 Correlations between variables calculated ten times a second for each of the films and averaged (using Fisher's z transformation) across the 13 films Mean Mean Mean Mean Horizontal Vertical Mean responses fixation horizontal vertical gaze gaze saccade per sec duration gaze gaze angle variance variance distance angle Mean responses
1.000
Mean fixation duration
0.457
1.000
Horizontal gaze angle
-0.062
0.004
1.000
Vertical gaze angle
-0.053
-0.181
0.141
1.000
Horizontal gaze variance
-0.211
-0.358
-0.002
-0.072
1.000
Vertical gaze variance
-0.170
-0.186
0.024
-0.157
0.501
1.000
Mean saccade distance
-0.162
-0.483
-0.020
0.182
0.693
0.325
1.000
Many of the correlations in Table 2 shed light on the relationship between our measures; so the high correlations between saccade distances and horizontal and vertical variance suggest that these are all related measures of the degree to which the point of gaze moves around the scene, and the negative correlations between fixation durations and these three measures show that where wide scanning strategies are being adopted individual fixations tend to be relatively short. There is also one particularly striking relationship evident in the table, namely the high positive correlation between the number of button responses per second and the mean fixation duration. Note that this relationship is in exactly the opposite direction to that observed in our previous comparisons between the clips. We find that while clips evoking many button presses per second also evoke generally short fixation durations, on a moment by moment basis in any clip long fixation durations are strongly associated with the detection of dangerous events. Correlation matrices as in Table 2 were additionally calculated separately for Novice and Experience drivers. The pattern of correlations was virtually identical for the two groups. To show the key features of these relationships three of the important variables, mean responses per second, fixation durations and saccade distances are shown for
382
P.R. Chapman & G. Underwood
two films in Figs. 2 and 3. Figure 2 shows the data for clip number 5, a busy urban shopping street with parked vehicles on each side. There were three pre-defined dangerous events in the film, each corresponding to a pedestrian or pedestrians crossing the road from right to left. The first and third events involve pedestrians emerging suddenly from behind parked vehicles (10 and 25 seconds into the film), while the second event is a pedestrian by the side of the road who is clearly visible for several seconds before crossing the road 22 seconds into the film). Figure 2(a) shows the number of button responses per subject per second on a moment by moment basis (10 points per second). If all subjects pressed the response button within one second of a particular event this would give a value of 1. The figure clearly shows that the three pre-defined dangerous events are readily detected by subjects, with greater temporal agreement in button pressing for the first and third events than for the second one, and generally good agreement between Novice and Experienced drivers about all three events. Figure 2(b) shows the mean duration of the ongoing fixation on the same scale as Fig. 2(a). The three dangerous events are all clearly characterised by long fixation durations, and the overall difference between groups with Novices having longer fixation durations than Experienced drivers is also relatively clear. Note that this difference appears to be present both while the dangerous events are in progress and at other times. Figure 2(c) shows the mean distance of the preceding saccade on a moment-by-moment basis. It can be seen that long fixation durations, and to a lesser extent dangerous events, are characterised by relatively short preceding saccade distances. Figure 3 shows the equivalent data for a contrasting film, number 13. This film shows driving along an almost deserted rural dual-carriageway containing just one dangerous event. This occurs when the driver of a lorry parked at the side of the road opens the door just as the driver is about to pass the lorry. This event occurs 17 seconds into the clip. Figure 3(a) shows that this event is readily detected by both groups of subjects but that subjects are often aware of the hazard potential considerably sooner. The parked lorry first becomes visible some eight seconds earlier and both groups of subjects show an increased rate of responding from this time. Figure 3(b) shows that the mean fixation durations, already relatively long compared with those shown in Fig. 2(b) increase further while the hazard is present. For this film there additionally appears to be an interaction between the fixation durations of the two groups over time. Until the lorry first appears in the scene the Novices have the longest fixation durations but the Experienced drivers have longer fixation durations while the hazard is visible. Figure 3(c) shows the same relationship with saccade distances as previously, with saccade distances being reduced while the hazard is present. While Figs. 2 and 3 do show the pattern of data over time it is extremely difficult to interpret this without a complete description of the visual scene available at each moment. To aid in the interpretation of such data a program (ELLIPSES) has been
Search of dynamic scenes
383
Fig. 2. Measures calculated ten times a second throughout film number 5: (a) the mean number of button responses per subject per second; (b) the mean duration of the current fixation for each tenth of a second; (c) the mean distance of the preceding saccade. The vertical arrows at 4.0 and 22.7 seconds indicate the moments displayed in Colour Plates l(a) and l(b), respectively.
384
P.R. Chapman & G. Underwood
Fig. 3. Measures calculated ten times a second throughout film number 13: (a) the mean number of button responses per subject per second; (b) the mean duration of the current fixation for each tenth of a second; (c) the mean distance of the preceding saccade. The vertical arrows at 2.9 and 11.8 seconds indicate the moments displayed in Colour Plates 2(a) and 2(b), respectively.
Search of dynamic scenes
385
written which allows eye movement data from the two groups of subjects to be played back in conjunction with the original film. Every hundred milliseconds the mean fixation location, the final duration of the ongoing fixation, and the length of the last saccade are calculated for each subject. Group means and standard deviations are then calculated from these values. For each group the program plots an ellipse ten times a second which is centred on the mean fixation location for the appropriate group of subjects. Differences in the locations of the two ellipses thus indicate that Novice and Experienced subjects are fixating different objects at a particular moment. The height of the ellipse at that moment is the current standard deviation of vertical fixation locations for that group of subjects, and the width is the group standard deviation of horizontal fixation locations. A wide ellipse thus indicates large variance between subjects in the horizontal axis in fixation locations, while a tall one indicates large variance in the vertical axis. Very small ellipses indicate extremely good agreement between subjects in which aspects of the scene to fixate. The program can also simultaneously display charts of ongoing fixation durations and button presses for each group if required. Colour Plates 1 and 2 show sample output from the ELLIPSES program at particular instances from the films whose data appear in Figs. 2 and 3. Colour Plate l(a) is taken 4.0 seconds into film number 5. Although this depicts a busy urban road there are no specific hazards visible and this is reflected in the fact that very few subjects pressed the response button at this time (see Fig. 2a). The large sizes of the ellipses demonstrate that there was relatively little agreement between subjects in fixation locations. There were no significant differences between the behaviour of the two groups of subjects at this moment. Colour Plate l(b) is taken 22.7 seconds into the same film. A pedestrian is visible standing by a vehicle on the left hand side of the road, he has been visible in this location for approximately 2 seconds and is just about to cross the road from left to right. At this moment the Novices (represented by the red ellipse) were looking significantly further to the right than the experienced drivers, t(64) = 2.05, p < 0.05, their previous saccade distance was marginally longer, t(64) - 1.73, p < 0.10, and they were significantly more likely to be pressing the response button, p = 0.02 (Fisher's exact test). From the shapes of the ellipses in Fig. 4 it can be seen that the experienced drivers show less variability between subjects in their gaze locations at that moment but that this is true only for the horizontal axis. Both groups of drivers are fixating a point close to the focus of expansion in the scene, however, the experienced drivers are clearly more likely to be looking at the pedestrian in question. It can be seen from Fig. 2(a) that although the novices are significantly more likely to be making button responses at this moment, the difference can largely be attributed to the experienced drivers already having responded before the pedestrian begins to move. Colour Plate 2(a) is a frame recorded 2.9 seconds into film number 13. The precise moment is marked on Fig. 3. At this moment there were no specific hazards
386
P.R, Chapman & G. Underwood
Colour Plate 1. Horizontal and vertical fixation locations and standard deviations for Novice (red) and Experienced (blue) subjects represented as ellipses (see text). The frame in l(a) represents a moment 4.0 seconds into film 5, before any specific hazards are present, while l(b) shows a frame from later in the same film where a pedestrian is visible in the roadway ahead.
Search of dynamic scenes
387
Colour Plate 2. Horizontal and vertical fixation locations and standard deviations for Novice (red) and Experienced (blue) subjects represented as ellipses (see text). The frame in 2(a) represents a moment 2.9 seconds into film 13, before any specific hazards are present, while 2(b) shows a frame from later in the same film where a lorry is parked on the road edge ahead.
388
P.R. Chapman & G. Underwood
visible and no significant differences between the two groups of drivers on eye movement measures. Colour Plate 2(b) shows a moment 11.8 seconds into the same film. Here a stationary lorry has become visible on the left hand side of the road. At this moment Experienced drivers were looking significantly further to the left than the Novices, f(58) = 2.45, p < 0.05, and marginally lower down the screen, f(58) = 1.73, p < 0.10. As can be seen from the shape of the ellipses, there was less variance between the experienced drivers in their fixation locations, particularly in the vertical axis. They were also significantly more likely to be pressing the response button, p = 0.03 (Fisher's exact test), and had significantly longer durations for the current fixation, f(58) = 2.36, p < 0.05. There were no significant differences in the length of the previous saccade at that moment. Discussion The one overall finding of a significant group difference between novice and experienced drivers on mean duration of fixations confirms the tentative findings reported by a number of authors in different contexts. Mourant and Rockwell (1972) thus reported a decrease in the frequency of pursuit movements as a function of experience. We have classified these as fixations and in Mourant and Rockwell's analysis they represent relatively long fixation durations (over 400 ms). Cohen and Studach (1977) similarly found that experienced drivers had significantly shorter fixation durations than novices, but only for curves to the right. Miltenburg and Kuiken (1990) predicted that experienced drivers would have shorter fixation durations than novices, but only found the effect unambiguously for one of their six films. The findings from our study suggest that this is a general pattern of behaviour that can be observed when fixation data are averaged over most sufficiently long events. Miltenburg and Kuiken's interpretation of the decreased fixations durations was that experienced drivers already have relevant schemata to deal with the situation and hence have to spend less time abstracting information from the scene. This approach is very much in line with the work of Theeuwes (e.g. 1996) and seems to provide a good explanation for the general effect. However there are clear limitations to this as an explanation for changes in fixation durations in individual subjects, both within an individual film and particularly across films in different traffic environments. The dimensionality of our stimuli based on fixation durations appears to be much like that of Riemersma (1988) and Groeger and Chapman (1996) with our dimensions apparently corresponding to the rural, suburban, urban distinction and the number of dangerous events present per unit time. These are similar to Riemersma's first two factors of safety, and rural versus urban, and to Groeger and Chapman's factors of danger and difficulty. However, the limited number of stimuli employed
Search of dynamic scenes
389
makes it unreasonable to assume that this represents an adequate basis for the classification of all such stimuli, clearly the precise choice of stimuli and similarity measures will to large degree determine the structures obtained (cf. Steyvers, 1993). For our task the rural, suburban, urban distinction is particularly clear in that it is related directly to mean fixation durations — for both Novice and Experienced drivers fixation durations are longest on rural roads and shortest on urban ones. This is consistent with the results reported in actual driving by Underwood et al. (1997) and with the general finding that increasing the complexity of the visual scene increases the number of eye movements made and decreases the mean fixation durations on individual objects (Erikson and Horberg, 1980; Luoma, 1986; Miura, 1990; Robinson et al., 1972; Rutley and Mace, 1968). What is perhaps surprising in our results is the high positive correlation between responses indicating danger in the film and fixation durations. This result is clear in almost all of the films we used and represents a frequent tendency for subjects to fixate and track hazardous objects at length. It demonstrates that the strategy of making multiple short fixations in complex visual environments is not obligatory and can be easily modified to cope with unusual circumstances even by novice drivers. It should be noted that we found no overall differences between the Novice and Experienced drivers in either the overall fixation locations or the variances of locations, this lack of differences confirms the results of the similar study by Miltenburg and Kuiken (1990). Underwood et al. (1997) did find differences between Novice and Experienced drivers in variance of gaze angle but only for driving on a busy dual carriageway with slip roads entering from either side. Our stimulus set unfortunately contained no comparable situations. There are also important caveats to be made about the current study and that by Miltenburg and Kuiken. Firstly, in both cases the novice groups were qualified drivers so will have already mastered most of the skills required in everyday driving. Secondly, these studies both involved the watching of videos of traffic scenes, thus involved no vehicle control and allowed no use of mirrors. These studies thus differ considerably from those that have previously demonstrated differences between Novice and Experienced drivers in visual behaviour (e.g., Cohen and Studach, 1977; Mourant and Rockwell, 1972; Mourant and Donohue, 1977), but they nonetheless suggest that we should be cautious in assuming that such differences are widely present in everyday driving. The interpretation of the data from the current study is by no means complete yet, there are a number of questions that remain unanswered. A clear limitation to the kinds of analyses we have performed so far is that they ignore sequential dependencies in our data, a feature of eye movements in driving that may prove extremely revealing (Liu, Veltri and Pentland, 1997; McDowell and Rockwell, 1978). Where two parts of the scene are being alternately fixated by subjects, our analyses may interpret an extremely consistent pattern as in fact demonstrating great between
390
P.R. Chapman & G. Underwood
subjects variance. Similarly, the use of moment by moment correlations in the analysis of our data assumes that the relationships between measures are in fact simultaneous. Thus if, for example, hazardous events were followed by changes in fixation duration after a finite time lag we would not detect this in the correlations. A more significant limitation in the reported results is demonstrated in Colour Plates l(b) and 2(b). These show two moments in the viewing of films during which there are significant differences between Novice and Experienced drivers in their visual search. Unfortunately, while there are numerous such differences throughout the films used, they do not fall into any obvious pattern. This does not necessarily imply that no such pattern exists. The reason it is not obvious may be that we possess no simple way of describing driving situations in terms of the visual features present. One possible avenue for further analysis may be to attempt some such description of the events within our stimuli based on the observed patterns of visual search in the same way that it was possible to categorise differences between stimuli. Until we have such a 'syntax' for interpreting driving situations it may prove impossible to generalise results from any study to specific new situations either on the road or presented in the laboratory. We started this chapter by proposing four questions raised by common advice on visual search given to people learning to drive (Miller and Stacey, 1995). The first of these was whether there exists good experimental evidence to support general differences between novice and experienced drivers in their visual search. The conclusion from this study is that apart from a difference in fixation durations, with novices having longer durations, general differences in visual search strategies have not been clearly demonstrated. We nonetheless suspect that important differences between the groups do exist in visual search and that fixation durations are simply the one measure which shows reliable differences even when aggregated across subjects and scenarios. This is linked to the second and third questions in that where differences are observed they are not consistent across all types of road environment, but depend on the precise characteristics of the ongoing situation, and that some of the larger differences in search strategy may be present only in the very earliest stages of learning to drive during which car control skills are still being acquired. The final question was how to define a "point of interest" or "areas of risk" in everyday driving situations. As yet we have no answer to this question. Our study measured moments of risk in terms of subjective responses to the situation. Such an approach could be extended by the use of groups of expert or experienced drivers giving their judgements. However, to completely understand these judgements in terms of the visual characteristics of the environment at each moment is a much more complex undertaking which would require a highly developed 'syntax' to be developed for the description of driving situations.
Search of dynamic scenes
391
References Brown, I.D. (1982). Exposure and experience are a confounded nuisance in research on driver behaviour. Ergonomics, 14, 345-352. Brown, I.D. and Groeger, J.A. (1988). Risk perception and decision taking during the transition between novice and experienced driver status. Ergonomics, 31, 585-597. Cohen, A.S. (1981). Car drivers' pattern of eye fixations on the road and in the laboratory. Perceptual and Motor Skills, 52, 515-522. Cohen, A.S. and Studach, H. (1977). Eye movements while driving cars around curves. Perceptual and Motor Skills, 44, 683-689. Blander, J., West, R. and French, D. (1993). Behavioral correlates of individual differences in road-traffic crash risk: An examination of methods and findings. Psychological Bulletin, 113,279-294. Erikson, B. and Horberg, U. (1980). Eye movements of drivers in urban traffic. Uppsala Psychological Reports 283. University of Uppsala: Sweden. Evans, L. (1991). Traffic Safety and the Driver. New York: Van Nostrand Reinhold. Forsyth, E. (1992). Cohort study of learner and novice drivers. Part 2: Attitudes, opinions and the development of driving skills in the first 2 years. Department of Transport TRL Research Report 372. Crowthorne, UK: Transport Research Laboratory. Forsyth, E., Maycock, G. and Sexton, B. (1995). Cohort study of learner and novice drivers: Part 3, Accidents, offences and driving experience in the first three years of driving. Department of Transport TRL Project Report 111. Crowthorne, UK: Transport Research Laboratory. Groeger, J.A. and Chapman, P.R. (1992). Developing an understanding of danger: Contributions of experience and age. In: G.B. Grayson (Ed.), Behavioural Research in Road Safety II. Crowthorne, UK: Transport Research Laboratory, pp. 37-43. Groeger, J.A. and Chapman, P.R. (1996). Judgement of traffic scenes: The role of danger and difficulty. Applied Cognitive Psychology, 10, 349-364. Groeger, J.A. and Clegg, B.A. (1995). Novice drivers' judgements of traffic scenes. In: G.B. Grayson (Ed.) Behavioural Research in Road Safety V, Crowthorne, UK: Transport Research Laboratory, pp. 121-127. Helander, M. and Soderberg, S. (1972). Driver visual behavior and electrodermal response during highway driving. Goteborg Psychological Reports, 2, 4. Hughes, P.K. and Cole, B.L. (1986a). What attracts attention when driving? Ergonomics, 29, 377-391. Hughes, P.K. and Cole, B.L. (1986b). Can the conspicuity of objects be predicted from laboratory experiments?Ergonomics, 29, 1097-1 111. Land, M.F. and Horwood, J. (1995). Which parts of the road guide steering? Nature, 377, 339-340. Land, M.F. and Lee, D.N. (1994). Where we look when we steer. Nature, 369, 742-744. Lester, J. (1991). Individual differences in accident liability: A review of the literature. TRRL Research Report 306: Crowthorne, U.K.: Transport and Road Research Laboratory. Liu, A., Veltri, L. and Pentland, A.P. (1997). Modeling changes in eye fixation patterns while driving. In: A.G. Gale et al. (Eds.), Vision in Vehicles 6, Amsterdam: Elsevier.
392
P.R. Chapman & G. Underwood
Luoma, J. (1986). The acquisition of visual information by the driver: Interaction of relevant and irrelevant information. Reports from Liikenneturva 32/1986. Helsinki, Finland: Central Organization for Traffic Safety. Maycock, G., Lockwood, C.R. and Lester, J. (1991). The accident liability of car drivers. Department of Transport TRL Research Report 315. Crowthorne, UK: Transport Research Laboratory. McDowell, E.D. and Rockwell, T.H. (1978). An exploratory investigation of the stochastic nature of the drivers' eye movements and their relationship to the roadway geometry. In: Senders, Fisher and Monty (Eds.), Eye Movements and the Higher Psychological Functions. Hillsdale: Erlbaum. McKenna, P.P. and Crick, J.L. (1994). Hazard perception in drivers: A methodology for testing and training. TRL Contractor Report 313. Crowthorne, U.K.: Transport Research Laboratory. McLean, J.R. and Hoffmann, E.R. (1971). Analysis of drivers' control movements. Human Factors, 13,407-418. McLean, J.R. and Hoffmann, E.R. (1973). The effects of restricted preview on driver steering control and performance. Human Factors, 15,421-430. Miller, J.M. and Stacey, M. (1995). The Driving Instructor's Handbook (8th Edn.). London: Kogan Page. Miltenburg, P.G.M. and Kuiken, M.J. (1990). The effect of driving experience on visual search strategies: Results of a laboratory experiment. Haren, The Netherlands: Traffic Research Centre, University of Groningen. Miura, T. (1990). Active function of eye movement and useful field of view in a realistic setting. In: R. Groner, G. d'Ydewalle, R. Parham (Eds.), From Eye to Mind: Information Acquisition in Perception, Search and Reading. Amsterdam: Elsevier, pp. 119-127. Mourant, R.R. and Donohue, R.J. (1977). Acquisition of indirect vision information by novice, experienced, and mature drivers. Journal of Safety Research, 9, 39-46. Mourant, R.R. and Rockwell, T.H. (1970). Mapping eye-movement patterns to the visual scene in driving: An exploratory study. Human Factors, 12, 81-87. Mourant, R.R. and Rockwell, T.H. (1972). Strategies of visual search by novice and experienced drivers. Human Factors, 14, 325-335. Pelz, D.C. and Krupat, E. (1974). Caution profile and driving record of undergraduate males. Accident Analysis and Prevention, 6,45-58. Quimby, A.R., Maycock, G., Carter, I.D., Dixon, R. and Wall, J.G. (1986). Perceptual abilities of accident involved drivers. TRRL Research Report 27. Crowthorne, U.K.: Transport and Road Research Laboratory. Quimby, A.R. and Watts, G.R. (1981). Human factors and driving performance. TRRL Supplementary Report 718. Crowthorne, U.K.: Transport and Road Research Laboratory. Riemersma, J.B.J. (1988). An empirical study of subjective road categorization. Ergonomics, 31, 621-630. Robinson, G.H., Erickson, D.J., Thurston, G.L. and Clark, R.L. (1972). Visual search by automobile drivers. Human Factors, 14, 315-323. Rutley, K.S. and Mace, D.G.W. (1968). A preliminary investigation into the frequency of driver motor actions and eye movements. RRL Report LR 162. Crowthorne, U.K.: Road Research Laboratory.
Search of dynamic scenes
393
Sabey, B.E. and Taylor, H. (1980). The known risks we run: The highway. TRRL Supplementary Report 567. Crowthorne, U.K.: Transport and Road Research Laboratory. Shinar, D., McDowell, E.D. and Rockwell T.H. (1977). Eye movements in curve negotiation. Human Factors, 19, 63-71. Smiley, A., Reid, L. and Eraser, M. (1980). Changes in driver steering control with learning. Human Factors, 22,401-415. Staplin, L. (1995). Simulator and field measures of driver age differences in left-turn gap judgments. Transportation Research Record, 1485,49-55. Steyvers J.J.M. (1993). The measurement of road environment appreciation with a multiscale construct list. In: A.G. Gale et al. (Eds.), Vision in Vehicles 4. Amsterdam: Else vier, pp. 203-212.. Theeuwes, J. (1996). Visual search at intersections: An eye-movement analysis. In: A.G. Gale et al. (Eds.), Vision in Vehicles 5. Amsterdam: Elsevier, pp. 125-134. Theeuwes, J. and Hagenzieker, M.P. (1993). Visual search of traffic scenes: On the effect of location expectations. In: A.G. Gale et al. (Eds.), Vision in Vehicles 4. Amsterdam: Elsevier, pp. 149-158. Underwood, G., Crundall, D.E. and Chapman, P.R. (1997). Visual attention while performing driving and driving-related tasks. In: G.B. Grayson (Ed.), Behavioural Research in Road Safety 7. Crowthorne, U.K.: Transport Research Laboratory. Zwahlen, H.T. (1993). Eye scanning rules for drivers: How do they compare with actual observed eye scanning behavior? Transportation Research Record, 1403, 14-22.
This page intentionally left blank
395
CHAPTER 18
How Much Do Novice Drivers See? The Effects of Demand on Visual Search Strategies in Novice and Experienced Drivers David E. Crundall, Geoffrey Underwood and Peter R. Chapman University of Nottingham
Abstract Varying levels of visual and cognitive demand produce different visual search strategies. These effects differentiate between drivers on the basis of experience. Previous studies are reviewed with the aim of identifying a process which may account for the effects of changes in demand according to driver experience. One possible theory is that of perceptual narrowing, which suggests that the usable field of view shrinks with an increase in demands at the point of fixation. This theory is discussed in relation to novice and experienced drivers and a new methodology is put forward to test for such differences.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
396
D.E. Crundall, G. Underwood & P.R. Chapman
Introduction This chapter discusses the effects of cognitive and visual demand upon visual search strategies during driving, and whether they can be used to differentiate between novices who have recently passed their test and more experienced drivers. If novices are placed under higher demands than experienced drivers due to the novelty of stimuli, lack of automatised sub-routines, or the absence of relevant schemas, then this would compound any increases in the general level of demand. Assuming a limited capacity of attention, it may occur that novices' attentional limitations (and the resultant consequences such as a link with increased accident liability), only become apparent when the task becomes particularly demanding. If this is the case, then what process could underlie this effect, and what could be done, short of giving novice drivers the experience they need, to redress the matter? The effects of increased demands on visual search strategies in drivers Before addressing the question of what effect increased demand has on a driving task, it may be useful to draw analogies from areas of research more accessible to experimentation. In reading for example it is a widely reported effect that unfamiliar words require longer fixation durations than common words (Rayner and Polletsek, 1989). The explanation for this is that an unfamiliar word requires more processing, and within certain limitations, the measure of fixation duration is considered to reflect object identification time according to the eye-mind assumption (Henderson, Polletsek and Rayner 1987, Underwood and Everett, 1992). A similar effect can be noted in laboratory based driving tasks. Underwood, Crundall and Chapman (1997) asked novice and experienced drivers to watch a series of video clips taken from the drivers perspective and to respond to potential hazards by pressing a button. This hazard perception paradigm is discussed in more detail elsewhere in this book (see Chapter 17). Regardless of driving experience, it was found that fixation durations on the cause of any potential hazard (such as a car suddenly emerging from a side road) were greatly in excess of mean fixation durations for the rest of the scene. In essence, the emergent car in this example takes the role of the unfamiliar word in reading studies: it requires more processing than usual, and thus attracts increased fixation durations. The transition from studying how people read to how people view moving scenes is not an easy one. Sentences have an agreed grammar which allows parsing. The hazard perception clips used in the above study have a structure which is just as complex as language though there is no current agreement on how to parse them meaningfully. We do not know what are the crucial elements which divide clips. With this in mind consider the problems involved with understanding vision during
Demand and visual search
397
real driving. How should we define demand and what constitutes an increase in the level of it. A fundamental problem in defining demand is the confounding of increases in visual demand, such as an increase in visual clutter or complexity, and increases in cognitive demand, such as an increase in the processing demands of a particular stimulus perhaps due to an increase in its relevance to a current context (Williams, 1982). Visual demands are closely related with task demands, for as the dominant sensory modality in driving is vision, an increase in task difficulty will usually coincide with an increase in the complexity of the visual scene. In an ideal situation one could provide subjects with the same visual stimuli in two conditions, yet the information is only of relevance in one of the conditions. This would hold the visual demands constant, while varying the cognitive demands which would be reflected in the processing times of the stimuli. Alternatively adding visual clutter while holding the meaningfulness of the display constant would provide the opposite manipulation. Unfortunately, as stimuli become more realistic, the two become harder to separate as we shall see later. We will argue in the following section that though the two types of demand (visual/task demands and cognitive demands) are rarely fully separated, the majority of driving research has focused on task demands which are predominantly related to visual complexity, rather than cognitive demands.
Previous manipulations of demand in driving research One problem in attempting to manipulate the level of demand in an experiment is the identification of a suitable independent variable. There is a lack of consistency in the relevant literature in the adoption of a demand manipulation, with the result that it is very hard to compare across studies. The one common feature that the majority of these studies share however is that their demand manipulation is concerned more with the task demands of factors such as road geometry or traffic density (both of which increase visual complexity), rather than assessing the cognitive demands placed on the subject. Despite the disparity between the factors chosen to represent demand on the road, the following discussion highlights the consistent results that an increase in the complexity of the driving task (and thus an increase in the visual complexity) tends to increase one's active search of the scene, producing a wider spread of search and an increased sampling rate. This may initially seem to weaken our analogy with reading where infrequently used words tend to capture attention for longer than normal, but let us look at the results of these studies before drawing any conclusions. The use of road geometry as a demand manipulation has focused mainly on how visual search strategies differ between driving along straight roads or when driving through curves. Shinar, McDowell and Rockwell (1977) were the first to note that
398
D.E. Crundall, G. Underwood & P.R. Chapman
the increased processing demands associated with the negotiation of a curve were related to a more active visual search pattern, as compared to observations on a straight road. The increase in demand occurs due to a shift in the loci of important, visual information sources. Fry (1968) suggested that the focus of expansion is the most important point of information for driving as it maximises preview time for objects directly in the path of travel. Evidence confirms that experienced drivers tend to fixate close to the focus of expansion, while information concerning lane maintenance is obtained through peripheral vision from near the car (Land and Horwood, 1995). However, when driving through a curve the focus of expansion becomes less important for direction as the car's immediate heading is offset from the expansion point. Lane maintenance also becomes more difficult: a curve is rarely of constant arc and this necessitates constant monitoring of one's position in relation to the edge of the curve. The increased importance of road markings for lane maintenance, and the corresponding decrease in the importance of the expansion point create a more dynamic visual search pattern. Shinar et al. (1977) found the subjects tended to switch rapidly between fixating the road ahead for long-term directional information, and fixating the road edge or lane markings in order to stay within their lane. To accommodate the increased number of fixations on the road markers, fixation durations decrease during curve negotiation. Shinar et al. suggested that the visual processing of a high speed curve during driving suggests that the subjects were collapsing a two process system (directional information from foveating the focus of expansion, and lane maintenance information through peripheral vision) into one, where the fovea is attention switching between the two sources of information. Zwahlen (1993) also found curve negotiation to involve a more active search strategy than on straights. He noted that fixation durations were markedly shorter in the curve, and equated this finding with the American Automobile Association's "brief glance technique" where drivers are advised to keep fixations short in order to avoid "captured attention". As mentioned earlier, within the laboratory the appearance of an unfamiliar word, or a potential hazard on a video clip of driving, tends to concentrate one's attention. Conversely, the increased processing demand involved in curve negotiation decreases fixation durations on the real road. A further problem is that not all the evidence points toward a more active search strategy on curves than straights. Liu, Veltri and Pentland (1997) have provided contradictory evidence. They utilised a first order Markov matrix to analyse drivers scan paths on straights and curves and discovered two distinct scan patterns on straights: a 'preview' search (from the middle preview distance down the road to either the near or far preview distances with equal probability, and then back to the middle), and a 'side-to-side' search (from the next road segment which is very far away, to either side with equal probability, and then back to the middle). During curve negotiation, however, though they identified the side-to-side pattern, they failed to find the preview
Demand and visual search
399
pattern. In this study, the search strategy became less dynamic under the increased demand of the curve, though it could still be feasible that the number of fixations increases (while the durations decrease) within the smaller confines of a reduced search space. Another measure of cognitive demand that has been used is traffic density. As traffic increases, so does the danger of any driving situation up to the point of traffic congestion. Rahimi, Briggs and Thorn (1990) looked at eye and head movements of a driver at two American intersections, one busy and one quiet. The subject performed 20 left turns (crossing the line of traffic) at each junction alternately, while head and eye movements were recorded via video. They found that the busy intersection produced more fixations than the quiet junction, which suggests a corresponding reduction in fixation durations as demand increased. There is also evidence that the proximity of other vehicles may affect the visual search patterns of drivers. The work of Hella, Laya and Neboit (1996) suggests that the closer one is to the car in front the shorter one's fixation durations upon it become, though there is a corresponding increase in the total number of fixations upon it. They discovered this by comparing the eye movements of drivers on a three lane motorway. Interestingly, they did not discover any visual search changes due to the speed of the car (which was dictated in part by the lane they were in at the time). The decreased fixation durations support the suggestion that as drivers find the task demands and visual complexity increasing they respond by increasing the sampling rate of the scene. Miura (1979) used four separate levels of task demand to investigate fixation durations. These were stable running, passing parked vehicles, entering into a narrower route, and overtaking. He found that entering the narrower route and the act of overtaking significantly reduced mean fixation durations. This mirrors the results of studies of curves and intersections. Despite the lack of consistency in the manipulations of demand, it seems fairly well documented that general increases in task demands and visual complexity tend to reduce mean fixation durations and increase the sampling rate. Though this reveals the limits of our reading analogy it does not mean that the effects of increased fixation durations upon hazards viewed in a hazard perception test (Underwood, Crundall and Chapman, 1997) are not generalisable to the real road. The general increase in the task and visual demands of driving through a curve may necessitate an increase in the sampling rate, though the demand of a particular stimulus with high priority and a defined locus (i.e., a hazard) may still capture attention. This paradox has been noted in a study by Chapman and Underwood (this volume). They discovered that though subjects tended to increase fixation durations upon hazards relative to the overall scene, the mean fixation durations for each whole clip varied according to the visual complexity of the roads viewed. Cluttered urban and suburban roads produced shorter fixation durations than the empty, rural road clips. This difference between the two effects is probably due to the localisation
400
D.E. Crundall, G. Underwood & P.R. Chapman
of the hazard stimuli compared to the diffused nature of increased visual complexity. It is the difference between a solitary car suddenly pulling out from a side road on to an otherwise empty road, and a busy road where any one of a number of vehicles is a potential hazard. The former captures attention whereas the latter requires an increased search in order to monitor all potential threats. The former is more akin to cognitive demands in that the saliency of a particular stimulus has increased and requires further processing, indicated by increased fixation durations, while the latter reflects an increase in visual complexity. Are novices more susceptible to high demands than experienced drivers? The hypothesised differences between novices and experienced drivers due to increases in cognitive demand concern the processing times involved in identifying particular stimuli such as hazards, and other measures of perceptual input such as the spread of search and sampling rate when the task is held constant, which relate to attentional capacity. We shall discuss these capacity related problems first, before addressing the issues of visual and task demands. Novice drivers are likely to encounter capacity problems with attention more often and more severely than experienced drivers. Recently licensed drivers will have no doubt gained experience on actual roads though there will still remain much which will be novel. Faced with new stimuli a novice driver may take longer to process it in the same way that an infrequent word will tax a novice reader more than an experienced reader. In addition, depending of the amount of practice they have received, they still may have to automise certain sub-routines of the driving task. One such task which is widely believed to be automatic is that of changing gear. Novices have been noted as being slower gear changers than more experienced drivers (Duncan, Williams and Brown, 1991), which suggests a failure to completely automatise the task. One of the benefits of automatising this is that the task will no longer need attention. The experienced driver can then allocate all attention to other matters, while the novice drivers may still have to apportion some to gear changing. This should not be a problem when cognitive demands on the driver are low, but as demand increases the novices may suffer a degradation of either the gear changing or the other tasks which are competing for attention. The most cited studies of novice and experienced drivers are those of Mourant and Rockwell (1970,1972). They found that novices have an increased frequency of pursuit tracking eye movements. These are fixations where either the stimulus or the viewer is moving. In order to maintain the stimulus in the same place on the retina, the eye must move to accommodate for other movements in the scene. A traffic sign may first be fixated at the focus of expansion but as one gets nearer to it, the sign will be displaced upwards and to the left of the visual field (on British roads). In order to
Demand and visual search
401
remain fixated on it for any length of time one must move the eyes with the optic flow rate of the sign. The increase in frequency of pursuit fixations found by Mourant and Rockwell (1972), suggests that the novices overall fixation durations were greater than those of the experienced drivers who did not linger on objects outside the focus of expansion long enough to significantly increase their amount of pursuit tracking. This higher level of fixation duration may reflect increased processing times, and attentional capacity limitations. They also discovered that novices tended to look in their mirrors less and at lane markings more than experienced drivers, and that they searched an area of the road ahead which was closer to the car. It was further noted that the spread of search along the horizontal axis was more compact than that produced by the experienced drivers. Similarly Renge (1980) identified a tendency for novices to search predominantly in the vertical plane. He asked subjects to verbalise what they were looking at while driving and noted the pattern of verbalisation was consistent with a vertically based search strategy. The reduction in horizontal scanning and high level of lane marker fixations has also been recorded after alcohol consumption (Mortimar and Jorgeson, 1972) and for drivers suffering from fatigue (Kaluger and Smith, 1970). Both alcohol and fatigue are considered to decrease attentional resources, which provides further support that the novice drivers may be suffering from a competition for resources. The majority of these findings can be explained in terms of the attentional allocation problems which novices may suffer from (Underwood, Crundall and Chapman, 1997). The smaller search area of novices may reflect an attempt to reduce perceptual input, as may the reduction in mirror checks. As previously mentioned, increased fixation durations suggest increased processing time which in turn may cause the driver to reduce the size of the visual search pattern in an attempt to avoid overloading. Underwood et al. (1997) also noted that novices tend to have longer fixation durations on hazards than experienced drivers. With all these studies the tasks were held constant between subjects (as much as is possible with on-road studies), and therefore can be considered more related to the differing levels of cognitive demand placed on novice and experienced drivers, than due to changes in visual complexity and task demand. The next section considers differences between novice and experienced drivers due to changes in visual demands.
An investigation into the effects of experience on different roadways A further problem for novice drivers which is not directly related to attentional capacity problems, is the possible lack of schemas for certain road situations, or the use of inappropriate schemas. Earlier in this chapter, examples of task demand studies were reported, which demonstrated differences in drivers' responses to differing road geometries, levels of traffic density, and types of manoeuvre. In an
402
D.E. Crundall, G. Underwood & P.R. Chapman
attempt to identify hypothesised differences between novice and experienced drivers' search strategies due to task and visual differences, Underwood et al. (1997) analysed the search strategies of 32 subjects while driving along roads of varying complexity in a real traffic situation. Half of these subjects had passed their driving test within three months of the study while the other half had at least five years experience. From a 20-minute drive three one-minute windows were selected to reflect differing levels of demand based upon the type of road they were on. Each window started from the same geographic spot. The roads used were a single lane, rural road with good visibility ahead, a single lane suburban road through a small village (which included shops, parked cars and pedestrians), and a dual carriageway which was joined by two slip roads, one from the left and one from the right. The main measures that were analysed from the windows were fixation duration, and variance of fixation locations along the vertical and horizontal axes. The latter was to provide information concerning the spread of search while the former was intended to gauge the sampling rate of the subject. Fixation durations produced a significant interaction with novices producing their longest fixation durations on the dual carriageway while experienced drivers tended to display longer fixations on the rural road (see Fig. 1). It was only upon the suburban route that both groups of subjects agreed with the use of short fixations. If one accepts the experienced drivers' strategy as the correct one, then their fixation durations do not increase with visual demand, for the rural road was considered to be the least demanding of the three road types. It had less traffic, less visual clutter, and only one lane. The speed restriction was 60 mph, though the evidence for the effect of speed on fixation durations is inconsistent (McDowell and Rockwell, 1978; Cohen, 1981) and there are some studies which suggest a very limited effect of speed on visual demand (Miura, 1985; Hella, Lay a and Neboit, 1996). It seemed that the experienced drivers decreased their fixation durations on the busier routes, with the cluttered suburban route through a village producing the shortest durations (and therefore the greatest sampling rate). The number of fixations varied accordingly with the durations, which supported the theory of the sampling rate increasing as the complexity of the scene increased. The novices however produced the longest mean fixation durations on the dual carriageway. If one accepts that the increased traffic and danger of the additional lane makes the dual carriageway more demanding than the rural road, then the novices have responded inappropriately by reducing their sampling rate. The dual carriageway was most often the route where the subject had traffic ahead in the same lane. Mourant and Rockwell (1970) reported that novice drivers' fixation durations tend to increase when following another car, as opposed to Hella et al. (1996) who found that experienced drivers fixation durations decreased but the total time spent on the car ahead increased. This suggests that the car in front is as important to experienced drivers as it is to novices, though the experienced
Demand and visual search
403
Fig. 1. Three graphs which show how the eye movements of novice (N) and experienced (E) drivers vary across the road types. The measures taken are (a) mean fixation durations, (b) variance of fixation locations in the horizontal plane, and (c) variance of fixation locations in the vertical plane.
404
D.E. Crundall, G. Underwood & P.R. Chapman
drivers still manage to increase the sampling rate of other areas of the scene. One explanation could be that the novices suffered attentional capture by vehicles in front in the same way that subjects tend to fixate the hazards in hazard perception clips for longer (Underwood, Crundall and Chapman, 1997; and Chapter 17). The more experienced drivers however overcame this and maintained a high sampling rate. A further explanation stems from Mourant and Rockwell's (1972) finding that novices fixate lane markers more often when driving on a freeway. Such fixations made up 70% of the pursuit tracking that they noted in novices. As mentioned above, pursuit tracking fixations imply long fixation durations, which could account for the effect. A subsequent category analysis performed on our data produced evidence to suggest that novice and experienced drivers spend similar amounts of time looking at certain items within the visual field. This analysis was conducted on a subset of the original sample (five novices and five experienced drivers with the most accurately calibrated data) and compared the total time spent fixating the car in front and lane markings. No significant differences were found between the subject groups in regard to total time on lane markers, or on a followed vehicle on the dual carriageway. This finding reconciles the results of Mourant and Rockwell (1972) and Hella et al. (1996) in that total time dedicated to fixating certain stimuli such as the car in front tends to be the same across experience though the experienced drivers may still have a higher sampling rate of the scene. Comparison of the variance of fixation locations in the horizontal and vertical meridians also highlighted the dual carriageway as a main difference between the experienced and novice drivers' search strategies. Mean comparisons of the horizontal search interaction revealed that the experienced drivers increased their scanning in this meridian when on the dual carriageway. The other roads produced narrow, less dynamic search strategies. The novice measures of horizontal scanning were all similar to the experienced drivers' measures for the rural and suburban road. Analysis of fixation variance in the vertical axis did not produce a significant interaction though a main effect of road type was discovered. However, means comparisons of the levels of roadway found the spread of search for experienced drivers on the dual carriageway to be significantly different to the suburban and rural roads. Despite the lack of interaction there is a suggestion that experienced drivers increase their vertical search on the dual carriageway just as they increased their horizontal search. Whereas the novices tended to maintain a restricted horizontal search comparable to the experienced drivers' search on the rural and suburban roads, their vertical search is closer to the expanded scanning of the experienced drivers on the dual carriageway (see Fig. 1). There are several points which should be drawn from this research. First, one should note that the differing levels of demand that each roadway places upon the driver do produce changes in the relevant search strategies. The experienced drivers
Demand and visual search
405
behave according to the prior published results. Their fixation durations decrease on more demanding roads, and their search strategy widens (Mourant and Rockwell, 1970; McDowell and Rockwell, 1978; Shinar et al., 1978; Rahimi et al., 1990; Zwahlen, 1993). The difference between the suburban route and the dual carriageway is of interest, the latter producing the greater scanning and the former producing the lower fixation durations. It may be that both roads are considerably demanding, but the responses to such demands are different. A second point to note is the lack of flexibility of novices' scanning strategies across the road types. The experienced drivers increased their scanning behaviour in both meridians according to the road type, while the novices maintained one level of scanning throughout. In their analysis of curve negotiation Shinar et al. (1978) reported that high levels of field dependence correspond with inflexible, narrow search strategies that are insensitive to increases in demand. A third important finding was the high level of vertical scanning and the low level of horizontal scanning produced by the novices. This fits with previous research which suggests that novice drivers require the experience which will sensitise them to the horizontal axis as the main source of information. The result that is hardest to explain is the short fixations on the suburban route, yet a failure in both groups of subjects to increase the search space. The dual carriageway received both short fixations and increased scanning. Perhaps, as mentioned above, the different demands of the roadways require different responses. Unfortunately the myriad of visual and task oriented factors which correspond to a particular roadway prevent anything but a coarse grain view of the visual demands. An alternative approach however is take subjective measurements of the levels of demand placed upon subjects. In a further follow up study 18 novices and 18 experienced drivers were asked to rate a set of video clips, taken from the roads where the eye movement data was recorded. The ratings were made on nine, seven-point Likert scales (e.g., How much risk would you have felt during that drive? How stressful would it be to be the driver in that drive? How hard would you need to concentrate to drive safely during that drive?) which loaded onto two constructs: Danger (How dangerous is this route at this particular time?) and Difficulty (How difficult would you find the route to drive at this particular time?). These two constructs were initially identified by Groeger and Chapman (1996) as useful in distinguishing between driver groups. Analysis of the results revealed that novices rated all the road types as both more dangerous and difficult than the experienced drivers. On the danger ratings, both groups of subjects rated the dual carriageway as dangerous as the suburban route, while the rural route was considered relatively safe. With the difficulty ratings however, the suburban road was given the highest score. These subjective scores can be related to the eye movement data. One could postulate that a need to increase the search area and sampling rate of a scene may be related more to the danger of the situation than the difficulty of the drive. An increase in the number of potentially
406
D.E. Crundall, G. Underwood & P.R. Chapman
hazardous stimuli would require a more active search strategy in order to monitor all the possible sources of danger. In support of this Beck and Emery (1985) believe that anxiety, or the unpredictability of events, produces in people a state of hypervigilence where search strategies become more active, and more of the environment is inspected in an attempt to locate any potentially dangerous stimuli. In times of difficulty however, evidence supports a concentration of attention in a few locations such as longer fixation durations on a unfamiliar word during reading. On the dual carriageway the experienced drivers had low fixation durations and a wide search pattern, perhaps due to the high level of danger. On the suburban route fixation durations were low also, but the search strategy was narrower. This may be due to difficulty of the suburban road. Though as dangerous as the dual carriageway, the suburb was considered more difficult and as such drivers may have constricted their search while maintaining short fixation durations due to the element of danger. One cannot infer that perceptions of danger or difficulty lead to particular search strategies, especially as the ratings were made by subjects who viewed the roads on video. If one assumes that novice and experienced drivers would use the same search pattern while driving on a road or watching it on video, one could validly suggest that the search strategies themselves may lead to particular perceptual ratings. Differences in search strategies across these roads may however be linked to the description of the road according to the two constructs. Though the two groups of drivers do not differ in their ordinal ratings of the roads, they may differ in how they react to that information. If danger and difficulty impose different demands upon the driver then one may predict this to differentiate between novices and experienced drivers. For example, novice drivers may have developed strategies or schemas for coping with demands linked to difficulty rather than danger. The possible relationship between these two constructs and the nature of the demands they place on the driver in regard to visual search strategies may provide an interesting avenue of research for the future.
What is the underlying process whereby demand modulates visual search? The evidence presented so far argues that the level of both cognitive and visual demand upon a driver will either constrain and direct visual search, or actively expand it. Whereas the effect of the visual and task demands on drivers are triggered externally, the cognitive demands depend on internally motivated factors in an individual's ability to process stimuli in an efficient manner. The effect of varying roadways on novices could be plausibly explained in terms of inappropriate schemas, though as yet there has been no suggestion as to the nature of the process which underlies the effects of cognitive demand upon the search strategies of novice and experienced drivers. In search of a possible contender to explain the effects of both
Demand and visual search
407
forms of demand let us turn away from foveal vision and instead explore the periphery. The term usable or functional field of view is used to describe the area of the visual field within which stimuli can be detected, and possibly processed to some extent. Engle (1971, 1974) proposed that though ultimately limited by the physiological boundaries of visual acuity, the area of peripheral vision available to analysis changes in size and shape according to circumstances. One such circumstance is cognitive load at the fovea. In a similar manner to the zoom lens model of attention (Erikson and Yeh, 1985; Erikson and St. James, 1986) the theory suggests that a high level of cognitive demand at the fovea should reduce the area of the usable field of view, as this allows limited resources to be concentrated upon the foveal region, increasing the resolving power. The reduction of the functional field of view due to increases in cognitive load is termed perceptual narrowing. If stimuli in the peripheral field are left outside the functional field of view as the attentional tide retreats, then preview benefits will be lost. Holmes, Cohen, Haith and Morrison (1977) discussed two models of perceptual narrowing: general interference and tunnel vision. The former is merely a general degrading of peripheral detection rates as cognitive load at the fovea increases, while the latter predicts an interaction with the eccentricity of the peripheral target. Tunnel vision suggests that as cognitive load increases, peripheral detection rates will suffer more at greater eccentricities. Evidence has been found for both models (Williams, 1982; Williams, 1985). Williams (1988) concluded that either model can be induced depending on the antecedent conditions. In order to invoke tunnel vision he suggested that three things were necessary: a high foveal load, an attentional strategy overtly focused on the central task, and speed stress. Regardless of which particular model one supports, the main prediction of this theory is that as a foveal load becomes more cognitively demanding, so less attention is given to peripheral items. Such an effect has been discovered in a number of areas of vision research. In reading it has been noted that fixation durations on words can be lengthened by placing increasingly unfamiliar words before the target. The suggestion is that the unfamiliar word reduces perceptual span and removes preview benefits for the subsequently fixated words (Rayner, 1986; Henderson and Ferreira, 1990). Identification of objects is also susceptible to perceptual narrowing. Reynolds (1993) found that errors identifying a peripheral target 4° from the fovea increased when a complex picture was displayed at the point of fixation rather than when a letter or geometric shape was presented instead. Williams' (1982) complaint applies to this study however. He noted "those few studies that have examined dual task performance within a single glance have intentionally manipulated the visual complexity of the foveal task or have confounded the visual and cognitive aspects of the foveal task" (p. 684).
408
D.E. Crundall, G. Underwood & P.R. Chapman
The loss of preview benefits with a reduction in the usable field of view In a recent study the authors attempted to circumvent this problem by using the same stimuli in both the high and low demand conditions. The study was designed to test the hypothesis that peripheral preview benefits could be removed by increasing the cognitive demands of a central stimulus while holding visual demands constant. Thirty subjects were given two blocks of trials on a visual discrimination task. Each of the presentations consisted of a central, red-bordered, triangular warning sign (which subtended 1°) with either a vowel or a consonant in it, and a peripheral red-bordered triangle (4.6° to either the right or left of centre) with either a staggered junction sign, or a right bend junction. The subject's task was to distinguish between the peripheral targets after making a saccade to it from the central sign. To ensure each subject was looking at the centre at the start of each presentation the trials were only presented when the computer was satisfied that the subjects were focused on a central cross. Two counterbalanced blocks were given to subjects consisting of 24 presentations, with the only difference between the two blocks being the instructions that subjects received. In the high demand block subjects were told to respond to the peripheral target only if the central letter was a vowel. In the low demand block subjects were told to ignore the central letter, and to decide on the peripheral target as soon as possible. The first measure that was analysed was saccade latency. This is the time taken to disengage from the central stimulus and to saccade to the peripheral target. In the low demand task the central stimulus did not hold any relevant information, though in the high demand task the same central stimulus had to be processed before saccading to the target. The increase in cognitive complexity between the tasks was reflected in a significant main effect of cognitive demand with saccade latencies for the high demand task greater by 327 ms on average (F(l,28) = 165.9, p < 0.01). Comparison of subjects' first fixation durations on the peripheral target produced a main effect of task demand (F(l,28) = 31.9, p < 0.01), with the high demand task attracting nearly 100 ms of attention more than the low demand task on average. This difference increases to an average of 180 ms when re-fixations are included in a measure of total gaze duration on target (F(l ,28) = 18.0, p < 0.01). These differences can be viewed in Fig. 2. Other measures included analysis of saccade inaccuracy (distance from the target after the first saccade) and the duration of any fixations which fell short of the target, but these failed to reveal any significant differences. Reaction times for the discrimination task also showed a main effect of demand (F(l,28) = 90.61, p < 0.01), with targets in the high demand condition taking an extra 664 ms to respond to on average, more than double the average increase in saccade latencies, suggesting that the differences noted in the first fixation durations and gaze durations on target do actually represent processing differences that influence the response. Error rates tended to vary between 2 and 4%.
Demand and visual search
409
Fig. 2. Comparison of the first fixation duration on the target (FFD) and the gaze duration on the target (GD) across task demand.
These results are consistent with the theory of perceptual narrowing. The peripheral preview that was afforded subjects in the low demand condition was removed when they had to process the central stimulus producing nearly 100 ms benefit in the duration of first fixations at an eccentricity of 4.6°.
Do novice drivers see less of the world? The evidence suggests that a reduction in the peripheral field may well occur due to an increase in the cognitive demands of a foveal stimulus, but what evidence is there to suggest that novice drivers may be more prone to perceptual narrowing than experienced drivers? Unfortunately no research addresses this question directly. Two sub-questions can be answered however. First, one should ask whether experience in any task can influence one's usable field of view, and secondly, whether perceptual narrowing occurs at all in driving. If such narrowing does occur in drivers, and experience has been shown to be a factor in other task domains, then it is a short step to predict that driving experience may influence visual search through a cognitive demand-based reduction of the usable field of view. With regard to the effects of experience, Holmes et al. (1977) suggested that the adaptation of the functional field of view is a skill which is learned, rather than a natural response to the changing environment. One example of this is the preview windows of Israeli subjects reading English and Hebrew (Polletsek et al., 1981). While reading English the subjects had a preview window of up to 15 letters to the
410
D.E. Crundall, G. Underwood & P.R. Chapman
right of fixation, yet only three or four letters to the left. When reading Hebrew however, which reads from right to left, this visual asymmetry was reversed. In this instance the functional field (where specifically defined in terms of preview benefits for reading) was adapted to the particular language. Experience in reading produced the two opposing attentional strategies that Polletsek et al. discovered. Other studies of picture or shape identification in the functional field of view have noted a training effect (Engel, 1971; Ikeda and Takeuchi, 1975). When subjects have experience in peripheral detection experiments they become more resilient to perceptual narrowing. In a comparison of aviators and non-aviators Williams (1995) found that the aviators had better accuracy than non-aviators in identifying peripheral targets under conditions of high cognitive load at the fovea. The experiment consisted of a foveal memory task involving letters presented in the centre of a tachistoscope field, and the identification of digits at various eccentricities in the peripheral field. This has little immediate relevance to the task of flying which suggests that the perceptual strategies of the aviators did generalise to tasks other than piloting a plane to some extent. If experience in areas such as aviation can improve peripheral detection rates, then driving experience may also have an effect. The second question concerns whether perceptual narrowing has ever been recorded in the driving domain. An early series of in-car studies of peripheral detection rates was conducted by Lee and Triggs (1976). Their experiments consisted of up to 12 subjects driving along various roadways such as a freeway, a suburban road and a shopping centre route, or along a private road attempting to keep the vehicle following a thin line on the road surface, while verbally responding to peripherally presented lights. Four target lights were mounted on the dashboard and body of the car, the furthest two at 70° from a fixation straight ahead, and the nearest two 30° from fixation. Though they questioned the appropriateness of the term "perceptual narrowing" they noted that as the processing demands increased, such as when driving through the shopping centre or when the margin of error for line following was reduced, peripheral detection rates fell with a pronounced decrement occurring in the two targets furthest from centre. Miura (1990) reported an experiment involving two subjects and 120 hours of driving. The subjects drove along a number of roads selected on the basis of traffic density and task demands. During the drive subjects had to verbally respond to peripherally presented target lights in a similar manner to the studies of Lee and Triggs (1976). Miura noted that as the demands of the roadway increased there was a corresponding increase in reaction times. From this he concluded that perceptual narrowing was occurring. He also identified a negative correlation between response eccentricity (distance of the target from the fixation point at the time of response) and the demands of the roadway. As the roadway becomes more complex the subjects saccaded closer to the target before responding, and used a greater number of fixations to do so. Miura's explanation is that as the usable field of view shrinks, drivers tend to
Demand and visual search
411
search toward the extremes of this field to increase their active search space. This can be described as a compensatory strategy developed to overcome the limits of peripheral vision under conditions of high demand, and it corresponds with the on-road data reported earlier from Underwood et al. (1997) which focused on the varying task demands of different roadways, and with Beck and Emery's (1985) suggestion of hypervigilance under anxiety provoking circumstances. Though the dual carriageway and the suburban route are viewed by novices and experienced drivers alike in regard to danger, the lack of difficulty on the former road may allow a compensatory strategy to be employed to overcome any reduction in the peripheral field. From the work of Lee and Triggs (1976) and Miura (1990) perceptual narrowing ostensibly transfers to the driving task. Evidence also suggests that task experience can influence the shape and size of the usable field of view (e.g. Williams, 1995). The proposition that experience may play a role in the effective size of the peripheral field of drivers is supported by evidence from culmination of these two research areas. It is also possible that perceptual narrowing could be partially responsible for the increased accident liability of inexperienced drivers. Land and Horwood (1995) have demonstrated in a rudimentary simulator that experienced drivers take in information about lane position through peripheral vision, rarely fixating the lane markers close to the vehicle. Mourant and Rockwell (1972) found that novice drivers tended to fixate road markers more often than experienced drivers. If novices do suffer greater perceptual narrowing, then lane maintenance information will not be available through peripheral vision, necessitating foveating the markers. This reduces the amount of time spent fixating the focus of expansion, thus decreasing the preview time for potential hazards, which may in turn increase the likelihood of an accident. A laboratory methodology for assessing the usable field of view in novice drivers The current aim of the authors is to devise a laboratory test which will distinguish between experienced and novice drivers on the basis of peripheral vision performance. An initial study on 10 novice drivers was completed to assess the validity of the methodology. This chapter will conclude with a brief look at the method that has been adopted, and whether initial testing supports the current theory of perceptual narrowing. Though other studies have measured peripheral detections in a real driving task, the subjects that participated in this research were experienced drivers. The safety implications of conducting such a study on a group of novice drivers are considerable. For this reason a laboratory approach has been adopted. One issue in the development of any test is the choice of measures that should be recorded. Miura (1990) said that the two most important indices of peripheral performance are response time and response eccentricity.
412
D.E. Crundall, G. Underwood & P.R. Chapman
However, the use of reaction time as a valid measure is dependant on the presentation of the peripheral targets. If the targets are only presented for a few hundred milliseconds then a response time can add little information to our knowledge of when the light was seen and will mainly consist of post-detection response bias, unless the difference in reaction times between groups is shorter than the presentation time of the target. If the light remains on until a response is made, then the time of response is more informative about when the light was noticed. During the time between target onset and response however, one cannot identify the motivations underlying the search strategy. The subject may note the stimulus and saccade toward it for verification, or they may simply "stumble" across it in their inspection of the visual field. For this reason it was decided to use simple detection rates of short duration targets as the main indicator of perceptual narrowing. Similarly the measure of response eccentricity can be misleading. Miura's findings suggest that response eccentricity is inversely correlated with demands and the size of the usable field of view. This means that the smaller one's visual field, the nearer one must be to the target before responding. However, if one saccades toward a target, then this presupposes that the stimulus has captured exogenous attention and has produced a reflexive saccade (Serano, 1992). If this is the case, the usable field of view must be at least as wide as the furthest eccentricity from which a peripheral target elicits a saccade. Instead of using response eccentricity, this initial study has focused on onset eccentricity — the distance from fixation to target at target onset. Coupled with the detection rate of peripheral targets which are presented for extremely short durations, these measures reflect the size of the subjects' usable field of view. Method Subjects Ten novice drivers were paid to take part (5 male, with a mean age of 19 years, and a mean experience since passing the driving test of 1.5 months). All the subjects had normal vision and were recruited via questionnaires distributed through the Driving Standards Agency (DSA) in Great Britain to newly qualified drivers. Materials and apparatus Thirty-nine MPEG video clips taken from a driver's perspective were presented to the subjects via a P90 PC. Each clip contained at least one potentially hazardous event such as a car emerging from a side road or a pedestrian stepping in front of the vehicle. Overlaid on the video clips were four red place holders, each positioned half way along one of the sides of the video display. The place holders subtended 0.7°.
Demand and visual search
413
The left and right place holders were 6.8° from the centre of the screen, while the top and bottom place holders were 4.4° from the centre. The peripheral targets were 200 ms lights which appeared in a random order in the centre of the four place holders. The lights subtended 0.3°. On average one light was presented every 5 seconds, with the added stipulation that two lights could not appear within one second of each other. In total 297 peripheral lights were presented to the subjects over 45 minutes of video clips. Subjects responded to the peripheral lights by pressing a button. While they watched the clips their eye movements were monitored using a SRI Dual Purkinje Generation 5.5 eye-tracker produced by Fourward Technologies. Design Two factors were of importance: demand and eccentricity of the target from fixation. Task demand was decided on the basis of a median split of a prior hazard perception study in which an average of 16 experienced drivers and 16 novices watched each video clip and pressed a button when they spotted a potential hazard. The number of button presses per subject was calculated for each 5 second segment of film, and a median split of 0.1842 defined half the 5 second windows as high demand and the other half as low. Eccentricity of the target varied according to where the subject was looking. The target was considered "near" if it fell within 6° of the current fixation. This roughly equates to an onset eccentricity within the same hemifield as the target. Procedure Subjects were instructed to search the scene as if they were the driver, attempting to spot any potential hazards. At the end of each clip they were instructed to judge it along two ratings: danger and difficulty (Groeger and Chapman, 1996). This was done on the computer with a cursor on a seven-point scale controlled by the PC mouse. They were also instructed to press a button whenever they saw a peripheral light, though the experimenter emphasised the importance of maintaining a relatively normal search pattern and searching for the hazards rather than waiting for the lights to appear. The video clips were viewed in four counterbalanced blocks.
Results and Discussion An analysis of variance was conducted on those peripheral targets which had been assigned onset eccentricities by the computer. This removed targets where calibration problems occurred, or where the possibility of a blink or a saccade would have meant the target would have been missed regardless of its onset eccentricity. A
414
D.E. Crundall, G. Underwood & P.R. Chapman
near
far
onset eccentricity Fig. 3. The percentage of hits compared across the level of demand and onset eccentricity.
significant main effect of both demand level (F(l,9) = 14.54, p < 0.01), and onset eccentricity (F(l,9) = 20.27, p < 0.01) were discovered, but an interaction was not found. The means can be viewed in Fig. 3. The results are consistent with the theory of perceptual narrowing and somewhat validate the experimental methodology. The consistent effects of demand on peripheral detection rates support the method of apportioning demand according to button presses in a prior hazard perception study. In a constantly changing perceptual scene, one cannot define a priori cognitive levels of demand without the confounding effects of visual complexity. This method however used a direct, self-report measure of demand which has provided a moment to moment index of the demands of all the video clips. While this does not totally solve the problem it is a vast improvement over labelling whole roads as demanding or otherwise, due to traffic density or visual clutter, and it also allows novice drivers to be tested in a safe environment. The lack of an interaction between onset eccentricity and level of demand suggests that the applicable model is general interference rather than tunnel vision. As mentioned earlier, Williams (1982,1985,1988) concluded that three things were necessary to induce tunnel vision: a high level of cognitive demand at the fovea, an attentional strategy overtly focused on the central task, and speed stress on the central task. The first two were present in this initial study but the latter was absent, which may account for the results. Regardless of which model of perceptual narrowing the results are in accord with, the initial success of this methodology is encouraging. The natural progression is to compare drivers of different levels of experience in order to identify differences in the reaction of their usable fields of view according to cognitive, foveal demand. This methodology could provide an understanding of the process which underlies
Demand and visual search
415
the effects of cognitive demand on visual search strategies. The added possibility of compensation strategies (Miura, 1990) may also provide an insight into the expanded search strategies which occur with increases in task demands and visual complexity. If such differences can be identified and linked with accident liability, the possibility of training interventions could become a feasible proposition. References Cohen, A.S. (1981). Car drivers' pattern of eye fixations on the road and in the laboratory. Perceptual and Motor Skills, 52, 515-522. Duncan, J., Williams, P. and Brown, I. (1991). Components of driving skill: experience does not mean expertise. Ergonomics, 34, (7), 919-937. Eriksen, C.W. and St. James, J.D. (1986). Visual attention within and around the field of focal attention: a zoom lens model. Perception and Psychophysics, 40, 225-240. Eriksen, C.W. and Yeh, Y. (1985). Allocation of attention in the visual field. Journal of Experimental Psychology: Human Perception and Performance, 11, 583-597. Engel, F.L. (1971). Visual conspicuity, directed attention and retinal locus. Vision Research, 11,563-576. Engel, F.L. (1974). Visual conspicuity and selective background interference in eccentric vision. Vision Research, 14, 459-471. Fry, G.A. (1968). The use of the eyes in steering a car on straight and curved roads. American Journal of Optometry, 45, 374-391. Groeger, J. and Chapman, P. (1996). Judgement of traffic scenes: the role of danger and difficulty. Applied Cognitive Psychology, 10, 349-364. Hella, F., Laya, O. and Neboit, M. (1996). Perceptual demand and eye movements in driving. Paper presented at ICTTP '96, Valencia, Spain. Henderson, J.M. and Ferreira, F. (1990). Effects of foveal processing difficulty on the perceptual span in reading: implications for attention and eye movement control. Journal of Experimental Psychology: Learning, Memory and Cognition, 16, (3), 417-429. Henderson, J.M., Pollatsek, A. and Rayner, K. (1987). Effects of foveal priming and extrafoveal preview on object identification. Journal of Experimental Psychology: Human Perception and Performance, 13, 449-463. Henderson, J.M., Pollatsek, A. and Rayner, K. (1989). Covert visual attention and extrafoveal information use during object identification. Perception and Psychophysics, 45, (3), 196-208. Holmes, D.L., Cohen, K.M., Haith, M.M. and Morrison, F.J. (1977). Peripheral visual processing. Perception and Psychophysics, 22, 571-577. Ikeda, M. and Takeuchi, T. (1975). Influence of the foveal load on the functional visual field. Perception and Psychophysics, 18, 225-260. Kaluger, N.A. and Smith, G.L., Jr. (1970). Driver eye movement patterns under conditions of prolonged driving and sleep deprivation. Highway Research Record, 336, 92-106. Land, M.F. and Horwood, J. (1995). Which parts of the road guide steering? Nature, 377, 339-340.
416
D.E. Crundall, G. Underwood & P.R. Chapman
Land, M.F. and Lee, D.N. (1994). Where we look when we steer. Nature, 369, 742-744. Lee, P.N.J. and Triggs, T.J. (1976). The effects of driving demand and roadway environment, on peripheral visual detections. APRB Proceedings, 8, 7-12. Leibowitz, H.W. (1986). Recent advances in our understanding of peripheral vision and some implications. Proceedings of the Human Factors Society 30th Annual Meeting, 605-607. Liu, A., Veltri, L. and Pentland, A.P. (1997). Modelling changes in eye fixation patterns while driving. In: A.G. Gale (Ed.) Vision in Vehicles VI. Amsterdam: Elsevier/North Holland. McDowell, E.D. and Rockwell, T.H. (1978). An exploratory investigation of the stochastic nature of the drivers' eye movements and their relationship to the roadway geometry. In: Senders, Fisher and Monty (Eds.), Eye Movements and the Higher Psychological Functions. Hillsdale, NJ: Erlbaum, pp. 329-345. Miura, T. (1979). Visual behaviour in driving. Bulletin of the Faculty of Human Sciences, Osaka University, 5, 253-289. Miura, T. (1985). What is the narrowing of visual field with the increase of speed? Proceedings of the 10th Congress of the International Association for Accident and Traffic Medicine, 130-134. Miura, T. (1990). Active function of eye movement and useful field of view in a realistic setting. In: R. Groner, G. d'Ydewalle and R. Parham (Eds.), From Eye to Mind: Information Acquisition in Perception, Search and Reading. Amsterdam: Elsevier/North Holland, pp. 119-127. Mortimer, R.G. and Jorgeson, C.M. (1972). Eye fixations of drivers as affected by highway and traffic characteristics and moderate doses of alcohol. Proceedings of the 16th Annual Meeting of the Human Factors Society, 86-92. Mourant, R.R. and Rockwell, T.H. (1970). Visual information seeking of novice drivers. 1970 International Automobile Safety Compendium. New York: Society of Automotive Engineers. Mourant, R.R. and Rockwell, T.H. (1972). Strategies of visual search by novice and experienced drivers. Human Factors, 14, (4), 325-335. Polletsek, A., Bolozky, S., Well, A.D. and Rayner, K. (1981). Asymmetries in the perceptual span for Israeli readers. Brain and Language, 14, 174-180. Rahimi, M., Briggs, R.P. and Thorn, D. R. (1990). A field evaluation of driver eye and head movement strategies toward environmental targets and distracters. Applied Ergonomics, 21, (4), 267-274. Rayner, K. (1986). Eye movements and perceptual span in beginning and skilled readers. Journal of Experimental Child Psychology, 41,211-236. Rayner, K. and Polletsek, A. (1989). The Psychology of Reading. New Jersey: Prentice Hall. Renge, K. (1980). The effects of driving experience on a driver's visual attention. An analysis of objects looked at: using the 'verbal report' method. International Association of Traffic Safety Sciences Research, 4, 95-106. Sereno, A.B. (1992). Programming saccades: the role of attention. In: K. Rayner (Ed.), Eye Movements and Visual Cognition: Scene Perception and Reading. New York: Springer-Verlag, 89-107. Shinar, D., McDowell, E.D., Rackoff, NJ. and Rockwell, T.H. (1978). Field dependence
Demand and visual search
417
and driver visual search behaviour. Human Factors, 20 (5), 553-559. Shinar, D., McDowell, E.D. and Rockwell, T.H. (1977). Eye movements in curve negotiation. Human Factors, 19, (1), 63-71. Underwood, G., Crundall, D.E. and Chapman, P.R. (1997). Visual attention while performing driving and driving-related tasks. In: G.B. Grayson (Ed.), Behavioural Research in Road Safety 7. Crowthorne, U.K.: Transport Research Laboratory. Underwood, G. and Everatt, J. (1992). The role of eye movements in reading: some limitations of the eye-mind assumption. In: E. Chekaluk and K.R. Llewellyn (Eds.), The Role of Eye Movements in Perceptual Processes. Amsterdam: Elsevier/North Holland, pp. 111-169. Williams, L.J. (1982). Cognitive load and the functional field of view. Human Factors, 24, 683-692. Williams, L.J. (1985). Tunnel vision induced by a foveal load manipulation. Human Factors, 27, 221-227. Williams, L.J. (1988). Tunnel vision or general interference? Cognitive load and attentional bias are both important. American Journal of Psychology, 101, 171-191. Williams, L.J. (1995). Peripheral target recognition and visual field narrowing in aviators and non aviators. The International Journal of Aviation Psychology, 5, (2), 215-232. Zwahlen, H.T. (1993). Eye scanning rules for drivers: how do they compare with actual observed eye scanning behaviour. Transportation Research Record, 1403, 14-22.
This page intentionally left blank
419
CHAPTER 19
The Development of the Eye Movement Strategies of Learner Drivers Damion C. Dishart and Michael F. Land The University of Sussex
Abstract Land and Horwood (1995) showed that experienced drivers obtain visual information from two sections of their view of the road ahead, in order to maintain a correct position in lane whilst steering their vehicle around a curve. The more distant of these two sections is used to predict the road's future curvature. This section is optimally 0.75-1.00 s ahead of the driver and contains the tangent point. (That point where the inside edge of a curve reverses its apparent direction, and the driver's line of sight forms a tangent to the road edge.) This section of road is used by a feedforward (anticipatory) mechanism which allows the driver to match the curvature of the road ahead. The other, nearer, section is about 0.5 s ahead of the driver and is used by a feedback (reactive) mechanism to 'fine tune' the driver's position in lane. As either lane edge approaches the vehicle, the driver steers away from it, correcting his/her road position. This combination of these two mechanisms enables the trained driver to steer an accurate course on roads of varying curvature. Experiments using video based eye-head tracking equipment have shown that the feedback mechanism is present in most people regardless of their experience of driving (although its accuracy is higher in those with experience), but that the feedforward mechanism is learned through experience of steering tasks (that can include riding a bicycle, computer driving games, etc.). Eye-head tracking experiments on learner drivers during their tuition have indicated that use of the section of the road containing the tangent point increases with experience, then decreases as drivers learn to optimise their visual search patterns, allowing them to spend more of their visual resources on other visual tasks both related and unrelated to driving. Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
420
D. C. Dishart & M. F. Land
Introduction. One of the fundamental skills involved in driving is the ability to steer a vehicle accurately around a course of varying curvature. In order to do this a driver must be able to estimate the curvature of the road and convert that estimate into appropriate movements of the vehicle's controls (steering wheel, accelerator, brake, etc.). Land and Lee (1994) demonstrated that three experienced drivers fixated a particular feature of their view of the road ahead while driving around bends in the road. This feature was the tangent point, that point where the driver's line of sight forms a tangent on the inside edge of a road bend. The subjects of the study would begin to fixate this feature approximately 3 s prior to entering the bend and continue to fixate it until between 3 and 6 s after entering the bend. The highest percentage of fixations to the tangent point was at approximately 0.5 s into the bend where the drivers' fixations rested on the tangent point 80% of the time. Land and Lee hypothesised that the simple relationship between the position of the tangent point and the curvature of the road, as shown in eq. (1), combined with the stability of the tangent point on a bend of constant curvature, made the tangent point so useful to the visuo-motor processes involved in driving. C=l/(rfcos6)-l/
(1)
where C is the curvature of the road, d the lateral distance of the driver from the road edge and 8 the angle between the tangent point and the driver's current course as subtended on the retina. Land and Horwood (1995) demonstrated that during simulated driving, three experienced drivers relied on two sections of their view of the road ahead for visual information. These sections were at approximately 4° (far-road) and 7° (near-road) below the vanishing point of the road at a speed of 16.9 ms~!. Land and Horwood also found that the near-road system was effective when used by itself at low speeds (described as similar to driving in fog), whereas the far-road system was never sufficiently accurate when used on its own. Land and Horwood hypothesised that two independent mechanisms using visual information were at work during driving, an anticipatory or feedforward mechanism that estimated the future curvature of the road, possibly from the offset of the tangent point from the heading direction, and a reactive or feedback mechanism that maintained the vehicle's position in lane (the far- and near-road mechanisms, respectively). This model of driver behaviour agreed with a similar model put forward by Donges in 1978. A preliminary study of inexperienced drivers during simulated driving indicated that the two mechanisms suggested by Land and Horwood are not innate but are gained with experience of driving. The study also suggested that these mechanisms
Eye movement strategies of learner drivers
421
do not require experience of actually driving a vehicle but may be gained in part if not in total through simulated driving (i.e., computer/video games, etc.). This study is intended to explore the development of these mechanisms in learner drivers. To that end, the equipment used by Land and Lee (1994) has been used to measure the eye movements of drivers undergoing tuition in the United Kingdom. Methods This study has two main subject groups. Those undergoing testing in the laboratory using a simulated car drive and those undergoing testing whilst driving an actual vehicle. Each group consisted of experienced and inexperienced subjects. For simplicity these groups will be known in the remainder of this document by the mnemonics RE, RI, SE and SI, where R and S stand for Road and Simulator (experiment type) and E and I stand for Experienced and Inexperienced (subject type). The groups RE, SE and SI were selected from volunteers from the University of Sussex, England (faculty, staff and students), while the RI group was selected from volunteers provided by a Driving school in Surrey, England. Subjects were excluded from the groups RI and SI if they had any experience of driving a vehicle in the past, while subjects were excluded from the RE and SE groups if they had not had one years driving experience after passing their British driving test. The subjects were asked to drive either a car or the simulator whilst wearing eye tracking headgear. This equipment makes a simultaneous video recording of the subject's eye and the subject's view ahead. Before and after each recording is made, the subject performs a calibration routine by looking directly at objects named on the audio recording by the experimenter. Further details can be found in Land (1993). The recordings made from this equipment are then processed to produce a second-generation recording bearing a spot representing the position of the subject's gaze during the recording. The position of the subject's eye in its socket on the recording are transformed into the position of the gaze spot using the information gathered during the calibration routines. This second-generation recording can then be processed by a number of different methods. The relative movements of eye and head that move the gaze spot can be extracted by reprocessing the second-generation recording. The position of an anchor point in the subject's field of view is tracked, on computer, to record the direction of the subject's head. This anchor point must, therefore, be static with respect to the subject's body (and therefore the vehicle). To this end pieces of marked masking tape are placed on the windscreen of the vehicle during the primary recording. The tape can then be used as anchor points, as can any other static point in the field of view (e.g., the rear-view mirror). The anchor point can be changed during the processing without upsetting the data collection if, for example, it moves out of the field of view. The data collected via this method is
422
D. C. Dishart & M. F. Land
combined with the position of the gaze spot recorded during the first process. This combined data can then be printed out in graphical form and analysed. The secondgeneration recording can also be analysed with respect to the distribution of gaze points in time and space relative to a particular feature of the road. The speed of the vehicle during on road recordings can be calculated. The vanishing point (horizon) is found by extending the sides of the road until they meet. The vertical offset of a target point, static with respect to the road, from the vanishing point is then recorded for a defined period in time. Equation (2) gives the speed of the vehicle in ms~\ where 6,, is the angular vertical offset of the target point at time n and Af is the time interval in seconds between time 1 and time 2. The value 1.1 represents the height of the driver's eyes from the road, assumed to be 1.1 m throughout this study. The apparent speed of the simulator is set by the experimenter prior to the recording.
The subjects in groups RE and RI were recorded while driving on public roads in the Crawley area of Sussex, England. The subjects in group RI were recorded three times during their tuition. The first recording was made as soon as possible after the subjects had begun driving tuition. In practice this was usually the subject's second two-hour driving lesson, the subject being approached to participate in the study during their first two-hour lesson. The subject's first lesson generally consists of one hour of induction and preparation, where the subject does no driving, and one hour of driving practice. The second recording is made when the instructor feels that the subject has made sufficient progress to start being taught the more complex parts of driving (manoeuvres, etc.). The third recording is made prior to the subject's first driving test. The content of the lesson received by the student during the recording was typical of a lesson for a student of that level of ability. The subject groups SE and SI were recorded while driving a simulator program displayed on a 55 cm wide television at approximately 80 cm from the subject's head. The subjects were free to move their heads as they required. The simulator program displays a simple road scene consisting of a horizon and left and right road edges in white on a black background. The scene also contained an arc at the bottom of the screen representing the vehicle's bonnet (that was stationary with respect to the driver). The simulator program gives both graphical and numerical outputs of the subject's performance at the end of a drive. This output gives the standard error of mean of the vehicles position with respect to the centre of the roadway and graphical outputs of the steering wheel angle with respect to the road angle and the vehicle's position with respect to the centre of the road for the duration of the drive. All experiments were performed at a simulated speed of 12.5 ms~'. This speed,
Eye movement strategies of learner drivers
423
lower than that used in Land and Horwood (1995), was chosen as some of the more inexperienced subjects were prone to 'losing the road' at the higher speed. (Where the simulated roadway is lost from the display and is almost usually unrecoverable.) The task given to the subjects in all cases was that they should attempt to drive the vehicle along the road, keeping as near to the centre of the road as possible which simulated maintaining a correct position in a road lane. This task would be the same for drivers on most, if not, types of road and would also eliminate any differences between drivers who have greater experience of driving on the right side of the road (although, in the end, none were included in the experiments).
Results This study is currently in progress, and so the results in this section are from those subjects tested so far, or who are currently in the process of being tested. Thus the experimental process is incomplete and that, as yet, few definite statements can be made. This, combined with the low numbers of volunteer subjects, means that the data are generally not yet statistically significant. The initial study of groups SE and SI indicated that there was a difference in performance between those inexperienced subjects who had experience of performing a similar task to driving (e.g., use of computer driving simulation games) and those who had not. The subjects with relevant experience were able to better perform the tasks given to them on the simulator. A qualified driver (SE-4), a non-driver with relevant experience (SI-2) and a complete novice (SI-3) were recorded using the eye-tracker while driving the simulator. After processing, the position of the gaze spot was recorded, once per second, for the entirety of the test. These points were then sorted into bands depending on their vertical position relative to the horizon, as shown in Table 1 along with the horizontal and vertical spread of fixation points and the mean number of saccades made by the subject per second. This last value was calculated by the observing number of saccades made by the subject during a particular 20-s section of the test. This value could not be calculated from the gaze point data in Table 1 as it represents a once per second sampling of gaze points for tests of approximately the same duration. The pattern of gaze spots for the three subjects appears to increase in horizontal spread with increasing experience, while the frequency of saccades decreases correspondingly. This suggests that experience of driving teaches the subject to make fewer, longer, fixations and saccades. The increasing horizontal spread of gaze points occurs in a band between 1 and 3° below the horizon, the area where tangent points occur on the simulation. The vertical spread of gaze points appears unrelated to experience.
424
D. C. Distort & M. F. Land
Table 1 Summary of gaze point positions relative to horizon for three subjects of differing driving skill Degrees below horizon
Subject
SE-4
SI-2
SI-3
-1.0
0
1
0
-0.5
0
5
2
0.0
2
7
0
0.5
2
11
7
1.0
3
12
12
1.5
18
16
9
2.0
16
8
19
4.5
1
0
3
5.0
1
1
1
5.5
1
1
1
6.0
1
1
1
6.5
0
1
0
Total
82
77
83
Horizontal spread
17°
14.5°
7.5
Vertical spread
6°
10°
7°
Saccades per second
1.45
1.80
3.40
The SE and SI groups were also asked to drive the simulator with only certain vertical sections of the roadway visible. These 2° sections were at 0,6 and 9° below the horizon, respectively. These sections were chosen as they were at the 'top, middle and bottom' of the area of the display between horizon and vehicle. The standard deviation (s.d.) of the vehicle's distance from the centre line throughout the tests are shown in Table 2. A value of less than 0.3 for s.d. indicates that the subject did not allow the vehicle to leave the road during the test. All tests were performed at a simulated speed of 12.5 ms'1 (see above). Subjects SI-3 and SI-4 were the subjects with no experience of driving-like situations and performed the worst at all of the tests. However, there were no obvious differences between experienced drivers and those used to 'games driving'. All of the subjects were then given a test where the steering wheel on the simulator
Eye movement strategies of learner drivers
425
Table 2 The standard deviation for subjects from groups SE and SI driving with only parts of the road visible Subject
Distance of 2° section of road below the horizon
SE-1
0.29
0.16
0.42
SE-2
0.34
0.19
0.41
SE-3
0.35
0.25
0.36
SE-4
0.32
0.15
0.45
SI-1
0.24
0.18
0.34
SI-2
0.28
0.26
0.27
SI-3
2.44
0.40
1.18
SI-4
5.01
0.63
1.36
did not control the vehicle (although the subjects were unaware of this) and the vehicle maintained a central course for the duration of the test. This test was designed to observe the subjects steering wheel movements in the absence of visual feedback from the road. All of the subjects made vastly exaggerated steering wheel movements on this test, except for SI-3 and SI-4 who made very small, ineffectual steering wheel movements, indicating that they had not yet learned what visual feedback to expect. None of the subjects, when subsequently asked, realised that the steering wheel was no longer controlling the vehicle. To investigate subjects' behaviour during real driving, the locations and durations of fixations were obtained twice from a group RI subject on the same stretch of road (Tables 3 and 4). Where a target of fixation is noted, the fixation point was within 1 ° of that object for the duration of the fixation. The saccades and fixation associated with a mirror check are not included in the averages at the bottom of these tables as they are to a within-car object. Table 1 was taken after four hours of tuition and Table 2 after 12 hours. The road curve from which these figures were taken was a right hand bend approaching a roundabout. The data comes from 8 s of recording, starting 1 s prior to entering the bend. As the data comes from a right hand bend, the distance of the fixation point from the tangent point formed on the road edge opposite the subject has been obscured in some cases by oncoming vehicles. The mnemonics in the target of fixation column represent tangent point (T.P.) and oncoming vehicle (O.V.). Tables 3 and 4 show similar, but not repeatedly identical results for the subject driving around a given right hand curve.
D.C. Dishart & M.F. Land
426
Table 3 Fixation durations and positions for a subject with four hours driving experience Duration (s)
Distance from tangent point (°) Far road edge
Centre line
Object
0.16
6
5
0.08
5
4
0.36
6
>10
0.16
6
5
0.54
4
7
O.V.
0.2
3
10
O.V.
0.48
-
>10
0.6
6
2
1.16
-
-
MIRROR
0.6
-
6
O.V.
0.28
4
9
0.66
5
>10
0.24
3
5
O.V.
O.V.
Mean fixation duration
0.36 s
Mean distance from nearest tangent point
4.1°
Discussion Godthelp (1986) describes the necessary steering-wheel angle for a curve of constant curvature by eq. (3), where ct is the road curvature, G the steering ratio, K the stability factor, / the wheel base, u the vehicle's speed and 8g the steering-wheel angle. No matter how variable the curvature of a road, it can be considered as a series of curves of constant curvature. (Gl(\+Ku2)]ct (3) 1000 Godthelp (1986) proposed a model of driver steering behaviour which was derived from Donges' (1978) model. This model had two mechanisms that allowed the above equation to be solved in a vehicle, which he dubbed the anticipatory and the compensatory mechanisms. Both mechanisms received an estimate of the road
6 =
Eye movement strategies of learner drivers
427
Table 4 Fixation durations and positions for a subject with 12 hours driving experience Duration (s)
Distance from tangent point (°) Far road edge
Centre line
0.8
6
>10
0.34
3
>10
1.24
8
1
T.P.
0.8
-
-
MIRROR
0.1
9
>10
0.78
5
>10
0.26
8
1
0.08
3
6
0.52
1
8
0.2
4
4
0.72
7
1
0.38
6
6
0.38
-
-
0.16
6
5
0.36
-
-
MIRROR
0.32
-
5
O.V.
1.16
4
8
Object
T.P. T.P./ O.V. T.P. MIRROR
Mean fixation duration
0.50s
Mean distance from nearest tangent point
3.8°
curvature (cr) from the visual system and produced a component of the desired steering-wheel angle. These components were combined and output to the vehicle via the steering-wheel. The movement produced by the vehicle (which is dependent on the other variables in the equation) is then fed back into the compensatory mechanism. Land and Horwood (1995) suggested a similar mechanism, where the anticipatory mechanism received its input from the visual system's estimate of future road curvature (from the tangent point) while the compensatory system received its input from the visual system's estimate of the current distance of the vehicle from the edge of the lane it is in.
428
D.C. Dishart & M.F. Land
The first study detailed above indicated that the development of the anticipatory and compensatory mechanisms can occur with any form of driving-like experience. This is shown by the increase in the ability of subjects to perform the experimental tasks. The ability appears to reach a 'steady state' (i.e. little or no further improvement occurs in the ability top perform this task) with relatively little experience, as both of the subjects in group SI with relevant experience (SI-1 and SI-2) achieved s.d. scores similar to, and in some cases better than, the qualified subjects. Subject SI-2 displayed a fixation point spread more like the qualified subject and was, like the qualified subject, making fewer, longer saccades. Subject SI-3, who had no relevant experience, was making more, shorter saccades. The subject whose eye movements were recorded on the road showed the development of a logical efficient pattern of eye movements while driving. On a right hand curve the subject alternates between fixating around the far road edge tangent point, where future potential hazards (e.g. oncoming vehicles, pedestrians, etc.) will first become visible, and the tangent point on the centre line of the road. As the Land and Horwood (1995) model of driver behaviour relies on the driver being able to estimate the distance to the lane edge, it is more logical for a driver to use the lane edge closest to the vehicle, as that edge is less likely to be obscured by other traffic, as is the tangent point. The recording taken after four hours driving tuition shows that the subject has already begun to develop this pattern of fixations. After 12 hours the subject is able to use this pattern efficiently, for the most part using the pattern except when other visual stimuli require attention (e.g., oncoming traffic or the rear-view mirror). The subject also has a higher mean fixation duration, suggesting that the subject is processing more useful information from the fixations made, lowering the need to change fixation position.
Conclusions From the experiments completed so far it would appear that the two-system model of driver behaviour, in the form proposed by Land and Horwood (1995), is developed by the brain in response to visual stimuli from a task that requires a relation to be established between the curvature of a road (or road-like object) and the muscle action involved in steering. The compensatory mechanism, that uses the vehicle's lateral position relative to the road or lane edge, is obtained first, followed by the anticipatory mechanism using the angular position of the tangent point relative to the vehicle's current heading. Godthelp's (1986) equation of steering dynamics (eq. 3) includes speed as a variable. A third stage of development of the behaviour would seem to be, from personal observations, the ability to adjust the behaviour for different speeds and to
Eye movement strategies of learner drivers
429
be able to estimate an optimum speed for a given corner. Equation three also includes vehicle wheel base and stability factor as variables. This would suggest that a period of re-calibration would be necessary when driving different vehicles, which also seems likely from personal observation. This study of driver-steering behaviour will continue with subjects recorded during their driving tuition as well as those recorded while driving the simulator. A new series of experiments, suggested by the results obtained thus far would be to observe the entire development of the behaviour in subjects learning the behaviour solely on the simulator. These experiments would allow the observation of the relative development of the observed and suggested behaviours under controlled conditions. References Donges, E. (1978). A two-level model of driver steering behaviour. Human Factors, 20(6), 691-707. Godthelp, H. (1986). Vehicle control during curve driving. Human Factors, 28(2), 211-221. Land, M.F. (1993) Eye-head co-ordination during driving. Proc. IEEE SMC Conference, Le Touquet: Vol. 3, pp. 490-494. Land, M. and Horwood, J. (1995) Which parts of the road guide steering? Nature, 377, 339-340. Land, M.F. and Lee, D.N. (1994) Where we look when we steer. Nature, 369, 742-744.
This page intentionally left blank
431
CHAPTER 20
What the Driver's Eye Tells the Car's Brain Andrew Liu Nissan Cambridge Basic Research
Abstract The analysis of drivers' eye movement may provide useful information for an intelligent vehicle system that can recognize or predict the driver's intention to perform a given action. Such a system could improve the interaction between the driver and future vehicle systems and possibly reduce accident risk. It has been experimentally demonstrated that the pattern of eye fixations reflects, to some degree, the cognitive state of the observer. Also, a Markovian analysis has been used to quantitatively characterize the eye movement patterns associated with specific mental states. This approach is quite similar to that used to model human driver behaviour for the aforementioned intelligent vehicle system. This strongly suggests that eye movement analysis could be readily incorporated into these systems.
Eye Guidance in Reading and Scene Perception/G. Underwood (Editor) © 1998 Elsevier Science Ltd. All rights reserved
432
A. Liu
Introduction In 1993, over 6 million vehicular accidents were reported in the United States, including over 40,000 fatalities and over 3 million injuries (Gross and Feldman, 1995). A large number of these accidents are attributed to driver error stemming from driver inattention, misallocation of attention, or misperceptions which lead to inappropriate decisions. Without considering the intentions or state of the driver, the concurrent use of in-car devices such as cellular telephones, navigation systems, automated safety systems and perhaps even personal computers could draw the driver's attention from the traffic situation at inappropriate times and lead to an accident. An example is the use of cellular telephones in vehicles which quadruples the risk of a collision (Redelmeier and Tibshirani, 1997). Consider the situation where a driver is following a vehicle in the same lane and a second vehicle is in the adjacent lane slightly behind the driver's vehicle. An alarm or warning to the driver is appropriate if the driver's intention is to pass the car but it could be annoying and potentially distracting if the driver was only following the car ahead. Such false alarms could result in the driver ignoring future warnings. Thus a "smart car" would be able to predict or recognize the driver's intentions and then take the appropriate course of action based on that prediction. The major hurdles in developing such a system are the model of the driver's behaviour and a non-invasive and unobtrusive technique for checking the state variables of that model. One promising approach for modelling the behaviour of human drivers is the hidden Markov dynamic model (HMDM) (Pentland and Liu, 1995; Liu and Pentland, 1997). In this approach, observations of the driver's control actions are used to infer what action the driver is intending to perform. In this chapter, I propose that by analyzing the pattern of a driver's eye movements, it may be possible to improve the performance of a system recognizing driver intentions and possibly even allow the prediction of intentions. Consider the following thought experiment: If you were sitting in the passenger's seat of a car, would you be able to determine what actions the driver was performing or was going to perform by merely observing the driver's head and eye movements? It certainly seems straightforward to determine whether the driver is looking at the mirrors, or instruments, or to one side of the road, and so one could make fairly accurate predictions of the driver's intentions from the eye movement patterns. The actual utilization of eye movements for practical applications is problematic due to the unperceived dynamic nature of eye movements. Consider the efforts to design computer-human interfaces based on eye movements. Early efforts linking cursor control to the gaze position of the user proved to be quite unnatural as eye movements generally represent the motor output of a number of cognitive processes, not only foveal information acquisition. More successful approaches have made inferences about the user's intentions from the pattern of eye movements. In this
The driver's eye
433
paradigm, interaction is enabled from the user's natural eye movements rather than from certain prescribed and ad hoc movements (Starker and Bolt, 1990; Jacob, 1991). Jacob (1991, p. 168) commented that"... when the system is working well, it can give the powerful impression of responding to its user's intentions rather than his explicit inputs." Analyzing patterns of eye movement is more critical when attempting to interpret "non-spatial" intentions such as zooming into or out of a picture (Goldberg and Schryver, 1995) where there is no explicit feature or object in the visual scene that indicates the intention to zoom. This latter application is much closer to the problem of inferring the underlying state of drivers (e.g., what manoeuvre they will execute). However, the driving application is also much more difficult given the dynamic nature of the visual scene. The remainder of the chapter provides more detailed evidence why driver eye movement patterns hold much promise as an indicator of the driver's intentions and then how the analysis of eye movement patterns might actually be implemented into a smart car. I will briefly review some examples from the literature that illustrate patterns of eye movements that can be associated with some mental state. I will also mention the issues concerning explicitly modelling this relationship. One statistical approach, the Markov model, has been successfully used to model the patterns of eye movements associated with specific mental states in visual tasks and in driving. This approach is quite similar to the HMDMs mentioned previously, which suggests that the information in eye movement patterns might be easily incorporated into such a model of the driver. At this point of the chapter, I will describe the hidden Markov dynamic models in further detail and outline how the analysis of eye movement patterns might be incorporated into the system. Modelling the relationship between eye movements and cognitive processes Numerous studies have tried to establish the level at which the relationship between eye movements and higher cognitive processes can be modelled. The least controversial conclusions simply state that the pattern of eye movements generally reflect the observer's thought processes, indicating to some degree the goals of the observer and perhaps even the main areas of interest. The strongest conclusions assert that the eye movements are directly observable indicators of underlying cognitive processes, revealing the nature of the acquired information as well as the computation processes. There are numerous examples illustrating the general relationship between eye movements and cognitive processes. The fixation patterns of observers can change rather substantially depending on numerous factors, such as the information they are trying to discover from a scene (e.g., Yarbus, 1967), or whether they recognize a hidden or embedded figure (Stark and Ellis, 1981) or according to the current mental interpretation of an ambiguous figure such as the Necker cube (Ellis and Stark, 1978). However, to postulate a more detailed link between eye movements and
434
A. Liu
cognitive processes, three important questions should be considered (Viviani, 1990). First, how do the various cognitive processes map onto the sequence of eye fixations? Clearly, the movement of the eyes is a strictly serial process yet, there is abundant evidence suggesting that certain cognitive processes can also work in parallel. In this case, the eye movements may represent the motor output from a multiplexed command signal which includes eye movement commands from a number of cognitive processes and it becomes virtually impossible to associate specific eye movements with specific processes without a theory of how concurrent processes combine eye movement signals. Second, how does attention move with respect to the gaze direction? It has been shown that attention may shifted spatially from the fixation point (e.g., Posner, 1980; Eriksen and James, 1986). or even in depth within the visual scene (Nakayama and Shimojo, 1992), but it remains an open question whether these are totally independent processes (e.g., Klein, Kingstone and Pontefract, 1992) or closely linked (e.g., Rizzolatti, Riggio and Sheliga, 1994). Further discussion of this question can be found in Chapter 13. The third question arises from the second, that is, how can one determine the nature of the information being acquired? The concept of "information" in vision research has numerous definitions ranging from the physical features of a visual scene, such as edges or texture changes (Mackworth and Morandi, 1967) to the features that aid the interpretation of the meaning of a scene (Yarbus, 1967; Antes, 1974). The latter form of information is highly context dependent. When the observer is in one state, then fixations may cluster on areas of the picture relevant to that state. When the state changes, then hitherto meaningless regions acquire "informativeness" and may be fixated. The concept of information is not a low-level aspect of the scene but rather must be a higher level construct dependent on the mental models being tested. More in depth discussion of scene perception and information acquisition can be found in Chapters 12 and 14. A well-known example of a theory modelling the connection between eye movements and cognitive processing is the Scanpath Theory of Recognition (Noton and Stark, 197la, 1971b). The theory postulated that repetitive fixation sequences indicated the creation and storage in memory of visual-motor traces, or "featurerings". These traces were pattern-specific and idiosyncratic and moreover, remained unchanged. In order for recognition to occur, the eyes had to move through the same scanpaths. However, numerous studies (e.g., Biederman et al., 1974; Potter, 1975; Chapter 13) have shown that picture recognition can occur very quickly and without eye movements which indicates some parallel processing of the image and no need for visual-motor traces in recognition. Furthermore, the presence of scanpaths does not influence recognition presence (Locher and Nodine, 1974) . This evidence suggests that the scanpaths are merely typical of, but probably not necessary for some level of recognition.
The driver's eye
435
Markovian analysis of eye movements While the details of the link between eye movements and cognitive processes remain in question, a model of eye movement behaviour is necessary to at least identify characteristic patterns of behaviour that can be associated with a cognitive state. Stark and Ellis (1981) described a probabilistic approach where the eye movements are modelled as a Markov process. Harris (1993) re-analyzed data from Bus well (1935) and found that the data were well modelled by a first-order Markov model. In addition, Stark and Ellis (1981) showed that the transition matrices for eye movements preceding recognition of a fragmented figure were statistically different from the matrices modelling fixation behaviour after recognition. In this approach, the location of the current fixation is dependent on the location of the ftth previous fixation. The pattern of eye fixations can be captured in the set of conditional probability matrices which can be empirically measured. The zero-order Markov matrix M0 simply gives the probability that a fixation will be in a given region of the scene. Assuming that the fixations tend to cluster in the set of regions Rm in the scene, then it is possible to generate the mxm first-order Markov matrix M, whose elements indicate the probability/?^ that a fixation in region Rt is immediately followed by a fixation in region /?.. If only a few entries in each row have high transition probabilities, then the eye movement sequences have a high probability of passing through a cyclic sequence (i.e., a scan path). First and higher-order Markov matrices characterize the dependency of transitions on previous history (i.e., the probability of reaching region Rf from Rt through n intermediate steps). However, as n increases, longer fixation sequences are required to obtain accurate estimates of transition probability, so that second-order models and higher have generally not been considered. In a similar approach, individual scan patterns have been discriminated by the relative frequency of fixation triplets (Groner, Walder and Groner, 1984). It should be noted that the existence of scanpaths as shown by Markov matrix analysis does not necessarily imply that eye movements are controlled by a higher level process. The distribution of highly salient features across the scene may lead to repeated patterns of eye movements which resemble first-order behaviour (Viviani, 1990; Ellis, 1986). However, these cases can be distinguished by statistically comparing the empirically obtained transition probability matrix to a transition probability matrix derived from the zero-order probabilities. Ellis and Stark (1986) used this type of statistical test to show that the eye movement behaviour of airline pilots viewing a flight display was not simply determined by the area of the regions of fixation which would be zero-order behaviour but exhibited first-order properties. An additional limitation of this Markovian analysis is the exclusion of temporal aspects of the fixation sequences such as fixation duration (Viviani, 1990). The temporal properties of the pattern of fixations might provide another parameter
436
A. Liu
to differentiate the cognitive processes. Fixation duration seems to provide interesting insights into road type and possibly a driver's assessment of the danger of the current situation (see Chapter 17).
Driver eye movements Returning to the subject of driving, I will now review some of the research literature on driver eye movements. The goal is to identify patterns of eye movements that can be associated with a particular mental state of driving. These states form the building blocks for a model of a particular driving intention such as changing lanes and ultimately, the patterns of eye movements will be used to distinguish these states. Most studies in the literature report only zero-order statistics such as fixation location and duration. While these statistics are quite informative, a moment by moment analysis of behaviour would be equally desirable (see Chapter 17). Eye movements for vehicle control The most basic task in driving is probably keeping the vehicle on the road. Therefore, the majority of studies have examined eye movement behaviour in the context of vehicular control. Consider first the case of driving on a straight section of roadway with little of no traffic. Generally drivers spend most of their time looking somewhere ahead of the car but also make fixations that are closer to or farther from the car or to either side of the road. One suggestion for the distant fixations is that drivers use the focus of expansion to determine their heading for lane-keeping purposes (Gordon, 1966b; Olson, Battle and Aoki, 1989). But more likely, this is simply the most advantageous position to maximize anticipation time (Wohl, 1961) and detect possible hazards. Fixating on the road ahead provides a uniform distribution of the visual field to either side of the road in which potential hazards can be detected and allows the driver time to react appropriately. Mourant and Rockwell (1972). suggested that experienced drivers tend to look further down the road, while novice driver fixate on the road closer to the car but this result has not been found in more recent studies (see Chapters 17 and 18). The fixations closer to the car and towards the edge of the road may be used to acquire information for lateral control (Mourant and Rockwell, 1972). Thus for driving on straight road segments, a characteristic fixation "pattern" would be to alternately fixate on or near the focus of expansion and any other point in the scene. Fixation behaviour in curves shows a more interesting pattern of eye movement. Numerous studies have shown a pattern of looking between the road ahead and nearer the car (e.g., Gordon, 1966a; Shinar, McDowell and Rockwell, 1977; Jurgensohn, Neculau and Willumeit, 1991). Again the general conclusion is that this
The driver's eye
437
pattern indicates a switching between a preview of the road ahead and near-car fixations for controlling lateral position. Jurgensohn et al. showed a distinctive saw-tooth pattern of fixations suggesting that drivers fixate on a single point and track it as it nears the vehicle. When it reaches a preview distance of about one second, the eye makes a saccade to a new point and repeats the cycle. Other studies indicate that the driver makes fixations to the near and far areas of the road without the tracking (Shinar et al., 1977; Land and Horwood, 1995). Anticipation of the curve is also evident in numerous studies where eye movements to the upcoming curves preceded actual entrance into the curves by 1-4 s (Cohen and Studach, 1977; Shinar et al., 1977; McDowell and Rockwell, 1978; Land and Horwood, 1995). Different road conditions and vehicle speeds account for the range of preview times. Other studies (Land and Lee, 1994; Veltri, 1995) indicate that a large proportion of fixations on the curve ahead are concentrated near the tangent point of the curve which provides information about the curvature of the upcoming curve (Land and Lee, 1994). The curvature of the road has an effect on the pattern of eye movements in that drivers tend to make more fixations on higher curvature roads (Serafin, 1994). As the curvature of the road increases the separation between the tangent point and the area of road furthest from the driver also increases. Thus drivers may not be able to acquire information from both locations simultaneously and must make eye movements between them. This behaviour may be a useful indicator that the driver will be negotiating sharp curves. Naturally, when eye movements of drivers are studied under normal traffic conditions, there is a significant change in fixation patterns. A vehicle in front of the driver and in the same lane tends to attract most of the eye fixations (Mourant and Rockwell, 1970; Olson et al., 1989; Veltri, 1995) Therefore the car following situation might be characterized by a pattern of eye movements centered on the lead vehicle. The mirrors and instruments inside the car attract only around 10-15% of the driver's fixations but may also be an excellent indicator of the current state or intentions of the driver. Glances at mirrors certainly indicate retrieval of information that can be related to specific driving tasks. For lane changes and merge maneuvers, one interesting trend in the pattern of looking is evident. American drivers tend to use the left-side mirror much more than the inside mirror for left side maneuvers, while the opposite is true for right-side maneuvers (Mourant and Donohue, 1977). Tasks involving the use of instruments inside the car also change the general fixation behaviour of the driver. Antin et al. (1990) compared fixation patterns when navigating with a moving-map system versus the control case of navigating a memorized route. Using a link probability measure (Wierwille, 1981), they showed that the dominant eye fixation pattern was between the road and either the mirrors or instrument panel when driving along a memorized route. However, when using the moving map system, the dominant pattern was between the center of the road and the map display. Thus the use of a navigation system could be signaled
438
A..Liu
by such a change in the driver's eye movement pattern. Interestingly, the link probability is another first-order statistical measure of the probability of a transition between two regions without regard to the direction of the transition. The link probabilities PUj can be derived from the joint probabilities p(j of the Markov transition matrices, although the converse is not true. The two measures are related by the following equation:
Markovian analysis of driver eye movements Liu, Veltri and Pentland (1996) used a Markovian analysis to examine whether characteristic fixation patterns for drivers could be identified (i.e., is there a unique Markov transition matrix which describes the eye fixation sequences of a driver for a given driving situation or task?). If driving is considered as a combination of a number of "basis" situations (i.e., one-task situations), then the Markov transition matrix for the general driving situation should be predicted by a linear combination of the Markov transition matrices characterizing the "basis" situations. Mathematically, the new first-order Markov matrix M, could be computed as the sum of n elemental Markov matrices Ml as follows:
where a{ represents a weighting parameter for the ith basis driving situations. If this relationship holds, it suggests that drivers do not create a new fixation strategy for each new situation, but rather, combine existing strategies according to the perceptual or cognitive resources available. An experiment to investigate this hypothesis was carried out on the Nissan CBR simulator to control the visual stimuli as much as possible. Eye movement data was collected at 120 Hz using a head-mounted ISCAN (Burlington, MA, USA) head/ eye tracking system. Four subjects with more than five years experience drove on a simulated single-lane road composed of alternating straight and curved sections with no road markings under two conditions. In one condition, drivers were instructed to simply stay in the middle of the road. Their speed was kept constant throughout the trial. In a second condition, drivers were instructed to follow a lead car at a comfortable distance of their own choosing but also to keep the lead car always in view. The speed of the lead car was fixed but the subjects controlled their own speed.
The driver's eye
439
Table 1 Description of regions used in Markovian analysis Region
Description
1
Near preview: current view point to 1 s ahead or up to tangent point
2
Middle preview: 1-2 s ahead or 1° around tangent point
3
Far preview: 2 s or tangent point to end of segment
4
Next segment ahead
5
Left side of road
6
Right side of road
7
Car ahead
8
All other samples/fixations
Fixation definition The visual scene was subdivided into eight non-overlapping regions whose exact boundaries changed with car speed and the road geometry: four preview distances along the road, either side of the road, the tangent point, and the lead car (Table 1). Any consecutive string of raw gaze samples from a single region was labelled as a "fixation" if the duration of the consecutive string was greater than 50 ms (i.e., the number of consecutive samples from one region was greater than six). If two sample strings in the same region were separated by one or two samples (<16.6 ms) in a neighbouring region, the samples from the neighbouring region were relabelled and the samples were combined into one long string. Using this classification scheme, it was not possible to detect fixation transitions within the same region. Zero-order analysis To ensure that the drivers were making realistic eye movements in the simulator, we compared our results with those from a previous on-road study (Olson et al., 1989). In this study, six male subjects of ages between 20 and 34 years drove on a one mile rural road consisting of a straight segment followed by three curves (approximately 90°). Speed was controlled by the driver and was approximately 30 mph. The route was driven in both directions, under both day and night conditions and with and without a lead car. The drivers were not given specific instruction to follow the lead car, but the lead car adjusted its speed to stay 200-300 ft ahead. Direct comparisons between studies are difficult because of different road and traffic conditions, as well as the different regions specified for fixation analysis. The regions from both Liu et al. (1996) and Olson et al. (1989) were combined into the
A. Liu
440
Table 2 A comparison of M0 for a simulator (Liu et al.) and on-road study (Olson et al.) Region
With lead car
No lead car Liu et al. (1996)
Olson et al. (1989)
Liu et al. (1996)
Olson et al. (1989)
Road ahead
0.31
0.24
0.17
0.18
Far field
0.37
0.25
0.35
0.02
Left of road
0.18
0.15
0.09
0.14
Right of road
0.12
0.16
0.04
0.14
Car interior
0.00
0.03
0.00
0.02
Lead car
—
—
0.34
0.37
Note: The regions listed are combinations of smaller regions from the individual studies to make comparison possible.
six regions listed in Table 2 to facilitate quantitative comparison. The "Road ahead" region is the road surface 100-300 ft ahead of the car or <2 s preview. The "Far field" covers the road >300 ft ahead or >2 s preview. The other four categories are self-explanatory. Data represents all subjects and all speeds combined for straight segments. The fixation probabilities for two driving conditions (driving on an open road, and following behind a lead car) were comparable. In both cases, the presence of a lead car resulted in a significant shift of fixations to the lead car. In the simulator, the drivers shifted their fixations from the sides and just ahead to the lead car. In Olson et al., there were fewer fixations to the road ahead and the far field. This difference might be explained by the relative position of the lead car, which tended to be further ahead in the simulated driving. First-order analysis To study the sequences, we separated the data into four different situations: (1) Driving on straight segments, (2) driving on curved segments, (3) following a lead car on straight segments, and (4) following a lead car on curved segments. The first two situations represent the most basic driving task of lane keeping. From the literature, the first-order transition matrices should indicate significant transitions between near and far regions to acquire preview information and perhaps transitions from side to side for lateral control. The latter two tasks represent a more complex scenario where both lane keeping and car following are being executed. In these cases, significant transitions should occur to and from the lead vehicle. If the eye movement behaviour of the tasks is indeed additive, then the transition matrices of
The driver's eye
441
Table 3 First-order Markov matrices M, for driving on a straight segment with and without a lead car Previous region
Current region
1
2
3
4
5
6
0.89
0.00
0.05
0.06
0.00
0.00
0.52
0.09
0.09
0.03
0.00
0.07
0.30
0.13
0.00
0.44
0.40
0.00
0.13
0.00
7
8
No lead car
1 2
0.27
3
0.00
0.50
4
0.02
0.06
0.08
5
0.05
0.11
0.34
0.37
6
0.03
0.05
0.17
0.50
0.00
0.25
0.00
7 8
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.32
0.00
0.13
0.00
0.00
0.55
0.00
0.18
0.09
0.01
0.01
0.52
0.00
0.05
0.11
0.02
0.70
0.00
0.38
0.09
0.42
0.00
0.06
0.27
0.00
0.14
0.00
With lead car
1 2
0.19
3
0.00
0.12
4
0.05
0.02
0.04
5
0.00
0.02
0.20
0.45
6
0.00
0.04
0.29
0.39
0.14
7
0.10
0.20
0.39
0.24
0.06
0.01
8
0.00
0.00
0.00
0.00
0.00
0.00
0.00 0.00
Note: The italicized cells are transitions in which the frequency of occurrence is much greater than predicted from M0. The regions are described in Table 1.
the car following situations should also contain the significant transitions associated with lane keeping. By following the highest transition probabilities in each row of M, the most likely sequences of fixation were determined. In straight segments (Table 3), the expected "preview" and "side-to-side" patterns were found. The preview pattern (seen in regions 1-3) moved entirely up and down the roadway ahead of the car. The side-to-side pattern was partially a preview pattern in that highly likely transitions went from a region on the roadway ahead of the car to either side with almost equal
442
A. Liu
probability and then back to the preview region (regions 4-6). For curved segments, a similar side-to-side pattern was present, but not a preview pattern. The most likely transition in curves is to a distant preview region followed by a transition to either side and from there back to the preview region. When a lead car is present, most of the transitions are to the lead car. On straight segments however, the two basic patterns are still evident. The preview pattern is approximately one-third as likely to occur while the side-to-side pattern is almost undiminished. On curves, the side-to-side pattern is slightly less likely, whereas transitions between the lead car and the far preview region are most probable. To distinguish whether the patterns are different from a pattern predicted by the zero-order matrix MQ, we compared the observed first-order transition frequency matrix F, with the transition frequency matrix F,R which is predicted by MQ. To obtain F,R we first calculate the expected transition probability matrix M R by first taking the outer product of MQ and M J MR = M 0 x M j Since separate fixations within a single region are not identified, the transition probabilities atj of M R are renormalized as follows
Finally, the expected frequency for the random transitions f* of F,K is given by
A transition was flagged as significant according to the following criterion
If one assumes that the number of transitions has a Poisson distribution, then this is roughly equivalent tof- being more that two standard deviations from the mean f*. The results of this "significance" test when driving on straight segments is shown in Table 4. With no lead car present, the frequency of transitions is significant among regions 1-3 (preview pattern) and regions 4-6 (side-to-side pattern). The
The driver's eye
443
Table 4 Transitions with greater than expected occurrences in two driving conditions Previous region
Current region
1
2
3
4
5
6
+
0 +
0 0
0
0
0
0
0
0
0
0
0
0
+
+
0
0
0
7
8
No lead car
1 2
+
3
0
+
4
0
0
0
5
0
0
0
+
6
0
0
0
+
0
0
0
0
0
0
0
+
0
0
0
0
©
0
0
0
0
0
©
0
0
0
0
©
0
+
+
0
0
0 0
0 0 0
0
7 8 With lead car
1 2
+
3
0
0
4
0
0
0
5
0
0
6
0
0
0 0
+ +
0
7
0
0
©
0
0
0
8
0
0
0
0
0
0
0
Note. The + symbols indicate significant transitions when no lead car is present. The © symbol indicates significant transitions when following another car.
addition of the car following task adds a pattern of significant transitions to the lead car emanating from region 7 but the other two patterns are mostly preserved. This suggests that the additive model may be a reasonable approximation for the driver's fixation behaviour.
444
A. Liu
Extending the Markov model for recognition of driver cognitive state In the preceding section the results show that given a specific driver state, it is possible to statistically analyze the eye fixation behaviour and determine characteristic scan patterns that are associated with the task. However, to implement a "smart car" as described at the beginning of the chapter, the inverse problem needs to be solved. That is, given an observed pattern of eye movements, is it possible to determine the driver state that produced it? In fact, this is one of the key problems which can be solved using hidden Markov models (HMMs) (Rabiner and Juang, 1986; Rabiner, 1989) They have been applied with great success in speech recognition and more recently, for recognition of other human actions, such as gestures (Stamer, 1995) or telerobotic manipulation (Hannaford and Lee, 1991; Yang, Xu and Chen, 1994; Yang, Xu and Chen, 1997). Unlike the simple Markov model assumed in the previous analysis, HMMs are a doubly stochastic process in which an underlying stochastic process can only be observed through another set of stochastic processes that produce the observed behaviour. For the simple Markov models, each state is associated with a single output (i.e., fixation in a particular region). With HMMs, the second stochastic process "hides" the state from direct observation. Each of these states generates some observable behaviour (changes in vehicle heading or acceleration, or possibly eye movements) that can be characterized by a set of probability distributions, typically Gaussian. Therefore, a fixation in a single region does not imply the current state of the observer since many states might produce such a fixation. Instead, the whole sequence of fixations must be examined to determine the underlying state. Hidden Markov dynamic models A conceptual variant of this approach, hidden Markov dynamic models (HMDMs), has been applied to the problem of recognizing driver intentions (Pentland and Liu, 1995; Liu and Pentland, 1997). In this approach, the driver is considered to be a Markov device with a (possibly large) number of internal mental states. Each of these states has its own characteristic control behaviour and a set of interstate transition probabilities. For driving, the states might include the centering of the car in the current lane, checking whether the adjacent lane is clear, steering to initiate a change in heading, and centering the car in the new lane. Together, the sequence of these internal states comprises a lane change. Figure 1 shows a three-state model of a driving manoeuvre with both recurrent transitions and transitions to the following state. Of course, this is not to suggest that human drivers operate in a stochastic manner (at least not everyone!), but the HMDM framework provides a rich set of mathematical tools with which to perform this recognition task.
The driver's eye
445
Fig. 1. The parameters of the HMDM are estimated from a training set using the Baum-Welch re-estimation formulae. Each example in the training set is a time series of observed variables (e.g., gaze location or vehicle heading) that was collected as the particular task was executed. A separate HMDM must be constructed for each driving manoeuvre.
HMDM parameter estimation Using HMDMs for real-time recognition of driving maneuvers is a two-stage process. First, the parameters of the HMDM for a single manoeuvre are recursively estimated from a training set of examples of the manoeuvre (Fig. 1). Each example is an evenly sampled time series of the parameters which characterize the driver (e.g., vehicle heading or gaze direction) during a single instance of the manoeuvre. The training set will have examples of varying duration since humans drivers cannot perform each instance of a manoeuvre exactly the same. Initial estimates of the mean and variance of each state output distribution are made by segmenting the training examples evenly among the states. Next the maximum likelihood state sequence of this HMDM is generated using a procedure such as the Viterbi algorithm (Rabiner and Juang, 1986; Rabiner, 1989) which determines a new segmentation of the observation vector. The parameters of the HMDM are reestimated and this is repeated until the changes in the parameter estimates are less than some threshold.
446
A. Liu
Fig. 2. The conditional probability Pr(OI M,) that model M(. generated the observation vector O is computed simultaneously for all models using the Viterbi algorithm. The driving manoeuvre modelled by the M, which reaches an acceptable likelihood first is assumed to be the task that the driver intends to perform.
Recognition of actions In the recognition phase, the likelihood PrXOlM,) of each HMDM having generated an observed pattern of behaviour O is performed by the Viterbi algorithm (Fig. 2). The observation vector O is also a time series of sampled parameters like the training examples described previously, although it does not necessarily represent a complete manoeuvre as in the case of the training examples. As O increases over time, the likelihood of the HMDM modelling the manoeuvre being executed will continue to increase while the other models increase less or even decrease. The driving manoeuvre modelled by the HMDM reaching the acceptable likelihood threshold first is considered to be the task intended to be executed by the driver. Once an observed pattern of behaviour is recognized, it can be used to re-estimate the parameters of the respective driving manoeuvre HMDM, which essentially "tunes" the models to that particular driver. Recent results Recent efforts using this approach to recognize driving maneuvers from steering actions and acceleration have produced encouraging results (Liu and Pentland, 1997). In recognition tests on the training data, the HMDMs successfully recognized their respective driving tasks with greater than 90% accuracy using only the first half second of data from the manoeuvre. Accuracy improved slightly with
The driver's eye
447
longer segments of data. In a test of real-time recognition of a driver's actions in the simulator, accuracy was up to 60% (three times chance) within three seconds of the initiation of the manoeuvre. Eye movement behaviour and HMDMs Incorporating eye movement behaviour into the HMDMs of the maneuvers seems to be a natural addition. Recall that drivers typically make eye movements to a curve roughly 2-3 s before entering the curve. Also, eye movements to the side-mirrors typically precede lane changes and passing maneuvers. Using eye movement information might allow the car to predict the upcoming manoeuvre, rather than performing the recognition after the task has been initiated as in the studies described above. The primary consideration for including driver eye movements in HMDMs is how the eye movement behaviour will be coded (e.g., by gaze locations or by gaze or saccade direction). Gaze location seems to be the natural choice given that characteristic patterns of driver eye movements have been identified using a Markovian analysis of gaze location (Liu et al., in press). Unlike the Markovian analysis based on fixation location, it is possible to incorporate information similar to fixation duration into the HMDMs based on gaze location. Fixations will be characterized by high transition probabilities in the diagonal of M,, indicating recurrent transitions to the same location. Thus states that drive gaze to similar regions but with different durations in those regions due, perhaps, to context dependent information content (see Chapter 12) will have very different transition probability matrices. This approach also requires the system to perform real-time segmentation of the driving environment in order to classify the gaze location. This appears to be technically feasible given the current state of vision-based autonomous vehicles. For various projects, the forward scene has been segmented into the roadway, roadside, lanes, tangent point regions, and even other vehicles (e.g., Raviv and Herman, 1994; Weber et al., 1995). Information from global positioning systems (GPS) and digital maps could further aid the segmentation of the scene. Furthermore, some level of calibration will be needed to register the eye movements to the visual scene acquired by the smart car. A brief procedure where the driver looks at two or three known points in the vehicle (e.g., the rear view mirror or speedometer) might suffice. Another possibility might be to use gaze or saccade direction, which lessens the needs for a complicated calibration procedure. Establishing a single fixation direction as a frame of reference may be sufficient, as long as the eye tracker's operating characteristics remain constant between uses. The exact spatial location may not be as informative as the distribution of gaze over time in terms of determining the
448
A. Uu
underlying strategy or intention of the driver. If the distribution is spatially fixed for a long time, then the driver may be fixating on an object moving at similar speed, such as a lead car during car following. A single distribution to one side or the other may indicate an upcoming curve. A bi-modal distribution may indicate normal lane keeping as the driver moves his/her gaze from a nearby location on the roadside to a preview position ahead of the car. Transition matrix analysis of saccade direction has proven to be a useful measure of visual search strategy in video display tasks (Ponsoda, Scott and Findlay, 1995). However, the road environment seen by the driver is constantly changing as she is driving and the surrounding traffic is constantly changing as well. Only the parts of the car which might be viewed by the driver (i.e., the mirror or instruments) remain reasonably fixed with respect to the driver's eye position. Thus it might be quite difficult to deduce the nature of the visual space from such low-level information, even with additional information provided from GPS or digital maps. Other considerations One interesting problem is the case of driving at night. Under these conditions, driver eye fixations tend to be more concentrated in the area ahead of the car (Olson et al., 1989) which could results in significantly different patterns of eye movement. HMDMs trained on examples of daytime behaviour may have difficulty recognizing behaviour at night. The easiest solution would be to train night-time HMMs and switch to using these models when the driver turns on the headlights. But it is also possible that some behaviour, such as looking at the mirrors before passing, will not change significantly such that the models will work equally well in daytime or nighttime. It could even be argued that since there is less to see at night (i.e., fewer stimulus-driven eye movements) that the gaze patterns might be more homogeneous across drivers and thus, improve recognition of driving maneuvers. There are a number of other combinations of conditions which could alter eye movement behaviour, including rain, glare on the road, fog, etc. More studies are needed to understand how eye movement behaviour changes under these conditions. Hopefully, the studies will indicate that separate models for every condition will not be required and that the parameters of a basic model might simply be adjusted by a "gain" factor to fit the current condition. Also, it must be hoped that basic eye movement and control behaviour is fairly general so that a single set of HMDMs will work reasonably well across drivers. The evidence from the literature seems to support this claim but individual differences maybe large enough that a self-learning system will be needed to tune recognition to individual drivers.
The driver's eye
449
Summary In this chapter, I have explored the possibility of using the link between eye movements and the underlying mental processes for improving driving comfort and safety. The main question is whether the mental state or intentions of the driver can be inferred from eye movements before an action or manoeuvre takes place. The numerous examples of characteristic patterns of eye movements associated with a particular mental state such as pattern recognition or with a particular driving situation strongly suggest that this is a reasonable possibility. In particular, the Markovian approach to characterizing eye movement behaviour seems able to model the eye movement behaviour that can be associated with a particular task. A similar statistical approach used to model human drivers, the HMDM, provides a set of mathematical tools with which to actually identify the mental state most likely to be associated with an observed pattern of behaviour. The results from preliminary experiments using HMDMs without utilizing eye movements are promising. The addition of eye movement analysis should enhance the system performance by enabling recognition of driver intentions rather than just recognition of manoeuvres as they begin. Although the wide variety of potential driving circumstances may impede the implementation of a generally smart car (e.g., what happens to eye movements while driving at night?), the general outlook for success seems optimistic. References Antes, J.R. (1974). The time course of picture viewing. Journal of Experimental Psychology, 103, 62-70. Antin, J.F., Dingus, T.A., Hulse, M.C. and Wierwille, W.W. (1990). An evaluation of the effectiveness and efficiency of an automobile moving-map navigation display. International Journal of Man-Machine Studies, 33, 581-594. Biederman, I., Rabinowitz, J.C., Glass, A.L. and Stacy, E.W. (1974). On the information extracted from a glance at a scene. Journal of Experimental Psychology, 103(3), 597-600. Buswell, G.T. (1935). How People Look at Pictures. Chicago: University of Chicago Press. Cohen, A.S. and Studach, H. (1977). Eye movements while driving cars around curves. Perceptual and Motor Skills, 44(3), 683-689. Ellis, S.R. and Stark, L. (1978). Eye movements during the viewing of Necker cubes. Perception, 7, 575-581. Ellis, S.R. and Stark, L. (1986). Statistical dependency in visual scanning. Human Factors, 28(4), 421-438. Eriksen, C. W. and James, J.D.S. (1986). Visual attention within and around the field of focal attention: A zoom lens model. Perception & Psychophysics, 40(4), 225-240. Goldberg, J.H. and Schryver, J.C. (1995). Eye-gaze determination of user intent at the computer interface. In: J.M. Findlay et al. (Eds.), Eye Movement Research. Amsterdam: Elsevier, pp. 491-502.
450
A. Liu
Gordon, D.A. (1966a). Experimental isolation of drivers' visual input. Public Roads, 33(12), 53-68. Gordon, D.A. (1966b). Perceptual basis of vehicular guidance. Public Roads, 34(3), 53-68. Groner, R., Walder, F. and Groner, M. (1984). Looking at faces: local and global aspects of scanpaths. In: A.G. Gale and F. Johnson (Eds.), Theoretical and Applied Aspects of Eye Movement Research. Amsterdam: Elsevier, pp. 523-533. Gross, M. and Feldman, R. (1995). National Transportation Statistics: 1995 (DOT-VNTSCBTS-94-3). Washington, DC: U.S. DOT/Bureau of Transportation Statistics. Hannaford, B. and Lee, P. (1991). Hidden Markov model analysis of force/torque information in telemanipulation. International Journal of Robotics Research, 10(5), 528- 539. Harris, C.M. (1993). On the reversibility of Markov scanning in free-viewing. In: D. Brogan (Ed.), Visual Search 2. London: Taylor & Francis, pp. 123-135. Jacob, R.J.K. (1991). The use of eye movements in human-computer interaction techniques: What you look at is what you get. ACM Transactions on Information Systems, 9(3), 152-169. Jurgensohn, T., Neculau, M. and Willumeit, H.P. (1991). Visual scanning pattern in curve negotiation. In: A.G. Gale et al. (Eds.), Vision in Vehicles HI. Amsterdam: Elsevier/North-Holland, pp. 171-178). Klein, R., Kingstone, A. and Pontefract, A. (1992). Orienting of visual attention. In: K. Rayner (Ed.), Eye Movements and Visual Cognition. New York: Springer-Verlag, pp. 46-65. Land, M.F. and Horwood, J. (1995). Which part of the road guides steering? Nature, 377, 339-340. Land, M.F. and Lee, D.N. (1994). Where we look when we steer. Nature, 369, 742-744. Liu, A. and Pentland, A. (1997). Towards real-time recognition of driver intentions. Proc. of the IEEE Intelligent Transportation Systems Conference, Boston, MA. Liu, A., Veltri, L. and Pentland, A. (1996). Modeling changes in eye fixation patterns while driving. (Technical Report CBR-TR96-1). Cambridge, MA: Nissan Cambridge Basic Research. Locher, P.J. and Nodine, C.F. (1974). The role of scanpaths in the recognition of random shapes. Perception & Psychophysics, 15(2), 308-314. Mackworth, N.H. and Morandi, A.J. (1967). The gaze selects important informative details within pictures. Perception & Psychophysics, 2(11), 547-552. McDowell, E.D. and Rockwell, T.H. (1978). An exploratory investigation of the stochastic nature of the drivers' eye movements and their relationship to the roadway geometry. In: J.W. Senders et al. (Eds.), Eye Movements and the Higher Psychological Functions. Hillsdale, NJ: Erlbaum, pp. 329-345. Mourant, R.R. and Donohue, R.J. (1977). Acquisition of indirect vision information by novice, experienced, and mature drivers. Journal of Safety Research, 9(1), 39-46. Mourant, R.R. and Rockwell, T.H. (1970). Mapping eye-movement patterns to the visual scene in driving: an exploratory study. Human Factors, 12(1), 81-87. Mourant, R.R. and Rockwell, T.H. (1972). Strategies of visual search by novice and experienced drivers. Human Factors, 14(4), 325-335. Nakayama, K. and Shimojo, S. (1992). Experiencing and perceiving visual surfaces. Science, 257, 1357-1363.
The driver's eye
451
Noton, D. and Stark, L. (197la). Scanpaths in eye movements during pattern perception. Science, 171,308-311. Noton, D. and Stark, L. (1971b). Scanpaths in saccadic eye movements while viewing and recognizing patterns. Vision Research, 11, 929-942. Olson, P.L., Battle, D.S. and Aoki, T. (1989). Driver eye fixations under different operating conditions (Technical Report UMTRI-89-3). Ann Arbor, MI: Univ. of Michigan Transportation Research Institute. Pentland, A. and Liu, A. (1995). Towards augmented control systems. Proc. of the 1995 Intelligent Vehicles Symposium, Detroit, MI, pp. 350-355. Ponsoda, V., Scott, D. and Findlay, J.M. (1995). A probability vector and transition matrix analysis of eye movements during visual search. Acta Psychologica, 88, 167-185. Posner, M.I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3-25. Potter, M.C. (1975). Meaning in visual search. Science, 187, 965-6. Rabiner, L.R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. of the IEEE, 77(2), 257-286. Rabiner, L.R. and Juang, B.H. (1986). An introduction to hidden Markov models. IEEE ASSP Magazine, 3(1), 4-16. Raviv, D. and Herman, M. (1994). A unified approach to camera fixation and vision-based road following. IEEE Transactions on Systems, Man, and Cybernetics, 24(8), 11251141. Redelmeier, D. and Tibshirani, R. (1997). Association between cellular-telephone call and motor vehicle accidents. The New England Journal of Medicine, 336(7), 453-8. Rizzolatti, G., Riggio, L. and Sheliga, B. M. (1994). Space and selective attention. In: C. Umilta and M. Moscovitch (Eds.), Attention and Perception XV. Cambridge, MA: MIT Press, pp. 231-265. Serafin, C. (1994). Driver eye fixations on rural roads: insights into safe driving behavior (Technical Report UMTRI-94-21). Ann Arbor, MI: Univ. of Michigan Transportation Research Institute. Shinar, D., McDowell, E.D. and Rockwell, T.H. (1977). Eye movements in curve negotiation. Human Factors, 19(1), 63-71. Stark, L. and Ellis, S.R. (1981). Scanpaths revisited: Cognitive models direct active looking. In: D.F. Fisher et al. (Eds.), Eye Movements: Cognition and Visual Perception. Hillsdale, NJ: Erlbaum, pp. 193-226. Starker, I. and Bolt, R.A. (1990). A gaze-responsive self-disclosing display. Proc. of the CHI '90, Seattle, WA, 3-9. Starner, T.E. (1995). Visual recognition of American sign language using hidden Markov models (Perceptual Computing Group Technical Report #316). Cambridge, MA: MIT Media Laboratory. Veltri, L. (1995). Modelling eye movements in driving. Unpublished Master's Thesis, Massachusetts Institute of Technology, Cambridge, MA. Viviani, P. (1990). Eye movements in visual search: cognitive, perceptual and motor control aspects. In: E. Kowler (Ed.), Eye Movements and Their Role in Visual and Cognitive Processes, Vol. 4. Amsterdam: Elsevier, pp. 353-393. Weber, J., Koller, D., Luong, Q.T. and Malik, J. (1995). New results in stereo-based auto-
452
A. Liu
matic vehicle guidance. Proc. of the IEEE Intelligent Vehicles Symposium, Detroit, MI, 530-535. Wierwille, W.W. (1981). Statistical techniques for instrument panel layout. In: J. Moraal and K.F. Kraiss (Eds.), Manned Systems Design. New York: Plenum Press, pp. 201-218. Wohl, J.G. (1961). Man-machine steering dynamics. Human Factors, 3, 222-228. Yang, J., Xu, Y. and Chen, C.S. (1994). Hidden Markov model approach to skill learning and its applications to telerobotics. IEEE Transactions on Robotics and Automation, 10(5), 621-631. Yang, J., Xu, Y. and Chen, C.S. (1997). Human action learning via hidden Markov model. IEEE Transactions on Systems, Man, and Cybernetics, Pt. A, 27(1), 34-44. Yarbus, A. (1967). Eye Movements and Vision. New York: Plenum Press.
453
Author Index Abramov, I. 50 Aglioti, S. 312 Albrecht, I.E. 74 Altman, G. 97 Altmann, G.T.M. 74 Ames, C.T. 311 Anderson, J. 241 Andriessen, JJ. 122 Anstis, S.M. 309 Antes, J.R. 291,334,449 Antin, J.F. 449 Anton, S. 265 Aoki,T. 451 Ashkari, N. 336 Atkinson, R.C. 263 Badcock, D. 221 Balota, D.A. 49, 52, 74, 99, 145, 196, 197, 263, 266, 334 Bassi, C.J. 239 Battle, D.S. 451 Baudouin, V. 25, 263 Bayle, E. 122 Baylis, G.C. 309 Beauvillain, C. 25, 263 Becker, C.A. 197 Becker, W. 25, 98, 145, 263 Benett, S.C. 336 Bergen, J.R. 309 Berlucchi, G312 Berry, G. 240 Bertera, J.B. 53 Bertera, J.H. 176, 221, 266, 267, 351 Bever, T.G. 74 Biederman, I. 334, 449 Binder, K.F. 49 Binello, A. 309 Blanchard, H.E. 49, 145, 264, 309, 334 Blickle, T.W. 334
Bloomfield, R. 53 Bloomfield, R. 100, 221, 268 Bolozky,S.221,416 Bolt, R.A. 451 Booth, J.R. 264 Bouma, H. 122, 145, 239, 264, 309 Boyce, S.J. 334, 350 Brannan, J.R. 242 Breitmeyer, B.C. 239 Bretz, R. 367 Brians, K. 240 Bridgeman, B. 49 Briggs, R.P. 416 Briihl, D. 26,49, 50, 220 Brooks, V. 367 Brown, I. 415 Brown, I.D. 391 Brown, V. 309 Bruck, M. 239 Brugaillere,B.99,241,266 Brysbaert, M. 74, 122, 123, 145, 147, 177, 239 Burr, D.C. 239 Buswell, G.T. 122, 145, 292, 350, 449 Cain, K. 220 Campsall, J.M. 267 Carlson, M. 75, 198 Carpenter, P.A. 49, 51, 122, 123, 146, 175, 239, 265, 268 Carrasco, M. 309 Carreiras, M. 220 Carroll, J.B. 122 Carroll, P.J. 50 Carter, B. 241 Carter, I.D. 392 Cave, K.R. 312 Cavegn, D. 51 Chan, H.S. 309
454
Author Index
Chang, I. 309 Chapman, P. 27,415 Chapman, P.R. 391, 393,417 Chase, C.H. 239 Chelazzi, L. 312 Chen, C.S. 452 Christiaens, D. 292, 335 Christie, J. 335 Chumbley, J.I. 267 Churchland, P.S. 309 Clark, R.L. 392 Clegg, B.A. 391 Clews, S. 27, 53, 100, 221, 242, 268 Clifton, C. 26, 51, 74, 75, 175, 221, 240, 265, 267 Coeffe, C. 52, 145 Cohen, A.S. 391,415,449 Cohen, K.M. 415 Cole, B.L. 391 Collewijn,H.49,310 Collins, W.E. 351 Cooper, S. 27 Corcos, E. 242 Cornelissen, P. 239 Cornsweet, T.N. 122 Courtney, A.J. 309 Grain, S. 74, 75, 197 Crane, H.D. 122 Crick, J.L. 392 Crundall, D. 27 Crundall, D.E. 393,417 Cunitz, R.J. 49
De Troy, A. 335 De Valois, R.L. 350 De Valois, K.K. 350 Dember,W.N. 241 Dennis, Y. 74, 97 Desimone, R. 309 Deubel, H. 49 deVoogd, A.H. 122, 145, 264 DeWitt M.J. 336 Dingus, T.A. 449 Dixon, R. 392 Dodge, R. 145 Donges, E. 429 Donnelly, N. 311 Donohue, R.J. 392,450 Dore, K. 25, 263 Dougherty, T. 336 Downing, C.J. 335 Drasdo, N. 239, 240 Drislane, F.W. 241 Driver, J. 309 Duffey, S.D. 53 Duffy, S.A. 26, 75, 220 Duncan,!. 309,415 Dunn-Rankin, P. 264 Dyre.B.P. 51,99, 146
Eden, G.F. 239 Edgington, E.S. 175 Egeth, H.E. 309 Ehrlich, K. 49, 74, 264 Ehrlich, S.F. 145, 147, 266 Elander.J. 391 Elliott, J.M.G. 336 D'Entremont, B. 240 d'Ydewalle, G. 51, 123, 147, 177, 292, 293, Ellis, S.R. 449,451 Engel,F.L.415 335,336,351,352,367 Daneman, M. 49,98, 239 Engel,G.R. 309,310 Epelboim, J. 264 Dark, V.J. 335 Erdmann, B. 145 Dascola, I. 311 Erickson, D.J. 392 Dautrich, B.R. 239 Eriksen, C.W. 415,449 Davidson, B.J. 98 Erikson, B. 391 Davidson, M. 49,98 Davies, P. 122 Erkelens,C.J.49,310 Evans, B.J.W. 239, 240 De Graef, P. 292, 293, 335, 336, 351, 352
Author Index
Evans, L. 391 Everatt, J. 26, 27, 53, 100, 220, 221, 240, 242, 264, 268,417 Evert, D.L. 309 Farmer, M. 240 Farnham, J.A. 336 Farnham, J.M. 335 Farrell,M. 310 Fawcett, A. 240 Fawcett, R. 241 Feldman, R. 450 Ferreira, F. 26, 50, 74, 98, 145, 175, 197, 220,264,292,415 Findlay, J.M. 98, 145, 292, 309, 310, 311, 451 Fischer, F.W. 240 Fischer, M.H. 123, 147, 176, 266 Fisher, D.F. 123,221,267 Fisher, D.L. 311 Fitzgibbon, G. 241 Fodor, J.A. 264 Fodor, J.D. 197 Folk, J.R. 49 Forster, K.I. 197 Forsyth, E. 391 Fowler, S. 239 Francis, W.N. 175, 197, 264 Francolini, C.M. 334 Franzel, S.L. 312 Franzen, S. 26 Fraser, M. 393 Frazier, L. 53, 74, 75, 123, 198, 264 Freedman, S.E. 197 French, D. 391 Frieder, K.S. 309 Friedman, A. 292, 335 Fry, G.A. .415 Galaburda, A.M. 241 Garnham, A. 74, 97, 220 Garrod, S. 49, 220 Geiger, G. 240, 310 Gelade, G. 312 Germeys, F. 367
455
Gilchrist, I.D. 310 Gillund, G. 264 Glass, A.L. 449 Godard, J.-L. 367 Godthelp, H. 429 Goldberg, J.H. 449 Goolkasian, P. 240 Gordon, D.A. 450 Gormican, S. 312 Grimes, J. 99, 310 Groeger, J. 415 Groeger, J.A. 391 Groner, M. 450 Groner, M.T. 350 Groner, R. 51,350,450 Gross, M. 450 Grosser, G.S. 240 Hagenzieker, M.P. 393 Hainline, L. 50 Haith, M.M. 415 Hannaford, B. 450 Harris, C.M. 50,450 Harvey, L.O. 311 Hawkins, H.L. 335 Hawley, K.J. 26, 335, 336 Helander, M. 391 Hella, F. 415 Heller, D. 50, 52, 53, 98, 99, 100 Hemforth, B. 75 Henderson, J.M. 26, 50, 98, 145, 175, 197, 220, 241, 264, 292, 310, 335, 350, 351, 415 Hendricks, A. 50 Hengeveld, W. 27 Herdman, C.M. 267 Herman, M. 451 Heywood, C.A. 310 Hillyard, S.A. 335 Hintzman, D.L. 264 Hochberg, J. 123, 146, 264, 367 Hockey, G.R.J. 311 Hoffman, J.E. 310 Hoffmann, E.R. 392
456
Author Index
Hoffmann, J. 336 Hofmeister, J. 53, 98, 100 Hogaboam, T.W. 50, 51, 264 Hollingworth, A. 292, 335 Holmes, D.L. 415 Hopkinson, P. 336 Horberg, U. 391 Horwood, J. 391, 415, 429, 450 Huey, D. 309 Huey, E.B. 146 Hughes, P.K. 391 Hulse, M.C. 449 Humphreys, G.W. 309, 311 Hyona, J. 26, 50,98, 146, 220, 221, 240, 242, 264 Ikeda, M. 265, 293, 310, 351,415 Inhoff, A.W. 26, 49, 50, 51, 52, 53, 74, 98, 100, 123, 124, 146, 147, 175, 176, 177, 220,221,266,268,293,351 Ishida, T. 265 Jackson, M. 240 Jacob, R.J.K. 450 Jacobs, A.M. 52, 99, 123, 176, 241, 265, 310 James, J.D.S. 449 Jaschinkski, W. 52 Jennings, F. 74 Johnston, W.A. 26, 335, 336 Jorgeson, C.M. 416 Juang,B.H.451 Julesz, B. 309 Juola, J.F. 263 Jurgens, R. 263 Jurgensohn, T. 450 Just, M.A. 49, 50, 122, 123, 146, 175, 265, 268 Kaluger,N.A.415 Kapoula, Z. 98 Katz, S.M. 309 Kaufman, L. 351 Kello, C. 124, 177 Kempe, B. 52 Kempe, V. 100, 146, 266
Kennedy, A. 52, 74,98, 100, 176, 198, 220 Kennison, S.M. 26, 51, 74, 175, 240, 265 Ken-, J.S. 220 Kerr, P. 99 Kerr, P.W. 51, 52,98,99, 123, 124, 146, 176, 198, 265 Kerr, R.W. 176 King, J. 240 Kingstone, A. 310,450 Klatsky, G.J. 334 Klein, G.A. 123 Klein, R. 240, 310, 335,450 Kliegl, R. 98 Klitz, T.S. 265 Koch, C. 292 Koller, D.451 Konieczny, L. 51, 75 Kowler,E.51,265 Kristjanson, A.F. 334 Kruk, R.S. 242 Krummenacher, J. 53, 100 Krupat, E. 392 Krupinski, E.A. 52 Kucera,H. 175,197,264 Kuiken, M.J. 392 Kundel, H.L. 52 Kurlowski, F. 123 Lamote, C. 351 Land, M.F. 26, 391,415,429,450 Lartigue, E.K. 242 Laya, O.415 LeCluyse, K. 242 Lee, D.N. 26, 391,416,429,450 Lee, P. 450 Lee, P.N.J. 416 Legein, Ch.P. 239 Legge, G.E. 265 Lehmkuhle, S. 239 Leibowitz, H.W. 416 Lesch, M. 52, 146, 198, 266 Lesevre, N. 123 Lester, J. 391,392 Lettvin,J. 240, 310 Levi, D.M. 312
Author Index
Levinson, H.N. 240 Levy-Schoen, A. 99, 123, 241, 265, 266 Liberman, I.Y. 240 Liebelt, L.S. 292, 335 Lima, S.D. 50, 51, 175 Lindsey, D.T. 311 Liu, A. 391,416,450,451 Liu, W. 50 Liversedge, S.P. 75, 197 Livingstone, M.S. 241 Locher, P.J. 450 Lockwood, C.R. 392 Loftus, G.R. 292, 336 Loftus, G.R. 351 Loschky, L. 351 Lovegrove, W.J. 239, 241, 242 Lucas, P.A. 51 Luck, S.J. 335 Luoma, J. 392 Luong, Q.T. 451 Mace, D.G.W. 392 Mackworth, N.H. 292, 336, 450 Mahoney, J.V. 292 Maisog, J.M. 239 Malik, J. 451 Mannan, S. 309 Marks, W. 336 Martin, F. 241 Mascelli, J.V. 367 Mason, A. 239 Matin, E. 292 May, J.G. 242 Maycock, G. 391, 392 Mayzner, M.S. 220 McClelland, J. 176, 240 McClure, K.K. 351 McConkie, G.W. 51, 98, 99, 123, 124, 146, 176, 177, 197, 221, 264, 265, 266, 293, 309, 334, 336, 351 McDonald, J.E. 99, 198,266 McDowell, E.D. 392, 393, 416, 417, 450, 451 McFalls, E.L. 336 McKenna, P.P. 392
457
McLean, J.R. 392 McLeod, J. 241 McLoughlin, D. 241 Metzger, R.L. 334 Meyers, C. 239 Mezzanotte, R.J. 334 Miles, E. 241 Miles, T.R. 240,241 Miller, J.M. 392 Miltenburg, P.G.M. 392 Mitchell, D.C. 74, 145 Mittau, M. 53, 124, 268 Miura, T. 310, 392,416 Morandi, A.J. 292, 450 Morris, R. 49, 52, 53 Morris, R.E. 100 Morris, R.K. 26, 75, 146, 198, 221, 241, 265, 266, 267, 336 Morrison, F.J. 415 Morrison, R.E. 26, 51, 53, 99, 146, 147, 176, 197, 221, 221, 265, 266, 293, 351 Morrison, R.M. 267 Morrone, M.C. 239 Mortimer, R.G. 416 Mouloua, M. 335 Mourant, R.R. 392, 416, 450 Mozer, M.C. 176 Mueller, P.U. 51 Muller,H.J. 311 Murphy, L.A. 241 Murray, W.S. 52, 74, 75, 100, 176, 197, 198 Nakayama, K. 450 Nasanen, R. 311 Nattkempfer, D. 311 Nazir, T.A. 99, 123, 265 Neboit, M. 415 Neculau, M. 450 Neisser, U. 311 Newsome, S.L. 99, 198, 266 Ni, W. 75, 197 Nicolson, R. 240, 241 Niemi, P. 26, 50,98, 220, 221, 240, 242, 264 Nishimoto, T. 351 Nodine, C.F. 52, 450
458
Author Index
Morris, D.G. 197 Noton, D. 451 O'Brien, E.J. 49 O'Regan, J.K. 26, 52, 53,99, 123, 124, 145, 146, 147, 176, 177, 197, 241, 265, 266, 268,293 Oakhill, J. 220 Olbrei, I. 197 Oliva,A. 351 Olson, P.L. 451 Olson, R.K. 98 Paap, K.R. 99, 198, 266 Palermo, D.S. 241 Palmer,!. 311 Parkes, A.M. 26 Pashler, H. 311 Paterson, K.B. 75 Pelz, D.C. 392 Penland, J.G. 334 Pentland, A. 450,451 Pentland, A.P. 391,416 Perea, M. 52 Perry, A.R. 241 Peru, A. 312 Pezdek, K. 336 Pickering, M. 75 Pickering, M.J. 198 Pierce, S. 351 Pinkus, S.Z. 242 Plewe, S.H. 336 Plude, D. 334 Pollatsek, A. 26, 27, 49, 50, 52, 53, 75,99, 100, 123, 145, 146, 176,197,198, 220, 221, 241, 263,264, 265, 266,267, 292, 293, 334, 335, 336, 350, 351, 415, 416 Ponsoda, V. 451 Pontefract, A. 310,450 Posner, M.I. 26, 50,220, 311,451 Potter, M.C. 451 Poulton,E.C.99,351 Prinz,W. 311 Pynte, J. 52, 99, 100, 123, 176, 198, 241, 266
Quimby, A.R. 392 Rabbitt, P.M.A. 311 Rabiner, L.R. 451 Rabinowitz, J.C. 334,449 Rackoff,N.J.416 Radach, R. 26, 27, 50, 52, 53, 99, 100, 146, 176, 197, 266 Rahimi, M. 416 Ramachandran, V.S. 309 Raney, G.E. 53, 75, 100, 123, 147, 198, 221,267 Raviv, D. 451 Raymond, J. 241 Rayner, K. 26, 27,49, 50, 51, 52, 53, 74, 75,99,100,123,145,146 Rayner, K. 147, 175, 176, 197, 198, 220, 221, 241, 263, 264, 265, 266,267, 292, 293, 311, 334, 335, 336, 350, 351, 415, 416 Reddix, M.D. 99, 123, 146, 176, 265 Reddix,M.R. 51 Redelmeier, D. 451 Reichle.E.D. 221,267 Reid, C. 74 Reid, L. 393 Reingold, E.M. 49,98 Reisz, K. 367 Renge, K. 416 Rentschler, I. 311 Reynolds, K. 336 Richards, I.L. 239, 240 Richardson, A. 239 Richman, B. 122 Riemersma, J.B.J. 392 Riggio,L.311,451 Rizzolatti, G. 311,451 Robinson, D.A. 98 Robinson, G.H. 392 Rockwell, T.H. 392, 393,416,417,450, 451 Rosen, G.D. 241 Ross, J. 239 Rovamo, J. 311 Rowan, M. 197
Author Index
Ruddock, K.H. 309 Rumsey, J.M. 239 Ruthruff,E. 351 Rutley, K.S. 392 Ryder, L.A. 197
Shioiri, S. 351 Shiori, S. 293 Slaghuis, W.L. 242 Slowiaczek, M.C. 221 Slowiaczek, M.L. 53, 266, 267, 351 Smiley, A. 393 Smith, G.L., Jr. 415 Snowling, M. 242 Soderberg, S. 391 Solman, R.T. 242 Sommerville, F. 27 Spafford, C.S. 240 St. James, J.D. 415 Stacey, M. 392 Stacy, E.W. 449 Stainton, M. 49 Stanovich, K.E. 242 Staplin, L. 393 Stark, L. 27,449,451 Starker, I. 451 Starner, T.E. 451 Starr, M. 50 Steedman, M. 74 Stein, J. 239, 242 Steinman, R.M. 49, 51, 264 Stelmach, L.B. 267 Stevenson, B.J. 197 Steyvers J.J.M. 393 Strasburger, H. 311 Studach,H. 391,449 Subramaniam, B. 310 Suppes, P. 267 Sussmann, C. 99 Swensson, R.G. 312
Sabey, B.E. 393 Sacks, J.G. 241 Saida, S. 293, 351 Sanford, A. 220 Sato, S. 312 Schall, J.D. 311 Scheepers, C. 75 Schiepers, C. 147 Schilling, H.E.H. 267 Schmauder, A.R. 75, 221 Schmauder, A.R. 267 Schrock, G. 351 Schroyens, W. 122, 123, 145, 147, 177 Schryver, J.C. 449 Schustack, M. 147 Schvaneveldt, R.W. 99, 198, 266 Schwanenflugel, PJ. 147 Schwarting, I.S. 336 Schwartz, J. 26, 220 Schwarz, J. 50 Schyns, P.O. 351 Scott, D. 451 Sejnowski, T.J. 309 Serafin, C. 451 Sereno, A.B. 416 Sereno, S. 100, 176 Sereno, S.C. 53, 53, 75, 123, 147, 198, 221, 267 Sexton, B. 391 Shankweiler, D. 75, 197, 240 Takeuchi, R. 310, 415 Shapiro, K. 309 Tanenhaus, M.K. 124, 177 Tassinari, G. 312 Shapiro, S.I. 241 Taylor, E. 123 Shebilske, W. 123, 147 Taylor, H. 393 Sheinberg, D.L. 312 Sheliga,B.M. 311,451 Teitelbaum, R.C. 334 Theeuwes, J. 393 Shepherd, M. 311 Thibadeau, R. 268 Shiffrin, R.M. 264 Shimojo, S. 450 Thorn, D.R. 416 Shinar, D. 393,416,417,451 Thomson, M.E. 240, 242
459
460
Author Index
Thurston, G.L. 392 Wampers, M. 352 Wanat, S.F. 124 Tibshirani, R. 451 Wang, J. 50 Tjan, B.S. 265 Ward, R. 309 Toet,A. 312 Warm, J.S. 241 Topolski, R 177 Topolski, R. 50, 53, 124, 146, 147, 268, 293 Warner, J. 240 Watts, G.R. 392 Toto, L.C. 52 Weber, J. 451 Townsend, J.T. 312 Weeks, P.A., Jr. 292, 335 Traxler, M.J. 75, 198 Well, A. 53 Treisman, A. 312 Well, A.D. 53, 147, 221, 267, 336,416 Tresselt, M.E. 220 Wertheim,T. 312 Triggs, TJ. 416 West, R. 391 Trueswell, J.C. 124, 177 Whetstone, T. 336 Ullman, S. 292 Widdel,H. 312 Ullman,T. 311 Wierwille, W.W. 449,452 Umilta,C. 311 Williams, L.G. 312 Underwood, G. 26,27 Williams, L.J. 417 Underwood, G. 50, 53,75,98, 100, 220, Williams, M.C. 242 221, 240, 242, 264, 268, 393,417 Williams, P. 415 Underwood, J.D.M. 27 Willows, D.M. 242 Underwood, N.R. 177, 197, 242, 336 Willumeit, H.P. 450 Wohl, J.G. 452 Van Diepen, P.M.J. 293, 336, 351, 352 Wolfe, J.M. 312,336 Van Rijn, H. 27 Wolin, B.R. 220 Van Orden, G.C. 268 Wolverton, G.S. 51,98,99, 197, 264, 309, Van Rensbergen, J. 352 334, 336 Van Wijnendaele, I. 351 Woods, R.P. 239 Vanderbeeken, M. 367 Woodward, D.P. 335 VanMeter, J.W. 239 Wurtzel, A. 367 VanVoorhis, B.A. 335 Velluntino, F. 242 Xu, Y. 452 Veltri,L.391,416,450,451 Yang, J. 452 Virsu,V. 311 Vitu, F. 52, 53,99, 100, 122, 123, 124, 145, Yantis, S. 309 Yarbus, A.L. 293, 352,452 147, 176, 177, 197, 268, 293, 312 Yeh, Y. 415 Viviani, P. 312,451 Young, V. 241 Vochatzer, K.G. 335 Vonk, W. 27, 147 Zeffiro, T.A. 239 Zegarra-Moran, O. 240 Walder, F. 450 Zelinsky, G.J. 312 Walker, R. 310 Zola, D. 51, 98, 99, 123, 146, 176, 197, Wall, J.G. 392 242, 264, 265, 266, 309, 334, 336 Wallen, R. 27 Zwahlen, H.T. 393,417 Walsh, V. 242
461
Subject Index accident liability 370 acuity function 184 algorithms 56 apparent movement 20, 357 area V5 237 artificial scotoma 19 attention 6, 11,271,370,434 - covert 308 - covert view 300 - movement of 302 - relation between covert and overt 301 attention gradient 14 attention shifts 10, 16, 150 attentional capture 17, 328 attentional field 21 attentional movements 297 attentional regression 14 attentional theory of eye guidance in reading 127 attention-based processing models 14 attraction hypothesis 203 automatic orienting response 319 automatic word reading 235 automaticity 235 autonomous oculomotor control model 126 baseline effect 318 Baum-Welch re-estimation 445 binocular measurement 35 binocular stability 238 blink time 3 blinks 34 brief fixations 34 candidate generation 185 CELEX corpus 169 central foveal region 298 centre of fixation 232 centre of gravity 94 children 104
clutter 304, 308, 402 cognitive and peripheral search guidance theory 126 cognitive demand 397 cognitive processes in reading 3 comparison processes 187 comprehension 225 conjunction search 16 conspicuity area 298 context 223 contextual constraint 6, 132, 133 contingent display 18 contrast sensitivity 298 convenient viewing position 224 corneal reflection-pupil centre method 361 costs and benefits 301 covert attention 300, 308 covert movement of attention 16 covert visual attention 296 critical region 61 curve negotiation 398 dangerous traffic scenes, 20, 405 - judging 374 deadline assumption 151 deterministic inter-word strategy 130 deterministic scanning strategy 137 developmental dyslexia 233 difficulty 405 discrete control of saccades 92 disruption to processing 61 distribution of gaze shift latencies 323 drivers - cognitive state 444 - experienced 396, 400, 419 - eye movementsof 432, 436 - intentionsof 431, 433 - novice 396,400,411 - steering behaviour 426
462
Subject Index
driving 1,370,420 - difficulty in dangerous situations 375 — experience 21, 370 - in fog 420 - risk of accident 370 - training novice drivers 372 driving simulator 23 dual carriageways 402 durations of the first and second fixations 39 dynamic scenes 15, 376 dyslexia 9, 233 editing errors 357 effective perceptual span 150 exogenous attention 412 experience 419 experiment-specific strategies 196 extended optimal viewing position effect 138 extrafoveal discrimination 317 extrafoveal information 15 extrafoveal processing 323 eye and mind 56 eye fixations 361,434 eye movement control 202, 270 - effects of reading ability on 223 eye movement records 55 eye movements 270, 357 - Markovian analysis of 435 - pattern of 55 - search with 303 - vergence 308 eye-head tracking 419 eye-mind and immediacy assumption 37 eye-movement-contingent display changes 47 E-Z Reader 11, 152, 160, 255-258, 260 face search 306, 307 familiar sink-in 318 familiarity 149 feature conjunction 297, 304 feature integration theory 296, 303, 304
feedforward 419 film cuts 357 film perception 357 Finnish readers 225 first fixation duration 58, 69, 144, 183, 190, 193,250,279,321 first fixations 40 first gaze duration 321 first pass gaze duration 279 first pass reading time 58, 121, 192, 193 first-order Markov matrix 435, 441 fixation density 271 fixation duration 32, 55, 104, 162, 185, 271, 378,447 - and saccade size 195 fixation location 104 fixation patterns 437 fixation position 271 - measures 43 fixation precedence 314 fixation transition matrix 24 fixations 223, 270,420,423 flashbacks (in movies) 358 flicker threshold 238 focus of expansion 371, 398,411 focus operator 55 Formal Editing Principle 358 forward gaze durations 42 fovea 203 foveal 149 foveal effects 190 foveal information 184 foveal information acquisition 341 foveal load 201 foveal processing load 9 foveal proximity effect 305 foveal vision 232 French subjects 224 functional field of view 407,410 functional visual field 300 ganglion cells 236 garden path 59 Gaussian distribution 139
Subject Index
gaze duration 3, 40, 58, 70, 157, 165, 224, 250 gaze location 447 gaze shifts - fast, voluntary 327 - involuntary 327 - populations of 327 - slow, voluntary 327 general interference 407 graphemes 229 graphemic explanation 228 grouping 302 hazard perception 375, 396 hazardous events 20 hidden Markov dynamic model 432,444 high or low spatial frequencies 348 high-level cognitive processes 2 Hollywood concept 358 horizontal search 22 hypervigilance 406, 411 informativeness 149 infrequent letter string 201 inhomogeneity of the visual system 298 initial fixation location effect 223 initial trigram 201 - frequency 156, 158, 161, 165 initial trigram token frequency 156 intelligent vehicle control 24, 431 internal mental states 444 interstate transition probabilities 444 inter-word eye behaviour 143 inter-word strategies 5 involuntary gaze shifts 327 jump cuts 361 jumping 357 landing position 119, 141, 194, 201, 211, 246, 247, 250, 252 - distribution 129 - effect 8, 203 - function 4, 79, 82, 88 last fixation duration 192, 193 lateral geniculate nucleus 236
463
lateral masking 298, 304 launch distance 6 launch sites 109, 130 leftward saccades 233 letter cluster 203 letter identification span 13 lexical access 11, 230, 255 lexical decision task 230 lexical informativeness 8 lexical properties 194 lexicon 224 linguistic control 131 linguistic influences 223 link probability measure 437 literacy 233 low-level visual influences 2 'magic moment' 185 magnocellular visual pathway 236 Markov matrix 435 Markov model 433 Markovian analysis 23, 435 mathematical model of the eye movements 128 mean fixation duration 41 measurement standards 30 measures 121 - standardisation of 59 methodological issues 32 microsaccades 34 mismatch theory 316 modularity 186 morpheme 229 morphemic composition 225 morphemic properties 202 morphological composition 8 morphology 228 Morrison's model 184, 204, 251 motion perception 237 movement of attention 297 movement-time profile 33 moving mask and moving window paradigms 337, 338, 340 moving objects 20 Moving Overlay Technique 352
464
Subject Index
Narrative Editing Principle 358 neurological data 237 neuroscience 306 non-linguistic models of inter-word eye behaviour 137 nonwords 230 normal reading 169 novel pop-out 318 novice drivers 21 object informativeness 15 object-skipping 15 ocular reactivity 319 oculomotor aiming errors 104 oculomotor error 141 oculomotor events 3 oculomotor measures 3, 30 oculomotor models 182, 190, 244, 252 oculomotor readiness theory 301 oculomotor strategy 104 oculomotor theories 195 optimal viewing position 10, 78, 224 orthographic factors 194 orthographic information 201 orthographic properties 202 orthographic saliency 8 orthographic structure 225 orthographies 2 parafovea 149, 201 parafoveal effects 191, 192, 193, 194, 1Q5 parafoveal guidance hypothesis 84 parafoveal processing 2, 83, 149, 181,185, 225 parafoveal vision 2, 108, 183, 184, 232 parafoveal word 141, 142 parafoveal-on-foveal effects 8, 149 parallel and serial processes 296, 305 parallel processing 151 parallel programming model 150 parietal systems 236 parvocellular pathway 236 perceptual continuum 184 perceptual narrowing 407, 410, 411, 414 perceptual pop-out 315
perceptual span 127, 231, 339, 346 peripheral information 232 peripheral processing 2 peripheral resolution 346 peripheral target detection 22 peripheral vision 271,407 periphery 407 pictorial information 14 picture perception 1 plausibility effects 191,317 point of fixation 201 "pop-out" of objects 17 populations of gaze shifts 327 posterior parietal cortex 238 postperceptual 359 postsaccadic vergence 36 pragmatic plausibility 181, 186, 192 preferred viewing position 5, 78, 224 prepositional phrase ambiguities 186 pre-programmed sequences of movements 2 preview 437 preview benefit 250, 262 primary visual cortex 236 prime 228 prime fixations 321 prior context 226 prior saccade length 110 process model 251 process monitoring 7, 149, 185 processing - disruption to 61 - models 195, 196, 244 progressive left-to-right movements 2 progressive saccades 102 psychophysical data 236 pursuit movements 361 pursuit tracking 378, 400, 401 race models 187 reading 1, 127 - attentional theory of eye guidance in 127 - ability 223 - able readers 223
Subject Index
- disabled subjects 223 - span 231 - speed 230 - time measures 55 - times 201 refixation optimal viewing position effect 104
refixation rates 44 refixations 86, 149, 159, 163, 247, 248 reflexive saccade 412 regression likelihood 107 regression path reading time 4, 63 regressions 89 regressive eye movements 143, 144 regressive gaze durations 42 regressive saccades 2, 59, 102 repair time 42 repetition context 186 re-reading time 4, 63, 72 resource 234 retina 236 retinal location 305 reti no-cortical mapping 298 reversed-angle shots 358 rightward saccades 233 road geometry 397 root morphemes 229 rural roads 402 saccades 2, 361, 423 - amplitude 10, 167, 282, 300 - distance 380 - durations 34 - latency. 408 - length 108, 195, 226 - linguistic influences on 223 - preparation 10 - target selection 342 - trajectories of 57 saccadic deadline 151 saccadic overshoot 3 saccadic range error 79 saliency map 289, 305 - framework 16
same/different matching 181 - sentence matching 186 scanpath theory of recognition 434 scanpaths 24, 435 scene perception 2, 14, 270, 314, 340 scene representation 298 scene viewing 270 schema-driven facilitation 315 schema-driven perception 17 schemata 388 search function 297 search with eye movements 303 second pass gaze duration 279 semantic aspects 223 semantic context effects 332 semantic informativeness 272 semantic plausibility 8 semantic pre-processing 8, 225 semantic processing 190 sentence prime 227 sequential attention model 13, 204, 342 shoot first, think later strategy 308 simulated driving 420 single fixation durations 39 single versus dual task 234 skipped word frequency 112, 113 skipping 2, 319 skipping rate 38 'smart car' 432 spatial frequency 237 spatially contiguous fixations 57 spelling ability 233 spillover 262 - effects 3 standardisation of measures 59 steering 372 steering control 22 stochastic nonlinguistic eye guidance strategy 141 strategy and tactics theory 87, 128 sub-lexical information 185 suburban roads 402 sustained pathway 236 syntactic analysis 186
465
466
syntactic and semantic form 191 syntactic structure 183 syntax 390 tangent point 22, 371, 419, 437 target saliency 327 temporal frequencies 237 temporal systems 236 temporally contiguous fixations 55 text comprehension 230 time-locked effects 323 total reading time 58, 70 total time 279 total viewing durations 41 traffic density 399, 410 trajectories of saccades 57 transient visual pathway 236 transition probability matrix 435 trigram frequency 194 trigram informativeness 7 trigram token and type familiarity 7 tunnel vision 407 unitized representation 17 usable field of view 407, 411 useful field of view 300, 323 vehicle control - eye movements for 436 vehicles 419 vergence eye movements 308 video switcher 353 visual acuity 103, 298 visual attention 296 visual conspicuity 2
Subject Index
visual deficit 235 visual demand 397 visual ecology 307 visual factors 105 visual informativeness 272 visual lobe 300 visual momentum 359 visual resources 419 visual search 1, 273, 295, 370,406 - patterns 419 visual stimuli 428 visual system - inhomogeneity of the 298 visual tasks 419 visuomotor constraints 130 visuomotor processes 420 Viterbi algorithm 446 wiggling objects 17, 19 within-word letter sequences 224 within-word tactics 5 word boundary 225 word frequency 6, 105, 132, 149, 183, 191 word identification 2, 103 - span 13 word length 6, 105, 130, 149 word lexical characteristic 105 word recognition processes 184, 224 word skipping 120, 193, 194 word-by-word incremental sentence processing 186 workload 371 writing ability 233 zoom lens model 407