Seeing Spatial Form
This page intentionally left blank
Seeing Spatial Form Edited by Michael R. M. Jenkin, Ph. D Laurence R. Harris, Ph. D
OXFORD UNIVERSITY PRESS
2006
OXFORD UNIVERSITY PRESS Oxford University Press, Inc., publishes works that further Oxford University's objective of excellence in research, scholarship, and education. Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam
Copyright © 2006 by Michael R. M. Jenkin and Laurence R. Harris Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 www.oup.com Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. Library of Congress Cataloging-in-Publication Data Seeing spatial form / edited by Michael R. M. Jenkin and Laurence R. Harris p. cm. Includes bibliographical references and indexes. ISBN-13 978-0-19-517288-1 ISBN 0-19-517288-4 1. Form perception. 2. Space perception. I. Jenkin, Michael, 1959II. Harris, Laurence, 1953QP492.S44 2005 152.14—dc22 2004056814
9 8 7 6 5 4 3 2 1 Printed in the United States of America on acid-free paper
Preface This book is in appreciation of the contributions of David Martin Regan. He continues to be an inspiration to many. We would like to thank Teresa Manini, who ran the conference; Jim Zacher for his technical assistance; and our wives for their enduring support. The CD-ROM that accompanies this book contains color imagery and video clips associated with various chapters and the York Vision Conference itself. The CDROM is presented in HTML format, and is viewable with any standard browser (e.g., Netscape Navigator or Microsoft Internet Explorer). To view the videos on the CDROM you will need Quicktime, which is available free from Apple. To view the CDROM, point your browser at the file index.htm on the CD-ROM. York University, Ontario, Canada Winter 2004
Michael Jenkin Laurence Harris
v
This page intentionally left blank
Contents Contributors
xv
1 Seeing Spatial Form Laurence R. Harris and Michael Jenkin 1.1 1.2
Processing by the Brain The Structure of This Book
3 5
1 Form Vision
9
2 Pictorial Relief Jan J. Koenderink, Andrea J. van Doom, andAstridM. L. Kappers 2.1 2.2 2.3 2.4
2.5
2.6
1
Introduction Some History Psychophysics: Methods Findings 2.4.1 Veridicality 2.4.2 Influence of Viewing Mode 2.4.3 Influence of Pictorial Cues 2.4.4 Global versus Local Representation 2.4.5 Influence of Method Geometry of Pictorial Space 2.5.1 Simple Introduction to the Geometry: The 2D Case . . . . 2.5.2 The 3D Case 2.5.3 The Panoramic Visual World What Next?
3 Geometry and Spatial Vision Gerald Westheimer
11 11 12 13 17 17 18 20 21 21 23 24 26 27 29 35
VII
viii
Contents
4 The Inputs to Global Form Detection David R. Badcock and Colin W. G. Clifford 4.1 4.2 4.3 4.4
Introduction Seeing Glass Patterns A Model of the Functional Architecture of Global Form Detection . . Conclusions
5 Probability Multiplication as a New Principle in Psychophysics Michael Morgan, Charles Chubb, and Joshua Solomon
6
II
43 43 44 52 53 57
5.A1 Methods 5.A2 Models and Theory 5.A2.1 The Late-Noise Reichardt Model 5.A2.2 The Opponent (Contrast Discrimination) Model 5.A2.3 The Probability-Multiplication Model 5.A2.4 The Convoy Model (not considered here and a poor fit to all the data)
63 64 64 65 65
Spatial Form as Inherently Three Dimensional Christopher W. Tyler
67
6.1 6.2 6.3 6.4 6.5
72 75 80 83 85
Surface Representation through the Attentional Shroud Interpolation of Object Shape within the Generic Depth Map . . . . Transparency Object-Oriented Constraints on Surface Reconstruction Conclusion
Motion and Color
7 White's Effect in Lightness, Color, and Motion Stuart Anstis 7.1 7.2 7.3 7.4 7.5 7.6
Introduction Experiment 1. White's Effect Increases with Spatial Frequency ... Experiment 2. A Colored White's Effect Shows Both Contrast and Assimilation Experiments. Colored White's Effect: Spatial Frequency Experiment 4. An Isotropic Brightness Illusion: "Stuart's Rings" . . Experiment 5. White's Effect and Apparent Motion
8 The Processing of Motion-Defined Form Deborah Giaschi 8.1 8.2
The Motion-Defined Letter Test Dissociations Between Motion-Defined Form and Simple Motion Processing
66
89 91 91 92 93 93 95 96 101 102 103
Contents
8.3
8.4
ix 8.2.1 Multiple Sclerosis 103 8.2.2 Cortical Lesions 104 8.2.3 Normal Development 105 8.2.4 Abnormal Development 106 Role of the M/Dorsal Pathways in Motion-Defined Form Processing . 110 8.3.1 Parkinson's Disease 110 8.3.2 Reduced Visual Acuity 113 8.3.3 Functional Neuroimaging 113 Conclusions 115
9 Vision in Flying, Driving, and Sport Rob Gray 9.1 9.2 9.3 9.4 9.5
Introduction 121 9.1.1 Some Basic Limitations of the Visual Processing System . 122 Vision in Flying 122 9.2.1 Visual-Motor Control in Approach and Landing 123 9.2.2 Visual-Motor Control in Low-Level Flight 131 Vision in Driving 136 Vision in Sports 140 Conclusions 146
10 Form-from-Watercolor in Surface Perception, and Old Maps Lothar Spillmann, Baingio Pinna, and John S. Werner 10.1 Introduction 10.2 General Methods 10.3 Experiment 1: How to Create Two Geographical Maps by Using One Boundary 10.4 Experiment 2: Watercolor Effect vs. Proximity and Parallelism . . . 10.5 Experiment 3: Watercolor Effect vs. Good Continuation and Prdgnanz 10.6 Experiment 4: Watercolor Effect Used to Disambiguate Grouping and Figure-Ground Organization 10.7 Experiment 5: Why Did the Old Maps Fail to Elicit Strong LongRange Coloration Effects? 10.8 Conclusion
III
121
Eye Movements
153 153 156 156 157 159 161 162 164
167
11 The Basis of a Saccadic Decision: What We Can Learn from Visual Search and Visual Attention 169 Eileen Kowler 11.1 Prologue 11.2 Saccadic Decisions 11.3 Search and Optimal Search
169 170 172
x
Contents 11.4 Saccades during Natural Visual Tasks 11.5 Saccades and Visual Search: An Investigation of the Costs of Planning a Rational Saccade 11.6 The Role of Attention in the Programming of Saccades 11.7 Saccadic Decisions, Search, and Attention 11.8 Final Comments
173 174 180 183 184
12 Handling Real Forms in Real Life R. M. Steinman, W. Menezes, and A. N. Herst
187
IV Neural Basis of Form Vision
213
13 The Processing of Spatial Form by the Human Brain Studied by Recording the Brain's Electrical and Magnetic Responses to Visual Stimuli 215 David Regan and Marian P. Regan 13.1 Introduction 13.2 Human Brain Electrophysiology: The Early Days 13.3 My Introduction to the Mathematical Analysis of Nonlinear Behavior and to the Joys of Collaborative Research 13.4 Brain Responses to Depth-Defined Form and to "Short-Range" Apparent Motion 13.5 Dissociation of the Brain's Magnetic Responses to Texture-Defined Form and to Texton Change 13.6 Three Subsystems in the Steady-State Evoked Potential to Flicker and a Magno Stream/Parvo Stream Correlate in Human 13.7 The Frequency Tagging Technique: Some Early Applications . . . . 13.8 The Sweep Method: A Fast Hybrid Technique for Gathering Data within a Short Period and for Measuring Adaptation and Other Nonstationary Processes 13.9 Response Spectrum Recorded at Ultra High Resolution: Nondestructive Zoom-FFT 13.10 Measurement of the Orientation, Spatial Frequency, and Temporal Tuning of Spatial Filters by Exploiting the Nonlinearity of Neurons Sensitive to Spatial Form 13.11 A Visual-Auditory Convergence Area in the Human Brain 13.12 A Frequency Domain Technique for Testing Nonlinear Models of the Human Visual System 13.13 Appendix 13.A1 Linear Systems and the Wide and Wild World of Nonlinear Systems . 13.A2 Some Definite Time Elapses between Stimulation of the Eye or Ear and the Occurrence of any Given Evoked Response 13.A3 A Method for Deriving the Response of Asymmetric Nonlinearities to a Sum of Two Sinewaves 13.A3.1 Half-Wave Linear Rectifier: Response to a Single Sinusoid
215 215 223 233 234 236 238 239 243 247 250 250 256 256 258 259 259
xi
Contents 13.A3.2 Half-Wave Linear Rectifier: Response to the Sum of Two Sinusoids 13.A3.3 Half-Wave Square Law Rectifier: Response to the Sum of Two Sinusoids 13.A3.4 Half-Wave Square Root Rectifier: Response to the Sum of Two Sinusoids 13.A3.5 Two Cascaded Half-Wave Rectifiers, AC Coupled 13.A3.6 Cascaded Compressive Rectifiers 13.A3.7 Two Parallel Rectifiers whose Summed Outputs Pass through a Third Linear Rectifier: The Dichoptic Case 13.A3.8 Half-Wave Rectifier Combining Accelerating and Compressive Segments
260 266 267 268 269 270 272
14 Linking Psychophysics and Physiology of Center-Surround Interactions in Visual Motion Processing 279 Duje Tadin and Joseph S. Lappin 14.1 Introduction: Moving Image Information 14.1.1 Linking Macroscopic and Microscopic Analyses of Visual Information Row 14.1.2 Inferring Perception from Physiology 14.2 Center-Surround Interactions in Motion Processing 14.2.1 Center-Surround Mechanisms Found in MT and Elsewhere 14.2.2 Perceptual Correlates of Center-Surround Antagonism . . 14.2.3 Interim Conclusions 14.3 Segregating Surfaces 14.3.1 Psychophysics of Motion-Based Figure-Ground Segregation 14.3.2 Neurophysiology of Motion-Based Figure-Ground Segregation 14.3.3 Interim Conclusion 14.4 Perceiving 3D Surface Shape 14.4.1 Psychophysics of 3D Shape-from-Motion 14.4.2 Contribution of MT to Shape-from-Motion 14.4.3 Interim Conclusions 14.5 General Conclusions
279 280 284 285 285 287 291 291 292 295 299 299 299 300 304 304
15 Transparent Motion: A Powerful Tool to Study Segmentation, Integration, Adaptation, and Attentional Selection 315 Thomas Papathomas, Zoltan Vidnydnszky, and Erik Blaser 15.1 15.2 15.3 15.4 15.5 15.6 15.7
Introduction 315 Stages of Motion Processing 316 Transparent and Non-Transparent Bi-Vectorial Motion 318 Neural Mechanisms of Motion Segmentation and Integration . . . . 320 Integration of Motion Directions during the Motion Aftereffect (MAE) 322 MAE with Transparent Motion - Integration during MAE 323 Nature of Attentional Selection in Motion Processing 324
xii
Contents 15.8 Conclusions
326
16 Neurological Correlates of Damage to the Magnocellular and Parvocellular Visual Pathways: Motion, Form, and Form from Motion after Cerebral Lesions 335 James A. Sharpe, Ji Soo Kim, and Jos ee Rivest 16.1 16.2 16.3 16.4
Introduction Methods Results Discussion
336 337 339 341
17 The Effect of Diverse Dopamine Receptors on Spatial Processing in the Central Retina: A Model 347 Ivan Bodis-Wollner and Areti Tzelepi 17.1 17.2 17.3 17.4
Retinal Circuitry Receptive Fields of Ganglion Cells Retinal Processing and Dopamine's Action Dopaminergic Effects on the PERG in the Monkey 17.4.1 Retinal Spatial Tuning in the MPTP Primate Model . . . . 17.4.2 The Effects of Intravitreal 6-OHDA on Spatial Tuning . . 17.4.3 The Effect of the D2 Receptor Blocker L-Sulpiride on Spatial Frequency Tuning 17.4.4 The Effect of CY 208-243, a Dl Agonist, on Spatial Tuning 17.4.5 Synthesis of Experimental Results 17.5 TheModel 17.5.1 The Normal Retina 17.5.2 The Dopamine-Deficient Retina 17.6 Dopamine's Role in Retinal Mechanisms
347 350 351 352 352 352 354 354 355 356 356 357 359
V Development
369
18 Improving Abnormal Spatial Vision in Adults with Amblyopia Uri Polat
371
18.1 Background 18.2 Amblyopia 18.2.1 Abnormal Spatial Vision in Amblyopia 18.2.2 Contrast Sensitivity and Amblyopia 18.2.3 Models of Amblyopia 18.2.4 Abnormal Spatial Interactions and Crowding 18.3 Perceptual Learning and Neural Plasticity 18.3.1 Plasticity in Amblyopia 18.4 Treatment of Adult Amblyopia
371 372 372 372 374 374 375 375 376
xiii
Contents 18.4.1 18.5
Perceptual-Learning-Based Technique for Treating Amblyopia
Summary
19 Visual Development with One Eye Martin J. Steinbach and Esther G. Gonzalez
376 380 385
19.1 Introduction 385 19.2 Form (Contrast, Texture and Motion Denned), Motion (including OKN), and Depth 386 19.2.1 Form 387 19.2.2 Motion 390 19.2.3 Monocular Practice 391 19.2.4 Optokinetic Nystagmus (OKN) 391 19.2.5 Time to Collision 391 19.3 Depth 391 19.4 Egocenter: Role of Binocular Experience 392 19.4.1 The Visual Direction of Objects 392 19.4.2 Hints from a Birthday Party 392 19.4.3 The "Cyclops Effect" 394 19.4.4 The Egocenter is "Built In" 395 19.4.5 The Egocenter is not so "Plastic" 395 19.4.6 Learning to Perform Monocular Tasks 396 19.4.7 Performance on a Monocular Task sans Feedback 397 19.5 Conclusions 398 A Appendix: Selected Publications of David Regan
405
Author Index
421
Subject Index
435
This page intentionally left blank
Contributors Stuart Anstis, Department of Psychology, University of California at San Diego, 9500 Oilman Dr., La Jolla, CA 92093-0109 USA. email:
[email protected] David R.Badcock, Human Vision Laboratory, School of Psychology, The University of Western Australia, 35 Stirling Highway, Nedlands, Western Australia 6907. email:
[email protected] Erik Blaser, Department of Psychology, University of Massachusetts, 100 Morrissey Blvd., Boston, MA 02125 USA. email:
[email protected] Ivan Bodis-Wollner, Department of Neurology, State University of New York, Health Science Center at Brooklyn, 450 Clarkson Ave., Box 1213, Brooklyn, NY 11203 USA. email:
[email protected] Charles Chubb, Department of Cognitive Sciences, 3151 Social Sciences Plaza, University of California, Irvine, CA 92697-5100 USA. email:
[email protected] Colin W. G. Clifford, Visual Perception Unit, School of Psychology, The University of Sydney, Sydney, NSW Australia 2006. email:
[email protected]
xv
xvi Deborah Giaschi, University of British Columbia, Department of Ophthalmology, Room A146 - 4480 Oak Street, Vancouver, BC V6H 3V4 Canada. email:
[email protected] Esther G. Gonzalez, Toronto Western Research Institute, Toronto Western Hospital, 399 Bathurst Street Toronto, Ontario M5T 2S8 Canada. email:
[email protected] Robert Gray, Department of Applied Psychology, Arizona State University East, 7001 E Williams Field Rd., Mesa, AZ 85212 USA. email:
[email protected] Laurence R. Harris, Centre for Vision Research and Department of Psychology, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3 Canada. email:
[email protected] A. N. Herst, Department of Psychology, University of Maryland, College Park, MD 20742-4411 USA. email:
[email protected] Michael Jenkin, Centre for Vision Research and Department of Computer Science, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3 Canada. email:
[email protected] Astrid M. L. Kappers, Industrial Design Delft, Delft University of Technology, Landbergstraat 15,2628CE Delft, The Netherlands. email:
[email protected]
Contributors
Contributors Ji Soo Kim, Department of Neurology, College of Medicine, Seoul National University, 300 Gumi-dong, Bundang-ku, Seongnam-si, Gyeonggi-do, 463-707 Korea. email:
[email protected]
Jan J. Koenderink, Helmholtz Instituut, Universiteit Utrecht, Buys Ballot Laboratorium, Princetonplein 5, 3584CC Utrecht, The Netherlands. email:
[email protected] Eileen Kowler, Department of Psychology, 152 Frelinghuysen Road, Rutgers University, Piscataway, NJ 08854 USA. email:
[email protected] Joseph S. Lappin, Vanderbilt Vision Research Center and Department of Psychology, Vanderbilt University, 111 21st Ave. South, Nashville, TN 37203 USA. email:
[email protected] Wayne Menezes, Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742-4411 USA. email:
[email protected] Michael Morgan, Department of Optometry, City University, Northampton Square, London EC1V OHB United Kingdom. email:
[email protected] Thomas Papathomas, Department of Biomedical Engineering and Laboratory of Vision, Research, Rutgers University, 152 Frelinghuysen Road, Piscataway, NJ 08854-8020 USA. email:
[email protected] Baingio Pinna, Facolta di Lingue e Letterature Straniere, Universita di Sassari, Via Roma 151,07100 Sassari-Italy. email:
[email protected]
xvii
xviii
Uri Polat, Goldschleger Eye Research Institute, Tel Aviv University, Sheba Medical Center, Ramat Can, Tel Hashomer, Israel 52621. email:
[email protected] David M. Regan, Centre for Vision Research and Department of Psychology, York University, 4700 Keele St., Toronto, Ontario M3J 1P3 Canada. email:
[email protected] Marian P. Regan, Centre for Vision Research and Department of Psychology, York University, 4700 Keele St., Toronto, Ontario M3J 1P3 Canada. email:
[email protected] Josee Rivest, Department of Psychology, Glendon College York University, 4700 Keele St., Toronto, Ontario M3J 1P3 Canada. email:
[email protected] James A. Sharpe, Division of Neurology, University Health Network, University of Toronto, Toronto, Ontario M5T 2S8 Canada. email:
[email protected] Joshua Solomon, Department of Optometry, City University, Northampton Square, London EC IV OHB United Kingdom. email:
[email protected] Lothar Spillmann, Brain Research Unit, University of Freiburg, Hansastrasse 9, D-79104 Freiburg, Germany. email:
[email protected]
Contributors
Contributors Martin J. Steinbach, Toronto Western Research Institute, Toronto Western Hospital, 399 Bathurst St., Toronto, Ontario M5T 2S8 Canada. email:
[email protected] R. M. Steinman, Department of Psychology, University of Maryland, College Park, MD 20742-4411 USA. email:
[email protected] Duje Tadin, Vanderbilt Vision Research Center and Department of Psychology, Vanderbilt University, 111 21st Ave. South, Nashville TN 37203 USA. email:
[email protected] Christopher W. Tyler Smith-Kettlewell Eye Research Institute, 2318 Fillmore Street, San Francisco, CA 94115 USA. email:
[email protected] Areti Tzelepi, LPPA, CNRS - College de France, 11, place Marcelin Berthelot, Paris,75005 France. email:
[email protected] Andrea J. van Doom, Industrial Design Delft, Delft University of Technology, Landbergstraat 15, 2628CE Delft, The Netherlands. email:
[email protected] Zoltan Vidnydnszky, Laboratory for Neurobiology, Semmelweis University Medical School, Tuzolto u. 58,1094 Budapest, Hungary. email:
[email protected]
xix
xx John S. Werner, Department of Ophthalmology and Section of Neurobiology, Physiology and Behavior, University of California, 4860 Y St., Suite 2400, Sacramento, California 95817 USA. email:
[email protected] Gerald Westheimer, Division of Neurobiology, University of California, 565 Life Sciences Addition, Berkeley, CA 94720-3200 USA email:
[email protected]
Contributors
1. Seeing Spatial Form Laurence R. Harris and Michael Jenkin The term "spatial form" refers to the existence in the outside world of surfaces that have a spatial extent. The problem of seeing such surfaces is the problem of deducing or constructing their existence and shape from the spatially distributed pattern and spectral content of the light reflected from them into the eyes and transduced by the retina. The level of concern of "seeing spatial form" therefore starts somewhere higher than the retina, at a point where information is integrated across and between the retinae. The upper level of its jurisdiction depends on the interpretation of the word "seeing." In this book we regard spatial form as not being perceived on its own. By the level of perception, the visual information has passed through too many constancy mechanisms and cognitive processes to allow access to the raw "spatial form." For example, it is very hard to see a coin as anything but round and a person as anything but personsized. It is hard to see the shapes of the contours of someone's cheek in terms of their geometric curvature. Spatial form has provided some of the building blocks for conscious perceptions but is no more available to unbiased conscious inspection than are absolute luminance levels or retinal distances. We therefore draw the upper bound on seeing spatial form somewhere lower than object recognition and consciousness. Looking at 'lower-level vision', the input to conscious processing, presents a methodological challenge. The tendency of the visual system to apply higher-level processes needs to be controlled so that we can look at the processing of spatial form per se. This involves designing psychophysical probes which can only be solved by extracting the particular aspect of spatial form that we wish to address. David Regan (Fig. 1.1) is a master of the design of psychophysical tests to reveal the processing of spatial form. Through his enormous output of published papers, which are listed in an appendix to this book, he has made great strides in clarifying the denning features of spatial form that the visual system is able to use. He has presented and tested many pioneering algorithms. This aspect of his work up until 2000 is summarized in his book Human Perception of Objects: Early Visual Processing of Spatial Form Defined by Luminance, Colour, Texture, Motion and Binocular Disparity (Sinauer Press, 2000; Fig. 1.2). Defining spatial form requires perceptually breaking off a surface or form from its 1
2
Seeing Spatial Form
Figure 1.1: A portrait of David Martin Regan by renowned psychological portrait composer Nick Wade titled "Sportsman, Seer, Scientist." The image combines three of Martin Regan's loves - cricket, vision, and scientific communication. He is shown as a young batsman striding towards (or away from!) the crease, and as an older man surveying the Atlantic from Halifax, Nova Scotia. Unlike many who enjoy sport and science, Martin combines them in an elegant manner. He has displayed the particular demands requked of a batsman both on the pitch and in the laboratory. In addition to his elegant strokes of the bat, those of the pen have won him many plaudits — there is art in science as well as sport. I am grateful to Marian Regan for providing the pictures of Martin which were used in producing this "perceptual portrait." The text is taken from D. Regan (1992). Visual judgements and misjudgements in cricket, and the art of flight. Percept.,21: 91-115. background. This is sometimes referred to as 'parsing' the visual scene. The term "parsing" describes the way that a sentence is grammatically analyzed into its component pieces. The connection with language is intentional and concepts such as "in front of" or "on top of" are also parts of the visual parsing process. How does the visual system parse the visual scene into forms? The system needs to detect discontinuities that mark the transition between one form and another. Candidates for such discontinuities that can be detected visually include luminance, texture,
Laurence R. Harris and Michael Jenkin
3
Figure 1.2: The cover of David Regan's book Human Perception of Objects: Early Visual Processing of Spatial Form Defined by Luminance, Colour, Texture, Motion and Binocular Disparity, (Sinauer Press, 2000). colour, distance, "stuff" (a variant of texture implying a type of material such as skin or metal), and relative motion. In addition, there are nonvisual variables such as temperature or tactile "feel," but those are beyond the scope of this book. The converse of detecting a form is to camouflage or hide it. The process of hiding something is the process of obscuring the discontinuities. Hiding might be achieved, for example, by lying flat and still against a background that appears similarly colored to the hunter. A wonderful example of camouflage in action is shown in Fig. 13.2. (The original title of the book was Seeing through Camouflage.) A major drive in the evolution of the visual system is to break camouflage in order to detect prey, other food items, or predators.
1.1 Processing by the Brain The retina and thalamic precortical cells show limited signs of coding spatial form in higher mammals. The pioneering work by Hubel and Wiesel (1959, 1962, 1968) suggested that the activity of cells in the primary visual cortex could be interpreted as extracting and identifying features. Pulling out and segragating features from the visual input implies the existence of a later stage that puts them together. This binding process is probably above the level of form extraction. Spatial form is probably one of the features that needs to be bound to the other features. Spatial form might even provide the frame onto which other features such as color are added (see chapter 6). Constructing a representation of the world by bundling separated features allows multiple solutions. The most appropriate choice of features and the emphasis placed on one rather than another feature can depend on the use to which the end product is to be
4
Seeing Spatial Form
Figure 1.3: David Martin Regan as visionary in action. put. Spatial location relative to the observer, for example, is not a centrally important part of recognizing a face but is essential when reaching out to touch, hit, or throw something (see Fig. 1.3). Two broad streams of visual processing have emerged that seem to divide this task-dependent reconstructive process between them, corresponding to a broad anatomical division of the visual system into dorsal and ventral streams (Mishkin and Ungerleider, 1982; Mishkin et al., 1983). The dorsal and ventral streams are separate right from their source where they originate in different retinal cells with different response characteristics. They are then processed by separate layers in the lateral geniculate nucleus of the thalamus, and even have distinct routes through the primary visual cortex before forming the dorsal and ventral pathways. The description of different neurological syndromes that can result from damage to one or other of these pathways has supported a "two visual systems" model (Milner and Goodale, 1993, 1995). In this influential model, one visual system, realized in the dorsal stream and the parietal cortex, processes visual information that subserves the needs of action. For this system, spatial location and movement are more important than features that might help in recognition, for example. The other visual system, the one that uses the ventral pathway, is less concerned with spatial location or movement but more with the details of pattern. It is important, however, not to overinterpret this distinction. Regan has strongly cautioned against making this mistake, pointing to evidence that connections between the dorsal stream and ventral stream (and, perhaps, subcortical nuclei) are necessary for recognizing motion-defined spatial form (Regan et al., 1992). Nevertheless, although there is no doubt that each pathway can communicate with and share information with the other, in terms of a functional division, having
Laurence R. Harris and Michael Jenkin
5
Figure 1.4: David Martin Regan going out on a limb, as usual, this time in Wales over a 500 ft drop. two visual systems subserving the requirements of perception and action has proved a remarkably robust concept (Harris and Jenkin, 1998). Which of these two visual systems is important for seeing spatial form? Spatial form, as a lower-level building block, is actually essential to both of these uses of vision. To define the existence of any object requires knowledge about the spatial form of its surfaces.
1.2 The Structure of This Book The book is divided into five sections. I Form Vision In part I, some of the general principles involved in defining spatial form are considered. Koenderink et al. (chapter 2) look at some of the pictorial cues that are involved in defining spatial form. Pictorial cues do not include binocular or eye position cues which cannot be directly included in a conventional picture but includes cues such as shape-from-shading, luminance discontinuities and perspective cues that contribute to seeing spatial form. Defing the geometric cues is pursued further in the chapter by Westheimer (chapter 3) who extends them into three dimensions and puts them into historical perspective. Badcock and Clifford (chapter 4) introduce the processing of spatial form as a heirarchical process working from orientation selectivity to selectivity for more global patterns. Morgan et al. (chapter 5) examine the role of coincidence detectors. Tyler (chapter 6) addresses the issue of how the various scattered features can be bound back together and suggests a central role for spatial form as a representing a framework onto which other features can be bound.
6
Seeing Spatial Form
II Motion and Color The chapters in part II address some of the specific cues that can be used to define spatial form, especially color and motion. Anstis (chapter 7) compares the contributions from color, light, and motion. Giaschi (chapter 8) looks specifically at motion-defined form and how arbitrarily it relates to the magnocellular-parvocellularprocessing stream divide. Gray (chapter 9) explores the role of these low-level visual processes during human performance in flying, driving, and sport. Spillmann et al. (chapter 10) explore the use of edges in maps. Cartoon drawings, consisting exclusively of edges, can be identified with their real-world counterparts suggesting some common features in the neural representation of cartoons and real-world scenes. III Eye Movements In order to put together the spatial structure and layout of a scene, the various views obtained from sequential fixations need to be put together. This involves a knowledge of the eyes' positions in space, bearing in mind that the head, the vehicle of the eyes, is also mobile. The planning of these scanning saccadic eye movements, used to explore the visual world, is considered by Kowler in chapter 11. Steinman et al. (chapter 12) examine how eye movements may contribute to the perception of spatial form in natural (as opposed to experimental) environmental conditions IV Neural Basis of Form Vision In part IV, the neurophysiology of form vision is considered. David Regan and his wife Marian (chapter 13) provide an extensive (and historically grounded) review of the approach to investigating the processing of spacial form by recording the electrical and magnetic responses to spatial form of the human brain. This chapter brings Regan's book Human Brain Electrophysiology (1989) up-to-date. The contribution of the center-surround organization of lower-level visual cells is considered by Tadin and Lappin (chapter 14). They explore the possible link between physiological center-surround antagonism and perceptual functions in segregating figure from ground, perceiving surfaces, and perceiving 2D and 3D shape. Papathomas et al. (chapter 15) use transparent motion to model a way in which surfaces can be defined by motion. Retinal ganglion cells are most responsive to sharp changes in luminance, especially edges in the retinal image. So it is from edges or luminance gradients that the representation of form must be constructed. Sharpe et al. (chapter 16) continue a theme that runs through this book: the relative role of parvocellular and magnocellular divisions of the visual system. They report patients who have lost the ability to see motion-defined form associated with lesions in the parietal-temporal region around MT. Bodis-Wollner and Tzelepi look at dopamine's role in the retina in chapter 17. V Development Polat (chapter 18) considers the role of specific deficits of spatial form perception in creating amblyopia, and Steinbach and Gonzalez (chapter 19) explore the consequences of losing one eye.
Laurence R. Harris and Michael Jenkin
1
Supplemental Material The chapters in this volume have two supplements. The first is the appendix found in this volume. This appendix lists, by subject, David Regan's contributions to our knowledge of spatial vision and related topics. The second is the CD-ROM associated with this text. The CD-ROM contains a number of color images, videos, presentations, and demonstrations that are associated with the various chapters. In addition to these chapter-specific components, the CD-ROM also contains a Quicktime version of David Regan's presentation at the York Vision Conference in June 2003.
References Harris, L. R. and Jenkin, M. (1998). Vision and Action. Cambridge University Press: Cambridge, UK. Hubel, D. H. and Wiesel, T. N. (1959). Receptive fields of single units in the cat's striate cortex. J. Physiol. (Lond.), 148: 574-591. Hubel, D. H. and Wiesel, T. N. (1962). Receptive fields, binocular interaction, and functional architecture in the cat's visual cortex. J. Physiol. (Lond.), 160: 106154 Hubel, D. H, Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. J. Physiol. (Lond.) 195: 215-243. Milner, A. D. and Goodale, M. A. (1993). Visual pathways to perception and action. Prog. Brain Res., 95: 317-337. Milner, A. D. and Goodale, M. A. (1995). The Visual Brain in Action. Oxford University Press: Oxford, UK. Mishkin, M. and Ungerleider, L. G. (1982). Contribution of striate inputs to the visuospatial functions of parieto-preoccipital cortex in monkeys. Behav. Brain Res., 6: 57-77. Mishkin, M., Ungerleider, L. G. and Macho, K. A. (1983). Object vision and spatial vision: two cortical pathways. Trends Neurosci., 6: 414-417. Regan, D. (1989). Human Brain Electrophysiology: Evoked Potentials and Evoked Magnetic Fields in Science and Medicine. Elsevier: New York. Regan, D. (1992). Visual judgements and misjudgements in cricket and the art of flight. Perception, 21: 91-115. Regan, D. (2000). Human Perception of Objects: Early Visual Processing of Spatial Form Defined by Luminance, Colour, Texture, Motion and Binocular Disparity. Sinauer Press: Sunderland, MA. Regan, D., Giaschi, D., Sharpe, J. A. and Hong, X. H. (1992). Visual processing of motion-defined form: selective failure in patients with parietotemporal lesions. J. Neurosci., 12: 2198-2210.
8
Seeing Spatial Form
Regan, D. and Tansley, B. W. (1979). Selective adaptation to frequency-modulated tones: evidence for an information-processing channel selectively sensitive to frequency changes. J. Acoust. Soc. Am., 65: 1249-1257. Tansley, B. W. and Regan, D. (1979). Separate auditory channels for unidirectional frequency modulation and unidirectional amplitude modulation. Sensory Proc., 3: 132-140.
Part I
Form Vision
This page intentionally left blank
2. Pictorial Relief Jan J. Koenderi , Andrea J. van Doom, and Astrid M. L. Kappers 2.1 Introduction Look at a photograph: You see a flat piece of paper. Look into a photograph: You are aware of an all-but-flat pictorial space. Artists have marvelled over this since earliest times. Scientists have tried to trivialize it or explain it away. But pictorial space is there to stay and remains an enigma till this day. Pictorial space is a mental entity. You can't "see" it and it needn't be "looked at." Its existence fully coincides with your experience. Nor need a "corresponding" physical space exist. On close inspection the alleged photograph might turn out to be mere fungus overgrowth over a plain sheet of dried wood pulp. You will still be aware of the pictorial space, the "stimulus" being the same. Clearly, pictorial space is not an "image" of anything. Of course, you can look at and see the fungus growth, that has nothing to do with it. It is your "hallucination," if you want. Let the difference between a photograph and the fungus-overgrown sheet not be detectable without the use of a microscope. Then looking at or into the sheet with the unarmed eye is in no way different from such acts performed on actual photographs. But then the pictorial space evoked by a real photograph must be equally hallucinatory and the fact that the photograph was once "taken," irrelevant to your perception. Thus talk of "veridicality" in the context of pictorial spaces has nothing to do with perception.1 Of course we could go on and discuss regular seeing as "controlled hallucination" (Gibson, 1970), but we will refrain from that in this chapter. If pictorial space is controlled hallucination, a mental entity, then a study of its structure has to reveal structures of consciousness. Such thoughts were driving us when we embarked on our exploration of pictorial space about a decade ago (Koenderink, van Doom, and Kappers, 1992). 'Moreover, philosophical discussions on the intentional (Brentano, 1874) nature of pictorial spaces have nothing to do with perception proper. 11
12
2.2
Pictorial Relief
Some History
A study of the literature reveals that though artists have written much of interest, scientists have mostly tried to get rid of the phenomenon of pictorial space. This may take various forms; for example, one may simply deny the existence of the phenomenon, or explain it away. Denial has been most popular. Thus "stereopsis," which simply means "stereoscopic vision," is invariably interpreted as "binocular stereopsis" (through disparity) and "monocular stereopsis" is regarded a controdictio in terminis. The existence of pictorial space was rediscovered many times over, (e.g., when one Claparede (1904) accidentally put two equal images in a stereoscope without the perception "going flat") and led to an obscure literature on "paradoxical stereopsis" (which apparently the better journals wouldn't touch). If something believed to be impossible is actually found to be the case it is surely paradoxical! Explaining away usually takes the form of stating that pictorial space is not a perception proper at all, but merely a cognitive construction. It is mere phantasy and has nothing to do with vision. The Rorschach test (Rorschach, 1921) exploits this very notion. It was not until the early twentieth century that monocular stereopsis was acknowledged as a stubborn fact. Then the optical industry produced viewers to obtain optimum pictorial spatiality from single pictures. There exist two major types. The first type,the Zeiss "Verant" (designed by von Rohr (1904) and Gullstrand) is the generic example. The Verant uses a flat field loupe with exit pupil in the center of rotation of an eye, the other eye being occluded. Accommodation is fixed at infinity. The eye is centered at the perspective center of the picture, the visual field being about 40 °. This type of viewer is still used for viewing slides nowadays, although many modern slide viewers are of inferior design (apparently the design objectives are unclear). The second type uses optics to present the picture to both eyes, while eliminating accommodation, vergence, and disparity cues so that the observer is actually confronted with a flat picture. The Zeiss (Carl Zeiss Jena, 1907) "synopter" (also a von Rohr (1920) design) is the generic example, the late nineteenth century "zograscope" being a somewhat inferior precursor (Balzer, 1998). Monocular and binocular stereopsis yield qualitatively different results. It has been repeatedly rediscovered that true stereopsis (binocular of course) gives rise to a coulisses scene: there is indeed spectacular depth, but it is as if the objects were dissappointingly like flat stage cardboard cut-outs staggered at various depths, the depth gaps between the coulisses being well defined. In contradistinction, the depth gaps between objects are less well defined in monocular stereopsis (except when the objects are distributed over a visible ground-plane, as is usually the case), but the pictorial objects look nicely rounded and solid. In fact, the paradoxical stereopsis literature often remarks on the fact that monocular pictorial space looks "better" (more like the real thing) than binocular stereoscopic space (Koenderink, van Doom, and Kappers, 1994): truly paradoxical! Such effects are striking (people who don't see it typically know that they won't, even before they venture to look) and can easily be demonstrated with an antique Victorian stereoscopic viewer. As already noted, artists have speculated much on the topic. For centuries it was their job to evoke pictorial spaces in their clients. Thus we find advice to artists on how to do this, and to the clients on how to get the most out of it. For instance,
Jan J. Koenderink, Andrea J. van Doom, and Astrid M. L. Kappers
13
Figure 2.1: Many experiments on pictorial depth are of this type: The stimulus is a figure and the response is some data structure that captures aspects of the pictorial relief (here a map of equal depth curves). Leonardo (1804) tells you to close one eye and stand at the proper distance in front of a painting. Oblique viewing and binocular viewing have a flattening effect; wrong viewing distance leads to deformations (see below). Perhaps the culmination of such writings is Adolf Hildebrand's Das Problem der Form (The Problem of Form) of 1893. Hildebrand (1945) understands pictorial space as relief space and describes generic transformations applied by observers when looking at painting and sculpture. The notion of a relief space indeed gets at the heart of the matter; we will return to it later in this chapter.
2.3 Psychophysics: Methods When we started our investigations, there weren't really any adequate psychophysical methods to approach the problem. Thinking in terms of stimulus and response, the former is simple enough: a picture and a viewing method. The latter is more of a problem, though: in order to quantify pictorial objects one needs to measure data structures whose values are significant geometry (see figure 2.1). This means large data volumes. Consider a simple example. The description of a non-trivially curved surface might take the form of a triangulation with at least a few hundred vertices, thus over a thousand real numbers (say, 3 digit). These have to be collected in half an hour or so; thus we envisage data streams of ca. 10 bits per second. Compare that with the dozen or so yes-no answers that are the typical yield of a classical psychophysical experiment! The data stream should be many orders of magnitude improved. Clearly novel methods were needed. In the course of time we developed a number of these. For the sake of conciseness we discuss only a couple of instances in this chapter.
14
Pictorial Relief
One general principle of measurement has been repeatedly valuable to us. If you want to measure something you might compare it to a standard. Thus you need a standard and a manner of comparison. In cases of geometrical measurements the standard might be some fiducial object ("gauge figure") and the comparison may be a judgment of "fit" or "coincidence." This is the principle of measuring length with a yardstick, for instance. It occurred to us that it is possible to put gauge figures in pictorial space by superimposing pictures of gauge figures (3D Euclidian ones) on the picture. The first instance of such a method implemented by us was based upon a standard technique in drawing (Rawson, 1969): an oval suggests a circular mark on a slanted plane. We put the oval under manual control by the observer. In this way a "fit" could be obtained in a few seconds. Already in our first trial we sampled about a hundred spatial orientations of pictorial surfaces within a quarter of an hour (Koenderink, van Doom, and Kappers, 1992). Although we have been rather happy with this particular method and have used it to good advantage in quite a number of studies, the field holds a rather different opinion (fortunately with a few exceptions). We have heard frequent complaints that these methods "don't work" or are otherwise problematic. This came as a surprise to us since random visitors and many naive persons (in this respect, that is) that came by our laboratory never experienced any problems in our setups. In a few cases we were in a position to try setups that "didn't work." In such cases we had to grant that there were problems since we weren't able to perform the task ourselves! Problems we noted were of various kinds. For instance, in some cases the gauge figure was rendered in such a way that it didn't adhere to the pictorial surface. This is visually immediately obvious, and the remedy is equally obvious. (It seems amazing that people running visual experiments wouldn't notice.) In other cases the interface was such as to render the task manually impossible. Again, the remedy is obvious and it is amazing that one involved in human psychophysics would fail to notice. If you ever played with the kid's game where you write your name using knobs that control Cartesian horizontal and vertical movements, you will understand what we mean with an "impossible" interface. Finally, there were often problems (and errors) in the initial processing of the raw data. This assumes some basic knowledge of differential geometry (Do Carmo, 1976) that is apparently lacking in many laboratories involved in the study of visual form. Such problems have made our research somewhat unpopular. However, we remain firmly dedicated to this general style of approach which has led to a major step up in our rate of progress on these topics. A finding that relates to the idea that pictorial space is not a true perception but a mere thought construct is that we have encountered a few observers that appeared singular in their time taken to perform the settings. Generic observers take a few seconds, being mainly paced by the slowness of the manual task. In pilot reaction time experiments we find that pictorial space builds up in a fraction of a second and is clearly a "perception" in the sense of "presentational immediacy." Pictorial space simply happens to you, much like sneezing. There is nothing you can do about it, except from closing your eyes or looking away from the picture. No deep thoughts are required. The singular observers (maybe one out of ten; the statistics are only guesswork) take ten to a hundred times longer than typical observers. What might go on in these people? It may be that they are not performing the task in pictorial space (the very crux
Jan J. Koenderink, Andrea J. van Doom, and Astrid M. L. Kappers
15
of the method), but somehow "reason it out." Indeed, some of these people understand the task in this way: First I estimate the slant and tilt of the pictorial surface, then I adjust the oval in such a way that its perceived slant and tilt appear to have these same values. Now this is exactly what is not intended. Observers need not even know what slant and tilt is, nor do they have to estimate the spatial orientation of the pictorial surface. They simply have to make the oval "look right" (as a circle on the surface). This difference apparently cannot be explained to some people, including (quite a few) colleagues in visual perception with whom we had rather fruitless correspondences. As reviewers of papers, such people suggest that one should "calibrate the method" by requiring observers to estimate the slant and tilt of isolated ovals, and "correct" the settings in the actual runs accordingly. Notice that this immediately derives from the misrepresentation quoted above. It is indeed possible (Mingolla and Todd, 1986) to let people estimate (e.g., call out values in degrees) slant and tilt of pictorial surfaces. People hate the task, take a long time doing it, and are very unreliable on it. Such methods have no relation to the gauge figure method. Other frequently used methods involve the indication of the nearest or most remote point of a pictorial surface, either on a surface patch or constrained to a line in the image (van Doom, Koenderink, and de Ridder, 2001; Koenderink and van Doom, 2003) (thus a plane in pictorial space). Such methods clearly cannot be done on the basis of local pictorial detail, but have to be done in pictorial space. In a related method we place two dots on a picture and ask the observer which one is closer (Koenderink, van Doom, and Kappers, 1996). Such a question only makes sense because dots on the picture surface are seen in pictorial space and seem to lie on the nearest pictorial surface. Again closely related to the latter method is a method where a line is drawn over the picture surface (indicating a plane in pictorial space) and the observer is asked to indicate (a suitable interface being provided) the shape of the intersection of the pictorial relief with that plane (Koenderink, van Doom, Kappers, and Todd, 2000,2001), a "normal cut." A final method that we have frequently used is of an altogether different type. We prepare two photographs of a single object, taken from different camera positions (see figure 2.2). Thus the pictures are quite different. We show both photographs simultaneously to the observer. We place a dot on one photograph and ask the observer to place a dot on the second picture such that the dots indicate the same spot on the pictorial objects (Koenderink, Kappers, Pollick and Kawato, 1997; Koenderink, van Doom, Arend, and Hecht, 2002; van Doom, Koenderink, and de Ridder, 2001). This is a very general method indeed. For instance, the task would make sense if the two pictures were portraits of different people, perhaps even if one were replaced with a picture of a horse's head. The observer is allowed the response "no correspondence," in order to avoid conflicts. This is necessary for instance when photographs show different sides of some object. In typical cases of straight photographs of some simple object, observers find the task very easy and can set hundreds of correspondences in a twenty-minute session. Of course this is highly remarkable, since the task cannot be done at all via modern computer vision algorithms (Forsyth and Ponce, 2002). The
16
Pictorial Relief
Figure 2.2: Two pictures of the same object, photographed at different angles. Here the object was rotated by 67.5° about the vertical between exposures. In the method of correspondences an observer is asked to find the location in the right picture corresponding to a given location in the left image. Try it yourself for the white dot. task can't be done on pictorial detail, it has to be done in pictorial space (or rather in two pictorial spaces), and that is how our observers tell us they do it. These descriptions don't exhaust our repertoire of methods to quantify pictorial relief, but they are probably sufficient to convey the general idea. So far we have mainly used photographs of rigid, opaque objects, painted white, against simple backgrounds. The objects were somewhat more articulated than is typical for the field though. About 90% of the literature is on planar patches, simple polyhedra (e.g., cubes), cylinders, spheres, or triaxial ellipsoids. In our view the problem with such shapes is that they present singular cases for most pictorial cues. The generic case involves surfaces with more complicated surface articulations, patches of smoothly joined doubly curved convex, concave, and saddle shaped surfaces. We prefer such generic cases because they are conceptually simpler than singular cases and lead to results of a general instead of a mere specific (or artificial) nature. It is perhaps a reductionist trait natural to scientists to prefer "simple" stimuli; however, in this case (apparently) simple is actually more complicated! Here "simple" means generic. (See the book by Poston and Stewart (1996) on the notion of "genericity.") A lack of familiarity with the formal tools to handle all but the simplest objects from high school geometry may also have to do with the stimulus preferences of mainstream research. We used photographs instead of computer graphics renderings (as is usual in the field) because virtually all computer graphics pipelines cut corners in the interest of speed at the expense of physical realism (Koenderink, 1999). We feel that it might perhaps be advisable to start research on physically realistic cases. Of course it is somewhat of a burden to produce the stimuli, especially to produce parametric variations on stimuli. However, this is by no means impossible. For instance, moving a
Jan J. Koenderink, Andrea J. van Doom, and Astrid M. L. Kappers
17
light source in the photographic studio varies the shading parametrically (Koenderink, van Doom, Christou, and Lappin, 1996a; Koenderink, 1998) (the parameter being the location of the source). It has been somewhat difficult to find sufficiently articulated objects that can be acquired as multiple (identical) copies. For a time we have mainly used torsos of dummies sold for clothing display in fashion shops. Although these have served us well, we met with a number of initially unexpected difficulties. For one thing, we often were severely chastised for overstepping the limits of civil morals, especially in the United States, though less so in Europe. Perhaps more importantly we very frequently met with the remark "but everyone knows what people look like" - on the face of it a strange remark coming from people used to looking at ellipsoids all day! It was suggested that our observers should have been able to perform the task with their eyes closed! This is nonsense, for very few people actually know what humans look like (Hatton, 1904). It takes (academic) artists years to learn the details of the shapes of human bodies. The variation in body shape among the population is immense (Bammes, 1990). Even the dummies sold for fashion display change their shapes (according to current fashion) every year. However, in the face of all this we are looking for alternatives. We are currently experimenting with sweet peppers (painted white). No doubt this, too, will meet with unexpected opposition.
2.4 Findings 2.4.1 Veridicality A number of early results addressed the problem of veridicality. As explained earlier we don't think this is a particularly interesting or important issue. However, a few baseline results are notable: 1. different observers yield different results, and so does the same observer—though to a much lesser extent—at different times (all for a single picture); 2. the viewing mode (e.g., monocular, binocular, synoptical, etc.) has a major effect on the pictorial relief; 3. only by accident is the pictorial relief quantitatively like the object that was photographed; 4. the rendering of the picture has an influence on pictorial relief. Regarding 1 and 2, the differences tend to be mainly of a very specific type, namely, a dilatation or contraction of the depth domain (a Hildebrand relief transformation). Such differences need not be small; we note changes by factors as large as five (van Doom and Koenderink, 1996; Koenderink, van Doom and Kappers, 1994; Koenderink and van Doom, 2003). In a number of cases we find changes of a more general but very particular nature: different pictorial reliefs (for the same picture, but for different
18
Pictorial Relief
observers or different tasks) are related through a particular type of shear, that is to say, a transformation of the type:
where z and z' denote the depth before and after the transformation, x and y the picture plane coordinates, and a to d are constants. Such transformations are very precisely of the stated form and equate the reliefs to within the experimental spread (Koenderink, van Doom, and Kappers, 2000; Koenderink, van Doom, Kappers, and Todd, 2000, 2001; Cornells, van Doom, and de Ridder, in 2003). Regarding 3 and 4, one obviously expects qualitative differences when the pictorial cues are changed. These methods allow us to study the effect of cue changes around a natural "set point," which appears to be crucial ("cue conflict" situations and "cue isolation" situations lead to very artificial results that can hardly be extrapolated to real-life cases). For instance, we find very systematic deviations from shape constancy under variations of illumination direction when shading is one of the important pictorial cues (Koenderink, van Doom, Christou, and Lappin, 1996a; Koenderink, 1998; Koenderink, and van Doom, 2003). Regarding 4, we find that variation over subjects is large when pictorial cues are scarce, whereas results from different subjects come closely into step as the bouquet of available pictorial cues is expanded (Koenderink, van Doom, Christou, and Lappin, 1996b; Koenderink, van Doom, Arend, and Hecht, 2002; Koenderink and van Doom, 2003). This shows how the "controlled hallucination" can run most of the spectrum between almost fully idiosyncratic (faces in clouds) to largely cue driven (looking at pictures of a holiday on the beach.)
2.4.2
Influence of Viewing Mode
It is well known that you can influence the apparent depth of relief by changing your viewing mode (Jacobs, 1986b). This works both when looking into pictures and when looking into a real scene, albeit that the effects are opposite for the two cases. Such viewing modes typically have to be learned. It is one part of an artist's training to learn how to look (which is much simpler than to learn how to see (Jacobs, 1986b), but it has to be practiced anyway). This creates a problem for the scientific literature, because neither the scientists, nor their naive observers, typically know how to look, thus leading to many apparent conflicts in the literature. If you have never experienced strong monocular stereopsis, this is what you should do: stand in front of a realistic painting, e.g., an impressionist landscape. Paintings are good because they have a size large enough to eliminate problems with accommodation or monocular parallax. Moreover, artists are expert at placing their depth cues strategically and effectively. Stand at what you think is the intended distance (typical field of views are 40-50°) with your eye at the right position, i.e., frontally (both in height and laterally). Close one eye. Feel free to look around in the painting through eye movements. Now wait, simply look intently. Don't think, look. If you have never experienced monocular stereopsis it may take you half a minute or a minute to acquire it. Even if you are experienced it will take a few seconds. Don't worry, you will know
Jan J. Koenderink, Andrea J. van Doom, and Astrid M. L. Kappers
19
Figure 2.3: The picture on the top left led to the pictorial reliefs (depth increasing upwards) shown in the bottom row. The viewing modes were (from left to right) binocular, monocular, and synoptical. At the top right is a scatterplot of the monocular depth against the binocular (B) and synoptical (S) depth. The dashed line indicates identity (unit slope). for sure when stereopsis occurs because the whole scene changes on you, you are not looking at, but into the painting. The depth becomes real. This then is the experience that turned on many painters and led some scientists to write enthousiastically about "paradoxical stereopsis." The influence of the viewing mode is simple and fully corroborates the accounts given by artists such as Leonardo. Here are the facts: when you look at a picture frontally, with a single eye, you experience a certain pictorial relief. If you switch from monocular to binocular viewing, the relief collapses by a factor that depends on your binocular stereo vision (see figure 2.3). For typical observers the depth range typically decreases by roughly a factor of two; for stereo blind observers there is little effect. If one uses a synopter, thus nulling the disparity field, the relief becomes much deeper than for monocular vision. (We are considering only typical observers here.) The difference between binocular and synoptical viewing can be as much as a factor of five (Koenderink, van Doom, and Kappers, 1994). If you look at the picture monocularly, but obliquely, you lose pictorial relief gradually as the degree of obliqueness increases (van Doom and Koenderink, 1996). This is one reason2 why pictorial depth is very good when you use a Verant. When you change the viewing mode the pictorial cues evidently remain the same. This is no doubt the reason why one finds only Hildebrand-type relief transformations in these cases. We almost always find simple, linear scalings of the relief (Koenderink, van Doom, and Kappers, 1994; Todd, Koenderink, van Doom, and Kappers, 1996). 2 Additional factors, such as the elimination of accommodation and monocular parallax cues, increase the gain even more.
20
Pictorial Relief
Figure 2.4: The bottom row shows pictorials reliefs for a single observer for the (geometrically identical!) pictures shown in the top row. Experiments were done in the sequence of increasing cues. Different observers vary greatly on the silhouette, but yield essentially identical responses on the shaded picture.
2.4.3
Influence of Pictorial Cues
Even if the geometrical structure of a picture remains the same, a change of the pictorial cues may well turn out to change the structure of pictorial relief. This happens not only for dramatic changes (Koenderink, van Doom, Christou, and Lappin, 1996b; Koenderink and van Doom, 2003) (figure 2.4) but also for more subtle, parametric variations. One obvious example concerns shading. One can photograph a single scene under different illuminations and thus produce numerous pictures that are geometrically identical (be sure to mount the camera solidly such that it doesn't move between exposures) yet qualitatively different. Such pictures depict on cursory examination "the same scene." In one experiment we systematically varied the position of a (single) light source (Koenderink, van Doom, Christou, and Lappin, 1996a; Koenderink, 1998; Koenderink and van Doom, 2003). We fould that the pictorial relief was systematically dependent upon the light source position. At a first shot the reliefs are rather similar. This might be said to confirm the "shape constancy" hypothesis to some degree. However, the
Jan J. Koenderink, Andrea J. van Doom, and Astrid M. L. Kappers
21
residuals are significant and clearly correlate with the position of the source. At a second shot one might say that convexities in the pictorial object tend to bulge out into the direction of the source. Thus a sphere looks like an egg with the pointed side towards the light source (Koenderink, van Doom, and Kappers, 1995; van Doom, 2000). We found this effect not only in pictorial space, but also in real scenes. In fact the effect becomes very marked if you move a lamp around in the studio: All illuminated forms seem to deform and follow the source. Portrait photographers use this effect to good advantage, e.g., to put a crooked nose straight (Nurnberg, 1948).
2.4.4
Global versus Local Representation
Do observers maintain a "global representation" of pictorial space? We have reasons to doubt it. In gauge figure experiments with the oval we clearly sample local surface attitude (best represented by the local depth gradient, a vector in the picture plane) at a finite number (hundreds) of points in the picture plane (and thus on the pictorial relief). Whether these samples can be "explained" through a global, smooth surface (the "pictorial relief) is something that can be tested. The gradient field should be integrable. Technically a vector field is integrable if its curl vanishes identically, something that is amenable to statistical test (Koenderink, van Doom, and Kappers, 1992; Koenderink and van Doom, 2003). So far we have found no instance where the sampled gradient field failed to be integrable. This is an important fact in its own right. Apparently the observers sample from some smooth pictorial relief. We can perform the integration on the sampled data, and thus produce nice computer graphical renderings of the "pictorial relief." Such renderings are useful in that they summarize the data in a particularly intuitive form. Such pictures should not lead one to assume that similar pictures somehow wander around in the observers' heads though. Of course, we need not think of homunculi merrily watching internal screens. Such surfaces might be represented in the form of data structures (say, triangulations) that might be addressed in various ways to yield data such as local depth. Whether this is indeed the case is something for empirical verification. We have found that we can predict the answer to the question "which of two points is nearer?" from the integral surface with higher precision than the observers can answer the question themselves (Koenderink and van Doom, 1995). Because the integral surfaces were obtained from the observers' earlier local surface attitude judgments, we have to conclude that the observers cannot address a data structure that represents the integral surface. Apparently their representations (in the sense of abstract data structures) are more fragmentary than that. It turns out that observers are about equally good as the prediction if the two points happen to lie on a single slope of the pictorial surface, but that they are bad if the points are separated by a ridge (Koenderink and van Doom, 1993, 1994,1998) or rut in the relief (Koenderink and van Doom, 1995).
2.4.5
Influence of Method
Typically, we find good agreement when we compare results obtained via different methods. It is mainly the scatter in the data that varies, the shape of the pictorial relief remains unaffected (Koenderink, van Doom, and Kappers, 1996). We find that the
22
Pictorial Relief
Figure 2.5: For the picture at top right an observer yielded the pictorial reliefs A and B (top row). Because the method was slightly different, different results were found. A scatter plot (I) of the depths reveals very low correlation. When relief A is transformed according to z'(x, y) = ax + by + cz(x, y) + d with optimal parameter choices we obtain relief A* and the scatterplot (II) shows an excellent correlation.
scatter depends primarily on the differential order that is being sampled (zeroth order =>• depth, first order =*> surface attitude, second order =>• curvature, etc.) and the degree of locality of the method (measurement at a point, comparison at points at different location, etc.). The more local, the less the scatter. We find that observers are hard put to use the zeroth order at all, the first order being much easier and far more precise. Possibly, observers are even better at second order tasks, but we haven't tried yet. Spectacularly bad agreement can be found when tasks are being used that involve (mental) changes of orientation in the picture plane (Cornelis, van Doom, and de Ridder, 2003). Apparently observers have great difficulties in performing mental rotations in the picture plane. This may also affect the results obtained with apparently very similar interfaces rather dramatically. We have found that the bad agreement of pictorial reliefs in such cases can typically be greatly improved through very simple transformations of the data; thus there is a systematic order in the differences. An obvious way to compare pictorial reliefs is to make a scatter plot of depths at corresponding locations. For a mere change of viewing mode we find R 2 values in the 0.9-0.99 range. For particularly bad agreement the R2 value may not be significantly different from zero. If one does a multiple regression, including the Cartesian picture plane coordinates, such very low correlations often spectacularly improve and even get into the 0.9-0.99 range again (see figure 2.5)! Thus the aforementioned transformations of the type z'(x, y) = ax + by + cz(x, y) + d are indeed very special. We will refer to them as mental movements in pictorial space.
Jan J. Koenderink, Andrea J. van Doom, and Astrid M. L. Kappers
23
Figure 2.6: The figure shows physical and pictorial space as separated by the picture surface, but remember that pictorial space is merely virtual (a figment of the mind). A "pixel" could be at any depth and is represented as a taut wire of indefinite length in pictorial space. The mind assigns a depth value by "sliding a bead" along this wire.
2.5
Geometry of Pictorial Space
It is a priori clear that pictorial space has to be non-Euclidian for consider its group of congruences and similarities. For ease of reference we will distinguish the "spatial" dimensions (i.e., the extension of the picture plane or the visual field) and the "depth" dimension. The depth dimension has no immediate existence in the physical world and is thus a virtual (mental, not physical) dimension. The group actions that don't affect the depth dimension are simply the Euclidian movements and similarities of the picture plane or visual field. The transformations envisaged by Hildebrand affect only the depth dimension. Neither type poses any particular problem. The "mixed" transformations are the ones that yield the problems for a Euclidian interpretation of pictorial space. For Euclidian motions would include rotations about axes that are parallel to the picture plane. Such movements would allow you to see the back of the head in a frontal photograph of a face, clearly preposterous! Such motions should be forbidden by the geometry of pictorial space. With some geometrical insight it is easy enough to guess at the correct structure. Think of pictorial space as the picture plane with an infinitely long, taut thread attached to every pixel. All these threads are elongated in a single direction, of which we think as the depth dimension. On each thread we put a bead, thus obtaining an infinite Glosperlenspiel (see figure 2.6). A "pictorial relief" is formed by a swarm of such beads in the form of a curved surface. The position of any bead is controlled by the mind, for the pictorial relief is a mental thing. The mind can't move the threads (these are determined by the picture), but may shift the beads as it pleases. We may think the movement of the beads to be controlled by the mind's interpretation of the "pictorial cues." Any movement has to respect the threads; thus movements conserve a specific family of parallel lines. Now assume pictorial space to be homogeneous, that is to
24
Pictorial Relief
say, the same as seen from any of its points. Then you're done: there exists only one homogeneous space (Coxeter, 1989) that conserves a family of parallel lines. It is a certain Cayley-Klein space (Clifford, 1873; Klein, 1871) of zero curvature and a single isotropic direction.3 In such a space rotations about axes parallel to the picture plane are not periodic. Thus you can't "turn a pictorial object over" in order to see its backside! In retrospect the structure discussed above makes very good sense in a number of different contexts. Consider the problem of "shape from X." For many X's (e.g., shape from shading, shape from texture, etc.) this problem has been formally analyzed (albeit under typically very restrictive assumptions) by the computer vision community (Forsyth and Ponce, 2002). In all cases "solutions" fail to be unique but are specified up to certain groups of ambiguity transformations (Koenderink and van Doom, 1997). In the cases that we understand formally, these ambiguity groups coincide with the rigid motions of pictorial space (Belhumeur and Kriegman, 1998; Koenderink and van Doom, 1991). The general argument that almost all pictorial cues let one detect deviations from planarity suggests that the "full ambiguity group" (for the bouquet of all pictorial cues) must be of this type. A bold step! If true, this means that the motions of pictorial space cannot be detected through analysis of the pictorial cues. The observer has total freedom to perform such motions without ever coming into conflict with the pictorial cues (the structure of the picture). As we have found empirically, human observers indeed perform such "mental movements;" it is exactly what Gombrich (2000) aptly called "the observer's share," that is, the idiosyncratic (not picture related) part of the pictorial scene.
2.5.1
Simple Introduction to the Geometry: The 2D Case
A very simple case is that of a plane in pictorial space corresponding to a line in the picture, thus a plane extending in depth with a single spatial dimension. In this case the general similarity can be simply expressed as x' = kix + a, z' = bx + k2z + c, where x, x' denote the spatial coordinates before and after, z, z' denote depths before and after, and a, b, c, k\^, are constants. For k\^. — 1 one has "motions" (congruences); for kit2 7^ 1, similarities. Let us consider the motions first. Consider two points {x, z} and {u, w}. We see that x' — u' — x — u\ thus x — u is invariant against arbitrary motions. This makes x - u the perfect candidate for the distance function (Strubecker, 1962). This distance is simply the distance along the picture plane. Now consider the case x = u. Then the distance is zero, yet the points need not be equal because in general z will not equal w (two beads on a single string!). Such points are called "parallel." It is easy to check that for parallel points (and only for parallel points!) z' — w' = z — w. Thus z — w is a good distance measure for parallel points, called the "special distance." In general we define the distance as either the generic or (for the parallel case) the special distance (Strubecker, 1962). Clearly x' — x+a, z' = z describes a mere translation in the image plane, whereas x' — x, z' = z + c describes a depth shift. Such motions (also combinations) are fairly trivial. More interesting is the case x' = x, z' = ax + z; this is a "rotation" about the 3
Here "isotropic" means that stretches extending along an isotropic direction have zero length.
Jan J. Koenderink, Andrea J. van Doom, and Astrid M. L. Kappers
25
Figure 2.7: The Hermann von Helmholtz stamp (left) with two rotated copies: at the center the result of a Euclidian rotation; at the right a rotation in pictorial space, both rotations over one radian. Of course this illustration has to be consumed cum grano sails, for we took the vertical dimension of the stamp to represent depth: don't get confused! "angle" a. Since the parameter a can take values between ±00, we see that rotations and angles are not periodic in this plane; thus the plane is definitely non-Euclidian (see figure 2.7). The rotation only changes the depth; thus all "beads" move along their "threads" (as they should). The frontoparallel line z(x) = ZQ is no longer frontoparallel (namely, z(x) = z0 + ax) after a rotation, but has slope a. Clearly the slope is the tangent of the Euclidian angle, or, equivalently, the depth gradient (dz/dx = a). This yields a simple interpretation of the non-Euclidian angle. Notice that rotations don't affect distances between points. Because the slope of any line is changed by the same amount, the angles subtended by two lines (difference of their slopes) is also not affected by rotations. This is simply what one expects from congruences, of course (Jaglom, 1979). Lines extending purely into depth (the "threads" of the beads model) have infinite slope. They subtend infinite angle with any generic line. One says that they are normal to any line. The concept of "normal" is not very useful in this geometry, since all normals are parallel! Instead of normals one uses slopes. Next consider pure similarities, i.e, x' — k\x, z1 — kiz. We differentiate between similarities "of the first kind" with k% = 1 and similarities "of the second kind" with ki — 1. The similarities of the first kind merely scale the spatial coordinate (for x' — u' = ki (x — u)) whereas those of the second kind are seen to scale the angles (for dz'/dx' = fo dz/dx). A general similarity has two distinct magnification factors, one for the distances and one for the angles, quite unlike the Euclidian plane. In the Euclidian plane angles can't be scaled because they are periodic. Consider the "unit circle" x2 = 1, that is, the locus of all points at unit distance
26
Pictorial Relief
from the origin. It consists of the normal lines x — ±1. It can be rotated in itself and is convenient as a protractor to measure (non-Euclidian) angles. Such a circle is referred to as a circle "of the first kind," because there are other, very useful, ways to define circles. A unit circle "of the second kind" is z(x) = x2/2. It can also be moved along itself (by a rotation combined with a shift). Both types of circle satisfy many of the properties of the familiar circle in the Euclidian plane (Sachs, 1987). Notice that you have (for a circle of the second kind) dz/dx = x; thus the slope equals the arc length. Consequently, the non-Euclidian angles are simply (non-Euclidian) "radians." The rate of change of slope is d(dz/dx)/dx — d2z/dx2 = 1, which is the curvature. Likewise, a circle z(x) = x2/2R is seen to have curvature l/R, thus radius R. It is possible to turn the Euclidian plane into the non-Euclidian plane by a very simple trick (Jaglom, 1979). Let {x, y} be Cartesian coordinates of the Euclidian plane, the metric being given as ds2 = dx2+dy2. Now we dilate the y-axis by some large factor F (say). The ^-coordinate is not affected, but the y-coordinates are scaled by 1/F. Thus the metric becomes ds2 — dx2 + (dy/F)2, which tends to ds2 = dx2, the metric of the non-Euclidian plane, as we increase F beyond bounds. Thus the non-Euclidian plane is simply the "infinitesimal" neighborhood of the x-axis of the Euclidian plane. Indeed, all geometrical constructions discussed above are intuitively obvious if you regard them as limiting cases of their Euclidian equivalents. This nicely illustrates the role of the depth dimension as a "virtual" (vanishing spatial extent) dimension.
2.5.2
The 3D Case
The 3D case is very similar in spirit to the simple 2D case, but the group of proper motions and similarities is much richer (Strubecker, 1941; Sachs, 1990). Notice that the similarities that leave the pixels in place (x' = x, y' = y) transform the depth ("shift the beads") according to z' = ax + by + cz + d, i.e., exactly the "observer's share" as we have found empirically in many experiments (Cornells, van Doom, and de Ridder, 2003; van Doom, Koenderink, and de Ridder, 2001; Koenderink, van Doom and Kappers, 2000; Koenderink, van Doom, Kappers, and Todd, 2000, 2001; Koenderink and van Doom, 2003). Here the parameters (a, b) denote a (nonEuclidian) rotation that allows mental movements to turn any generic plane into a frontoparallel plane! The parameter c describes a similarity of the second kind, that is a Hildebrand-style relief transformation. The parameter d, finally, denotes a depth shift. In our experiments we cannot measure depth shifts since we tend to measure surface attitudes or curvatures. In practice observers are hard put to assign absolute distances to pictorial objects; thus the parameter d might as well be ignored altogether. It is possible to work out the complete differential geometry for this space (Sachs, 1990). This is of much interest, since it leads to definitions of "pictorial shape" as the invariants under mental movements. Notice that pictorial shape is different from Euclidian shape because "shapes" are (by definition) invariants under arbitrary displacements whereas the groups of displacements (congruences) are quite different in the two geometries.
Jan J. Koenderink, Andrea J. van Doom, and Astrid M. L. Kappers
2.5.3
27
The Panoramic Visual World
In some cases the geometry discussed above seems misplaced, namely, whenever the observer is confronted with a panoramic field of view. The situation is a complicated one though, since—at least in the case of photographs—one has to reckon with two distinct fields of view, namely that of the camera and that under which the observer views the picture. In many instances of daily life these two are very different. A "normal" sized picture on a page (say) might be 5 x 7 "; seen at normal reading distance of 10", this means a viewing angle of 28 x 38.6°. 'This conforms closely to the field of view of a miniature camera (24 x 36mm frame) fitted with a 50 mm "normal" lens (field of view 27 x 39.6°). Of course this is no accident: such pictures appear "normal" enough. Now suppose I use a very long telephoto lens, ca. 40cm focal length. Now the field of view of the camera is 3.4 x 5.2° whereas the field of view of the viewer is still 28 x 38.6°: i.e., much larger. Most people consider such images "unnatural" because the pictorial space looks extremely flattened. Next fit the camera with an extremly wide angle lens, say 15mm focal length. Now the field of view of the camera is 77.3 x 100.4° whereas the field of view of the viewer is still 28 x 38.6°, i.e., much smaller. Again, most people consider such images "unnatural" because pictorial objects look extremely deformed (Pirenne, 1970). Typically people blame the lenses for this. Yet both telephoto and wide-angle lenses deliver perfect perspective images; they are not to blame. It is simply that the "correct" viewing distances would be (roughly) 3 " for the wide angle and 80" for the telephoto lens, and nobody is willing (or even able) actually to view the pictures from these "correct" distances. A different issue is whether observers "correct" for the divergence of their visual rays. Perhaps surprisingly, we have reason to believe that they don't. It is easy enough to convince oneself of this fact: build a children's peepshow from a large cardboard box and prepare it for a visual field of about 120°. Put a picture (a long strip is best) bent along a cylinder with axis at the peephole in the box and illuminate it through a source located near the peephole (the idea is to prevent shading). Use a uniform texture (e.g., equal sized polka dots) for the picture. Now take a peep and try to judge the pictorial relief. You will need eye movements to view most of the picture because the field of view is larger. Try to avoid motion parallax cues. What we see is something most akin to %.frontoparallelplane, and not a circular cylinder about the eye! Judge for yourself. As one might have expected, the "shape from texture" is apparently reckoned with respect to the local visual directions and the mind doesn't take the divergence of visual angles into account at all. In this respect vision seems very similar to active touch, where local rod orientation seems to be referred to the hand frame instead of the body frame, leading blindfolded observers to commit judgment errors of up to 90° (Kappers and Koenderink, 1999). This is something one should experience: most people don't believe they are capable of errors like that until one demonstrates this to their satisfaction (or horror)! A geometrical model of what is happening in panoramic pictorial space is the following (Koenderink, 2003) (see figure 2.8). Describe the spatial positions 4 of points with respect to the observer in terms of radial distance from the observer g, and azimuthal angle (p. The radius runs from 0 (the observer) to oo, and the azimuth runs 4
To keep the discussion simple we only consider the horizontal plane here.
28
Pictorial Relief
Figure 2.8: Two figures of a planar, panoramic visual world before (left) and after (right) the application of a rotation in pictorial space. Here the visual field is very large, apart from a small gap about the full horizon! The "straight ahea^d" direction is towards the right. Notice that the rotation does not affect the radii (pixels!) and that the equidistance circles shear to become equiangular (constant slant) spirals. from —7T/2 (leftwards) through 0 (straight ahead) to +7T/2 (rightwards). In practice the visual field may be more limited than that, of course. Now we map points {g, (f>} on points {u, v} in a "mental space" according to the transformation u =
Jan J. Koenderink, Andrea J. van Doom, and Astrid M. L. Kappers
29
—oo; i.e., it is outside the space) as is indeed intuitively evident (Wittgenstein, 1922). Its position is not at all indicated by the convergence of visual rays towards the eye, because the visual rays fail to converge in the mental space.
2.6 What Next? Since we started out on our quest to make pictorial space amenable to objective, quantitative empirical and theoretical study, we evidently made some progress (Koenderink, van Doom, and Kappers, 2000). Yet the work reported so far has perhaps resulted in more questions than answers. Now we have gained a position from which we start to gain an initial overview of the problems; it appears to us that the study of pictorial space is still pretty much at its infancy. So what's next? On the conceptual side one sorely misses a solid working knowledge of the "pictorial cues." The term "pictorial cue" itself is perhaps a misnomer, since it suggests a preordained, (very) finite set of discrete entities called cues, each of which would by itself be sufficient to add an independent bit of geometrical structure to the pictorial configuration evoked by the picture. This is clearly nonsense. Whereas it is clearly the case that the image structure is the cause of the pictorial configuration, it is also clear that this requires not only the picture, but also the observer. No doubt a newborn would come up with a different pictorial configuration, if any. This opens the possibility that even the mature observer might not fully exploit the available structure. What is available anyway? That clearly depends upon the type of image and the expertise of the observer. An analysis of the "cues" involves an analysis of the ecological optics (Gibson, 1950) in the setting of the observer's biotope (or Umwelt in von UexkiiU's sense (von Uexkull, 1909)). This again involves two types of expertise, namely generic understanding of the causal nexus ("Laws of Optics") and specific knowledge of the generic structure of the biotope (in a statistical sense). Such a program has never been carried out for typical human biotopes (natural scenes, city environment, office environments, etc.). One needs to know what can possibly be estimated, on the ground of what type of image structures, with what kind of probability measure, and what the structure of the group of remaining ambiguities is like. What is available today is preciously little. Psychology, biology, and computer vision have merely scratched the surface. One hopes that much can be done in a fairly general way, and then progress might be booked without having to accumulate too much encyclopedic knowledge. Our construction of the geometry of pictorial space on the basis of a few general arguments is an example of such a procedure. Eventually the many lacunae have to be filled up, though. On the empirical side one needs to forge more and especially more powerful tools to probe pictorial space. It will be necessary to sample many more properties than we have been able to do so far. The reason why this is necessary is that "pictorial space" is a catchy term to denote something that is unlikely to be a "space" in the usual geometrical sense at all. It is hard to say what a useful formal description might eventually look like. We would guess something like a number of formal spaces with various degrees of inner coherency and only weak mutual interactions. There are numerous reasons to believe that our current terminology leads to many unfortunate inconsistencies. It
30
Pictorial Relief
is a bit like the situation in astronomy where the distances to remote objects have to be labeled by the method of estimation, each method yielding a different estimate. In the case of astronomy one has the conviction that all such estimates are estimates of a single true distance, though. Such a conviction is likely to be misplaced in the case of pictorial space. A distance, surface attitude or curvature is not something for which a "true" value exists, even though one doesn't know it. It is entirely possible that "the same" curvature turns out to be different if one changes the way to sample it. It is also not necessary that curvature implies a certain trend of slope, and so forth. This is the reason why one should be wary of comparisons of entities that we assign the same name to (e.g., "curvature") but are actually to be distinguished. The literature is rife with most unfortunate examples. Is there a bottom line? If any, then probably the cheerful view that the study of pictorial space is certain to be a very rewarding one that might easily consume one's professional life without so much as a chance of being brought to a conclusion.
Acknowledgment We dedicate this chapter to David (or Martin?) Regan, whom we have always respected as a serious player in our chosen field which for all of us is quite a different game from the physics we were raised in.
References Balzer, R. (1998). Peepshow. A Visual History. Harry N. Abrams: New York. Bammes, G. (1990). Studien zur Gestaltdes Menschen. RavensburgerBuchverlag Otto Maier GmbH: Ravensburg. Belhumeur, P. M. and Kriegman, D. (1998). What is the set of images of an object under all possible illumination conditions? Int. J. Comp. Vis., 28: 245-260. Brentano, F. (1874). Psychologic vom empirischen Standpunkte. Duncker and Humblot: Leipzig. Claparede, E. (1904). Stereoscopic monoculaire paradoxale. Annales d'Ociilistique, 132: 465-466. Clifford, W. K. (1873). Preliminary sketch of biquaternions. Proc. Land. Math. Soc., 4: 381-395. Cornells, E. V. K., van Doom, A. J. and de Ridder, H. (2003). Mirror reflecting a picture of an object: What happens to the shape percept? Percept. Psychophys., 65: 1110-1125. Coxeter, H. S. M. (1989). Introduction to Geometry. Wiley: New York. Da Vinci, L. (1804). Trattato della pittura di Lionardo da Vinci. Societa Tipografica de'Classici Italian!: Milan.
Jan J. Koenderink, Andrea J. van Doom, and Astrid M. L. Kappers
31
Do Carmo, M. (1976). Differential Geometry of Curves and Surfaces. Prentice Hall: EngleWood Cliffs, NJ. Forsyth, D. and Ponce, J. (2002). Computer Vision—A Modern Approach. Prentice Hall: Upper Saddle River, NJ. Gibson, J. J. (1950). The Perception of the Visual World. Houghton Mifflin: Boston. Gibson, J. J. (1970). On the relation between hallucination and perception. Leonardo, 3: 425^27. Gombrich, E. H. (2000). Art and Illusion, A Study in the Psychology of Pictorial Representation. Princeton University Press: Princeton, NJ. Hatton, R. G. (1904). Figure Drawing. Chapman and Hall: London. Hildebrand, A. (1945). The Problem of Form in Painting and Sculpture. Translated by M. Meyer and R. M. Ogden. Das Problem der Form, first German edition, 1893. Stechert: New York. Jacobs, T. S. (1986a). Light for the Artist. Watson-Guptill Publications: New York. Jacobs, T. S. (1986b). Drawing with an Open Mind: Reflections from a Drawing Teacher. Watson-Guptill Publications: New York. Jaglom, I. M. (1979). A Simple Non-Euclidian Geometry and its Physical Basis: An Elementary Account of Galilean Geometry and the Galilean Principle of Relativity. Transl. A. Shenitzer, Ed. B. Gordon. Springer: New York. Kappers, A. M. L. and Koenderink, J. J. (1999). Haptic perception of spatial relations. Percept., 28: 781-795. Klein, F. (1871). Uber die sogenannte nichteuclidische Geometric. Mathematische AnnalenBd.,6: 112-145. Koenderink, J. J. (1998). Pictorial relief. Phil. Trans. R. Soc. Lond. A, 56: 1071-1086. Koenderink, J. J. (1999). Virtual psychophysics. Percept., 28: 669-674. Koenderink, J. J. (2003), Monocentric Optical Space. In N. Petkov and M. A. Westenburg (Eds.), Computer Analysis of Images and Patterns, LNCS 2756: 689-696. 10th Int. Conf. CAIP 2003. Springer: Berlin. Koenderink, J. J. and van Doom, A. J. (1991). Affine structure from motion. J. Opt. Soc. Am. A, 8: 377-385. Koenderink, J. J. and van Doom, A. J. (1993). Local features of smooth shapes: ridges and courses. In B. C. Vemuri (Ed.), Geometric Methods in Computer Vision II. Proc. SPIE2031:2-13. Koenderink, J. J. and van Doom, A. J. (1994). Two-plus-one-dimensional differential geometry. Pattern Rec. Let., 15:439^43. Koenderink, J. J. and van Doom, A. J. (1995). Relief: pictorial and otherwise. Image and Vision Computing, 13: 321-334. Koenderink, J. J. and van Doom, A. J. (1997). The generic bilinear calibration-estimation problem. Int. J. Comp. Vis., 23: 217-234.
32
Pictorial Relief
Koenderink, J. J. and van Doom, A. J. (1998). The structure of relief. In P. W. Hawkes (Ed.) Advances in Imaging and Electron Physics, 103: 65-150. Koenderink, J. J. and van Doom, A. J. (2003). Pictorial space. In H. Hecht, R. Schwartz and M. Atherton (Eds.) Looking into Pictures: An Interdisciplinary Approach to Pictorial Space, MIT Press: Cambridge, MA. Koenderink, J. J., van Doom, A. J., Arend, L. and Hecht, H. (2002). Ecological optics and the creative eye. In D. Heyer and R. Mausfeld (Eds.) Perception and the Physical World, pp. 271-304. John Wiley and Sons: New York. Koenderink, J. J., van Doom, A. J., Christou, C. and Lappin, J. S. (1996a). Perturbation study of shading in pictures. Percept., 25: 1009-1026. Koenderink, J. J., van Doom, A. J., Christou, C. and Lappin, J. S. (1996b). Shape constancy in pictorial relief. Percept., 25: 155—164. Koenderink, J. J., van Doom, A. J. and Kappers, A. M. L. (1992). Surface perception in pictures. Percept. Psychophys., 52: 487-496. Koenderink, J. J., van Doom, A. J. and Kappers, A. M. L. (1994). On so-called paradoxical monocular stereoscopy. Percept., 23: 583—594. Koenderink, J. J., van Doom, A. J. and Kappers, A. M. L. (1995). Depth relief. Percept.,24: 115-126. Koenderink, J. J., van Doom, A. J. and Kappers, A. M. L. (1996). Pictorial surface attitude and local depth comparisons. Percept. Psychophys., 58: 163-173. Koenderink, J. J., van Doom, A. J. and Kappers, A. M. L. (2000). Surfaces in the mind's eye. In R. Cipolla and R. Martin (Eds.), The Mathematics of Surfaces IX, pp. 180-193. Springer: London. Koenderink, J. J., van Doom, A. J., Kappers, A. M. L. and Todd, J. T. (2000). Directing the mental eye in pictorial perception. In B. E. Rogowitz and T. N. Pappas (Eds.), Human Vision and Electronic Imaging, V: 2-13. SPIE-The International Society for Optical Engineering: Bellingham, USA. Koenderink, J. J., van Doom, A. J., Kappers, A. M. L. and Todd, J. T. (2001). Ambiguity and the 'mental eye' in pictorial relief. Percept., 30: 431-448. Koenderink, J. J., Kappers, A. M. L., Pollick, F. E. and Kawato, M. (1997). Correspondence in pictorial space. Percept. Psychophys., 59: 813-827. Mingolla, E. and Todd, J. T. (1986). Perception of solid shape from shading. Biol. Cybern.,53: 137-151. Needham, T. (1997). Visual Complex Analysis. Clarendon Press: Oxford. Nurnberg,W. (1948). Lighting for Portraiture. The Focal Press: London. Pirenne, M. H. (1970). Optics, Painting and Photography. Cambridge University Press: Cambridge. Poston, T. and Stewart, I. (1996). Catastrophe Theory and its Applications. Dover Publications, Mincola, NY. Rawson, P. (1969). Drawing. Oxford University Press: London.
Jan J. Koenderink, Andrea J. van Doom, and Astrid M. L. Kappers
33
Rorschach, H. (1921). Psychodiagnostik. Bircher: Bern. Sachs, H. (1987). Ebene Isotrope Geometric. Vieweg: Braunscheig/Wiesbaden. Sachs, H. (1990). Isotrope Geometric desRaumes. Vieweg: Braunscheig/Wiesbaden. Strubecker, K. (1941). Differentialgeometrie des isotropen Raumes I. Sitzungsberichte der Akademie der Wissenschaften Wien, 150: 1—43. Strubecker, K. (1962). Geometric in einer isotropen Ebene. Der Mathematische und Naturwissenschaftliche Unterricht, 15: 297-306 and 343-394. F. Diimmlers Verlag: Frankfurt. Todd, J. T., Koenderink, J. J., van Doom, A. J. and Kappers, A. M. L. (1996). Effects of changing viewing conditions on the perceived structure of smoothly curved surfaces. J. Exp. Psych.: Hum. Percept. Per/., 22: 695-706. van Doom, A. J. (2000). Shape perception in different settings. In B. E. Rogowitz and T. N. Pappas (Eds.), Human Vision and Electronic Imaging, V: 697-708. SPIE-The International Society for Optical Engineering: Bellingham, USA. van Doom, A. J. and Koenderink, J. J. (1996). How to probe different aspect of surface relief. In A. M. L. Kappers, C. J. Overbeeke, G. J. F. Smets and P. J. Stappers (Eds.), Studies in Ecological Psychology, pp. 115-130, Delft University Press, The Netherlandsvan Doom, A. J., Koenderink, J. J. and de Ridder, H. (2001). Pictorial space correspondence in photographs of an object in different poses. In B. E. Rogowitz and T. N. Pappas (Eds.), Human Vision and Electronic Imaging, VI: 321-329. SPIE-The International Society for Optical Engineering: Bellingham, USA. von Rohr, M. (1904). Linsensystem zum einaugigen Betrachten einer in der Brennebene befindlichen Photographic. Kaiserliches Patentamt Patentschrift Nr. 151312 Klasse 42h. von Rohr, M. (1920). Die Binokularen Instrumente. Springer: Berlin, von Uexkull, J. (1909). Umwelt und Innenwelt der Tiere. Springer: Berlin. Wittgenstein, L. (1922). Tractatus Logico-Philosophicus. German text with an English translation en regard by C. K. Ogden; with an introduction by B. Russell. Routledge and Kegan Paul: London. Zeiss, C. (1907). Instrument zum beidaugigen Betrachten von Gemalden u.ggl. Kaiserliches Patentamt Patentschrift Nr. 194480 Klasse 42h Gruppe 34.
This page intentionally left blank
3. Geometry and Spatial Vision Gerald Westheimer When Martin Regan (2000) entitled his book Human Perception of Objects he also gave it the subtitle Early Visual Processing of Form; implying that it dealt not with intensity or chromaticity but with what may be called the extensive attribute in a visual percept: where an object appears to be and, within an object, what relative locations are occupied by its components. In other words, spatial vision. This essay examines some of the implications of using the concept "space" in connection with vision and what, if anything, can be learned from the role it plays in other areas of scholarly activity. The study of perception as a scientific discipline can of course be conducted entirely within its own grounds. Nevertheless, there is a tradition in vision research to seek connection with other sciences. Support - underpinning is too strong a word - from anatomy can help in such areas as acuity or vision in the retinal periphery. Investigation of the light sense was informed for the last 150 years by the physiology, biochemistry, and now molecular biology of visual pigments. Statistics of photon capture is a prime example of a link to physics, as is the whole area of visual optics and specification of the light stimulus. For spatial vision, at first glance it might appear that a study and then application of geometry could play a similar role; after all, isn't the German word for geometry "Raumlehre?" Geometry, though a branch of mathematics, features large in physics and is of interest even in philosophical circles. The link between geometry and spatial vision surely deserved as thorough an investigation as reaching into quantum mechanics in the pursuit of the roots of the eye's light sense, or applying diffraction theory to explore optical imaging in the eye. The single word that establishes the common ground between all the enterprises just mentioned - vision, physics, mathematics, and philosophy - is space. Its refers to a manifold of several dimensions and is given more specific connotations as it enters the discourse in the different disciplines. The one in mathematics is particularly simple: a manifold in which the elements obey relationships with the sole limitation that they not be self-contradictory. The relevant concept here is visual space. When we open our eyes and become aware, we visually encounter elements in different locations. They have attributes or qualities of intensity, such as brightness and color, and of temporal succession, but for the moment these are abstracted. What remains is the elements' extensive, side-by35
36
Geometry and Spatial Vision
side quality - they appear in different locations - the aggregate of which constitutes visual space, an immediate perceptual experience that may be amenable to the logicodeductive process but is not dependent on it. The subjective nature of this visual space, its personal character, private to the beholder and accessible to others only through some process of mediation, needs to be emphasized. On the other side of the divide, mankind has collectively created, through individual experience and common agreement, a very robust object space framing the external world in which we live. It has been explored and described with wonderful precision by physicists over the centuries, using inventions such as meter sticks, theodolites, and light beams. This physical space of objects is a metrical space; that is, distances and angles and other properties of configurations can be expressed, ordered, and compared numerically. Although physical space has always been easy to intuit, demands for its precise characterization has given rise to some formalisms which allow its structure to be dealt with explicitly. When they codified procedures for planning agricultural plots, buildings and fortifications, people in Euclid's time probably didn't give much thought to the ultimate legitimacy of working with points, lines, angles, and parallelism. But when professional mathematicians became involved two or three centuries ago, this changed. They began to examine the concepts used by Descartes and Newton, with the result that some apparent contradictions emerged. That these contradictions were not trivial nor could be safely ignored was demonstrated when non-Euclidean geometry was discovered, derived in the first instance by examining how robust Euclid's axiom of parallelism was. The nineteenth century was a particularly active period in subjecting the simple, intuitive physical space to thorough scrutiny and conceptually deep underpinning. When, at the beginning of the twentieth century, physics needed to break out of the rigid shell of Euclidean space, most of the groundwork had been laid. In contrast to the pragmatists who plied their trade in mathematical physics (Einstein (1921/1972), in an influential essay, remarked that insofar as the laws of mathematics apply to reality they are not certain and insofar as they are a certain they don't apply to reality), schools of mathematical fundamentalists agonized in detail over the meaning and validity of every word in their theorems. In particular, they wondered about the minimum number of self-evident - neither proved nor provable - statements needed in erecting their edifices. Being mathematicians they could not take recourse to the constitution of the physical universe and, on the other hand, they had trouble facing the proposition, obvious to Kant, that they brought to their study properties inherent in the human mind, preceding any of their reasoning. The preoccupation with what is axiomatic was only occasionally connected with what is immanent in the human mind. And yet, when it comes to geometry, it was inescapable to consider what Kant called Anschauungsraum, the manifold of perception, or as we now call it, visual space. There is no definition of a straight line beyond the axiom that there is one and only one between two points. We come here to the crux of our consideration: can the propositions developed by means of the logico-deductive method (as distinct from instant intuitive insight), either from physical object space or de novo by mathematicians, help in understanding the processes of human space perception? Given the enormous body of geometrical scholarship, can any of it be taken off the shelf or adapted to the task of charting and
Gerald Westheimer
37
understanding and making rules about, to use phrases from the title of Martin Regan's book, "object perception" and "visual processing of form"? This question was a crucial motive in turning Helmholtz's attention, as soon as he finished the last volume of his Physiological Optics in 1866, to mathematical problems of space and number theory. In 1868 he gave a talk in Gottingen entitled "Concerning the Facts that Underlie Geometry" in which he explained that his studies of spatial perception in the visual field had prompted him to delve into the nature and origin of our understanding of space in general. In passing he mentioned not only the visual space of objects, but also that of colors which over the centuries has attracted considerable interest and no little mathematical analysis. I believe that Helmholtz, ever the consequent thinker, came to the end of his rope after putting space perception through the grinder of his methodical from-the-groundup analysis. He surely, in his heart, was dissatisfied with his "unconscious inference" proposition which, to the dyed-in-the-wool mathematical physicist that he was, he must have regarded as a cop-out. He was not a psychologist and is bound to have thought that Fechner's thesis of psychophysics as the "study of the functional relationship between the material and the mental" had to be tackled from the material end. Helmholtz's forays into the origins of the axioms of geometry, although taken seriously by mathematicians, had no fallout in vision research. Yet, overall, the ground for this kind of approach was fertile: Helmholtz's (and Maxwell's and Rayleigh's) example of thorough analysis of vision from a physicist's perspective reverberated strongly. Sparked by Gauss, by the pioneers of non-Euclidean geometry, and, in particular, by Riemann, an enormous body of geometrical scholarship had grown. Though its motivation was mathematics for its own sake, it was available when the need arose in physics. We are intrigued whether the same applies to perception. The discussion here centers on the spatial relationship between unambiguously demarcated elements in visual space. A physicalist armamentarium has been employed with immense 61an and success in the elucidation of the processes through which these markers arise - optics of the eye, retinal structure and function, neural processing all leading to a good working knowledge of what goes on in the eye and visual cortex when we are shown, say, a short line in the fovea. But in the inquiry of how to apply theorems of geometry to vision, the luminance and chromaticity attributes of visual stimuli may very well be abstracted at the beginning. This is not a trivial point. In the Fourier theory of vision, for example, the emphasis is largely on modulation sensitivity functions, where the endpoint of measurement is merely the loss of spatial homogeneity of a patch, and not a parameter pertaining to spatial extent. Because it is approached as a theory of spatial processing of luminance, the significance of phase in discerning spatial relationships has, as a consequence, been neglected too long. On the other hand, the spatial relationship between elements (does this line bisect the distance between the lines flanking it on either side?) or of identifiable components within an element (Is the line straight or curved? Is the triangle right angled?) properly belongs to the category of questions in which one might be justified to expect aid from geometry. Fortunately by the nature of the problem, consideration does not have to be confined to visual space taken entirely by itself; inquiry can take the more amenable form of the mapping of physical object space onto perceptual space. The elements intrinsic
38
Geometry and Spatial Vision
to visual space obviously have attributes such as contours with identifiable locations, whose relationships, such as distances, permit qualitative and even some sort of quantitative comparisons. Manipulations are possible in physical space, and an individual's visual space is accessible to the observer by introspection or to the external experimenter by various kinds of behavioral interrogation. Hence the stage is set for rigorous research, so long as questions are posed sharply and in terms that yield unambiguous, even if numerically somewhat blunt, answers. At bottom we are interested in whether there are fundamental laws of geometry that apply generally in all spaces, including visual space, and that therefore perforce delineate the rules according to which we see things in space and the linkage to the physical space in which we are embedded. First of all, it turns out that there is not just one "geometry" but many classes of geometry. The loose ones, such as topology and projective geometry, can be quickly dismissed as of no relevance here. Topology sees no difference between configurations, such as a doughnut and a teacup, that can be transformed into each other by a sequence of continuous distortions. Because, among other things, one of the objects of this inquiry is whether distortions such as geometric-optical illusions might be inevitable consequences of geometrical laws, we leave topology aside. Projective geometry is based on a set of principles which concentrate merely on the incidence of points, lines, and planes, not on the distances and angles between them, and lump together without distinctions all kinds of conic sections (circles, ellipses, parabolas, hyperbolas). Again, this would not prove to be an inadequate basis for research in visual space. The most promising kind of geometry for our purposes is metrical geometry, in which distances between elements, and angles between lines, have measurable values. For a long time it was thought that all the familiar Euclidean rules, particularly the Pythagorean theorem ds2 = dx2 + dy2 + dz2, were mandatory in metrical geometry, but with the advent of non-Euclidean geometry and especially with Riemann's generalized form (1854/1973), metrical geometry became a much richer and more variegated territory. This expansion made more tractable the growth of physics outside its Newtonian bounds which had demanded that its laws fit a rigid manifold with a strictly Euclidean metric. Once the Riemann's formulation was accepted that allows rigorous metrical qualities without these having to remain invariant throughout the whole manifold, a set of much more succinct and elegant laws of physics emerged. Physics and cosmology of the twentieth century are a testament to the successful practice of systematic and analytical geometry beyond the constraint of a Euclidean framework. It was not until 1947 that the application of this kind of thinking to vision was begun by Luneburg, who tried to tie a few oddities of spatial vision together by examining visual space from the point of view of metrical geometry (Luneburg, 1947, 1950). First of all, more sharply than almost anyone else, he made the categorical distinction between physical object space and the subjective manifold housing our visual percepts. He satisfied himself that the latter had the requisite properties to be regarded as Riemannian, and he wisely restricted discussion to some simple situations of binocular observation of a few isolated light sources in an otherwise empty environment. There is then a one-to-one correspondence between points presented in object space and their perceived counterparts in visual space. He selected experiments in which observers had been asked to set several object points so that the corresponding perceived
Gerald Westheimer
39
points in visual space had some defined relationships, for example, forming straight lines that are seen as frontoparallel or as parallel alleys. This enabled the construction of geodesies in visual space. When the position of the object-sided points is examined, it is found that they are laid out in curves instead of straight lines. It should be noted that such an approach, while observationally easier, is conceptually equivalent to the converse statement: that a three-dimensional Cartesian, Euclidean grid in object space is represented in visual space as a curvilinear grid. In addition, Luneburg brought in the well-known appearance of the sky as a dome, in other words, a sphere of finite radius in visual space. With this groundwork Riemannian geometry could be applied to visual space. When we open our eyes, a spatial manifold is before us. At the outset there is no reason not to conceive it to be Euclidean. The map within it of the Euclidean object space is, however, not Euclidean. Before proceeding, Luneburg noted another characteristic of visual space: objects can be freely moved without changing their shape. On the basis of just these few observations, a mathematician of Luneburg's talent could proceed to the postulate that visual space is a non-Euclidean space of constant negative curvature. The fact that measurability is obeyed, albeit within some constraints, confirms it to be a Riemannian space; the fact that objects can be freely moved without changing shape, makes it one of constant curvature; and the relationship between physical object space and visual space of the shape of geodesies and of the sky demands that the curvature be negative. Whether we conceive our visual space to be intrinsically Euclidean is not a relevant question; what matters is the empirical fact that the mapping of the objective physical world into it entails the distortion characteristics of a hyperbolic space of constant negative curvature. Luneburg's fascinating theory has been revisited only occasionally during the last 50 years by the relatively small band of experts who are at home equally in visual perception and in Riemannian geometry. The details of the theory need not detain us here, except to point out that its scope was always limited, as was indeed realized and explicitly stated by Luneburg in the first publication: the sky does not appear as a spherical dome but seems rather more ellipsoidal and, in any case, there is the moon illusion; major changes take place in perceived spatial relationships where stimuli more elaborate than light points in a dark field are involved: there are bound to be training effects and individual differences. Still, above and beyond this predictable fraying at the edges, what the theory tries to do and whether it should be taken as a pointer for the future study of space perception, these are essential questions that no generation of vision scientists can afford to neglect. When what we see is not veridical, or is intrinsically irregular, or fails to conform in an important respect with object space, is this a consequence of inevitable imperatives of the mathematical laws of geometry? If this is the case, then when researching spatial vision we had better take up geometry, especially since, as we have seen, there is a lot of it. At this juncture the revolution of physics beginning in the twentieth century contains its lesson: far from being driven by the imperatives of geometry, empirical findings in physics and cosmology shaped the geometrical framework that made for a clearer and more economical formulation of the laws. Riemann provided the tools for the formulation of a metrical geometry that could accommodate, with analytical finesse and sufficient rigor, the laws needed to organize the results of observations. Sophisti-
40
Geometry and Spatial Vision
cated geometrical structures were invented as needed to frame the laws of physics. In his original publication in 1854, Riemann used the word "manifold" throughout and only referred to "space" at the end in a speculative aside wondering whether, when it comes to physics, the nature of matter might not be what determines the metric at the microscopic scale. In his commentary, Hermann Weyl explained the importance of the difference. Manifold, an ensemble of items, is a nonspecific mathematical term and can be given arbitrary properties and dimensions. How many dimensions and the nature of the relationship between the members of the ensemble, these are empirical questions which have to be brought to the task of formulating the laws of nature. According to Weyl, Riemann reserved the use of the word "space" for this particular intellectual enterprise. The change in viewpoint in physics, advancing from Newton to Einstein, was to abandon the effort to shoehorn the laws of physics into, as Weyl put it, the preconstructed skeleton of an apartment building, and rather to allow physics to build its own geometry "like a snail builds its shell." This, then, is the answer to our questions about the primacy of geometrical theorems in research into spatial vision. Their role is categorically different from the underpinning provided by, say, the quantal nature of light in the investigation of the absolute visual threshold. When the horopter is found to be curved and the sky a dome, when parallel and equidistant alleys do not coincide, these are phenomena intrinsic to the workings of the visual system and not forced on it by constraints of immutable laws of mathematics. A mathematician of Luneburg's stature can weave them together in a set of geometrical postulates and give them mathematically beautiful expression. However, they do not yet encompass the wider reality of perception. Vision in a rich environment demands the addition of a variety of parameters to accommodate local inhomogeneities. There are no inevitable consequences from geometry for relationships within visual space; to the contrary, geometrical frameworks can be constructed for any naturally occurring relationships. Cosmology is now replete with local variation in the curvature of space. Surprisingly, it is the mathematical discipline of geometry that is flexible and capable of being tailored to the findings of visual psychophysics, not the other way around. It remains to be seen what influence this kind of thinking will have on the development of the subject and whether attempts at rigor by way of geometrical inventions bring about advances. Will formulations of local warping of the metric of visual space as a function of its content have a heuristic impact? Is there a deeper understanding in store once researchers have come to grips with compression of binocular distance measures in visual space as objects recede? If the answer to such questions is yes, then we owe a debt to Luneburg's introduction into vision science of the sophisticated scholarship of geometry.
References Einstein, A. (1921/1972). Geometric und Erfahrung. Reprinted in K. Strubecker (Ed.), Geometric. Wissenschaftliche Buchgesellschaft: Darmstadt. Helmholtz, H. (1868/1968). Uber die Thatsachen, die der Geometric zum Grande liegen. Reprinted in H. Helmholtz, Uber Geometric. Wissenschafliche Buchge-
Gerald Wesiheimer
41
sellschaft: Darmstadt. Luneburg, R. K. (1947). Mathematical Analysis of Binocular Vision. Princeton University Press: Princeton, NJ. Luneburg, R. K. (1950) The metric of binocular visual space. /. Opt. Soc. Am., 40: 627-642. Regan, D. M. (2000). Human Perception of Objects. Sinauer: Sunderland, MA. Riemann, B. (1854/1973). Uber die Hypothesen, welche der Geometric zu Grunde liegen. Reprinted with commentary by H. Weyl in Das Kontinuum. Chelsea Publishing, NY.
This page intentionally left blank
4. The Inputs to Global Form Detection David R. Badcock and Colin W. G. Clifford When one thinks of Martin Regan, systems analysis comes to mind, but a systems analysis grounded in practical problems. Those from another tradition might say that his work had ecological validity. Certainly he is motivated by major practical issues, such as how a batsman manages to play cricket (Regan, 1992), and more minor tasks such as controlling self motion through the environment (Regan, Beverley, and Cynader, 1979). The mark of his work is a rigorous application of the tools of systems analysis in order to understand the underlying processes, a different modus operand! from the ecological perception tradition. In this work he has provided not only the analysis of performance but also clear descriptions of the tools (Regan, 1991) and, more recently, a collation of thoughtful advice for the practitioners who use those tools (Regan, 2000); advice we regularly give to new graduate students to read. In what follows we provide a description of our current state of knowledge regarding how the visual system processes global form information. This is a process that ends in the detection, segmentation, and description of objects, and in the latter, substantial progress has recently been made in our understanding (Wilson and Wilkinson, 1998). The topic lends itself to a systems approach. The levels of the processing hierarchy that we discuss in this chapter may best be described as early and intermediate, with the final stages still awaiting further work.
4.1 Introduction Spatial form is processed hierarchically in the primate visual system, beginning with the extraction of local stimulus orientation. While orientation bias has been reported as early as the retina (Levick and Thibos, 1980), orientation-selectivity proper is first observed in primary visual cortex (Hubel and Wiesel, 1968). Sensitivity to the alignment of line segments is a property of cortical area VI (Field, Hayes, and Hess, 1993; 43
44
The Inputs to Global Form Detection
Figure 4.1: Examples of rotary (left) and^radial (right) Glass patterns. Dot pairs are positioned randomly within the pattern, but the orientation of the dot pairs is determined by their position relative to the center. For rotary patterns, the dot pairs are oriented perpendicular to the line from the center of the pattern. For radial patterns, the dots pairs are oriented along this line. Gilbert, 1995). Selectivity for more global patterns of form is relatively weak at this stage (Smith, Bair, and Movshon, 2002; Kourtzi et al., 2003) but is a common property of cells later in the form processing hierarchy that may in turn provide the input to object and face recognition mechanisms (Rentschler et al., 1994; Wilkinson et al., 2000).
4.2 Seeing Glass Patterns To study the mechanisms of global form perception, many studies have used Glass patterns (Glass, 1969; Glass and Perez, 1973). Each Glass pattern consists of a large number of pairs of dots. One dot in each pair is positioned randomly within the stimulus according to a probability distribution uniform over area. The second dot of each pair is then positioned at a fixed distance from its partner in a direction defined by the particular pattern being generated. For example, if the direction of displacement is 0 °, directly away from the center of the image, then a radial "sunburst" pattern is generated. If the displacement is 90°, perpendicular to the position vector relative to the center, then the pattern is concentric (see figure 4.1). The spatial structure in Glass patterns has been termed, "static flow", (Kovacs and Julesz, 1992) by analogy with optic flow, the pattern of retinal motion generated by self-motion (Lee, 1980). This seems an appropriate analogy because the static Glass pattern stimulus can be considered as the superposition of successive frames of an optic flow stimulus. In studies of optic flow perception, it is common to vary the coherence of stimuli in order to control their visibility (e.g., Edwards and Badcock, 1993; Morrone et al., 1995). The coherence of a stimulus is the percentage of elements in the stimulus
David R. Badcock and Colin W. G. Clifford
45
conforming to the global pattern. It is straightfoward to extend the idea of varying stimulus coherence to studies of the perception of static flow in Glass patterns ( ., Wilson et al., 1997; Maloney et al., 1987). It has been argued that the human visual system is particularly sensitive to complex patterns of form (Wilson and Wilkinson, 1998) such as concentric structure. Wilson and Wilkinson (1998) found that coherence thresholds for detection are lower for concentric than radial Glass patterns, while Seu and Ferrera (2001) found that spiral patterns had higher thresholds than either radial or concentric Glass patterns. Similar data collected independently in our laboratory, with the help of Dr. Sieu Khuu, confirm this pattern. A series of Glass patterns (figure 4.2A) was created ranging from radial (0 °) through spiral (45°) to concentric (or rotary, 90°) in 15° steps. Figure 4.2B depicts the coherence thresholds. Equal sensitivity to all Glass angles would produce data falling on a constant-radius arc in the polar plot. The vertical and horizontal lines present the predictions (ignoring probability summation across independent mechanisms) based on sensitivity being determined by cosine-tuned orthogonal mechanisms preferring either the radial or rotary component of the patterns. Neither of these predictions provides a good account of the data, suggesting that the bandwidth of the cardinal mechanisms is narrower than cosine tuning and raising the possibility that the visual system contains spiral detectors less sensitive than the cardinal detectors (Seu and Ferrera, 2001). The motion system may be the same. A cardinal detector explanation is a viable model for motion when using a masking paradigm (Burr, Badcock, and Ross, 2001) but yields similar results to the Glass data when measuring detection thresholds (Morrone, Burr, DiPietro, and Stefanelli, 1999; Snowden, 1994). Further work is needed to determine whether lower-sensitivity spiral detectors are required. The squares in figure 4.2B are the corresponding data from Seu and Ferrera (2001) and show a strong similarity to the current results. If Glass patterns are sufficiently dense, and there is sufficient distance between the dots within each pair, then information must be pooled across significant distances for the global structure to be apparent. To confirm that subjects use global pattern information, and not local stimulus cues, to identify Glass patterns, Wilson et al. (1997) investigated the effect of stimulus area on coherence thresholds for the detection of concentric Glass patterns. Pattern area was reduced by one half or two-thirds by showing the stimulus in only six or four of twelve 30 ° segments and filling the remainder of the stimulus with randomly oriented noise dot pairs. Reducing stimulus area by twothirds increased coherence thresholds almost threefold. Plotting coherence threshold against stimulus area revealed a linear relationship in log-log coordinates with a slope of -0.91. The slope of this power function fit is close to the value of -1.00 predicted from the operation of an ideal linear integration mechanism (Morrone, Burr, and Vaina, 1995). Radial Glass patterns also show significant summation over area (Wilson and Wilkinson, 1998), while the data for translational Glass patterns are equivocal (Wilson and Wilkinson, 1998, 2003; Dakin and Bex, 2002, 2003), which has led to the suggestion that the latter are processed earlier in the visual pathway.
46
The Inputs to Global Form Detection
Figure 4.2: A: a sequence of Glass patterns depicting the variation in Glass angle from 0 to 90° in 15° steps. The two arrows depict the pattern of thresholds that would be expected if structure was detectable when either a radial or a rotary detector reached its individual threshold. B: A polar plot depicting the threshold percentage of dot pairs when a group of 6 naive observers were required to detect which stimulus of a pair had coherent structure (asterisks; the surrounding lines indicate ±1 SEM). The squares show the data of Seu and Ferrera (2003), plotted for comparison. The stimuli of Seu and Ferrera were composed of 500 rather than 100 dots with a density of 20 dots/deg 2, while ours were composed of 100 dots with a density of approximately 0.9 dots/deg 2.
David R. Badcock and Colin W. G. Clifford
47
Figure 4.3: Schematic of neural models of Glass pattern detection. The models contain parallel pathways carrying out filter-rectify-filter operations. The second-stage filters are oriented perpendicular to the first-stage filters, and are followed by a weighted summation across orientation and radius. Top: model for concentric patterns, reprinted from Wilson, H. R., Wilkinson, E, and Asaad, W. (1997) Concentric orientation summation in human form vision. Vision Res., 37: 2325-2330. Copyright 1997 with permisison from Elsevier. Bottom: Model for radial patterns, reprinted from Wilson, H. R., Wilkinson, F. (1998) Detection of global structure in Glass patterns: implications for form vision. Vision Res., 38: 2933-2947. Copyright 1998 with permission from Elsevier.
48
The Inputs to Global Form Detection
Glass patterns can be created using signal dot pairs of one contrast polarity, and noise dot pairs of the opposite polarity. With brief presentations it is easy to identify the pattern but very hard to determine the polarity of the signal dots - black or white (Wilson et al., 1997). However, if polarity reverses within each dot pair rather than between dot pairs the percept is quite different. For example, if the two dots in each pair of a concentric Glass pattern are of opposite contrast polarity, then concentric structure is not perceived but, instead, a spiral pattern (Anstis, 1970; Glass and Switkes, 1976; Prazdny, 1986). Dakin (1997) has argued that such effects can be predicted by considering the effects of contrast polarity on the response of local oriented filters whose responses are pooled in the detection of global form. These filters are usually modeled as elongated receptive fields with either a central excitatory zone extending along the major axis and flanked on that axis by two inhibitory zones or a central inhibitory zone flanked by excitatory ones, although physiological data suggest a more continuous variation in this structure (Field and Tolhurst, 1986; Kulikowski and Bishop, 1981). Smith, Bair, and Movshon (2002) have shown that VI simple cells in the macaque will respond weakly to opposite polarity dot pairs but that the orientation tuning function is multimodal and excludes the orientation preferred when the dots have identical polarity. The most effective orientations are those which present a bright dot to an excitatory receptive field region and a dark dot to an inhibitory region. The opposite polarity dots cancel each other if they fall in the same zones. Wilson et al. (1997) have reasoned that the influence of contrast polarity reversal within but not between dot pairs is consistent with a two-stage model of Glass pattern perception in which full-wave rectification follows oriented filtering, thus removing sensitivity to contrast polarity variations subsequent to the detection of local oriented structure. This model is discussed in more detail below. A recent report (Achtman, Hess, and Wang, 2003) has shown that if the dot pairs are replaced by Gabor stimuli, then the polarity of the Gabor is not critical. This is consistent with the current position, in that the Gabor replaces the dot pair and therefore presents an extended region of the same polarity. Since Glass patterns are usually produced in a manner that does not require alignment between the dot pairs, the Achtman et al. (2003) stimulus may be thought of as a replication of this result; i.e., the dots within a pair must be of the same polarity but the orientation signals at different locations can be carried by dot pairs (or lines, or Gabors) between which luminance polarity differs. Wilson and colleagues (Wilson and Wilkinson, 1998; Wilson, Wilkinson, and Asaad, 1997) have suggested that the global structure in an image is detected in the ventral cortical area V4. Selectivity for patterns of complex form and selectivity for color are both properties that have been associated with area V4 in macaque monkey and human visual cortex (Gallant et al., 1993,2000; Wilkinson et al., 2000; Zeki, 1973; Lueck et al., 1989). If color and global form were processed in the same region of extrastriate cortex, then one might expect to observe chromatic selectivity for Glass patterns. However, the color selectivity of macaque V4 and the homology between the regions termed V4 in human and macaque are both somewhat controversial (Schein et al., 1982; Heywood and Cowey, 1987; Hadjikhani et al., 1998).
David R. Badcock and Colin W. G. Clifford
49
Figure 4.4: A: Glass pattern constructed from textured dots within which the increments and the decrements are matched so that the average luminance of the dot is the same as the background. B: Comparison of coherence thresholds for detection of Glass patterns composed of luminance increment dots, luminance decrement dots, and texture, respectively, demonstrates that a second-order detector, sensitive to contrast variation, provides an input to global form detection.
50
The Inputs to Global Form Detection
Figure 4.5: Proposed model of the functional architecture of global form detection incorporating a second-order pathway in parallel to pathways processing luminance increments and decrements. The first stage of the on-pathway (left) is an array of oriented linear filters excited by contrast increments. The output of these oriented linear filters is then combined spatially to produce global form detectors sensitive only to contrast increments. A similar pathway (right) processes contrast decrements. The second-order pathway (middle) performs a rectification on the incoming signals. This pre-processed signal is then fed into an array of oriented linear filters similar to those in the other two pathways, and thence into a global form detector.
David R. Badcock and Colin W. G. Clifford
51
While interactions between the processing of color and orientation have been demonstrated psychophysically both at equiluminance and in the presence of luminance signals (McCollough, 1965; Lovegrove and Badcock, 1981; Flanagan, Cavanagh, and Favreau, 1990; Clifford et al., 2003), the mechanisms underlying Glass pattern perception appear only broadly tuned for color (Cardinal and Kiper, 2003) and, in the presence of strong luminance signals, effectively color blind (Kovacs and Julesz, 1992). Cardinal and Kiper (2003) argue that their results on the detection of colored Glass patterns are consistent with linear summation across multiple mechanisms broadly tuned for color, as have been reported in monkey area V4 (Schein et al., 1982), although other studies of monkey V4 have reported narrower color tuning (Zeki, 1980; Schein and Desimone, 1990). Kovacs and Julesz (1992) found that while color can counteract the effect of polarity reversal in stereoscopic fusion and reversed-phi motion, chromatic information cannot overcome the effect of polarity reversal within a dot pair in the perception of Glass patterns (see also Glass and Switkes, 1976). Switkes (2002) found that detection thresholds for Glass patterns made isoluminant to the background were similar to those for luminance-defined patterns. Chromatic differences between dots within a pair were found to decrease performance, presumably reflecting chromatic selectivity at the level of local orientation detectors. However, segregation of signal from noise was not aided by chromatic differences between pairs of dots, suggesting that, as with contrast polarity, information about color is lost at the global pattern integration stage. Brain imaging studies in human (Wilkinson et al., 2000) and macaque monkey (Tse et al., 2002) and single case neuropsychological studies (Gallant et al., 2000) suggest a strong correlation between activity in area V4 and the perception of concentric and radial patterns. Single-unit recording data from area V2 of macaque monkey also show some selectivity for complex patterns of form earlier in the visual hierarchy (Hedge and Van Essen, 2000). In humans, fMRI activation to concentric patterns has been reported at a later stage of processing: the fusiform face area (Wilkinson et al., 2000), and a prosopagnosic patient with a right-sided inferomedial occipitotemporal lesion has been reported to show a selective deficit in concentric Glass pattern perception (Rentschler et al., 1994). A comparative fMRI study of monkey and human brain activity found that, in both species, multiple areas are involved in the perception of global shape (Kourtzi et al., 2003). A comparative behavioral study of Glass pattern perception in pigeons and humans found that, unlike humans, pigeons do not show greater sensitivity to concentric and radial patterns than to translational (Kelly et al., 2001). This pattern of results suggests that pigeons are unable to use global pooling such as is believed to occur in primate V4, while humans possess such mechanisms for concentric and radial but not for translational patterns. Kelly et al. speculate that the superior detection of concentric patterns by humans but not pigeons might reflect differences in the evolution of specialized face processing mechanisms. This notion is supported by the observation that, unlike humans and monkeys, pigeons' recognition of one another is unaffected by facial inversion (Phelps and Roberts, 1994). This study also addresses the recent discussion between those who argue that the human visual system does possess specialized detectors for radial and concentric, rather than translational, global form (Wilson and Wilkinson, 1998; Wilson et al., 1997) and those that argue aperture shape produces the superior
52
The Inputs to Global Form Detection
sensitivity relative to translational patterns (Dakin and Bex, 2002,2003). Here, with the same stimuli presented in equivalent manner, humans show a performance advantage for such patterns while the pigeons do not. This suggests a difference within the visual systems of the two species and, by extension, the existence of specialized mechanisms in the human system.
4.3
A Model of the Functional Architecture of Global Form Detection
Wilson and colleagues have proposed detectors that would be able to localize radial (Wilson and Wilkinson, 1998) and concentric structure (Wilson et al., 1997) in an image. The models are depicted in figure 4.3, and both have the form of an early linear, oriented, filtering stage, followed by rectification and then subsequent filtering by a pair of matched filters placed strategically to selectively respond to either radial or rotary contours. The model outlined uses as the first stage elongated receptive fields that compare the luminance in excitatory and inhibitory flanking regions. However, as in other areas of spatial and temporal vision (McGraw, Whitaker, Badcock, and Skillen, 2003; Smith, Clifford, and Wenderoth, 2001; Van der Zwan, Badcock, and Parkin, 1999) Glass patterns may be constructed from textured dots within which the increments and the decrements are matched so that the average luminance of the dot is the same as the background (see figure 4.4). In this case the Glass structure is still readily detectable and, indeed, the thresholds are no different to those for dots constructed from either increments or decrements (see figure 4.4). Since these textured dots will not generate a systematic signal from elongated excitatory or inhibitory zones containing the whole dot, this must mean that a second-order detector, sensitive to contrast variation, can also provide an input to global form detection. We propose a model (figure 4.5) of the following form to accommodate both this finding and the apparent independence of inputs from the on- and off-pathways to global form detection (Wilson, Switkes, and De Valois, 2001; Van der Zwan et al., 1999). The detector for the second-order structure (the textured dots used here) incorporates a filter sandwich in the first stage which is typically represented as a linear, orientation-selective filter followed by either a full-wave (Wilson, Ferrera, and Yo, 1992), or half-wave (Van der Zwan et al., 1999) rectifier and then a subsequent linear, orientation-selective filter tuned to a coarser scale to remove the spurious higher frequency information produced by rectification. This unit then provides the input to the same type of processes Wilson has proposed to detect the global structure. It is clear from the data in Figure 4.2 that the sensitivity to global form is a function of the type of structure: spirals being less readily detected than rotary or radial patterns. While Wilson and colleagues have proposed both radial and rotary detectors, a decomposition of the patterns into orthogonal rotary and radial components fails to predict the pattern of thresholds for intermediate stimuli. The resolution of this issue requires further work to determine whether sensitivity to spiral stimuli is due solely to combination of information from more narrowly tuned cardinal detectors or whether an intermediate form
David R. Badcock and Colin W. G. Clifford
53
of the detectors is needed.
4.4
Conclusions
This outline of global form perception is an early part of an evolving story. How the early and intermediate stages described here contribute to higher level analyses is now receiving attention. A complete model of global form perception still requires more adequate description of the details at each level. The approach adopted reflects the same kind of motivation as Regan's. The desire is to specify the details so that the complexity of perception can be understood and lessons can be learned for the design of practical devices. A complete model should be able to analyze the structure inherent in a well-placed fielding team in cricket, typically incorporating both radial and rotary components, but until then perhaps we should just recommend hours of intense observation of the natural stimulus.
References Achtman, R. L., Hess, R. F. and Wang, Y. (2003). Sensitivity for global shape detection. J. Vis., 3: 616-624. Anstis, S. (1970). Phi movement as a subtraction process. Vis. Res., 20: 1411-1430. Burr, D. C., Badcock, D. R. and Ross, J. (2001). Cardinal axes for radial and circular motion revealed by summation and by masking. V?s. Res., 41: 473^81. Cardinal, K. S. and Kiper, D. C. (2003). The detection of colored Glass patterns. J. Vis., 3: 199-208. Clifford, C. W. G., Spehar, B., Solomon, S. G., Martin, P. R. and Zaidi, Q. (2003). Interactions between colour and luminance in the perception of orientation. J. Vis., 3: 106-115. Dakin, S. C. (1997). Glass patterns: some contrast effects re-evaluated. Percept., 26: 253-268. Dakin, S. C. and Bex, P. J. (2002). Summation of concentric orientation structure: seeing the Glass or the window? Vis. Res., 42: 2013-2020. Dakin, S. C. and Bex, P. J. (2003). Response to Wilson and Wilkinson: evidence for global processing but no evidence for specialised detectors in the visual processing of Glass patterns. Vis. Res., 43: 565-566. Edwards, M. and Badcock, D. R. (1993). Asymmetries in the sensitivity to motion in depth: A centripetal bias. Percept., 22: 1013-1023. Field, D. J., Hayes, A. and Hess, R. F. (1993). Contour integration by the human visual system: Evidence for a local "association field." Vis. Res., 33: 173-193. Field, D. J. and Tolhurst, D. J. (1986). The structure and symmetry of simple cell receptive-field profiles in cat visual cortex. Proc. Roy. Soc. Land. B, 228: 379-400.
54
The Inputs to Global Form Detection
Flanagan, P., Cavanagh, P. and Favreau, O. E. (1990). Independent orientation-selective mechanisms for the cardinal directions of colour space. Vis. Res., 30: 769-778. Gallant, J. L., Braun, J. and Van Essen, D. C. (1993). Selectivity for polar, hyperbolic, and Cartesian gratings in macaque visual cortex. Science, 259: 100-103. Gallant, J. L., Shoup, R. E. and Mazer, J. A. (2000). A human extrastriate area functionally homologous to macaque V4. Neuron, 27: 227-235. Gilbert, C. D. (1995). Dynamic properties of adult visual cortex. In M. S. Gazzaniga (Ed.), The Cognitive Neurosciences, pp. 73-90. MIT Press: Cambridge, MA. Glass, L. (1969). Moire effect from random dots. Nature, 223: 578-580. Glass, L. and Perez, R. (1973). Perception of random dot interference patterns. Nature, 246: 360-362. Glass, L. and Switkes, E. (1976). Pattern recognition in humans: correlations which cannot be perceived. Percept., 5: 67-72. Hadjikhani, N., Liu, A. K., Dale, A. M., Cavanagh, P. and Tootell, R. B. H. (1998). Retinopy and color sensitivity in human visual cortical area V8. Nat. Neurosci., 1: 235-247. Hedge", J. and Van Essen, D. C. (2000). Selectivity for complex shapes in primate visual area V2. J. Neurosci., 20: RC61. Heywood, C. A. and Cowey, A. (1987). On the role of cortical area V4 in the discrimination of hue and pattern in macaque monkeys. J. Neurosci., 7: 2601-2617. Hubel, D. H. and Wiesel, T. (1968). Receptive fields and functional architecture of monkey striate cortex. J. Physiol, 195: 215-243. Kelly, D. M., Bischof, W. E, Wong-Wylie, D. R. and Spetch, M. L. (2001). Detection of Glass patterns by pigeons and humans: implications for differences in higherlevel processing. Psychol, Sci., 12: 338-342. Kourtzi, Z. and Kanwisher, N. (2001) Representation of perceived object shape by the human lateral occipital complex. Science, 293: 1506-1509. Kourtzi, Z., Tolias, A. S., Altmann, C. F., Augath, M. and Logothetis, N. K. (2003). Integration of local features into global shapes: monkey and human fMRI studies. Neuron, 23: 333-346. Kovacs, I. and Julesz, B. (1992). Depth, motion, and static-flow perception at metaisoluminant color contrast. Proc. Nat. Acad. Sci. USA, 89: 10390-10394. Kulikowski, J. J. and Bishop, P. O. (1981). Fourier analysis and spatial representation in the visual cortex. Experientia, 37: 160-163. Lee, D. N. (1980). The optic flow field: the foundation of vision. Phil. Trans. Roy. Soc. Land. B, 290: 169-179. Lueck, C. J., Zeki, S., Friston, K. J., Deiber, M. P., Cope, P., Cunningham, V. J. Lammertsma, A. A., Kennard, C. and Frackowiak, R. S. (1989). The colour centre in the cerebral cortex of man. Nature, 340: 386-389.
David R. Badcock and Colin W. G. Clifford
55
Levick, W. R. and Thibos, L. N. (1980). Analysis of orientation bias in cat retina. J. Physiol., 329: 243-261. Lovegrove, W. and Badcock, D. (1981). The effect of spatial frequency on colour selectivity in the tilt illusion. Vis. Res., 21: 1235-1237. Maloney, R. K., Mitchison, G. J. and Barlow, H. B. (1987). Limit to the detection of Glass patterns in the presence of noise. J. Opt. Soc. Am. A, 4: 2336-2341. McCollough, C. (1965). Colour adaptation of edge-detectors in the human visual system. Science, 149: 1115-1116. McGraw, P. V., Whitaker, D., Badcock, D. R. and Skillen, J. (2003). Neither here nor there: localising conflicting visual attributes. J. Vis., 3: 265-273. Morrone, M. C., Burr, D. C., DiPietro, S. and Stefanelli, M. A. (1999). Cardinal directions for visual optic flow. Curr. Biol., 9: 763-766. Morrone, M. C., Burr, D. C. and Vaina, L. M. (1995). Two stages of visual processing for radial and circular motion. Nature, 376: 507-509. Phelps, M. T. and Roberts, W. A. (1994). Memory for pictures of upright and inverted faces in humans (Home sapiens), squirrel monkeys (Saimiri sciureus) and pigeons (Columba livia). J. Comp. Psychol., 108: 114—125. Prazdny, K. (1986). Some new phenomena in the perception of Glass patterns. Biol. Cybern., 53: 153-158. Regan, D. (1991). A brief review of some of the stimuli and analysis methods used in spatiotemporal vision research. In D. Regan (Ed.), Spatial Vision, pp. 1-42. Macmillan: London. Regan, D. (1992). Visual judgements and misjudgements in cricket, and the art of flight. Percept., 21: 91-115. Regan, D. (2000). Human Perception of Objects: Early Visual Processing of Spatial Form Defined by Luminance, Color, Texture, Motion and Binocular Disparity. Sinauer Associates: Sunderland, MA. Regan, D., Beverley, K. and Cynader, M. (1979). The visual perception of motion in depth. Sci. Am., 241: 136-151. Rentschler, L, Treutwein, B. and Landis, T. (1994). Dissociation of local and global processing in visual agnosia. Vis. Res. 34: 963-971. Schein, S. J. and Desimone, R. (1990). Spectral properties of V4 neurons in the macaque. J. Neurosci., 10: 3369-3389. Schein, S. J., Marrocco, R. T. and de Monasterio, F. M. (1982). Is there a high concentration of color-selective cells in area V4 of monkey visual cortex? J. Neurophysiol.Al: 193-213. Seu, L. and Ferrera, V. P. (2001). Detection thresholds for spiral Glass patterns. Vis. Res., 41: 3785-3790. Smith, M. A., Bair, W. and Movshon, J. A. (2002) Signals in macaque striate cortical neurons that support the perception of Glass patterns. J. Neurosci., 22: 83348345.
56
The Inputs to Global Form Detection
Smith, S., Clifford, C. W. and Wenderoth, P. (2001). Interaction between first- and second- order orientation channels revealed by the tilt illusion: psychophysics and computational modelling. Vts. Res., 41: 1057-1071. Snowden, R. J. (1994). Motion processing in the primate cerebral cortex. In A. T. Smith and R. J. Snowden (Eds.), Visual Detection of Motion, pp. 51-84. Academic Press: Cambridge, MA. Switkes, E. (2002). Integration of differing chromaticities in early and midlevel spatial vision. /. Vis., 2: 63a. Tse, P. U., Smith, M. A., Augath, M., Trinath, T., Logothetis, N. K. and Movshon, J. A. (2002). Using Glass patterns and fMRI to identify areas that process global form in macaque visual cortex. J. Vis., 2: 285a. Van der Zwan, R., Badcock, D. R. and Parkin, B. (1999). Global form perception: interactions between luminance and texture information. Austr. and New Zealand J. Ophthal, 27: 3^1: 268-270. Wilkinson, E, James, T. W., Wilson, H. R., Gati, J. S., Menon, R. S. and Goodale, M. A. (2000). An fMRI study of the selective activation of human extrastriate form vision areas by radial and concentric gratings. Curr. Biol., 10: 1455-1458. Wilson, H. R., Ferrera, V. P. and Yo, C. (1992). A psychophysically motivated model for two-dimensional motion perception. Vis. Neurosci., 9: 79-97. Wilson, J. A., Switkes, E. and De Valois, R. L. (2001). Effects of contrast variations on the perception of glass patterns. J. Vis., 1: 152a. Wilson, H. R. and Wilkinson, F. (1998). Detection of global structure in Glass patterns: implications for form vision. Vis. Res., 38: 2933-2947. Wilson, H. R. and Wilkinson, F. (2003). Further evidence for global orientation processing in circular Glass patterns. Vis. Res., 43: 563-564. Wilson, H. R., Wilkinson, F. and Assad, W. (1997). Concentric orientation summation in human form vision. Vis. Res., 37: 2325-2330. Zeki, S. M. (1973). Colour coding in rhesus monkey prestriate cortex. Brain Res., 53: 422-427. Zeki, S. (1980). The representation of colours in the cerebral cortex. Nature, 284: 412-418.
5. Probability Multiplication as a New Principle in Psychophysics Michael Morgan, Charles Chubb, and Joshua Solomon One of Martin Regan's most interesting, but little tested, ideas has been to generalize the mechanism of the Reichardt detector of motion to spatial vision, in the form of a hypothetical "Coincidence Detector" (Morgan and Regan, 1987; Regan, 2000; Regan and Beverley, 1985). The Reichardt detector (Hassenstein and Reichardt, 1956; Reichardt, 1961; van Santen and Sperling, 1985) works by multiplying the signal from two different spatial detectors, with a delay to one of them. In its full opponent version, the outputs of two such detectors, tuned to opposite directions, are subtracted (figure 5.1). A useful consequence of opponency (discussed at length by Regan in his 2000 book Human Perception of Objects) is contrast independence. Changes in contrast of a stationary stimulus over time (flicker) will not be confounded with movement of the stimulus. The same idea can be applied to a coincidence detector for spatial hyperacuity (figure 5.2). Changes of only a few arcsec in separation of two lines or dots can be reliably detected by observers making the decision whether the test stimulus is "wider" or "narrower" than a standard stimulus. Observers do not confound changes in separation with changes in contrast of the component lines (Morgan and Regan, 1987). The same is true of vernier acuity. Opponent pairs of coincidence detectors can account for this independence of spatial decisions from target contrast. In this chapter, we consider psychophysical evidence for opponent Reichardt detectors and coincidence detectors more generally. We begin with motion. Compelling evidence for a multiplication stage in coincidence detection comes from the "amplification effect" (van Santen and Sperling, 1984). The contrast required to detect the direction of movement of a low-contrast, drifting sinusoidal grating is lowered if it is superimposed upon a stationary, flickering grating of the same spatial and temporal frequency. In the case of sampled motion, with a 90 ° phase shift between frames, such 57
58
Probability Multiplication
Figure 5.1: Schematic diagram of the Reichardt Detector.
Figure 5.2: A version of the coincidence detector proposed by Morgan and Regan (1987) to account for the insensitivity of spatial interval acuity to contrast jitter of the component lines (top). The output of spatially localized detectors (circles) are multiplied and subtracted to give a signal proportional to target separation. a stimulus is formally identical to one in which odd frames have higher contrast than even frames (or vice versa). The contrast thresholds for motion found when varying even-frame contrast decrease as odd-frame contrast is raised, relative to the case where the odd and even frames have the same contrast (which we shall refer to as the "yoked condition"). This is predicted by the multiplication stage in the Reichardt model, since it is the product of even and odd frames that should be constant at threshold (van Santen
Michael Morgan, Charles Chubb, and Joshua Solomon
59
and Sperling, 1984). Van Santen and Sperling's model makes the counterintuitive prediction that contrast for even frames at threshold (for motion detection) should decrease without limit as contrast of the odd frames is raised. This is a consequence of detection limited by late noise after the multiplication stage. Morgan and Chubb (1999) found that thresholds did not decrease in this way, but reached an asymptotically low value when odd frames were greater or equal to three times the yoked threshold. To account for their findings, they explored an "early noise" version of the Reichardt model, in which noise was added to the output of two detectors in quadrature phase and the noisy outputs are then multiplied. This model provided a satisfactory fit to their data for two-frame motion of a 2 cpd sinusoidal grating of temporal frequency 2.5 Hz. However, the Morgan and Chubb model was defective in not having an opponent stage. It cannot truly be called a Reichardt detector; it is a "half-Reichardt" detector. In fact, Morgan and Chubb point out that their early noise model is difficult to distinguish from one in which motion direction is correctly computed if both frames reach detection threshold in the presence of early noise. Indeed, after the paper was written, we realized that the half-Reichardt detector with early noise is mathematically identical to a high-threshold (HT), early noise model of independent detection of the two frames. "Probability multiplication" in the latter is the equivalent of signal multiplication in the former (for the proof, see appendix 5.A2.3). We seem to have a paradox here. The model that satisfactorily fits Morgan and Chubb's data is not a motion energy model after all. It is compatible with independent detection of the component frames, and thus with a model based on "local sign" (Morgan, 1990) more akin to a long-range motion mechanism (Braddick, 1980). Does the same model fit van Santen and Sperling's data? We fit both probability multiplication and late noise models to their full psychometric functions, with the results shown in figure 5.3. Neither model fits all the data. However, there is some evidence that the late-noise model is a better fit at the higher temporal frequency of 12.5 Hz while the probability multiplication model is at least as good a fit at the lower frequency of 1.8 Hz, closer to the frequency (2.5 Hz) used by Morgan and Chubb. Note particularly the unacceptable failure of the late noise model at 1.8 Hz in observer NB in the highest amplification condition. We considered two further models. The opponent or contrast discrimination model is based on a formal similarity between amplification and facilitation in contrast discrimination. We consider direction discrimination in an opponent mechanism, one half-detector of which receives an input corresponding to one frame alone (the one of higher contrast) and the other half-detector receives an input that is the sum of the contrasts in the two frames. In each case the input is transduced by a power function, to give an accelerating nonlinearity, required for facilitation. Finally, we considered a convoy model, so called because a convoy moves at the speed of its slowest member. In this case, the detector receives an input from the lower contrast frame only. It is assumed that the higher-contrast frame is always detected, and that the observer is infallibly correct if both frames are detected. To document the failure of the late noise model further, we repeated Morgan and Chubb's conditions with a new observer (AJ). Once again (figure 5.4), the probability multiplication model was a better fit than the late-noise Reichardt model.
60
Probability Multiplication
Figure 5.3: The figure shows a reanalysis of the data in van Santen and Sperling (1984). Each row shows the data from a single observer at a particular temporal frequency of the motion. The top two rows are for 1.8 Hz, and the bottom two for 12.5 Hz. Each panel shows how probability correct (vertical axis) changes as a function of the contrast of the odd-numbered frames in the movie sequence (horizontal axis). The contrast of the odd-numbered frames is fixed within each panel, and is indicated by the figure in the bottom right hand corner of each panel. The contrast of even-numbered frames is shown on the x-axis. The open circles are data, and the error bars represent 95% confidence intervals derived from the binomial distribution. The cirves represent fits to the data by the various models described in the text: Solid line, the opponent model; Dotted line, the probability multiplication model; Dashed line: the late-noise Reichardt model. Note that the probability multiplication model is the best fit to the data at the lower temporal frequency, while at high temporal frequencies the data are better fit by the late-noise and opponent models. To confirm that we were able to get late noise behavior as well as early noise, we tried various combinations of temporal and spatial frequency, and number of motion frames, with various spatial envelopes for the gratings. We concentrated on thresholds obtained with an amplifying frame of 3 x yoked threshold, which is optimal for distinguishing the models. Results were frequently mixed, with neither model fitting well. However, we found at least one case where the late noise model was better than probability multiplication: a 12.5 Hz grating windowed with a stationary Gaussian window (figure 5.5). We conclude that observers do have access to a Reichardt detector, in which the predominant source of noise enters after the multiplication stage. What mechanism are they using for probability multiplication? Morgan and Chubb considered and rejected the idea that the phase of the stimulus is known to the observer as soon as it is detected (the local sign model). Their evidence contrary to this idea was that contrast thresholds for detection were lower than for phase discrimination (sine vs. cosine). This
Michael Morgan, Charles Chubb, and Joshua Solomon
61
Figure 5.4: Data from one observer (AJ) in a first-order motion detection task, with two quadrature frames of a Gaussian windowed 2 cpd sinusoidal carrier, replicating the procedure of (Morgan and Chubb, 1999). The frame duration was 100 msec with no ISI; thus temporal frequency was 2.5 Hz. Conventions are as in Figure 5.2. leaves the possibility that observers use a specialized coincidence mechanism such as the Reichardt detector, but one in which there is early noise with a high threshold on the output of the detectors. Observers may have access to both of these mechanisms (early /HT and late noise Reichardt) and use different mechanisms on different trials, ensuring that no single model will fit all the data, as we observed. This discussion of motion mechanisms sets the stage for other kinds of coincidence detector. Some of the tasks we have examined, to measure amplification, are as follows: 1. Vernier acuity with abutting gratings. 2. Alignment acuity for vertically separated Gabor patches, in which the observer must decide whether the imaginary line joining the two patches is tilted clockwise or anticlockwise of the vertical. Both the envelope and carrier are displaced. 3. Stereo-defined motion. A grating defined by stereo disparity in otherwise random noise moves between frames. 4. Second-order alignment. A Gabor patch appears randomly at either the topleft or bottom-right hand corner of a notional square. A second patch is either horizontally or vertically aligned with the first. The patches have random carrier phase, making the task second order. Logically, both patches have to be detected to perform the task; otherwise the observer must guess. Therefore, this is a task in which we would expect probability multiplication. 5. The same as (4) except that the stimuli were Gaussian blobs defined by disparity in low-pass-filtered random noise, not luminance
62
Probability Multiplication
Figure 5.5: Data for two observers performing a first-order motion detection task in which the stimulus was a 2 cpd sinusoidal carrier moving within a stationary Gaussian envelope. Frame duration was 20 msec, thus temporal frequency 12.5 Hz. Other conventions are as in figure 5.3. The results are easily summarized. In no case did we find an amount of amplification greater than predicted by probability multiplication. Figures 5.6 and 5.7 give examples from abutting vernier and second-order alignment tasks.
Conclusions The principle of probability multiplication says that contrast thresholds in tasks like vernier acuity can be predicted completely from the independent probability of detecting the component stimuli. No extra mechanisms are required in the tasks we have examined, with the sole exception of motion at highish temporal frequencies (12.5 Hz). In the latter case, we observe amplification greater than that predicted by probability multiplication, consistent with a dominant source of late noise (after multiplication). Physiological identification of this and the late noise mechanisms awaits investigation. We might consider simple versus complex cells, or parvocellular versus magnocellular pathways, with equally little evidence at present.
Acknowledgments This work was supported by the BBSRC. The first author warmly thanks Martin Regan for his friendship and support over many years.
Michael Morgan, Charles Chubb, and Joshua Solomon
63
Figure 5.6: Performance of two observers in a vernier alignment task, using a large field 2 cpd sinusoidal grating with a horizontally oriented 90 ° phase boundary in the middle of the screen. The observer's task was to decide whether the bottom half-grating was shifted left or right. Exposure was 100 msec. Other conventions are as in figure 5.3.
Appendix 5.A1
Methods
The following describes the methods for first-order motion task. Differences in the other tasks such as vernier are described in the figure legends. To maximize the rate of data collection, two parallel experimental systems were used to collect data in different experiments. The difference between the systems is not thought to be relevant to the interpretation of the data. Both used a Cambridge Research Systems Graphics VSG card in a PC platform to generate the stimuli at a frame rate of 100 Hz. One system used 12-bit luminance resolution from the card and a Barco high-resolution RGB monitor; the other used 15-bit resolution and a Mitsubishi Diamond Pro monitor. The two systems were used to generate two kinds of stimuli. The first system (Barco) was used for experiments with Gabor patches, which consisted of a horizontal 2 cycle/deg sine wave gratings multiplied by a circular Gaussian window of standard deviation 0.25 °. Mean luminance was 5 cd/m2. The second system (Mitsubishi) was used for rectangularly windowed horizontal 2 cycle/deg sine wave gratings of dimensions 3.3 deg 2. Mean luminance was 19 cd/m2. The stimuli were viewed from a distance of 2 m in a darkened room. Each trial began with the disappearance of the central fixation point followed by a brief motion sequence in which the sine wave carrier moved either up or down inside the stationary window. Each motion step consisted of a single 90 ° phase change in the carrier. There was a either a single step (two-frame sequence) or three steps (four-frame sequence). A new luminance lookup table was loaded between each frame to control contrast. In the yoked condition all the frames had the same contrast,
64
Probability Multiplication
Figure 5.7: Performance of one observer in a second-order alignment task where the stimuli were Gaussian blobs defined by disparity. Thresholds now refer to disparity, not to contrast. Other conventions as in figure 5.3. which was varied between trials in order to determine the psychometric function relating contrast to the probability of the observer making a correct identification of the motion direction (up or down). Five different contrast levels were pseudo-randomly interleaved. The yoked condition was randomly interleaved with the fixed contrast condition, in which the contrast of even numbered frames was fixed within a block of trials. Odd-numbered frames had varying contrast, as in the yoked condition, to determine the psychometric function. Each block of trials consisted of 10 repeats of each contrast level in both yoked and fixed conditions, giving a total of 2 x 5 x 10 trials. At the end of each block the observer rested before beginning another block of trials. We aimed to collect at least 100 trials for each point on the psychometric function. The observers were three of the authors (MJM, JAS, and AJ) who are experienced motion observers, and FF, who was previously inexperienced at motion observing. Further checks on the generality of the findings were performed with a naive student observer (TA), whose results were similar to those of the other observers but are not presented here.
5.A2 Models and Theory 5.A2.1
The Late-Noise Reichardt Model
An elaborated Reichardt detector (van Santen and Sperling, 1985) generates two visual signals before and after a stimulus is displaced, as illustrated in figure 5.1. Before displacement, it generates visual signals A and B. After displacement, the signals are A' and B'. If these signals are not perturbed by noise, then for sinusoidal stimuli,
Michael Morgan, Charies Chubb, and Joshua Solomon
65
displaced by 90°, A and B' may be considered equal to zero. Thus an elaborated Reichardt detector without early noise is equivalent to a simple Reichardt detector, in which direction discrimination depends on the product of two signals elicited by the stimulus, one before displacement, and the other after. In all of our models, visual signals are allowed to vary as any arbitrary power of target intensity. Thus, if t1 and t2 represent the pre- and post-displacement targetintensities, then the late-noise Reichardt model's two non-zero visual signals are given by and where p is a free parameter. In all of our models, internal noise is assumed to have a Gaussian distribution. This noise perturbs the product of visual signals in the late-noise Reichardt model. Thus, accuracy is given by
where
5.A2.2
like p, is a free parameter and $ is the standard normal CDF.
The Opponent (Contrast Discrimination) Model
Like the probability multiplication model (below), the opponent model asserts that one random variable Y must exceed another X for a correct response. Thus accuracy is given by
In this expression, is the noisy signal elicited by the two targets' combined energies and is the noisy signal elicited by the target with the maximum intensity. FX(Z) is the comulative density function of X and Fy (z) is the probability density function of Y.
5.A2.3
The Probability-Multiplication Model
The probability multiplication model assumes that the pre- and post-displacement targets are detected with probabilities and , respectively. When both targets are detected the observer will respond correctly. When at least one target is not detected, the observer will have a 50% chance of responding correctly. Thus accuracy is given by
Morgan and Chubb previously (1999) proposed a model in which accuracy was given by (their equation A3). With a little algebra, we note that this is equivalent to the probabilitymultiplication model without a nonlinear transducer:
provided the exponent p = 1.
66
Probability Multiplication
5.A2.4 The Convoy Model (not considered here and a poor fit to all the data) Accuracy is given by: where ii is the strenght of the weaker of the two signals. Data were fit to the various models described in the text using the MATLAB version of Nelder-Mead simplex (direct search) method. Where a figure contains more than one psychometric function, the data from all the functions were fitted simultaneously.
References Braddick, O. J. (1980). Low-level and high-level processes in apparent motion. Phil. Trans. Roy. Soc. Land. B, 290: 137-151. Hassenstein, B. and Reichardt, W. (1956). Systemtheoretische Analyse der Zeir, Reihenfolgen- und Vorzeichenauswertung bei der Bewegungsperzeption des Russelkafers Chlorophanus. Zeitschrift für Naturforschung B, 11: 513-525. Morgan, M. and Chubb, C. (1999). Contrast facilitation in motion detection: evidence for a Reichardt detector in human vision. Vis. Res., 39: 4217-4231. Morgan, M. J. (1990). Hyperacuity. In D. Regan (Ed.), Spatial Vision, pp. 87-113. Macmillan: London. Morgan, M. J. and Regan, D. M. (1987). Opponent model for line interval discrimination: interval and vernier performance compared. Vis. Res.,21,107-118. Regan, D. M. (2000). Human Perception of Objects. Sinauer Associates Inc.: Sunderland, MA. Regan, D. M. and Beverley, K. (1985). Postadaptation orientation discrimination. /. Opt. Soc. Am. A, 2: 146-155. Reichardt, W. (1961). Autocorrelation, a principle for the evaluation of sensory information by the central nervous system. In W. A. Rosenblith (Ed.), Sensory Communication. Wiley: New York. van Santen, J. P. H. and Sperling, G. (1984). A temporal covariance model of human motion perception. J. Opt. Soc. Am. A, 1: 451-473. van Santen, J. P. H. and Sperling, G. (1985). Elaborated Reichardt detectors. J. Opt. Soc. Am. A, 2: 300-321.
6. Spatial Form as Inherently Three Dimensional Christopher W. Tyler Perhaps the ultimate goal of visual processing is to understand of the perception of objects, the accepted fundamental unit of our visual world. Among many recent treatments of this topic, the book on the visual perception of objects by Regan (2000) stands out as being the most analytically psychophysical. Its emphasis is on the coding of sensory information of various types into coherent object forms. This analysis is indeed a core issue in object perception. How does the visual system break down the sensory information into the discrete components of the object representation? In particular, this leads to the question of how the sparse information in each visual modality is integrated into the continuous percept of a coherent object. It is the process of recombination of the local sources of object information, which is often called the "binding problem," that is the topic of this overview. The binding problem is typically conceptualized in terms of the temporal binding of different stimulus properties or object features into a coordinate whole (e.g., Singer, 2001). Here, however, emphasis is placed on a spatial binding principle providing an entirely different insight into the binding problem. Objects in the world are typically denned by contours and local features separated by featureless regions (e.g., the design printed on a beach ball, or the smooth skin between facial features). Leonardo's 1498 depiction of a dodecahedron (figure 6.1) illustrates the point. The surface between the edges is perceptually vivid, and yet its location is not specified by any features in the image. The shading does not define this surface, because it is not homogeneous although the surface is perceived as flat. The inhomogeneity of the shading is interpreted as the painter's brushstrokes lying in the surface defined by the edges alone. The mean differences between the shadings on different surfaces are interpreted as consistent with the angles of the surfaces, helping to support the three dimensional interpretation, but the surfaces themselves are interpolated from the locations of the edges without regard to the details of the shading. Surface representation is thus an important stage in the visual coding from images through to object identification. Surfaces are a key property of our interaction with objects in the world. It is very unusual to experience objects, either tactilely or visu67
68
Spatial Form as Inherently Three Dimensional
Figure 6.1: Illustration of a dodecahedron by Leonardo da Vinci from the book Divina Proportione by Luca Pacioli (1498). Copyright (C) Biblioteca Apostolica Vaticana (Vatican). ally, except through their surfaces. Even transparent objects are experienced through their surfaces, with the material between the surfaces being invisible by virtue of the transparency. Really the only objects experienced in an interior manner are translucent objects, through which the light passes so as to illuminate the density of the material. Developing a means of representing the proliferation of surfaces before us is therefore a key stage in the processing of objects. A very useful paradigm for the exploration of surface perception is the illusory overlay concept introduced by Schumann (1904). The basic paradigm is to overlay one set of objects by a background-colored mask of another object. The simplest version is the illusory bar (figure 6.2A), consisting of two disks with sectors cut out of them to generate the illusion of clear edges in the form of a vertical bar overlaid on the two disks (although the illusory edges fade if stared at directly). The triangular version developed by Kanizsa (1976; figure 6.2C) is even more vivid. The illusory contours can be interpreted as the result of a Bayesian "bet" that the most likely interpretation of the Kanizsa figure is as a triangular surface overlaying three disk-shaped surfaces, with the consequent enhancement of the edges dividing the triangular surface from the background of the same color. Rotating the Pacmen elements by 90° to the right (figure 6.2D) makes the bet implausible because of the lack of alignment of corresponding edges. The figure is now seen as three isolated Pacmen with no illusory contours connecting them. On looking back at the original of figure 6.2C, it may also be seen as isolated elements and some time may be required to regain the original percept of a triangular surface. Surfaces may be completed not just in two dimensions, but also in three. A compelling example was developed by Tse (1999). The amorphous shape wrapping a white space gives the immediate impression of a 3D cylinder filling the space (figure 6.3).
Christopher W. Tyler
69
Figure 6.2: A: The original Schumann figure in which the alignment of the edges produces an illusory white bar. B: The same figure with the slots rotated to the right by 45°. Although the figure elements are identical, this manipulation destroys the coherence of the bar and degrades the percept to two isolated disks with no illusory contours. C: The Kanizsa version of the occlusion contours, based on a triangle. D: The Kanizsa triangle with 90° rotated elements, again destroying the subjective contours.
Figure 6.3: Volume completion of a cylinder (Tse, 1999). Reproduced with permission of the author. This example illustrates the flexibility of the surface-completion mechanism in adapting to the variety of unexpected demands for shape reconstruction. The effect of the Bayesian interpretation may be enhanced by adding a supporting cue to the spatial interpretation (Ramachandran, 1986). If the triangle is given a stereoscopic disparity to support the interpretation of the overlaid triangle, the need for edges dividing the triangle from the background becomes paramount. Figure 6.4 constitutes a three-element stereogram that provides the binocular disparity cues when fused by crossing the eyes (or by diverging them). The disparity is added only to the "corner" regions of the Pacmen, not to their circular boundaries. In direct viewing of the figure without binocular fusion, it is clear that these small shifts are almost unnoticeable, and have no effect on the quality of the illusion. However, the left and right pairings in fig-
70
Spatial Form as Inherently Three Dimensional
Figure 6.4: The stereoscopic Kanizsa figure modified after Ramachandran (1986). On crossing or uncrossing the eyes, two stereoscopic versions of the figure are seen, flanked by two monocular versions. In the version with the triangle in front, the illusory contours complete the straight sides of the triangle in the same way (though more strongly) as they did in the original. However, in the other stereoscopic version with the triangular region behind the disks, the disks are seen as open portholes. Now the illusory contours switch to complete the circular edges of the disks and disappear from the triangular edges, emphasizing the active nature of the object reconstruction process. ure 6.4 provide near- and far-disparity versions of the identical figure, allowing one to contrast the perceptual effects of merely changing the sign of disparity at the triangular points. In the version with the triangle in front of the disks, the illusory edges are seen very strongly. The triangle standing out in depth appears substantially brighter than its background, and can be inspected much more extensively without loss of the illusion. The disparity cue provides extra confirmation that the corners are in front of the black disks, enhancing the percept that they are overlaid by a coherent object, which further requires that its white edges must stand out from the white background in the region between the disks. However, figure 6.4 also provides a version with the disparity consistent with the presence of a triangle lying behind the disks. This cue structure interdicts the interpretation of an overlaid triangle and forces a completely different surface configuration because the triangular sectors are now behind the disks. It is striking that our visual systems immediately come up with a plausible alternative. The disks are now seen as open "portholes" in a uniform white surface, behind which the triangle is hidden except for its corners. In order to achieve this interpretation, two changes are required in the edge structure. The original illusory edges have to evaporate to provide for the uniform surface, and the portholes require a curved rim completing the circle around each corner. These changes are achieved perceptually in dramatic fashion. Despite the fact that the monocular images are identical in the two cases (only the left- and righteye images are switched), the perceptual interpretation is strikingly different. Both the depth structure and the edge brightness are reorganized to new spatial locations. This immediate perceptual reorganization attests to the power of the interpretation, in terms of a configuration of surfaces in space, to generate vivid perceptual experiences.
Christopher W.Tyler
71
The version of the stereoscopic image in figure 6.4 with the triangle behind also illustrates the principle of what Kanizsa (1976) termed "amodal completion." The surface interpretation is focused on the flat surface out of which the three portholes are cut. However, we are perceptually aware that the three points seen through the portholes belong to the same triangle. There is a connection between them that is felt spatially rather than just known logically. This connection does not give rise to the illusory contours of the "modal completion" of the triangle seen visually in front of the surface (although some viewers see a blurred version of the underlying triangle semitransparently through the surface). In terms of the perceived 3D structure, this amodal connection between the points "should" be invisible because it is hidden by the surface containing the portholes. But yet the points are perceived as part of a single triangle. This connection takes the form of an implicit perceptual knowledge that, if there was movement in the figure, the three points would move together because they belong to the same triangle. The completion is "amodal," in the sense that it is mediated by implicit knowledge of the spatial structure, but is (usually) not seen in the visual modality.' The examples of figures 6.2-6.4 illustrate that surface reconstruction is a key factor in the process of making perceptual sense of visual images of black shapes. It is easy to talk about such processes verbally, but there is a large gap between a verbal description and a process that can be implemented in neural hardware. The test of neural implementation is to develop a numerical simulation of the process using neurally plausible computational elements. The feasibility of a surface reconstruction process being capable of generating accurate subjective contours is illustrated in figures 6.5 and 6.6 for the classic Kanizsa figure using the computational technique of Sarti, Malladi, and Sethian (2000). The edge-attractant properties of the Kanizsa corners progressively convert the initial state of an isotropic spindle into a convincing triangular mesa with sharp edges. The resulting subjective surface is developed as a minimal surface with respect to a Riemannian metric of metrical distortions induced by the features in the image (analogous to the gravitational distortions of physical space in the theory of general relativity). The computational manipulation of this Riemannian surface reveals how the interactions within a neural network could operate to generate the subjective contours in the course of the 3D reconstruction of the surfaces of the world. The SMS Riemannian algorithm first convolves the image with an edge detector to generate a potential function whose representation of the image corresponds to the raw primal sketch, as introduced by Marr (1982), which encodes image gradient, orientation of structures, T-junctions, and texture. The minimum lines of this potential function denote the position of edges and its gradient is a force field that always points towards the local edge (figure 6.5). This potential function defines a metric of the embedding space in which the perceived surface is developed. During its evolution (figure 6.6), the surface is attracted by the existing boundaries and progressively steepens. The surface develops towards the piecewise constant solution by continuation and closing of the boundary fragments and filling-in of the homogeneous regions. A solid object is progressively delineated as a constant surface bounded by the existing and 'Note that, as originally described by Kanizsa (1976), these percepts may be seen as emergent interpretations with prolonged nonstereoscopic viewing of figure 6.2 or figure 6.4, for those who have difficulty in attaining the stereoscopic view.
72
Spatial Form as Inherently Three Dimensional
Figure 6.5: Edge gradient algorithm. A: Edge map of original Kanizsa figure (figure 6.2A). B: Edge gradient map for one of the "Pacmen." The gradient of this potential function is computed as a force field that always points towards the local edge. Reproduced from Sarti, A., Malladi, R. and Sethian, J. A. (2000) Subjective surfaces: a method for completing missing boundaries. Proc. Nat. Acad. Sci. USA, 12: 62586263 with permission from the authors. reconstructed shape boundaries (Sarti et al., 2000). It is particularly interesting that the surface developed through the SMS Riemannianmetric algorithm has the apparently contradictory properties of sharp edges combined with a smoothness constraint. The smoothness constraint is a property of minimal surfaces, such as the surface of an aggregation of soap bubbles. The tensions within the surface of a soap bubble tend to minimize the local curvature, so it settles to the form of maximum smoothness. In the SMS algorithm, however, the implementation also allows sharp edges as a component of the solution, when they increase the smoothness of the rest of the surface. In these respects, the algorithm closely mimics the human visual system, which tends to identify edges of objects and to assume smooth surfaces extending between these edges. The SMS algorithm provides a neurally plausible implementation of the reconciliation between these two apparently contradictory demands of the surface properties of object reconstruction.
6.1 Surface Representation through the Attentional Shroud One corollary of this surface reconstruction approach is the postulate that the object array is represented strictly in terms of its surfaces, as proposed by Nakayama and Shimojo (1990). Numerous studies point to a key role of surfaces in organizing the perceptual inputs into a coherent representation. Norman and Todd (1998), for example, show that that depth discrimination is greatly improved if the two locations to be discriminated lie in a surface rather than being presented in empty space. This result is suggestive of a surface level of interpretation, although it may simply be relying on the
Christopher W. Tyler
73
Figure 6.6: Development of the Riemannian manifold towards the subjective surface. The original features are mapped in white against a dark background, while shades of gray map the low values of the Riemannian metric indicating the presence of boundaries. Reproduced from Sarti, A., Malladi, R. and Sethian, J. A. (2000) Subjective surfaces: a method for completing missing boundaries. Proc. Nat. Acad. Sci. USA, 12: 6258-6263 with permission from the authors.
fact that the presence of the surface provides more information about the depth regions to be assessed. Nakayama, Shimojo, and Silverman (1989) provide many demonstrations of the importance of surfaces in perceptual organization. Recognition of objects (such as faces) is much enhanced where the scene interpretation allows them to form parts of a continuous surface rather than isolated pieces, even when the retinal information about the objects is identical in the two cases. This study also focuses attention on the issue of border ownership by surfaces perceived as in front of rather than behind other surfaces. While their treatment highlights interesting issues of perceptual organization, it offers no insight into the neural mechanisms by which such structures might be achieved. A neural representation of the reconstruction process may be envisaged as an attentional shroud (Tyler and Kontsevich, 1995), wrapping the dense locus of activated disparity detectors as a cloth wraps a structured object (figure 6.7A). This depiction shows how the shroud may envelop an object in order to capture the broad features of its shape, although some degree of detail may be lost. This self-organizing surface is envisaged as operating in the manner of what Julesz (1971) called "the search for dense surfaces," as instantiated in the stereopsis model of Marr and Poggio (1979). Both of
74
Spatial Form as Inherently Three Dimensional
Figure 6.7: A: Cartoon of the attentional shroud wrapping an object representation. B: Depiction of a random-dot surface with stereoscopic ripples. C: Thresholds for detecting stereoscopic depth ripples, as a function of the spatial frequency of the ripples. Peak sensitivity (lowest thresholds) occurs at the low value of 0.4 c/deg (2.5 deg/cycle). Thus, stereoscopic processing involves considerable smoothing relative to contrast processing. Reprinted from Tyler, C. W. (1990) "A stereoscopic view of visual processing streams", Vision Res., 30: 1877-1895. Copyright 1990, with permission from Elsevier.
these conceptualizations were restricted to planar surfaces in the frontoparallel plane of the eyes. The attentional shroud, on the other hand, is proposed as a self-organizing connectivity that spreads through the array of activated disparity detectors, known as the "Keplerian array," attracted by the closest sets of disparity detectors. This process is what Tyler (1983, 1991) called "cyclopean cleaning," the simplification from the complexity of the activated Keplerian array of spurious correspondences to the single cyclopean surface of the final depth solution. At that time, the cleaning processes were envisaged as largely consisting of disparity (or epipolar) inhibition, together with lateral facilitation through neighboring fields of activation at similar disparities. The concept of the attentional shroud emphasizes that there is always a depth solution at every location in the field, and that it is based at the level of the generic depth representation rather than residing purely in the process of stereoscopic reconstruction. The attentional shroud has inherent limitations with regard to the complexity of the surface that it can reconstruct. It cannot follow the 3D shape to the level of detail provided by the luminance information, but is restricted to depth gradients that have less steepness than may occur in the physical structure. Such a loss of detail is characteristic of the stereoscopic process, as may be established by studies of the resolution of ripples in sinusoidal stereoscopic surfaces of the sort depicted in Figure 6.7B. The graph in figure 6.7C, reproduced from Tyler (1990), shows how the amplitude threshold varies with the spatial frequency of the stereoscopic ripples. This graph illustrates that the stereoscopic depth reconstruction of surfaces is limited to a maximum spatial bandwidth of only about 2 cycles/deg (or 30 arcmin per ripple cycle). This limitation is as much as ten times lower than the bandwidth for resolution of luminance informa-
Christopher W. Tyler
75
tion (grating acuity). The peak sensitivity is at an even lower frequency, requiring 2.5 ° for each ripple cycle. Thus, the stereoscopic reconstruction of surface shape is capable of rendering depth variations only to a coarse scale of representation. This neural process operates as though the depth reconstruction were by a flexible material whose connectivity was too stiff to match sharp discontinuities in the depth information.
6.2 Interpolation of Object Shape within the Generic Depth Map Once the object surfaces have been identified, we are brought to the issue of the localization of the object features relative to each other, and relative to those in other objects. Localization is particularly complicated under conditions where the objects could be considered as "sampled" by overlapping noise or partial occlusion - the tiger behind the trees, the face behind the window curtain. However, the visual system allows remarkably precise localization even when the stimuli have poorly defined features and edges (Toet and Koenderink, 1988). Furthermore, sample spacing is a critical parameter for an adequate theory of localization. Specifically, no low-level filter integration can account for interpolation behavior beyond the tiny range of 2-3 arc min (Morgan and Watt, 1982), although the edge features of typical objects, such as the form of a face or a computer monitor, may be separated by many degrees. Thus, the interpolation required for specifying the shape of most objects is well beyond the range of the available filters (as in the illustration of figure 6.2).. Conversely, accuracy of localization by humans is almost independent of the sample spacing. For sample spacings ranging from 30 minutes to 3 minutes of separation, localization is not improved by increasing sample density (Kontsevich and Tyler, 1998). This limitation poses an additional challenge in relation to the localization task, raising the "long-range interpolation problem" that has generated much recent interest in relation to the position coding for extended stimuli, such as Gaussian blobs and Gabor patches (Morgan and Watt, 1982; Hess and Holliday, 1992; Levi, Klein, and Wang, 1992; Kontsevich and Tyler, 1998). Localization information is available from multiple visual cues, as indicated in figure 6.1. Position information is available from luminance form, disparity profile, color, texture, and other visual cues. Localization in the sampled stimulus might employ interpolation over many such cues. In a task in which the object shape is defined both by luminance and by disparity, for example, the basic sources of noise determining the localization error are (i) early noise in each visual modality contributing to the position determination, and (ii) late noise in the peak localization process. To probe the nature of object processing by different cues, we may utilize a position task for objects defined by various visual modalities (e.g., luminance and disparity). If localization is performed in separate visual modalities, the position thresholds might be expected to combine according to the their absolute signal/noise ratios, assuming that the signals from separate visual modalities have independent noise sources. The observers would be able to interpolate one estimate of the position of the profile from the luminance information alone and a second estimate from the disparity information
76
Spatial Form as Inherently Three Dimensional
Figure 6.8: The generic depth model of localization processing based on unitary interpolation input. Localization information is available from multiple visual cues: luminance form, disparity profile, color form, texture gradients, and other visual cues. Object feature binding may be accomplished by the sensory information being fed into a generic depth map. The local cues in this map of depths would then be subject to a depth surface interpolation process operating over multiple visual cues to bind the various features into a coherent representation of the object, from which the generic localization information may be derived.
alone. In this case, signals from the various modalities (L, D,. .., X) would combine to improve the localization performance. Adding information about the object profile from a second modality would always improve detectability and could never degrade it. Likova and Tyler (2003) addressed the unitary depth map hypothesis of object localization by using a sparsely sampled image of a Gaussian bulge (figure 6.9). The luminance of the sample lines carried the luminance profile information while the disparity in their positions in the two eyes carried the disparity profile information. In this way, the two separate depth cues could be combined or segregated as needed. Both luminance and disparity profiles were identical Gaussians, and the two types of profile were always congruent in both peak position and width. The observer's task was to make a left/right judgment on each trial of the position of the joint Gaussian bulge relative to a reference line, using whatever cues were available. Threshold performance was measured by means of the maximum-entropy * staircase procedure (Kontsevich and Tyler, 1999). Observers were presented the sampled Gaussian profiles defined either by luminance modulation alone (figure 6.9A), by disparity alone (figure 6.9B), or by combination of luminance and disparity defining a single Gaussian profile (figure 6.9C). It should be noticeable that the luminance profile evokes a strong sense of depth as the luminance fades into the black background. If this is not evident in the printed panels, it was certainly seen clearly on the monitor screens. Free fusion of figure 6.9B allows perception of the stereoscopic depth profile (forward for crossed fusion). Figure 6.9C shows a combination of both cues at the level that produced cancellation to flat plane under the experimental conditions. The position of local contours is unambiguous, but interpolating the peak, corresponding to reconstructing the shape of someone's nose to
Christopher W. Tyler
77
Figure 6.9: Stereograms showing examples of the sampled Gaussian profiles used in the Likova and Tyler (2003) experiment, defined by A: luminance alone, B: disparity alone, and C: a combination of luminance and disparity. The pairs of panels should be free-fused to obtain the stereoscopic effect.
78
Spatial Form as Inherently Three Dimensional
locate its tip, for example, is unsupportable. Localization from disparity alone was much more accurate than from luminance alone, immediately suggesting that depth processing plays an important role in the localization of sampled stimuli (see figure 6.10, gray circles). Localization accuracy from disparity alone was as fine as 1-2 arc min, requiring accurate interpolation to localize the peak of the function between the samples spaced 16 arc min apart. This performance contrasted with that for pure luminance profiles, which was about 15 arc min (figure 6.10, horizontal line). Combining identical luminance and disparity Gaussian profiles (figure 6.10, black circles) provides a localization performance that is qualitatively similar to that given by disparity alone (figure 6.10, grey circles). Rather than showing the approximation to the lowest threshold for any of the functions predicted by the multiple-cue interpolation hypothesis, it again exhibits a null condition where localization is impossible within the range measurable in the apparatus. Contrary to the multiple-cue hypothesis, the stimulus with full luminance information becomes impossible to localize as soon as it is perceived as a flat surface. This null point can only mean that luminance information per se is insufficient to specify the position of the luminance profile in this sampled stimulus. The degradation of localization accuracy can be explained only under the hypothesis that interpolation occurs within a unitary depth-cue pathway. Perhaps the most startling aspect of the results in figure 6.10 is that position discrimination in sampled profiles can be completely nulled by the addition of a slight disparity profile to null the perceived depth from the luminance variation. It should be emphasized that the position information from disparity was identical to the position information from luminance on each trial, so addition of the second cue would be expected to reinforce the ability to discriminate position if the two cues were processed independently. Instead, the nulling of the luminance-based position information by the depth signal implies that the luminance target is processed exclusively through the depth interpretation. Once the depth interpretation is nulled by the disparity signal, the luminance information does not support position discrimination at all (null point in the solid curve in figure 6.10). This evidence suggests that depth surface reconstruction is the key process in the accuracy of the localization process. It appears that visual patterns defined by different depth cues are interpreted as objects in the process of determining their location. Only an interpolation mechanism operating at the level of a generic depth representation can account for the data. Specifically, a depth interpolation mechanism accounts for the impossibility of position discrimination at the cancellation point and the asymmetric shift of the cancellation point by the luminance cue (figure 6.10). The fine resolution of the performance when disparity information is present clearly implies that an interpolation process is involved in the performance, because it is about eight times better than could be supported by the location of the samples alone (even assuming that the sample nearest the peak could be identified from the luminance information; see Likova and Tyler, 2003). Evidently, the full specification of objects in general requires extensive interpolation to take place, even though some textured objects may be well defined by local information alone. The interpolated position task may therefore be regarded as more representative of real-world localization of objects than the typical vernier acuity or
Christopher W. Tyler
79
Figure 6.10: Typical results of the position localization task. The grey circles are the thresholds for the profile defined only by disparity; the black circles are the thresholds defined by disparity and luminance. The dashed grey line shows the model fit for disparity alone; the solid line, that for combined disparity and luminance. The horizontal line shows threshold for the pure luminance. Note the leftward shift of the null point in the combined luminance/disparity function.
other line-based localization tasks of the classic literature. It consequently seems remarkable that luminance information per se is unable to support localization for objects requiring interpolation. The data indicate that it is only through the interpolated depth representation that the position of the features can be recognized. One might have expected that positional localization would be a spatial form task depending on the primary form processes (Marr, 1982). The dominance of a depth representation in the performance of such tasks indicates that the depth information is not just an overlay to the 2D sketch of the positional information. Instead, it seems that a full 3D depth reconstruction of the surfaces in the scene must be completed before the position of the object is known.
80
Spatial Form as Inherently Three Dimensional
6.3 Transparency A major complication in the issue of surface reconstruction is the fact that we do not perceive the world solely as a set of opaque surfaces. There are many types of object that are partially transparent, allowing us to perceive more than one surface at different distances along any particular line of sight. The depiction of transparent objects was a particular obsession of the Dutch artists of the seventeenth century, but it is interesting to note that it extends as far back as Roman times. The fruit bowl and water jug in the wall- painting from the House of Julia Felix near Pompeii (figure 6.11) illustrates that fine glassware and mirrored surfaces were appreciated in this epoch of civilization also. At first sight, the perception of transparency seems at variance with the concept of the unitary surface reconstruction of the attentional shroud. A key feature of randomdot stereograms is their ability to support the percept of transparent depth surfaces (Julesz, 1970; Norcia and Tyler, 1984). Here the depth tokens are assuming a primary role, for they first need to be specified at each point in the image before the construct of a surface running through each appropriate set of points can be developed. It is as though the surface is strung across the depth tokens to segregate the relevant sets of monocular dots, rather than the reverse. The visual system may be capable of supporting the simultaneous percept of up to three overlaid surfaces (Weinshall, 1990) from fields of randomly intermixed dots. Such multi-layered percepts seem to make it difficult to maintain the perspective that construction of object surfaces is the primary process in spatial perception, because they emphasize the local depth tokens of each feature as the primary structure of visual 3D space, with the surface superstructure erected upon their scaffolding. Before abandoning the view that there is a single surface representation at any point in the field, it is important to be sure that there is no interpretation under which the single surface can remain the primary vehicle of reconstruction, even for perception of multiple transparent surfaces. One such view is that, although only a single surface may be reconstructed at any one moment in time, transparent perception may be obtained by sequential reconstruction of each of the multiple surfaces in turn. Marr and Poggio (1979) followed the approach of the Automap model of Julesz and Johnson (1968) in proposing such sequential reconstruction of depth surfaces. The idea is that surface reconstruction was achieved within a fixed array of cortical disparity detectors by vergence eye movements that shifted the surface reconstruction to different distances in physical space. In each new physical location, the otherwise rigid stereo reconstructive apparatus could then find the densest disparity plane to form the singular local surface. Transparency would be perceived by sequential operation of the local surface reconstruction. The hypothesis of sequential reconstruction by vergence eye movements makes two testable predictions. One is that the disparity range of depth reconstruction mechanism is, by postulate, limited to disparities near zero. Disparity images of flat planes near zero disparity therefore should be easier to detect than disparity images that cut through the zero disparity plane at a steep angle. Steep stereoscopic surfaces should require a sequence of several vergence positions before they can be fully reconstructed. Such a prediction was tested by Uttall, FitzGerald, and Eskin (1975), who generated planes up
Christopher W Tyler
81
Figure 6.11: Wall painting from the House of Julia Felix, illustrating the transparent glassware and reflective vessels available to the Pompeiian aristocracy at the beginning of the Roman Empire. Copyright ©Museo Nationale, Napoli. to 80° from frontoparallel in dynamic-noise stereograms and presented them in brief exposures too short for vergence eye movements to occur. Two-alternative forcedchoice experiments (with a monocularly indistinguishable null target of random depth information) indicated that the detectability of such depth planes was almost independent of angle of slant. This result makes it difficult to conceive how any model with based on purely frontoparallel surface reconstruction can be operating in human vision. A second feature of the eye-movement reconstruction concept is that it does not include a mechanism of attentional enhancement of surfaces projecting within the array of disparity detectors; the only local focusing mechanism is presumed to be that of vergence tracking of the eyes through the 3D optical image. To determine the nature of transparency perception, Tyler and Kontsevich (1995) presented a pair of transparent stereoscopic planes in front of and behind a fixation marker (figure 6.12). To measure the visibility of the two surfaces, they added the modulation signal of a sinusoidal disparity corrugation that could appear in either of the two ambiguous planes on each trial, the other remaining flat. Attention was drawn to one of the planes by presenting an attentional cueing plane immediately prior to the transparent stimulus. The corrugation itself could be in one of two phases (sine or phase-inverted sine relative to the fixation point) to form the forced-choice discrimination task for the observer. When the priming plane fell close to the disparity of either the front or back transparent plane, the phase of the corrugations became readily discriminable. But no information was available about the phase of the non-cued
82
Spatial Form as Inherently Three Dimensional
Figure 6.12: Frontoparallel stereoattention stimulus. Observers fixated on a stable fixation target (left). The test stimulus consisted of a transparent pair of depth planes (right). One of these depth planes, selected at random on each trial, had a sinusoidal depth ripple whose phase had to be identified. The transparent test target was preceded by a flat cueing plane (center) at one of five disparities selected at random. Thus, the cueing plane could be unpredictably at the same or different depths from the plane of the depth ripple on each trial.
plane. Because the priming plane contained no corrugations, it added no information to the discrimination task. Its effect, therefore, must have been due to a non-featurespecific enhancement of the information processing capability in a limited disparity range, which may be described as the operation of disparity-specific attention. The transparent plane experiment shows that the shape discrimination that is easy in the attended plane is impossible (at this duration) in the other plane of the transparent pair. This result reveals that the transparent percept does not allow discrimination of detail in two planes simultaneously. Only the attended plane can be resolved. It appears, therefore, that the attention mechanism plays the same role as the vergence shifts in the vergence eye-movement hypothesis of depth reconstruction. Only one plane can be attended at a time, with the details of the other plane inaccessible to consciousness until attention is switched to that depth location. On this interpretation, the perception of transparency is an illusion akin to the illusion that we see the world at high resolution throughout the visual field. In fact, we see at high resolution only in the restricted foveal region, but we point the fovea to whatever we wish to inspect, so its high resolution is available at all locations in the field. So effective is this sampling mechanism that most people are unaware of the existence of the limited spatial resolution outside their fovea. In a similar fashion, we may be unaware of the surface reconstruction mechanism filling across the plane of current interest at a time.
Christopher W. Tyler
83
6.4 Object-Oriented Constraints on Surface Reconstruction One corollary of this surface-reconstruction approach is the postulate that the object array is represented strictly in terms of its surfaces, as proposed by Nakayama and Shimojo (1990a). The dominance of a depth representation in the performance of the position interpolation and transparency tasks indicates that the depth information is a core process that must be completed before the configuration of the object is known. It is proposed (Tyler and Kontsevich, 1996; Likova and Tyler, 2002) that the depth representation is not simply an abstract pattern of neural firing, but an adaptive neural surface representation that links the available depth information into a coherent twodimensional manifold in a process analogous to the mathematical one of figure 6.8. Is such a mechanism neurally plausible? The phenomenon of perceived (phantom) limbs after amputation offers a perceptual lesion that provides profound insight into the strata of perceptual representation in the somatosensory system (Ramachandran, 1998). Applying such insights to the visual system provides a radical view of its self-organizing capabilities. It is well known that amputees experience a clear and detailed sense of the presence of the limb in the space that it would have occupied before amputation. This vivid phantom implies that there is a cortical representation of the limb that is distinct from its sensory representation. An independent cortical representation is needed to explain the experienced phantom because the sensory representation is no longer being supplied with consistent information, in the absence of the peripheral input. Any residual input will be disorganized noise, and therefore would not support a coherent representation of the pre-existing limb structure. Less well known, but well established, is that the amputee is capable of maneuvring the perceived phantom at will (but only if it was manoeuvrable before amputation; a paralyzed limb remains perceptually paralyzed after amputation; Ramachandran, 1998). This manipulable representation corresponds to the body schema of Sir Henry Head, a complete representation of the positions of the limbs and the body that is accessible to consciousness and manipulable at will. Head et al. (1920) propose the body schema as a neurological construct that has some specific neural instantiation, but it has been largely dismissed as metaphorical in the succeeding century. The idea of a conscious manipulable body schema provides a challenging view of the self-organizing capabilities of the neural substrate, but one that is hard to dismiss when details of the phantom limb manifestation are taken into account (Ramachandran, 1998). It suggests that there are three levels of representation of the sensory world in the visual system: 1. The visual representation in striate cortex, which includes the neural Keplerian array of disparity detectors. The coordinate frame for this representation would be the retinal coordinates of the location on the retina (or the joint retinal coordinates of the two eyes for the stereoscopic aspect). 2. The spatial representation in parietal cortex (in object-centered coordinates), the site of Shepard's manipulable image (Shepard and Metzler, 1971), Julesz's dense planes, and Tyler's attentional shroud. It also corresponds to Gregory's hypothe-
84
Spatial Form as Inherently Three Dimensional
Figure 6.13: Left: Inverted picture of Mirror Lake, Yosemite, with scattered leaves in the 'sky'. Right: Reverted picture reveals that the leaves are floating on the water's surface, which fills transparently across the space to the shoreline. The surface of the lake bottom is visible below and the mountains beyond, making a complex image with three levels of surface reality. From an original picture by Ron Reznick, reproduced with permission of the artist. ses of the spatial configuration tested during perceptual alternations. The representation is inherently self-organizing, with (a) local surface tension to bind it into a data-reducing form, (b) a tendency to self-destruct (autoinhibition) unless continually reinforced by sensory input, and (c) conformity to amodal instruction from distant spatial regions. 3. The intended configuration of the manipulandum in frontal cortex (in egocentric coordinates for convenient manipulation). This attentional manipulation is endogenous, in the sense that it can be manipulated at will according to higher cognitive instruction. This triple conceptualization of space perception is a high-level, dynamic representation that may be termed "prehensile vision." The property that distinguishes the tail of the primates from that of all other species is that it is prehensile; it can be guided by neural signals to reach out and grasp objects like tree branches by wrapping around them, operating like a fifth hand. Miller (1998) has drawn attention to the ability of our vision to perform analogous feats. He describes a depiction of a lake mirroring a sky, with a few leaves scattered on the surface, as may be illustrated in a photograph of Yosemite's Mirror Lake (figure 6.13). When viewed upside down, the reflected sky is upward, and appears distant, with the scattering of leaves seen as blowing through space. Right side up, the cues are sufficient for the reconstruction of the reflective surface of the lake extending toward us, with the leaves floating in the perceptually completed surface. Thus the same region of the picture is seen as distant in one orientation but transparently close in the other.
Christopher W. Tyler
85
What is the mechanism for this reorganization? The triple scheme for a prehensile process of spatial reconstruction proposes that the neural surface representation is not merely a passive connection between local sources of activation, but a dynamic selforganizing search mechanism with guidance from top-down frontal-lobe influences as to where might be interesting to look and what sense a particular arrangement would make. For example, if viewed for sufficient time, the inverted picture of the lake can also elicit surface completion, once it has been conceptualized as an inverted picture in which the lake surface might extend upward over our heads rather than below us. This would be an example of a modified Bayesian constraint. Lake surfaces, by gravitational constraints, are always below us (in the non-SCUBA environment!). There should therefore be a strong Bayesian constraint against expecting a surface above us. But this constraint is eliminated for the case of pictures of the environment, given the possibility that the picture may be inverted. Driven by such influences, the prehensile representation can reach out its surface reconstruction network, or attentional shroud, to search for constellations of surface cues making up meaningful interpretations of the structure of the environment and the objects within it. The process is analogous to the way the hand of the blind person reaches out to feel the shape of objects within range, except that the visual "hand" is infinitely extensible to wrap whatever form is encountered all the way to the far reaches of space. The concept of prehensile vision gives neural sinew to the exploratory perceptual experience that we have in a new spatial environment. It is a component of the attraction to the scenic view at a "vista point" on the highway. We step out of the enclosed space of the vehicle and experience our prehensile reconstruction mechanisms probing the arrays of visual information reaching the retina to expand the scope and reach of the spatial representation across the forms of the distant landscape. This process is often conceptualized as a cognitive endeavor: "Oh, there's that lake we just passed and there's the famous mountain peak we are aiming for." The concept of prehensile reconstruction proposes that beneath this cognitive appeal is a level of dynamic perceptual reconstruction that probes and molds the visual information in a surface representation of the surrounding hillsides, allowing us to experience them in a quasitactile manner that is neurally equivalent to feeling the curves of a bed comforter.
6.5
Conclusion
The evidence assessed in this review triangulates onto the concept that the predominant mode of spatial processing is through a flexible surface representation in a 3D spatial metric. It is not until the surface representation is developed that the perceptual system seems to be able to localize the components of the scene. This view is radically opposed to the more conventional concept that the primary quality of visual stimuli is their location, with other properties attached to this location coordinate (Marr, 1982). The conceptualization of the attentional shroud, on the other hand, is a flexible, selforganizing network that operates as an internal representation of the external object structure. In this concept, the attentional shroud is, itself, the perceptual coordinate frame. It organizes itself to optimize the spatial interpretation implied by the complex
86
Spatial Form as Inherently Three Dimensional
of binocular and monocular depth cues derived from the retinal images. It is not until this process is complete that the coordinate locations can be assigned to the external scene. In this sense, localization is secondary to the full depth representation of the visual input. Spatial form, usually seen as a predominantly 2D property that can be rotated into the third dimension, becomes a primary 3D concept of which the 2D projection is a derivative feature. In this connection, it is worth noting that position signals have a delayed integration time relative to luminance integration (Tyler and Gorea, 1986). This is just an additional line of evidence that position is a derivative variable from the primary object representation, rather than the primary metric property implied by the graphical representation of optical space. The net result of this analysis is to offer a novel insight into the nature of the binding problem. The separate stimulus properties and local features are bound into a coherent object by the "glue" of the 3D surface representation. This view is a radical counterpoint to the concept of breaking the scene down into its component elements by means of specialized receptive fields and recognition circuitry. However, an important aspect of the "understanding" of objects is the representation of the 3D spatial relationships among their components. This understanding cannot be fully realized by a 2D map of the component relationships. The evidence reviewed here points toward the key role of the surface representation in providing the "glue" or "shrink-wrap" to link the object components in their appropriate relationships. It also emphasizes the inherent three-dimensionality of this surface "shrink-wrap" in forming a prehensile matrix with which to cohere the object components whose images are projected onto the sensorium. While further details remain to be worked out, the simulations of the Sethian group (figure 6.6) provide assurance that such processes are readily implementable not only computationally but with plausible neural components that could reside in a locus of spatial reconstruction such as the parietal lobe of the human cortex.
Acknowledgment This work was supported by NIH grant EY 7890.
References Breitmeyer, B., Julesz, B. and Kropfl, W. (1975). Dynamic random-dot stereograms reveal up-down anisotropy and left-right isotropy between cortical hemifields. Science, 187: 269-270. Buckley, D., Frisby, J. P. and Mayhew, J. E. (1989) Integration of stereo and texture cues in the formation of discontinuities during three-dimensional surface interpolation. Percept., 18: 563-5-88. Gregory, R. L. (1968). Perceptual illusions and brain models. Proc. Roy. Soc. Lond. B, 171: 179-196. Gregory, R. L. (1980). Perceptions as hypotheses. Phil. Trans. Roy. Soc. Lond. B, 290: 181-197.
Christopher W. Tyler
87
Head, H., Rivers W. H., Holmes, G. M., Sherren, J., Thompson, H. T. and Riddoch, G. (1920). Studies in Neurology. H. Frowde, Hodder and Stoughton: London. Hess, R. F. and Holliday, I. E. (1992). The coding of spatial position by the human visual system: Effects of spatial scale and contrast. Vis. Res., 32: 1085-1097. Julesz, B. (1971). Foundations of Cyclopean Perception. University of Chicago Press: Chicago. Kanizsa, G. (1976). Subjective contours. Sci. Am., 234: 48-52. Kontsevich, L. L. and Tyler, C. W. (1998). How much of the visual object is used in estimating its position? Vis. Res., 38: 3025-3029. Kontsevich, L. L. and Tyler, C. W. (1999). Bayesian adaptive estimation of psychometric slope and threshold. Vis. Res., 39: 2729-2737. Levi, D. M., Klein, S. A. and Wang, H. (1994). Discrimination of position and contrast in amblyopic and peripheral vision. Vis. Res., 34: 3293-3313. Likova, L. T. and Tyler, C. W. (2003). Peak localization of sparsely sampled luminance patterns is based on interpolated 3D object representations. Vis. Res., 43: 26492657. Marr, D. (1982). Vision. W. H. Freeman: San Francisco. Marr, D. and Poggio, T. (1979). A computational theory of human stereo vision. Proc. Roy. Soc. Land. B, 204: 301-328. Miller, J. (1998). On Reflection. Yale University Press: New Haven, CT. Mitchison, G. J. and McKee, S. P. (1985). Interpolation in stereoscopic matching. Nature, 315:402^04. Morgan, M. J. and Watt, R. J. (1982). Mechanisms of interpolation in human spatial vision. Vis. Res., 25: 1661-1674. Nakayama, K. and Shimojo, S. (1990) Towards a neural understanding of visual surface representation. In T. Sejnowski, E. R. Kandel, C. F. Stevens, and J. D. Watson (Eds.), The Brain, Cold Spring Harbor Symposium on Quantitative Biology, Cold Spring Harbor Laboratory: NY, 55: 911-924. Nakayama, K., Shimojo, S. and Silverman, G. H. (1989). Stereoscopic depth: its relation to image segmentation, grouping, and the recognition of occluded objects. Percept., 18: 55-68. Norcia, A. M. and Tyler, C. W. (1984). Temporal frequency limits for stereoscopic apparent motion processes. Vis. Res., 24: 395^*01. Norman, J. F. and Todd, J. T. (1998). Stereoscopic discrimination of interval and ordinal depth relations on smooth surfaces and in empty space. Percept., 27: 257272. Pacioli, L. (1498/1956). Compendium de Divina Proportione. Fontes Ambrosiani: Milan. Ramachandran, V. S. (1986). Capture of stereopsis and apparent motion by illusory contours. Percept. Psychophys.,39: 361-373.
88
Spatial Form as Inherently Three Dimensional
Ramachandran, V. S. (1998). Consciousness and body image: lessons from phantom limbs, Capgras syndrome and pain asymbolia. Phil. Trans. Roy. Soc. Land. B, 353: 1851-1859. Regan, D. M. (2000). Human Perception of Objects. Sinauer and Associates: Sunderland, MA. Sarti, A., Malladi, R. and Sethian, J. A. (2000). Subjective Surfaces: A method for completing missing boundaries. Proc. Nat. Acad. Sci. USA, 12: 6258-6263. Schumann, F. (1904). Beitrage zur Analyse der Gesischtswahrnemungen: Einege Beobachtungen tiber die Zusammenfassung von Gesichtseindrucken zu Einheiten. Psychologische Studien, 1: 1-32. Shepard, R. N. and Metzler, J. (1971). Mental rotation of three-dimensional objects. Science, 171,701-703. Singer, W. (2001). Consciousness and the binding problem. Ann. NY Acad. Sci., 929: 123-146. Toet, A. and Koenderink, J. J. (1988). Differential spatial displacement discrimination thresholds for Gabor patches. Vis. Res., 28: 133-143. Tse, P. U. (1999). Volume completion. Cog. Psy., 39: 37-68. Tyler, C. W. (1983). Sensory aspects of binocular vision. In Vergence Eye Movements: Basic and Clinical Aspects, pp. 199-295. Butterworths: Boston. Tyler, C. W. (1990). A stereoscopic view of visual processing streams. Vision Res., 30: 1877-1895. Tyler, C. W. and Cavanagh, P. (1991). Purely chromatic perception of motion in depth: Two eyes as sensitive as one. Percept. Psychophys., 49: 53-61. Tyler, C. W. and Gorea, A. (1986). Different encoding mechanisms for phase and contrast. Vis. Res., 26: 1073-1082. Tyler, C. W. and Kontsevich, L. L. (1995). Mechanisms of stereoscopic processing: stereoattention and surface perception in depth reconstruction. Percept., 24: 127-153. Tyler, C. W. and Liu, L. (1996). Saturation revealed by clamping the gain of the retinal light response. Vis. Res., 36: 2553-2562. Weinshall,D. (1991). Seeing "ghost" planes in stereo vision. Vis. Res.,31: 1731-1748. Wurger, S. M. and Landy, M. S. (1989). Depth interpolation with sparse disparity cues. Percept., 18: 39-54. Yang, Y. and Blake, R. (1995). On the interpolation of surface reconstruction from disparity interpolation. Vis. Res., 35: 949-960.
Part II
Motion and Color
This page intentionally left blank
7. White's Eff t in Lightness, Color, and M ion Stuart Anstis Abstract In White's (1979) illusion of lightness, the background is a square-wave grating of black and white stripes (figure 7.la). Gray segments that replace parts of the black stripes look much lighter than gray segments that replace parts of the white stripes. Assimilation from flanking stripes has been proposed, the opposite of simultaneous contrast. We use colored patterns to demonstrate that the perceived hue shifts are a joint function of contrast and assimilation. Simultaneous contrast was relatively stronger at low spatial frequencies, assimilation at high. Both the chromatic and achromatic versions of White's effect were stronger at high spatial frequencies. "Geometrical" theories attempt to explain White's effect with T-junctions, anisotropic lateral inhibition, and elongated receptive fields. But a new isotropic random-dot illusion of lightness called "Stuart's rings" resists any anisotropic explanations. White's illusion also affects motion perception. In "crossover motion," a white and a black bar side by side abruptly exchange luminances on a gray surround. Direction of seen motion depends upon the relative contrast of the bars. On a light (dark) surround the black (white) bar is seen as moving. Thus the bar with the higher contrast is seen as moving in a winner-take-all computation. But if the bars are embedded in long vertical lines, the luminance of these lines is 2.3 times more effective than the surround luminance in determining the seen motion. Thus motion strength is dependent upon White's effect and is computed after it.
7.1 Introduction White's illusion is shown in figure 7.1 A. The gray rectangles are the same, but the left one looks lighter. This is surprising: by local contrast, the left ones should look darker than the right ones. The left gray stripes have a long border with white and a short border with black. The illusion is reversed from the usual direction (Adelson, 2000). 91
92
White's Effect in Lightness, Color, and Motion
Figure 7.1: A: White's effect (after White, 1979,1981). Gray regions look darker when embedded in white stripes and flanked by black stripes than vice versa. B: effect increases with spatial frequency; apparently light regions look even lighter (upper curve) and apparently dark regions look even darker (lower curve). Interpretations of White's effect include: • Endwise simultaneous contrast from the embedding stripes, plus assimilation from the flanking stripes (White, 1979, 1981). Assimilation and contrast are sometimes called positive and negative brightness induction. • Stimulus geometry: T-junctions (Todorovic, 1997: Zaidi, Spehar, and Shy, 1997). Patches straddling the stem of a T are grouped together for the lightness computation, and the cross-bar of the T serves as an atmospheric boundary. • Visual system geometry: Hypothetical elongated receptive fields produce anisotropic brightness induction plus neural filtering (Kingdom and Moulden, 199la, b; Blakeslee and McCourt, 1999). • Visual "scission" treats the gray regions as separate transparent layers (Anderson, 1997). This list is by no means exhaustive. Our experiments are limited to low rather than high-level explanations, and they seek to show that White's effect involves both contrast and assimilation, but perhaps no anisotropic geometrical factors.
7.2 Experiment 1. White's Effect Increases with Spatial Frequency White's effect was measured by a matching method. Outside the striped area of figure 7.1 A was a solid gray adjustable patch of the same size (not shown), which the observer adjusted to a perceptual match. All settings were recorded for later analysis.
Stuart Ansris
93
Setting the magnification of the display to 12, 8,4,2, and 1 fixed the spatial frequency of the stripes to 0.627,0.94,1.88, 3.76, and 7.53 cpd, respectively at a constant viewing distance of 72 cm. (We avoided varying the viewing distance, in case this might introduce accommodation-linked chromatic aberrations). Results are shown in figure 7.IB (mean of 2 naive subjects x 3 readings). Figure 7. IB shows that as the spatial frequency was increased, the apparently lighter patch looked progressively even lighter (upper line in figure 7. IB) and the apparently darker patch looked progressively even darker (lower line in figure 7. IB). At the highest frequency used (7.53 cpd), the left gray patch looked 0.4 log units lighter, a factor of 2.5, than the right patch.
7.3 Experiment 2. A Colored White's Effect Shows Both Contrast and Assimilation A gray test patch embedded in a white stripe and flanked by black stripes looks darker. Is this caused by contrast with the embedding white stripe, or by assimilation to the flanking black bars? To find out, we changed the 3 cpd black and white stripes of figure 7.1 A into noncomplementary colors, namely, cyan (CIE x — .23, y = .31), and green (CIE x = .29, y = .48), and plotted the results in CIE color space. Consider a gray patch embedded in cyan (central circle in figure 7.2A). Any simultaneous contrast from cyan would give it a reddish tinge and shift its perceived hue to the right. Any assimilation toward the flanking green stripes would give it a greenish tinge and shift its perceived hue upwards. In fact, it shifted in both directions, up and to the right (thick arrow in figure 7.2A) The relative lengths of the vertical and horizontal vectors gives the proportion of assimilation to simultaneous contrast. Likewise, a gray patch embedded in the green stripes shifted down to the left, so both patches shifted in directions parallel to the hypotenuse of the CIE green-gray-cyan triangle. We conclude that both assimilation and simultaneous contrast play a large role in the colored White's effect. Clifford and Spehar (2003) have reached similar conclusions from their experiments.
7.4 Experiment 3. Colored White's Effect: Spatial Frequency We now made the stripes orange (CIE x = .496, y = 438) and magenta (x — .320, y = .165), and used the same range of spatial frequencies as in experiment 1. These colored stripes made the gray stripes look compellingly bluish and greenish, and naive observers often refused to believe that they were really achromatic. They adjusted the hue and saturation of the matching patches by means of color palettes. Results are plotted in CIE color space in figure 7.2B (mean of 3 subjects x 3 readings). The gray test patches (open circle in center) appeared to be tinged with green (triangles) or blue (squares). This figure shows that the perceived hues were shifted approximately parallel to the hypotenuse, thus showing a combination of assimilation and simultaneous contrast. These data are replotted in figure 7.3A to show that the length of the
94
White's Effect in Lightness, Color, and Motion
Figure 7.2: Colored White's effect. A: Gray regions embedded in cyan stripes are repelled by cyan (rightward arrow) and also attracted to flanking green stripes (upward arrow). Opposite is true for gray regions embedded in green. B: Embedding stripes were magenta and orange. Gray test regions (central circle) showed increasing color shifts with spatial frequency.
Figure 7.3: Replotting data from figure 7.2B: increasing the spatial frequency increases both A: Strength of color shift and B: log ratio of assimilation to contrast. color-shift vectors, that is the saturation of the induced colors, increased with spatial frequency. In fact, both the achromatic and chromatic versions of White's effect increased with spatial frequency. The direction of these vectors reveals the ratio between the amount of contrast and of assimilation. Note that in figure 7.2B the square data points lie on a downward curve, showing increasing assimilation toward the flanking magenta stripes at higher spatial frequencies. Likewise, the uppermost triangles lie on a curve up and to the right, showing that they assimilate toward the flanking or ge stripes. In both cases
Stuart Anstis
95
raising the spatial frequency increased the amount of assimilation relative to simultaneous contrast (figure 7.3B). Thus, for gray stripes embedded in magenta (lower curve in figure 7.3B), contrast was more than ten times stronger than assimilation at 0.627 cpd, but was only 1.26 times stronger (0.1 log units) at 7.53 cpd. The slopes of the curves indicate that an octave increase in spatial frequency increased the ratio of assimilation to contrast by 0.8 octaves for stripes embedded in magenta, and by 0.23 octaves for stripes embedded in orange. These results suggest that assimilation has a smaller spatial range than contrast, and this fits with the common observation that fine lines give the most assimilation (Bertulis and Saudargene, 1988; Reid and Shapley, 1988). It is also consistent with hypothetical receptive fields with small summatory centers that handle assimilation, and with much larger inhibitory surrounds that handle simultaneous contrast.
7.5 Experiment 4. An Isotropic Brightness Illusion: "Stuart's Rings" Some geometric theories of White's effect invoke the role of T-junctions in the stimulus, or of elongated receptive fields in the stimulus. However, a new isotropic brightness illusion called "Stuart's rings," which can be stronger than White's effect, seems to rule out these theories. In figure 7.4A the gray parts of the rings in each column are identical, but those in the middle row look subjectively darker, and those in the bottom row look subjectively lighter, than the rings in the top row. The perceived lightness shifts are in the same direction as in White's effect, but with random dots instead of horizontal stripes. Thus in the middle row the rings are of interspersed gray and black dots, with the gray dots replacing the white dots of the surround. These rings look dark. In the bottom row the rings are made of interspersed gray and white dots, with the gray dots replacing the black dots of the surround. These rings look light. Compare this with figure 7.1 A, where in the gray/black right panel the gray stripes are Hanked by black stripes and replacing white stripes. This panel looks dark. In the gray/white left panel the gray stripes are flanked by white stripes and replacing black stripes. This panel looks dark. This illusion was measured by a matching method. Ring diameters were 1.9 °, and dot diameters were 4 min arc. Two observers adjusted the luminance of the rings hi the top row until they appeared to match the lightness of rings either in the middle row, or in the bottom row. Their settings are plotted in figure 7.4B (mean of 2 subjects x 3 readings). The x axis shows the actual ring luminances, expressed as a percentage of "white" (= 108 cd/m2). The y axis shows the amount of lightness illusion, where the rings looked darker, unchanged or lighter for y < 0, y = 0, y > 0. Effects were stronger for physically darker rings. In fact the darkest rings that we used (12% of white) looked as much as four times (0.6 log units) lighter in the bottom than in the middle row. These results show that isotropic random-dot patterns can produce strong lightness illusions hi the absence of T-junctions or elongated areas. One might argue that "Stuart's rings" are entirely different from White's effect, but this would gain little since one then would need to develop two separate explanations instead of one!
96
White's Effect in Lightness, Color, and Motion
Figure 7.4: "Stuart's rings" illusion. A: All three rings in each column are identical gray, but look darker when gray replaces white random dots (middle row) and lighter when gray replaces black random dots (bottom row). B: The illusion is greatest for physically darkest rings.
7.6 Experiment 5. White's Effect and Apparent Motion A bar that alternates between two spatial positions appears to jump back and forth (see reviews by Kolers, 1972; Anstis, 1978,1980). In "crossover" motion (Anstis and Mather, 1985), a black and a white bar side by side switch luminances repetitively
Stuart Anstis
97
Figure 7.5: A black and a white bar abruptly exchange luminances (Anstis and Mather, 1985). A: Flicker in place is rarely seen. B: On a light surround the black bar appears to jump. C: On a dark surround the white bar appears to jump. However, D: Embedding the bars in dark stripes makes the white bar appear to jump, despite the surround, and E: Embedding the bars in light stripes makes the black bar appear to jump, despite the surround. Conclusion: White's effect alters "crossover" apparent motion. over time. This display is rarely perceived as stationary flicker in place (figure 7.5 A), because the "suspicious coincidence" (Barlow, 1974) in which one bar appears just as the other disappears triggers the visual system to apply Occam's razor, namely, to adopt the minimum hypothesis about the real world that fits the maximum evidence in the visual input. This minimax is provided by the hypothesis that a single bar is jumping to and fro. But which bar is seen as jumping? This depends on the surround. On a dark surround, the white bar is seen as moving, but on a light surround, the black bar is seen as moving (figure 7.5B, C). Thus the bar with the higher contrast against the surround gives a stronger motion signal and is seen as moving (Anstis and Mather, 1985). Does White's effect alter the perceived contrast, and hence motion, of the jumping bars? We varied independently the luminances of the surround, and of long vertical stripes that embedded the jumping bars (figure 7.5D, E) and found that the stripes overruled the influence of the remaining surround, consistent with White's effect. So we measured the relative strengths of the stripes and the remaining surround, by titrating their luminances and seeing which determined the perceived direction of apparent motion. Figure 7.6 shows one frame of a two-frame movie: in the other frame the short black and white bars exchanged luminances. In figures 7.6A-C, all the jumping bars are identical, but in a the surround luminance is spatially graded from left to right, so that in the left half of the figure the white bars appear to move, whilst in the right half the black bars appear to move. A vertical line separates these two perceptual half-fields. Now look at figure 7.6B. All the bars are still the same, but now they are embedded in long vertical stapes that are graded from light at the bottom to dark at the top. The surround is black so plays no part in what is seen. In the bottom half of the figure the white bars appear to move, whilst in the top half the black bars appear to move. A horizontal line separates these two perceptual half-fields. In figure 7.6C these two stimuli are combined so that the surround is graded from left to right and in addition the embedding stripes are graded from top to bottom. Eight observers viewed each bar pair in turn through a small hole, and reported whether the white or the black bar seemed to move. The regions in which the black bars versus the white bars appeared to move could be separated by a line, whose slope
98
White's Effect in Lightness, Color, and Motion
Figure 7.6: White's effect alters "crossover" apparent motion. A: On dark surround at left, white bars seem to jump, and on light surround at right, black bars seem to jump. Bars with higher contrast win. B: On dark embedding stripes at top, white bars seem to jump, and on light embedding stripes at bottom, black bars seem to jump. C: Combining A with B pits surround against embedded bars. The slope of the dividing line shows the relative influence of surround and embedding stripes. revealed the relative importance of the surround versus the embedding line. A vertical (or horizontal) separating line would show that only the surround (or only the embedding stripes) determined the perceived bar contrast, and motion signal strength. Results are shown in figure 7.7 (mean of 8 subjects x 3 readings). This separating line was oblique. Below the line the surround and embedding stripes were dark and the white bar appeared to move. Above the line the opposite was true. The slope of this line was only 0.43, which indicates that the embedding stripes were 2.33 (— 1/0.43) times more important than the surround in determining the bars' contrast for motion. We conclude that White's effect occurs before the motions of the bars are computed In conclusion, White's effect involves both assimilation and simultaneous contrast: Geometrical theories involving T-junctions and elongates receptive fields might fit White's effect, but they do not explain Stuart's rings. Finally, White's effect occurs before motion processing and can influence the strength of motion signals.
Acknowledgments Supported by grants from UCSD Senate and Department of Psychology. Thanks to Georgina Blanc, Karen Clements, Noelle Der-Macleod, Dara Hashemi, Nicole Mead, Kaaryn Pederson, Katie Rozner, Ilan Shrira, and Ted Wu for assistance in collecting the data, and to Alex Holcombe and Lindsey Shenk for comments on the manuscript.
References Adelson, E. H. (2000). Perception and lightness illusions. In M. Gazzaniga (Ed.), The New Cognitive Neurosciences, 2nd Edn, pp. 339-351. MIT Press: Cambridge, MA.
Stuart Anstis
99
Figure 7.7: Apparent-motion results from stimuli in figure 7.6. Below the line, stripes and surround were dark and white bars appeared to move. Above the line, stripes and surround were light and black bar appears to move. Slope of line is 0.43, showing that luminance of embedding stripes is 2.33 (= 1/0.43) times more important than surround luminance. Conclusion: White's effect strongly influences crossover motion. Anderson, B. L. (1997). A theory of illusory lightness and transparency in monocular and binocular images: the role of contour junctions. Perception, 26: 419-453. Anstis, S. M. (1978), Apparent movement. In R. Held, H. W. Leibowitz, and H.L. Teuber (Eds.) Handbook of Sensory Physiology, vol. VIII, pp. 655-673. Springer-Verlag: New York. Anstis, S. M. (1980). The perception of apparent movement. Phil. Trans. Roy. Soc. Lond. 5,290: 153-168. Reprinted in The Psychology of Vision, C. LonguetHiggins and N. S. Sutherland (Eds.), 153-168. The Royal Society: London. Anstis, S. M. and Mather, G. (1985). Effects of luminance and contrast on direction of ambiguous apparent motion. Perception, 14: 167-179. Barlow, H. B. (1974). Inductive inference, coding, perception, and language. Perception, 3: 123-34. Bertulis, A. V. and Saudargene, D. S. (1988). Spatial parameters of the color assimilation effect. Sensory Systems, 2: 204-209. Blakeslee, B. and McCourt, M. E. (1999). A multiscale spatial filtering account of the White effect, simultaneous brightness contrast and grating induction. Vis. Res., 39: 4361^t377. Clifford, C. W. G. and Spehar, B. (2003). Using colour to disambiguate contrast and assimilation in White's Effect. J. Vis.,3: 294a. Kingdom, F. and Moulden, B. (199la). A model for contrast discrimination with incremental and decremental test patches. Vis. Res., 31: 851-858.
100
White's Effect in Lightness, Color, and Motion
Kingdom, F. and Moulden, B. (1991b). White's effect and assimilation. Vis. Res., 31: 151-159. Kolers, P. A. (1972). Aspects of Motion Perception. Pergamon Press: New York. Reid, R. C. and Shapley, R. (1988). Brightness induction by local contrast and the spatial dependence of assimilation. Vis. Res., 28: 115-132. Todorovic, D. (1997). Lightness and junctions. Perception, 26: 379-394. White, M. (1979). A new effect of pattern on perceived lightness. Perception, 8: 413416. White, M. (1981). The effect of the nature of the surround on the perceived lightness of gray bars within square-wave test gratings. Perception, 10: 215-230. Zaidi, Q., Spehar, B. and Shy, M. (1997) Induced effects of backgrounds and foregrounds in three-dimensional configurations: the role of T-junctions. Perception, 26: 395-408.
8. The Processing of Motion-Defined Form Deborah Giaschi It has been known since the time of Helmholtz that relative motion between a figure and its background can break camouflage and make figure-ground segregation possible. A demonstration on the CD that accompanies this book shows that a camouflaged bird becomes visible and its shape can be identified when it moves across a field of stationary lines (Regan, 2000). This is one type of motion-defined (MD) form. A different type of MD form is created by keeping the figure stationary and moving the texture inside the figure relative to the texture outside the figure. This chapter is a review of studies using the latter type of MD form with patients and/or children. In 1989, Martin Regan and Hua Hong created a motion-defined letter test, described below, to facilitate studies of MD form perception in patients (Regan and Hong, 1990). Angela Kothe began this work at Toronto Western Hospital, and I took over from Angela in 1990. We studied the effect of multiple sclerosis, glaucoma, amblyopia, Parkinson's disease, and cortical lesions on MD form perception in collaboration with James Sharpe, Graham Trope, Stephen Kraft, Anthony Lang, and Mark Bernstein. We found that MD form perception can be selectively disrupted in patients with normal visual acuity for high- and low-contrast luminance-defined (LD) form (amblyopia, Giaschi et al., 1992; glaucoma, Giaschi et al., 1996; multiple sclerosis, Regan et al., 1991) and in patients with normal detection and direction discrimination of coherent motion (multiple sclerosis, Giaschi et al., 1992; cortical lesions, Regan et al., 1992). In some of these patients, there was a deficit for identifying MD form but not for detecting it. We suggested that interconnections between motion and form pathways are involved in the processing of MD form (Giaschi et al., 1992; Giaschi and Regan, 1997; Regan et al., 1992), and we showed that damage to the white matter underlying parietotemporal cortex, which presumably connects the motion and form pathways, can disrupt MD form perception (Regan et al., 1992). We also pointed out that the MD letter test is probably not a test of strictly magnocellular (M) pathway function (Giaschi et al., 1996), and we even claimed that M pathway function is not essential for identifying MD form (Giaschi et al., 1997). Contrary to our conclusions, MD form stimuli have been used to study the onset of direction-selective motion perception in 101
102
The Processing of Motion-Defined Form
human infants (e.g., Wattam-Bell, 1996) and the intactness of low-level motion mechanisms in patients (e.g., Battelli et al., 2001). In addition, the MD letter test has been used to assess M/dorsal pathway function in children with dyslexia (Felmingham and Jakobson, 1995) and in extremely low birthweight children (Downie et al., 2003). Below I review evidence from multiple sclerosis, cortical lesions, and normal and abnormal visual development suggesting that the processing of MD form can be dissociated from the processing of simple motion stimuli. Then evidence is presented from Parkinson's disease, reduced visual acuity, and functional neuroimaging suggesting that the processing of MD form is not restricted to the M/dorsal visual pathway. Some of this work was recently completed in my laboratory at the University of British Columbia in collaboration with graduate students Catherine Boden, Ryan Hoag, Cindy Ho, and Emillie Parrish; postdocs Bob Dougherty, Chien-Chung Chen, and Veronica Edwards; and clinical colleagues Chris Lyons, Roy Cline, Bruce Bjornson, and Dorothy Edgell.
8.1 The Motion-Defined Letter Test The MD letter test was designed to be similar to the familiar Snellen test of visual acuity. A camouflaged letter is presented within a pattern of random dots (figure 8.1A). The letter is not visible when the dots are stationary or all moving in the same direction at one speed. The letter is made visible by moving the dots inside the letter in one direction and the dots outside the letter in the opposite direction at the same speed. The letter itself is stationary, and its edges are defined by the difference in motion direction. Defining motion contrast by a direction difference removes the confounding texture contrast cues that are present when motion contrast is defined by a speed difference (e.g., Donahue et al., 1998; Schrauf et al., 1999). A series of control experiments have ruled out the appearance and disappearance of dots at the edges of the letter as an additional confounding cue (summarized in Regan, 2000, pp. 298-301). The task is to either identify the letter or detect its presence in one of two intervals. Ten different letters are presented in random order at a fixed letter size, dot contrast, and dot speed. The task is made harder to determine minimum performance thresholds by fixing two of these parameters and reducing the third. We found that minimum speed thresholds were more sensitive to disruptions in MD form perception than minimum size or contrast thresholds (Giaschi et al., 1996; Giaschi et al., 1992; Regan et al., 1991). Others researchers have reduced the proportion of coherently moving dots inside the letter relative to the background to measure an identification threshold (e.g., Rizzo et al., 1995).
Figure 8.1: Motion-defined form stimuli. A: The MD letter test. B: The MD shape direction discrimination stimulus. C: The global motion stimulus. D: The texturedefined shape identification stimulus.
8.2 Dissociations Between Motion-Defined Form and Simple Motion Processing 8.2.1
Multiple Sclerosis
Multiple sclerosis (MS) is a demyelinating disease that produces a range of visual deficits. Regan et al. (1991) found deficits in MD form perception to be common in patients with MS. Speed thresholds for identifying MD letters were elevated in 17 of the 22 patients studied. Visual acuity for high- and/or low-contrast letters was normal in eight of these patients. In a subsequent study, we examined additional aspects of motion processing in 10 patients with MS (Giaschi et al., 1992). We measured speed thresholds for identification of MD letters, detection of MD letters, detection of coherent randomdot motion, and direction discrimination of coherent random-dot motion.
104
The Processing of Motion-Defined Form
We found that all 10 patients had elevated speed thresholds for identifying MD letters in at least one eye, relative to the age-matched controls tested. Five patients showed a generalized motion deficit with abnormal performance on all four motion tasks. The other five patients showed deficits specific to MD form. In four of these patients the deficit was specific to MD form identification and spared MD form detection. We concluded that, in some patients, the failure to identify MD letters was not due to a general failure of the neural processing of motion information. We suggested that MS damages the interconnections between motion and form pathways.
8.2.2
Cortical Lesions
The white matter lesions in MS are multiple and diffuse, but we have found similar deficits in MD form perception in patients with single, unilateral cortex lesions produced by tumour excision or vascular events (Regan et al., 1992). We measured speed thresholds for identification and detection of MD letters and for detection and direction discrimination of coherent random-dot motion in 13 patients and 20 controls. Each patient had normal visual acuity and no field defects in the central 10 °. We found that speed thresholds for direction discrimination and motion detection were normal in all 13 patients. Identification and detection thresholds for MD letters were elevated in three patients. Identification thresholds only were elevated in four patients. These results provide additional evidence that the processing of MD form can be dissociated from the processing of simple motion stimuli. The lesions in these seven patients with MD form perception deficits were in parietotemporal white matter underlying Brodmann areas 18, 19, 37, 39, 21, and 22. The lesions in the remaining six patients with normal MD form perception did not invade this region. We suggested that the parietotemporal lesions interrupted interconnections between the motion and form pathways. The subcortical visual pathways contain at least two subdivisions, magnocellular (M) and parvocellular (P), that start in the retina and continue through separate layers in the thalamic lateral geniculate nucleus to the visual cortex (Leventhal et al., 1981) (see figure 8.2). In VI and V2 there is a mingling of the M and P inputs, but a predominant projection from the M pathway continues dorsally to areas V5/MT and MST and on to the posterior parietal cortex (Deyoe and Van Essen, 1988). The P pathway continues ventrally to V4 and terminates in the inferior temporal cortex. This is of course an oversimplification and does not include the koniocellular pathway (see Hendry and Reid, 2000). Neurons in the subcortical M pathway are tuned to low spatial and high temporal frequencies (Shapley and Perry, 1986). Neurons in the subcortical P pathway are tuned to high spatial and low temporal frequencies and are selective for wavelength. At the cortical level, neurons in the dorsal stream are highly selective for the direction of motion and neurons in the ventral stream are more selective for color and form (Lennie et al., 1990). We suggested that both of these pathways and their interconnections are involved in the processing of MD form.
Deborah Giaschi
105
Figure 8.2: Parallel pathways for processing motion and form in the primate visual system (adapted from Merigan and Maunsell, 1993). Abbreviations in this figure: LGN, lateral geniculate nucleus; MT, middle temporal area; VIP, ventral intraparietal area; MST, medial superior temporal area; LIP, lateral intraparietal area; PPC, posterior parietal cortex; ITC, inferior temporal cortex.
8.2.3
Normal Development
The ability to perceive form defined by motion contrast appears as early as 3 to 4 months of age (Kaufmann-Hayoz et al., 1986; Wattam-Bell, 1996). Performance improves to adult levels by 7 years of age when minimum speed thresholds are measured (Giaschi and Regan, 1997), and by 10 to 15 years of age when coherence thresholds are measured (Gunn et al., 2002; Schrauf et al., 1999). MD form processing involves spatial integration of local motion signals and figureground segregation. To break down the developmental process we chose a global motion task (figure 8.1C) that involved spatial integration of local motion signals but not figure-ground segregation. The task was to discriminate the direction of motion (up or down). At 100% coherence all of the dots moved in the same direction. The coherence level was reduced by having an increasing proportion of the dots move in random directions until direction discrimination reached chance performance. We also chose a texture-defined (TD) form task (Figure 8. Id) that involved spatial integration and figure-ground segregation but no motion. The task was to identify the shape. The orientation difference between the line segments inside the shape and the line segments outside the shape was reduced from 90° until identification reached chance performance. We compared the maturation of speed thresholds for MD form identification to
106
The Processing of Motion-Defined Form
the maturation of coherence thresholds for global motion direction discrimination and the maturation of orientation-difference thresholds for TD form identification in 190 children age 3 to 12 years (Parrish et al., 2004). Instead of letters, we used MD and TD shapes that preschool children could identify (circle, fish, duck, heart, gingerbread man). We found that speed thresholds for identifying MD shapes were high in preschool children and reached adult levels by 6 years of age (figure 8.3A). This represents a slightly earlier maturation than our previous finding using MD letters instead of shapes (Giaschi and Regan, 1997). We found that coherence thresholds for discriminating the direction of global motion were not significantly different across all age groups tested (figure 8.3B). This aspect of motion perception appeared to be mature in the youngest children we could test, which is quite different from the motion-defined shape result. We found that orientation difference thresholds for identifying TD shapes improved rapidly between the ages of 3 and 5, then matured slowly until 9 years of age (figure 8.3C). This is similar to the results of Sireteanu and Rieth (1992). Coherence thresholds for detecting TD shapes have been found to mature slightly earlier at 6 to 7 years of age (Gunn et al., 2002). We interpret these results to show that the mechanisms involved in the integration of local motion signals mature before those involved in figure-ground segregation. This is consistent with our claim that MD form processing requires additional mechanisms to those involved in simple motion processing. Figure-ground segregation based on motion contrast appears to mature before figure-ground segregation based on texture contrast. This latter conclusion may be specific to the stimuli used. Gunn et al. (2002) reported the opposite developmental pattern for MD and TD form defined by coherence level.
8.2.4
Abnormal Development
Amblyopia Amblyopia is a developmental visual disorder characterized by poor visual acuity in one eye. The fellow eye is usually considered to be normal. Amblyopia may develop in a healthy eye during childhood if it is deprived of normal visual stimulation due to ocular misalignment (strabismus), unequal refractive errors (anisometropia), or both. Amblyopia is generally understood to affect the cortical mechanisms underlying form perception. Observed deficits include losses in contrast sensitivity and vernier acuity, difficulties with orientation discrimination and spatial localization (reviewed in Levi, 1991), and image distortion (Barrett et al., 2003). We discovered a robust deficit in the ability of amblyopic children to identify MD letters (Giaschi et al., 1992). Many of the children tested could not identify the letters even at the fastest speed at which they were easily recognized by non-amblyopic children of the same age. Surprisingly, the deficit was just as robust in the fellow eye. This MD form deficit is consistent with reports of static contour integration deficits in amblyopia (Hess et al., 1997; Kovacs et al., 2000), but it may also represent a deficit in direction-selective motion processing. Recently deficits have been reported in the amblyopic eye for several aspects of motion perception including the motion aftereffect (Hess et al., 1997), movement hyperacuity (Kelly and Buckingham, 1998), the finegrain motion illusion (Reed and Burdett, 2002), attentive motion tracking (Paul et al.,
Deborah Giaschi
107
Figure 8.3: Normal development. A: Mean speed thresholds for MD shape identification as a function of age in years. B: Mean coherence thresholds for global motion direction discrimination as a function of age. C: Mean orientation difference thresholds for TD shape identification as a function of age. Error bars are standard errors.
108
The Processing of Motion-Defined Form
Figure 8.4: Children with amblyopia. A: Mean speed thresholds for MD shape identification in each of the four groups. B: Mean coherence thresholds for global motion direction discrimination in each of the four groups. Error bars are standard errors. C = control group, S = strabismic amblyopia group, A = anisometropic amblyopia group, A+S = anisostrabismic amblyopia group.
Deborah Giaschi
109
2001), and second-order global motion (Simmers et al., 2003). Deficits in the fellow eye were not emphasized in these studies. To study the role of simple motion processing in MD form perception, we measured speed thresholds for MD shape identification and coherence thresholds for global motion direction discrimination in the fellow eye of 28 children with unilateral amblyopia and 28 age-matched controls (Ho et al., 2004). The children ranged in age from 3.7 to 11.2 years (mean=6.7). The amblyopic group was divided into three subgroups: strabismic (n=6), anisometropic (n=15), or anisostrabismic (n—7) Speed thresholds for identifying MD shapes were significantly elevated, relative to the control group, in all three amblyopic groups (figure 8.4A). This replicates our earlier findings with a different group of amblyopic children using motion-defined letters instead of shapes (Giaschi et al., 1992). Coherence thresholds for direction discrimination of global motion were similar in the control, strabismic, and anisometropic groups. Coherence thresholds were significantly elevated in the anisostrabismic group (figure 8.4B), suggesting a generalized motion deficit in these children only. We interpret these results as a selective sparing of the integration of local motion signals in the presence of a disruption in MD form perception in many children with amblyopia. This is consistent with our claim that the processing of MD form involves additional mechanisms to those involved in the processing of simple motion stimuli. Dyslexia Dyslexia is a developmental learning disability characterized by difficulty acquiring reading skills. Recently, subtle deficits in several different aspects of motion processing have been associated with dyslexia (Cornelissen et al., 1995; Demb et al., 1998; Everatt et al., 1999). We examined the ability of several different measures of flicker and motion perception to discriminate children with dyslexia from their peers with average reading ability (Edwards et al., 2004). MD form discrimination and global motion direction discrimination were two of the tasks included in this study. Because some children with dyslexia have a generalized naming deficit (Wolf et al., 2000), we added a different MD form task. The MD form was an arrowhead that pointed to the left or to the right (figure 8. IB). The task was to indicate the direction the arrowhead pointed. The speed of the moving dots was reduced until performance reached chance levels. We found that coherence thresholds were significantly higher in the dyslexic group (n — 21) than in the control group (n = 24) (figure 8.5 A). Speed thresholds were similar for the two groups of children on both the MD direction discrimination (figure 8.5B) and the MD shape identification task (figure 8.5C). A deficit in global motion perception in dyslexia has been reported by several other groups (Everatt et al., 1999; Raymond and Sorensen, 1998; Slaghuis and Ryan, 1999; Talcott et al., 2000), but no one else has compared performance in the same observers across several motion tasks. Our MD form result disagrees with an earlier study that found deficits in children with dyslexia when the stimuli were MD letters (Felmingham and Jakobson, 1995). This is possibly related to the finding that many children with dyslexia have difficulty naming letters. Felmingham and Jakobson (1995) dismissed a deficit in letter recognition per se as the cause of poor MD form perception because the children with dyslexia in their study were able to correctly identify LD letters. While
110
The Processing of Motion-Defined Form
this demonstrated an ability to recognize the letters that were used, it is not clear that the children would have performed equally well if the LD letters were presented for only 1 s, the duration of the MD letters. It is possible, for example, that children with dyslexia require longer exposure periods to recognize letters or to retrieve letter names (Wolf et al., 2000), which could account for their elevated MD letter thresholds in the previous study. Our MD direction discrimination task was intended to limit language, naming, and memory demands. Therefore, MD form perception may be normal in children with dyslexia when extraneous task demands are minimized (Edwards et al., 2004). We interpret these results as a selective sparing of MD form perception in the presence of a disruption in local motion integration. This suggests that, in some cases, different pathways are involved in the processing of MD form and global motion stimuli. Deficits in global motion perception in dyslexia have been attributed to a deficit in the M/dorsal visual pathways (eg. Demb et al., 1998). The role of these pathways in MD form processing is discussed in the following section.
8.3 Role of the M/Dorsal Pathways in Motion-Defined Form Processing 8.3.1
Parkinson's Disease
Our claim that the M pathway is not essential for processing MD form is based on a case study of a patient with Parkinson's disease (Giaschi et al., 1997). Parkinson's disease is a movement disorder with documented visual involvement. Contrast thresholds tend to be elevated for low spatial frequency gratings, particularly when these gratings are temporally modulated (Bodis-Wollner et al., 1987; Regan and Maxner, 1987). The spatial- and temporal-frequency specificity of this deficit suggests subcortical M pathway dysfunction. Our patient was a 47 year old male with prominent motoric fluctuations that responded to treatment with dopamine. Before and after administration of dopamine, we measured speed thresholds for identifying MD letters and contrast thresholds for detecting a 3.5 cycles/deg sinewave grating that was static or counterphase-modulated at 8 Hz. Before administration of dopamine, our patient had a large contrast sensitivity deficit for the temporally modulated grating, a slight speed deficit for MD letters, and normal contrast sensitivity for the static grating relative to 20 age-matched controls. Administration of dopamine eliminated the contrast sensitivity deficit but increased the speed deficit for MD letters. We suggested that the contrast sensitivity deficit was caused by the absence of dopamine and was due to a dysfunction of the M pathway. Administration of dopamine has also been shown to improve contrast sensitivity in control subjects (Domenici, Trimarchi, Piccolino, Fiorentini, and Maffei, 1985). Since MD form perception did not improve with administration of dopamine, we concluded that the subcortical M pathway is not essential for this ability.
Deborah Giaschi
111
Figure 8.5: Children with dyslexia. A: Mean coherence thresholds for global motion direction discrimination in the control and dyslexic groups. B: Mean speed thresholds for MD shape direction discrimination. C: Mean speed thresholds for MD shape identification. Error bars are standard errors.
112
The Processing of Motion-Defined Form
Figure 8.6: Reduced visual acuity. A: Speed thresholds for MD shape identification for two observers as a function of decimal visual acuity. B: Coherence thresholds for global motion direction discrimination for two observers as a function of decimal visual acuity. C: Mean orientation difference thresholds for TD shape identification for two observers as a function of decimal visual acuity. Error bars are standard errors.
Deborah Giaschi
8.3.2
113
Reduced Visual Acuity
Visual acuity is reduced by lesions to the subcortical P pathway but is not affected by lesions to the M pathway (Merigan and Eskin, 1986; Merigan et al., 1991). As an additional way of assessing the pathways involved in MD form perception, we examined the effect of reduced visual acuity on speed thresholds for MD shape identification, coherence thresholds for global motion direction discrimination and orientation-difference thresholds for TD shape identification (Hoag et al., 2004). Blurring lenses were used to reduce the acuity of two emmetropic adults with an uncorrected visual acuity of 20/12.5 or a decimal acuity of 1.6. We found that speed thresholds for MD shape identification were not affected by blur until decimal visual acuity was reduced to 0.6 (figure 8.6A). Further reductions in visual acuity made the shapes impossible to identify even at the fastest speed. This replicates our earlier result using MD letters instead of shapes (Giaschi et al., 1992). We found that coherence thresholds for direction discrimination of global motion were not affected by reductions in visual acuity (figure 8.6B). This is a quite different pattern from the one we obtained for MD form. We found that orientation difference thresholds for TD shape identification were not affected until acuity was reduced to 0.8 (figure 8.6C). Further reductions in visual acuity made the shapes impossible to identify even with a 90° orientation difference. This is similar to the effect of reduced visual acuity on MD form perception. We interpret these results to show that the P pathway is involved in the processing of MD and TD form but not in the processing of global motion. This is consistent with our claim that the processing of MD form is not restricted to the M/dorsal visual pathways.
8.3.3 Functional Neuroimaging The role of the cortical dorsal pathway in MD form perception in humans has been studied using functional neuroimaging techniques. Transcranial magnetic stimulation (TMS) delivered bilaterally over the occipital cortex or the temporoparietal-occipital junction (TPO) was found to degrade the discrimination of MD form (Hotson and Anand, 1999). The TPO effect occurred 20 to 40 ms after the occipital effect. These results suggest that VI and V5/MT are involved in a hierarchical fashion in the processing of MD form. TMS delivered over TPO also degraded the discrimination of direction in simple motion stimuli. The involvement of VI and V5/MT in MD form and simple motion processing is supported by fMRI data. Passive viewing of MD shapes activated V5/MT and posterior occipital areas relative to a baseline condition of stationary random dots (Wang et al., 1999). A similar pattern and amplitude of activation were observed for passive viewing of moving random dots relative to stationary random dots. This study therefore reveals the cortical areas responding preferentially to motion, but not the areas responding preferentially to MD form. To observe the areas responding preferentially to MD form, the baseline condition should be moving random dots. We demonstrated this in a fMRI block design study with five normal adult volunteers (Chen et al., 2003). On the V5/MT localizer task,
114
The Processing of Motion-Defined Form
Figure 8.7: Left and right hemisphere fMRI activation (average of five adult volunteers). A: V5/MT localizer task. B: MD form identification task. blocks containing random dots moving in and out from the center of the display alternated with blocks of stationary random dots. On the MD form identification task, blocks containing MD rectangles (condition 1) alternated with blocks in which the moving dots from the rectangle were spread equally across the display of dots moving in the opposite direction (condition 2). The observer pressed a button whenever the motion-defined rectangle was vertical in condition 1 and whenever the fixation point changed from a dot to a cross in condition 2. The MD form identification task activated frontal regions, fusiform gyrus and posterior occipital regions including the lateral middle occipital gyrus, but not the area identified as V5/MT on the localizer task (figure 8.7). This lack of V5/MT activation by MD form is consistent with other studies that used moving dots as the baseline condition (Reppas et al., 1997; Van Oostende et al., 1997). V5/MT is activated preferentially, however, when moving MD contours (similar to the camouflaged bird demonstration on the enclosed CD) are contrasted against moving dots (Wang et al., 1999). This suggests that a different network of cortical areas may be involved in processing the two types of MD form. We interpret these results to show that areas in addition to V5/MT of the dorsal
Deborah Giaschi
115
pathway are involved in processing MD form relative to simple motion stimuli. The figure-ground segregation involved in MD form processing may be important because TD form and images of natural objects also activate the lateral middle occipital gyrus (Braddick et al., 2000; Grill-Spector et al., 1998).
8.4 Conclusions Psychophysical evidence from normal visual development and from the deficits observed in patients with multiple sclerosis, cortical lesions, amblyopia, or dyslexia shows that the processing of MD form can be dissociated from the processing of simple motion stimuli, even though both involve spatial integration of local motion signals. MD form perception is a low-level form task with respect to attentional requirements because search speed for MD shapes is not affected by the number of distractors in a visual search task (Cavanagh et al., 1990). MD form should not be used to indicate the intactness of low-level motion mechanisms, however, because MD form perception requires form and motion mechanisms and can be normal in patients with abnormal global motion perception.
References Barrett, B., Pacey, L, Bradley, A., Thibos, L. andMorrill, P. (2003). Nonveridical visual perception in human amblyopia. Invest. Ophthalmol. Vis. Sci.,44: 1555-1567 Batteli, L., Cavanagh, P., Intrilligator, J., Tramo, M., Henaff, M.-A., Michel, F. and Barton, J. (2001). Unilateral right parietal damage leads to bilateral deficit for high-level motion. Neuron, 32: 985-995. Bodis-Wolhier, L, Marx, M., Mitra, S., Bobak, P., Mylin, L. and Yahr, M. (1987). Visual dysfunction in Parkinson's disease: loss in spatiotemporal contrast sensitivity. Brain, 110: 1675-1698. Braddick, O., O'Brien, J., Wattam-Bell, J., Atkinson, J. and Turner, R. (2000). Form and motion coherence activate independent, but not dorsal/ventral segregated, networks in the human brain. Cur. Biol., 10: 731-734. Cavanagh, P., Arguin, M. and Treisman, A. (1990). Effect of surface medium on visual search for orientation and size features. /. Exp. Psych. Hum. Percept. Per/., 16: 479-491. Chen, C.-C., Giaschi, D., Bjornson, B. and Au Young, S. (2003). Activation for detection and identification of motion-defined form in human brain. Soc. Neurosci. Abst.,29: 591.18. Cornelissen, P., Richardson, A., Mason, A., Fowler, S. and Stein, J. (1995). Contrast sensitivity and coherent motion detection measured at photopic luminance levels in dyslexics and controls. Vis. Res., 35: 1483-1494. Demb, J., Boynton, G., Best, M. and Heeger, D. (1998). Psychophysical evidence for a magnocellular pathway deficit in dyslexia. Vis. Res., 38: 1555-1559.
116
The Processing of Motion-Defined Form
Deyoe, E. and Van Essen, D. (1988). Concurrent processing streams in monkey striate cortex. Trends Neurosci., 11: 219-226. Dominici, L., Trimarchi, C., Piccolino, M., Fiorentini, A. andMaffeiL. (1985). Dopaminergic drugs improve human contrast sensitivity. Human NeurobioL, 4: 195-197. Donahue, S., Wall, M. and Stanek, K. (1998). Motion perimetry in anisometropic amblyopia: Elevated size thresholds extend into the midperiphery. J. AAPOS, 2: 94-101. Downie, A., Jakobson,L., Frisk, V. andUshycky,!. (2003). Periventricular brain injury, visual motion processing, and reading and spelling abilities in children who were extremely low birthweight. J. Int. Neuropsychol. Soc., 9: 440-449. Edwards, V., Giaschi, D., Dougherty, R., Edgell, D., Bjornson, B., Lyons, C. and Douglas, R. (2004). Psychophysical indices of temporal processing abnormalities in children with dyslexia. Develop. Neuropsychol., 25: 321-354. Everatt, J., Bradshaw, M. and Hibbard, P. (1999). Visual processing and dyslexia. Percep., 28: 243-254. Felmingham, K. and Jakobson, L. (1995). Visual and visuomotor performance in dyslexic children. Exp. Brain Res., 106: 467-474. Giaschi, D., Lang, A. and Regan, D. (1997). Reversible dissociation of sensitivity to dynamic stimuli in Parkinson's disease: Is magnocellular function essential to reading motion-defined letters? Vis. Res., 37: 3531-3534. Giaschi, D. and Regan, D. (1997). The development of motion-defined figure-ground segregation in preschool and older children, using a letter-identification task. Optom. Vis. Sci., 74: 761-767. Giaschi, D., Regan, D., Kothe, A., Hong, X. H. and Sharpe, J. (1992). Motion-defined letter detection and recognition in patients with multiple sclerosis. Ann. Neural., 31: 621-628. Giaschi, D., Regan, D., Kraft, S. and Hong, X. H. (1992). Defective processing of motion-defined form in the fellow eye of unilateral amblyopes. Invest. Ophthal. Vis. Sci., 33: 2483-2489. Giaschi, D., Trope, G., Kothe, A. and Hong, X. H. (1996). Loss of sensitivity to motion-defined form in patients with primary open-angle glaucoma and ocular hypertension. J. Opt. Soc. Am. A, 13: 707-716. Grill-Spector, K., Kushnir, T., Hendler, T., Edelman, S., Itzchak, Y. and Malach, R. (1998). A sequence of object-processing stages revealed by fMRI in the human occipital lobe. Hum. Brain Mapping, 6: 316-328. Gunn, A., Cory, E., Atkinson, J., Braddick, O., Wattam-Bell, J., Guzzetta, A. and Cioni, G. (2002). Dorsal and ventral stream sensitivity in normal development and hemiplegia. Neuroreport, 13: 843-847. Hendry, S. and Reid, R. (2000). The koniocellular pathway in primate vision. Ann. Rev. Neurosci., 23: 127-153.
Deborah Giaschi
117
Hess, R., Demanins, R. and Bex, P. (1997). A reduced motion aftereffect in strabismic amblyopia. Vis. Res., 37: 1303-1311. Hess, R., Mcllhagga, W. and Field, D. (1997). Contour integration in strabismic amblyopia: the sufficiency of an explanation based on positional uncertainty. Vis. Res., 37: 3145-3161. Ho, C, Giaschi, D., Boden, C, Dougherty, R., Cline, R. and Lyons, C. (2004). Deficient motion perception in the fellow eye of children with aniso-strabismic amblyopia. Submitted for publication. Hoag, R., Edwards, V., Boden, C. and Giaschi, D. (2004). The effects of optical blur on motion and feature perception. Submitted for publication. Hotson, J. and Anand, S. (1999). The selectivity and timing of motion processing in human temporo-parieto-occipital and occipital cortex: a transcranial magnetic stimulation study. Neuwpsychologia, 37: 169-179, Kaufmann-Hayoz, R., Kaufmann, F. and Stucki, M. (1986). Kinetic contours in infants' visual perception. Child Development, 57: 292-299. Kelly, S. and Buckingham, T. (1998). Movement hyperacuity in childhood amblyopia. Br. J. Ophthalmol., 82: 991-995. Kovacs, I., Polat, U., Pennefather, P., Chandna, A. and Norcia, A. (2000). A new test of contour integration deficits in patients with a history of disrupted binocular experience during visual development. Vis. Res., 40: 1775-1783. Lennie, P., Trevarthen, C., Van Essen, D. and Wassle, H. (1990). Parallel processing of visual information. In L. Spillmann and C. Werner (Eds.), Visual Perception: The NeurophysiologicalFoundations, pp. 103-128. Academic Press: San Diego. Leventhal, A., Rodieck, R. and Dreher, B. (1981). Retinal ganglion cell classes in the Old World monkey: Morphology and central projections. Science, 213: 11391142. Levi, D. (1991). Spatial vision in amblyopia. In D. Regan (Ed.), Vision and Visual Dysfunction: Spatial Vision, Vol. 10, pp. 212-238. London: Macmillan Press. Merigan, W. and Eskin, T. (1986). Spatio-temporal vision of macaques with severe loss of P0 retinal ganglion cells. Vis. Res., 26: 1751-1761. Merigan, W., Katz, L. and Maunsell, J. (1991). The effects of parvocellular lateral geniculate lesions on the acuity and contrast sensitivity of macaque monkeys. J. Afewrosci., 11:994-1101. Merigan, W. and Maunsell, J. (1993). How parallel are the primate visual pathways? Ann. Rev. Neurosci., 16: 369-402. Parrish, E., Giaschi, D., Boden, C. and Dougherty, D. (2004). The maturation of performance on tasks assessing texture and motion perception. Submitted for publication. Paul, P., Giaschi, D., Cavanagh, P. and Cline, R. (2001). Attention deficits in children with anisometropic amblyopia. /. Vis., 1: 80a.
118
The Processing of Motion-Defined Form
Raymond, J. and Sorensen, R. (1998). Visual motion perception in children with dyslexia: Normal detection but abnormal integration. Vis. Cognit.,5: 389-404. Reed, M. and Burdett, F. (2002). Apparent motion processing in strabismic observers with varying levels of stereo vision. Behav. Brain Res., 133: 383-390. Regan, D. (2000). Human Perception of Objects. Sinauer Press: Sunderland, MA. Regan, D., Giaschi, D., Sharpe, J. and Hong, X. H. (1992). Visual processing of motion-defined form: Selective failure in patients with parieto-temporal lesions. /. NeuroscL, 12: 2198-2210. Regan, D. and Hong, X. H. (1990). Visual acuity for optotypes made visible by relative motion. Optom. Vis. Sci., 67: 49-55. Regan, D. and Maxner, C. (1987). Orientation-selective visual loss in patients with Parkinson's disease. Brain, 110: 415^-32. Regan, D., Kothe, A. and Sharpe, J. (1991). Recognition of motion-defined shapes in patients with multiple sclerosis and optic neuritis. Brain, 114: 1129-1155. Reppas, J., Miyogl, S., Dale, A., Sereno, M. and Tootell, R. (1997). Representation of motion boundaries in retinotopic human visual cortical areas. Nature, 388: 175-179. Rizzo, M., Nawrot, M. and Zihl, J. (1995). Motion and shape perception in cerebral akinetopsia. Brain, 118: 1105-1127. Schrauf, M., Wist, E. and Ehrenstein, W. (1999). Development of dynamic vision based on motion contrast. Exp. Brain Res., 124: 469^73. Shapley, R. and Perry, V. (1986). Cat and monkey retinal ganglion cells and their functional roles. Trends in Neurosci., 9: 229-135. Simmers, A., Ledgeway, T., Hess, R. and McGraw, P. (2003). Deficits to global motion processing in human amblyopia. Vis. Res., 43: 729-738. Sireteanu, R. and Rieth, C. (1992). Texture segregation in infants and children. Behav. Brain Res., 49: 133-139. Slaghuis, W. and Ryan, J. (1999). Spatio-temporal contrast sensitivity, coherent motion, and visible persistence in developmental dyslexia. Vis. Res., 39: 651-668. Talcott, J., Hansen, P., Assoku, E. and Stein, J. (2000). Visual motion sensitivity in dyslexia: Evidence for temporal and energy integration deficits. Neuropsychologia, 38: 935-943. Van Oostende, S., Sunaert, S., Van Hecke, P., Marchel, G. and Orban, G. (1997). The kinetic occipital (KO) region in man: an fMRI study. Cerebral Cortex, 7, 690701. Wang, J., Zhou, T, Qiu, M., Du, A., Cai, K., Wang, Z., Zhou, C., Meng, M., Zhuo, Y, Fan, S. and Chen, L. (1999). Relationship between ventral stream for object vision and dorsal stream for spatial vision: an fMRI + ERP study. Human Brain Mapping,^: 170-181.
Deborah Giaschi
119
Wattam-Bell, J. (1996). Development of visual motion processing. In F. Vital-Durand, J. Atkinson, and O. J. Braddick (Eds.), Infant Vision, pp. 79-94. Oxford University Press: New York. Wolf, M., Bowers, P. and Biddle, K. (2000) Naming-speed processes, timing, and reading: A conceptual review. /. Learning Disabilities, 33: 387-407.
This page intentionally left blank
9. Vision in Flying, Driving, and Sport Rob Gray One of the more difficult challenges for vision scientists is to apply basic research gathered in highly controlled and restricted psychophysical experiments to the complex environments we face in the real world. In this chapter I describe examples of how basic research on spatiotemporal processing in the human visual system can be applied to complex actions such as those involved in driving a car, flying an aircraft, and playing sports.
9.1 Introduction Every day, all around us people perform seemingly impossible feats of visual-motor coordination. Professional baseball batters routinely generate contact between a cylindrical 7 cm (2.75") diameter bat and a spherical 8 cm (3") diameter ball even though the ball travels the distance from the mound to the plate in less than half a second. Military pilots traveling at 60 m/s (140 mph) must land their plane on a 300 m (1000 ft) long aircraft carrier bouncing around in choppy seas. Highway drivers scanning the road for other vehicles, road signs, and pedestrians must react to a lead car slamming on their brakes within a split second to avoid a collision. How do we perform such incredible acts? Over the past several decades researchers have sought to understand how visual information can be used to perform motor actions such as those involved in flying, driving, and sport with the practical goal of improving performance and skill acquisition in these domains.1 As it turns out, understanding behavior in such complex environments has proven to be one of the more challenging problems in vision science. Only field studies can reveal which visual information humans actually use to perform these actions; however, it is difficult if not impossible to control and isolate optical variables when studying real-world performance. Conversely, laboratory psychophysics while 'I encourage the reader to explore some of the seminal early research on flying (Langewiesche, 1944), driving (Gibson and Crooks, 1938), and sports (Hubbard and Seng, 1954). 121
122
Vision in Flying, Driving, and Sport
allowing full control over the visual stimuli can only identify information sources that could potentially be used. Only by using a step-by-step rational approach to studies on visually guided motor action can we hope to fully understand performance in these domains (Regan and Gray, 2003). The three steps in this approach are as follows: 1. Identify the retinal image variables that can be used to make judgments about quantities such depth, motion, and orientation that underlie these behaviors. 2. Carry out psychophysical experiments to find whether humans are sensitive to these variables and can process them independently of other retinal image variables. 3. Carry out studies of simulated or real actions to investigate if and how the designated retinal image variable is used to perform the action in question. In this chapter I examine some examples of research on the control of visual-motor action in flying, driving, and sport, emphasizing the limitations of past experiments in terms of this step-by-step approach and suggesting possible future directions. Pilots, drivers, and athletes differ in many obvious ways including the amount of training, the selection process, and the consequences of performance error. However, as the reader will soon see there are also many similarities between these diverse groups. In particular, the motor actions performed in the air, on the highway, and on the sports field appear to rely on a common set of visual information sources.
9.1.1
Some Basic Limitations of the Visual Processing System
Before beginning an analysis of these complex actions it is important to understand some of the basic things our visual system can and cannot do. The first point is that the only information we have about the outside world comes from our senses. On the surface this may seem obvious but it is a point we often overlook. Although we can speak about how many miles per hour a baseball travels and how many feet separate two cars on the roadway, we do not have direct visual access to these quantities. All the judgments we make are based on the rather crude, flat images formed at the back of our eyes. Furthermore, even when information about distance or speed is presented indirectly via a speedometer or altimeter gauge, our motor actions still appear to be largely based on sensory information alone. The second point is that our visual system is much better at comparing things (a relative discrimination) than making judgments about a single thing (an absolute judgment). This is in a large part due to the paucity of visual information we have available for estimating quantities like absolute speed and absolute distance as will be discussed below.
9.2 Vision in Flying Of all the skills pilots perform, there are two that stand out as particularly demanding and dangerous: landing and low-level flight. In commercial aviation, crashes during the final approach and landing account for more than 30% of the total number of accidents even though this phase of flight accounts for only 4% of the total flight time (Wiener,
Rob Gray
123
1988). Military aviation involves "nap of the earth" flight where the pilot is required to fly as close to the ground as possible in among buildings, trees, hillocks, and other terrain features. The high level of danger involved in low-level flight is illustrated by the fact that 55 helicopter fatalities during the Vietnam War were caused by collisions with one particular power line (Marsh, 1985); the only one in all of South Vietnam! Low level flying relies most heavily on visual processing2.
9.2.1
Visual-Motor Control in Approach and Landing
The approach and landing phase of flight can be broken down into three component subtasks: (i) aligning the aircraft with the runway, (ii) reducing speed and altitude appropriately to contact the end of the runway, and (iii) arresting the descent with a "landing flare." Beall and Loomis (1997) analyzed the visual information available to the pilot for performing the first of these subtasks, commonly called the "base-to-final turn." Initially traveling in a direction that is oblique or perpendicular to the orientation of the runway (the base), the pilot must initiate a turn at the appropriate time and with the necessary turn rate such that at the completion of the turn the direction of travel becomes aligned with the runway (the final). This task becomes particularly difficult for a short final approach. When the base-to-final turn is initiated a short distance from the runway, pilots often overshoot the turn and are forced into making a sudden corrective maneuver. This can lead to an aerodynamic stall resulting in a crash. The base-to-final turn is made even more difficult by the sluggish controls in a typical fixed-wing aircraft: turn rates rarely exceed 10 deg/sec and a 90 ° turn takes upwards of 30 s to complete. What visual information can the pilot use to decide when to initiate the turn and to regulate the turn rate? Calvert (1954) first proposed the idea that the optical splay angle (a) could be used to control this maneuver. As illustrated in figure 9.1, the splay angle is the angle formed by the centerline of the runway and the vertical at the convergence point on the horizon. If the horizon is visible, the splay angle is also equal to 7r/2 minus the angle formed by the runway centerline and the horizon. A pilot can use the splay angle to turn into alignment with the runway by turning so that the rate of change of the splay angle (da/dt) is held constant (Beall and Loomis, 1997). Presumably, the initiation of the turn begins when the splay rate exceeds some threshold above this critical value. The main evidence supporting this splay angle hypothesis comes from the work of Beall and Loomis (1997) in which optical variables during real landings were analyzed. Flight data showed that over several landings pilots appeared to hold the splay rate roughly constant during the base to final turn. It would be interesting for future research to examine this task in more detail in a flight simulator, for example, by introducing online perturbations in the splay angle by perturbing the position of the simulated runway. In support of this field research, psychophysical experiments have provided evidence for a neural mechanism that would be sensitive to splay angle providing the horizon is visible (Gray and Regan, 1996; Regan, Gray, and Hamstra, 1996). In this research it was found that observers could estimate the absolute angle formed by the 2 As altitude increases the number of objects in the pilot's visual field that can be used to judge depth, orientation, and motion decreases so the pilot must rely on other sensory information such as vestibular and tactile cues and/or the cockpit instruments.
124
Visj'oji in Flying, Driving, and Sport
Figure 9.1: Visual information that can be use to control the "base-to-final" turn maneuver during the final approach. The splay angle (a), the angle formed by the centerline of the runway and the vertical at the convergence point on the horizon, can be used to turn into alignment with the runway by holding the splay rate (da/dt) constant (Beall and Loomis, 1997). intersection of two lines and discriminate changes in this angle independently of the orientation each of the two lines. However, it should be noted that these experiments used angles that were presented in the frontoparallel plane whereas the splay angle lies on the ground plane. One important implication of this splay-rate-control hypothesis is that it is not necessary for the pilot to generate a full 3D reconstruction of the world to control the turn. Splay angle can be estimated without any knowledge about the distance to the runway or the aircraft's velocity. Consistent with this analysis, Beall and Loomis (1997) reported that base-to-final turn maneuvers were similar for day and night landings. In the later case ground texture cannot be used to estimate distance and speed. As the reader will soon see, this is a common theme for many of the actions discussed in this chapter. From an applied research perspective this has some profound implications, namely, that the vast majority of research on judgments of distance, speed of self-motion, and spatial layout may have limited relevance to understanding the control of visual-motor action.3 However, the fact that these variables are not necessary for control does not mean that we do not use them when they are available. This issue will be considered in more detail below. Once aligned with the runway, the pilot next needs to reduce speed and altitude appropriately so that the plane will be in position to contact the ground with a reasonable amount of force near the start of the runway (the actual contact is controlled in the landing-flare stage discussed next). What visual information could the pilot use 3 This is not to say that research in these areas is not important for understanding other aspects of human perception.
Rob Gray
125
Figure 9.2: Simulation of "dark hole" landings. Pilots overestimate their altitude and consequently fly too low when executing a night landing over featureless terrain. Reprinted from Kraft, C. L. (1978) A psychophysical contribution to air safety: Simulator studies of visual illusions in night visual approaches. In Pick, H., Leibowitz, H. W., and Singer, J. R. (Eds.), Psychology from Research to Practice, with permission from Kluwer Academic/Plenum Publilshers, New York. to judge whether the current descent rate is sufficient? Field observations and some very clever simulation research conducted by a Boeing engineer Conrad Kraft suggests that the rate of descent is primarily controlled on the basis of perceived altitude (Kraft, 1978). When the Boeing 727 was first introduced into commercial aviation in the late 1960s it was involved in a large number of landing accidents. Kraft's accident analyses revealed that many of these crashes involved landing short of the runway during a night approach over water or other featureless terrain (commonly called a "dark hole" approach). Kraft hypothesized that these crashes resulted from a misperception of altitude due to insufficient visual information. To test this prediction he asked pilots to fly nighttime approaches and give verbal estimates of altitude in a flight simulator. The results, shown in figure 9.2, indicated that pilots were overestimating their altitude and consequently flying too low (the verbal judgments were also consistent with this result). If you are like me when I first read this study, you are probably now thinking: but wouldn't the altimeter gauge allow the pilot to judge the altitude accurately? The answer is yes,4 however, pilots rarely consult the altimeter during landing. This is not due to negligence on the part of the pilot; workload is very high during this phase of flight involving communicating with ground control, monitoring power settings, and so on. When faced with these multiple demands, pilots typically choose to monitor the movement of the plane using their own senses; after all visual perception is a highly developed, effortless process that serves us well 99.9% of the time. This observation highlights the importance of understanding the visual information available for the con4 Kraft's accident analysis concluded that there was no reason to suspect mechanical failure in these crashes. In fact, the grim reality is that human error is the suspected cause of the vast majority (> 75%) of aviation accidents.
126
Vision in Flying, Driving, and Sport
Figure 9.3: Visual correlates the rate of change of altitude that could be used to control the final descent. A: The depression angle (6) is the angle formed at the eye by the horizon and a line of trees that are oriented perpendicular to the path of travel. Z g is the absolute distance between the pilot's eye and the trees. B: The altitude splay angle (a) is the angle formed by the motion path and a line of trees that are oriented parallel to the path of travel. Xg is the lateral separation between the line of trees and the axis perpendicular to the pilot's eye. C: 9 is angular size of a tree's retinal image and 2W is the tree's width. D: 7 is the angular size of the gap between adjacent trees and 2S is the distance between the trees. See text for details. trol of flight even though the same information is provided indirectly by the aircraft's instruments. We will see below that drivers are also guilty of this - when was the last time you checked your speedometer to determine if you were entering a curve at a safe speed? Why do pilots overestimate altitude and descend to quickly during "dark hole" landings? There are multiple sources of visual information that can be used to estimate the rate of change of altitude as illustrated in figure 9.3. One source of information comes from the angle formed at the eye by the horizon and an object or edge that is oriented perpendicular to the path of travel as depicted in figure 9.3A. This angle, called the depression angle (8), can be used to estimate the rate of change of altitude (dY/dt). For small values of <5:
Rob Gray
127
where Zg is the distance between the pilot's eye and the object/edge on the ground (Flach et al., 1997). The second source of information is based on the angle formed by the motion path and an object or edge that is oriented parallel to the path of travel (called the altitude splay angle5 (a) and depicted in figure 9.3B). For small values of a:
where Xg is the lateral separation between the object on the ground and the axis perpendicular to the pilot's eye (Flach et al., 1997). Objects that could provide altitude splay and depression angle information include railways, roads, a grove of trees, a river, or the runway itself. The third source of visual information that could be used is the angular size of a familiar object on the ground such as a building, tree, or the runway itself. For small values of 6:
where the physical width of the object on the ground is 1W and 9 is angular size of the object's retinal image. This information source is illustrated in figure 9.3C. Finally, if the physical layout of objects on the ground is known (e.g., when flying over a city) the texture density (figure 9.3D) can also be used as:
where 2S is the physical separation between adjacent texture elements (e.g., the spacing between rows of trees) and 7 is the angular size of the gap between adjacent texture elements. Equations 9.1-9.4 are all scaled in units of eyeheights/sec; i.e., they indicate the number of times the altitude will change by the current height of the eye above the ground in one second. Before we consider these information sources in more detail, it is now clear why the pilots in Kraft's study could not estimate the rate of change of altitude accurately: all of the visual correlates of altitude rely on the presence of visible terrain features during the approach. During a "dark hole" landing only the runway lights would be clearly visible to the pilot, therefore only the information expressed in equation 9.3 would be available. Furthermore, for the initial part of the descent (i.e., at a distance of 32km (20 miles) from the runway) the angle formed by the lights on a 40 m (131 ft) wide runway would be a mere 0.042 radians and its rate of change would be 0.108 rad/sec for an 100 m/sec (220 mph) approach speed. Since it has been shown that observers cannot reliably estimate time to contact (Gray and Regan, 1998) or approach speed (Hoffmann and Mortimer, 1996) for such low rates of expansion, it is unlikely the rate of change of the angle formed by the runway could be used to reliably estimate the rate of change of altitude. However, this has not been empirically tested. A closer look at these information sources provides some important insights into perceptual-motor control during flying. A major problem associated with using depression and altitude splay angles cues is that they do not satisfy the independent processing 5 I have added the term "altitude" since, unfortunately, previous research has used the term splay angle to refer to two different optical variables.
128
Vision in Flying, Driving, and Sport
criterion described in step 2 of the research approach discussed above. Namely, the estimate of the rate of descent based on depression angle will be altered by changes in the speed of forward motion (dZ9/dt), and the estimate of the rate of descent based on altitude splay angle will be altered by changes in the speed of lateral motion (dX g/dt). The main problems associated with using familiar size and texture density are that they require that the physical layout of the environment remains constant. For example, if pilots are using the angular size of the runway to estimate altitude and they assume the width of the runway (i.e., S in equation 9.3) is 60 m (196 ft), altitude will be dangerously overestimated when landing at an unfamiliar runway that is only 40 m (131 ft) wide. This particular estimation error has been identified as a cause of several crashes involving novice pilots (Galanis, Jennings, and Beckett, 2001; Mertens and Lewis, 1982). Experimental research on the perception of altitude has primarily used one of two experimental tasks: (i) altitude maintenance and (ii) judgments of the direction of altitude change following an occlusion period. Flach and colleagues (Flach, Hagen, and Larish, 1992; Flach et al., 1997) studied the relative contribution of depression and altitude splay angles in a simulated altitude maintenance task. Participants were required to track a constant altitude in the presence of simulated fore-aft, up-down, and rightleft wind disturbances. As predicted by the cross-talk between dZg/dt and dY/dt in equation 9.1, root mean square (RMS) error was significantly higher when flying at high speed over a terrain with only depression angle cues (lines perpendicular to the direction of motion) than when flying over a terrain with only altitude splay angle cues (lines parallel to the direction of motion). Somewhat surprisingly, it was also found that RMS errors were higher for a grid terrain (which contains both cues) than for the parallel line terrain. Flach and colleagues proposed that this is due to the fact that the addition of perpendicular lines in the grid terrain introduces noise into the perception of altitude. Kleiss and Hubbard (1993) used an occlusion technique to investigate the importance of ground terrain and the presence of 3D objects on altitude judgments. After flying at a constant altitude for 20 s, the visual display was blanked for 3 s (mimicking what would occur if a pilot flew through a bank of clouds). When the display reappeared participants judged whether the perceived altitude had increased, stayed the same, or decreased. Randomly positioned trees were used as 3D objects and terrain texture patterns were random noise. This randomization substantially limits the effectiveness of depression and altitude splay angle information. Judgment accuracy improved as tree density increased. Texture density was also positively related to response accuracy; however, performance at high terrain texture densities was not as good as performance at high tree densities. These findings suggest that the information sources expressed in equations 9.3 and 9.4 may be particularly important for control of altitude. Unfortunately, to my knowledge there has not been any research that has systematically manipulated all of the cues to altitude within the same set of experiments. Due to differences in the experimental tasks, display parameters, and subject populations used in the studies described above, it is difficult to compare the relative contributions of the information sources in equations 9.1-9.4. In addition, the experimental tasks used in previous research have only limited relevance to controlling the descent during land-
Rob Gray
129
ing. The tasks used do not address the problem of knowing whether the descent rate is appropriate to land safely and cannot be used measure judgments of absolute altitude. Clearly future research is needed in which observers are required to make simulated landings over terrains for which the optical variables expressed in equations 9. 1-9.4 are systematically and independently varied. The final stage in landing involves a transition from this controlled descent to contact with the runway hi the landing flare. The typical vertical speed during the final stages of the approach (roughly 3-5 m/s (10-16 ft/s)) is much too fast for a comfortable and safe landing. The purpose of a flare maneuver is to reduce the vertical speed to an acceptable level just before touchdown. The flare is initiated at an altitude of roughly 3-6 m (10-20 ft) by pulling back on the control stick, causing an increase in the angle between the direction of motion and the orientation of the nose of the plane (the "angle of attack"). Precise timing of this maneuver is critical since a flare initiated too late will not reduce vertical speed sufficiently before contact and a flare initiated too early can cause the plane to level off or even eventually climb. There are two primary control strategies a pilot could use in this situation (Mulder et al., 2000). First, the flare could be initiated at a constant critical altitude. In practice this could be achieved by initiating the flare when the retinal image of the runway reaches a critical angular size. However, this strategy would not be a robust for variations in vertical speed (e.g., due to a down draft) and could be dangerous in situations where a pilot is landing at an unfamiliar runway that is wider or narrower than expected (see above). A more effective strategy is to initiate the landing flare at a constant value of the time to contact (TTC) with the runway. The visual information that supports judgments of TTC has been studied extensively (reviewed in Regan and Gray, 2000, 2001). The TTC information that could be used for timing the landing flare is illustrated in figure 9.4. For simplicity, first consider the situation of an aircraft on a straight approach to a rectangular object that is oriented perpendicular to the direction of travel (e.g., flying towards a wall). In this scenario, illustrated in figure 9.4A, the angular subtense of the object's horizontal meridian (9h) can be used to estimate TTC as:
provided that approach velocity is constant and Oh is small (Hoyle, 1957). Note that in this special case the values Oh and dOh/dt will be equal along the entire vertical extent of the object because the object expands isotropically (i.e., constant shape). As will be discussed below, it has been demonstrated that observers can estimate absolute TTC with a high degree of accuracy in this situation (Gray and Regan, 1998). Next consider the optical geometry associated with landing a plane, i.e., a straight line approach to a rectangular object but with a angle of approach that is considerably less than 90 °. In this situation, illustrated in figure 9.4B, equation 9.5 can still be used to estimate TTC; however, an accurate estimate requires that Oh and dOh/dt be derived from points on the runway's vertical edges that are adjacent to the point of contact (shown as 0^2 in figure 9.4B). Because the runway does not expand isotropically in this situation, TTC estimates based on portions of the runway that are further away than the point of contact (e.g., Ohi in figure 9.4B) will be overestimates of the actual TTC and TTC estimates based on portions of the runway that closer than the point of contact (e.g., 0 ^3 in fig-
130
Vision in Flying, Driving, and Sport
Figure 9.4: Using retinal image expansion to estimate the time to contact with the runway. &h is the angular subtense of the runway's horizontal meridian. A: Expansion of the runway's retinal image for a perpendicular approach. The retinal image expands isotropically. B: Expansion of the runway's retinal image for an angled approach. Because the runway does not expand isotropically in this situation, TTC estimates based on portions of the runway that are further away than the point of contact (the solid circle) will be overestimates of the actual TTC and TTC estimates based on portions of the runway that are closer than the point of contact will be underestimates of the actual TTC. ure 9.4B) will be underestimates of the actual TTC. For example, for an approach speed of 60 m/s and an approach angle of 4 °, a TTC estimate based on portions of the runway that are 10 m (32 ft) further away than the point of contact will be approximately 290 ms longer than the actual TTC. Over this time period the plane will travel 17 m (55 ft)! More psychophysical research is needed to determine if observers can estimate TTC accurately during non-perpendicular approaches. Mulder et al. (2000) examined the timing of flare maneuvers in a flight simulator. In this study the movement of the simulated plane was not directly controlled by the observer instead who was only required to press a button to initiate a pre-programmed flare maneuver. The main independent variables were the width of the runway (40 or 60 m) and the presence/absence of texture lines on the surface of the ground and runway. It was found that the number of successful landings was significantly higher when ground texture was present. Under these conditions data were consistent with the strategy of initiating the flare at a constant value of TTC. Conversely, in the absence of texture, participants appeared to base the timing of the flare on the angular subtense of the runway. Mulder and colleagues argue that the presence of texture improves performance because it gives more edges near the point of contact that can be used to accurately estimate TTC. However, as evidenced by equation 9.4, the addition of texture would also serve to improve judgments of the rate of change of altitude.
Rob Gray
9.2.2
131
Visual-Motor Control in Low-Level Flight
During low-level flight the pilot's most urgent task is to avoid colliding with objects on the ground. This task involves several different components including knowing the direction one is heading relative to the object and knowing the instant in time the contact will occur. However, the first thing the pilot must do is something we often take for granted; namely, the pilot must detect that there is an object there in the first place! If an object's retinal image does not differ from the retinal image of its surroundings it is invisible to the pilot and cannot be acted upon. A good example of this is power lines. Along with the accidents in Vietnam described above, it has been estimated that in the U.S. between 1970 and 1979, wire-strike accidents accounted for 208 civilian accidents (Wiener, 1988). The majority of the accidents occurred with clear visibility; the pilot was simply not aware there was an object there to hit! Furthermore, this failure in detection is not restricted to very small objects as a large majority of aviation accidents involve the pilot flying a perfectly functioning plane directly into a ground feature: so-called "controlled flight into terrain" (CFIT). What visual properties render an object visible to a pilot? A difference in the luminance contrast between an object and its surroundings can be used for detection; however, this cue will be greatly affected by the veiling glare produced by the sun (Regan, Giaschi, and Fresco, 1993; Regan, 1995). Difference in the textures of an object and its surroundings is another visibility cue that is critical for low-level flight (Regan, 1995). For example, a sloping hill covered with bushes can be distinguished from a grassy valley even though the mean luminance, and color of the two areas may be roughly the same. Finally, when mean luminance, color and texture are similar for different terrain features (e.g., when grass covered hill is surrounded by grassy terrain) the pilot can use motion parallax to detect the presence of the feature. The retinal image of an object that is further away than the pilot's point of fixation will move in the same direction as the pilot is moving while the retinal image of an object that is closer than the fixation point will move in the opposite direction. As it turns out, there are large individual differences in sensitivity to motion and texture cues and susceptibility to glare (Regan, 1995), and tests for these abilities developed by Regan and colleagues may be an effective screening tool for evaluating novice pilots (Regan, 1995). Once the pilot has detected the presence of an object such as a building or a hill, two things are needed for successful collision avoidance: (i) will I collide with the object if I continue to travel in the same direction? and (ii) how much time do I have before collision will occur? Consider the case of flying towards a tree. The lateral distance at which the tree will cross the frontoparallel plane that contains the eyes (the "crossing distance") can be estimated using both monocular and binocular sources of visual information. As shown in figure 9.5A, if the approach velocity is constant the crossing distance (Xc) of the tree is given by:
where da/dt is angular lateral speed of the tree's retinal image, dO/dt is the rate of change of the tree's angular subtense, and R is the tree's width (Bootsma, 1991; Regan and Kaushal, 1994). Equation 9.6 is available to either eye alone. Psychophysical ex-
132
Vision in Flying, Driving, and Sport
Figure 9.5: Visual correlates of crossing distance (Xc). A: A monocular correlate of Xc. The pilot's eye (open circle) is moving at a constant velocity in a straight line (shown by the heavy arrow) past a tree with width R. 9 is the tree's angular subtense and a is the angular change in its lateral position of the tree's retinal image (i.e., along the axis shown with the dotted arrow). B: A binocular correlate of Xc. The pilot's eyes (open circles) are moving at a constant velocity in a straight line (shown by the heavy arrow) past a tree. 5 is the retinal disparity of the tree relative to a fixed reference point (F) and / is the interpupillary separation, a is the angular change in lateral position of the tree's retinal image (i.e., along the axis shown with the dotted arrow). See text for details. periments have shown that humans are sensitive to the information expressed in equation 9.6 as thresholds for the discrimination of the relative direction of an approaching object based on this optical variable range from 0.03 to 0.12° (Regan and Kaushal, 1994). To my knowledge the ability of observers to estimate absolute direction of motion in depth using the information expressed in equation 9.6 has not been investigated. As illustrated in figure 9.5B, the binocular correlate of crossing distance relies on changing retinal disparity as:
where dS/dt is the rate of change of retinal disparity relative to a fixed reference point (F) and / is the interpupillary separation (Regan and Kaushal, 1994; Regan and Gray, 2000). Although it has been shown that humans are sensitive to the information expressed in equation 9.7 and can use it to make precise discriminations (0.2 °) of variations in the trajectory of an approaching object (Portfors-Yeomans and Regan, 1996, 1997), it has not yet been demonstrated that equation 9.7 can be used to make absolute estimates of Xc based on this information source alone. Furthermore, it remains to be tested whether estimates of Xc are more accurate when both equations 9.6 and 9.7 are available (as is the case in the real world) than estimates based on either informa-
Rob Gray
133
tion source alone. One might expect an advantage when both information sources are available given that judgments of absolute TTC are more accurate when binocular and monocular information is combined (Gray and Regan, 1998). The number of seconds remaining before the tree crosses the frontoparallel plane containing the eyes (the "time-to-passage" (TTP)) is also specified by both monocular and binocular sources of visual information. As illustrated in figure 9.6A, for a tree approaching point P some distance from the pilot's eyes:
where 7 is the optical angle at the eye subtended by the current location of the tree and the point at which the tree will cross the frontoparallel plane containing the eyes (P) and dj/dt is the rate of constriction of this angle (Bootsma and Oudejans, 1993). In the special case that the tree is directly approaching the catcher's eye, 7 = 0 and equation 9.8 reduces to the correlate of the tune to collision (TTC) commonly called "tau" after (Lee, 1976): For direct approaches it has been demonstrated that humans can accurately estimate TTC on the basis of equation 9.9 alone with estimation errors ranging from 2% to 12% of the actual TTC (Gray and Regan, 1998). The problem of judging absolute TTP for an object passing to this side has not been studied in detail. However, Bootsma and Oudejans (1993) have shown that observers can reliably discriminate the relative TTP of two approaching object on the basis of equation 9.8 alone. Binocular information about TTP is illustrated in figure 9.5B. This information source relies on relative disparity information (Regan, 2002) as:
Estimation of TTP based on binocular information alone has not been previously investigated except in the special case of an object directly approaching the midpoint between the eyes where estimation errors range from 2% to 10% of the actual TTC (Gray and Regan, 1998). At this point a reader familiar with the topic of visual perception may be asking the question: but wouldn't binocular information be ineffective in flying because the objects are too far away? While it is true that the relative retinal disparity (i.e., 6 in figures 9.5A and 9.6A) for a given depth separation between two objects is inversely proportional to the square of the viewing distance, this limitation only applies to judgments of static depth. It is for this reason that I did not mention stereopsis as a cue to object visibility in the discussion above.6 When judging the direction of motion in depth and TTP (or TTC) the relevant retinal image variable is the rate of change of disparity (dS/dt). Since the magnitude ofdd/dt is proportional to the approach velocity, the high speeds involved in flying would ensure the value of d5/dt is well above perceptual threshold in most situations. 6 At a distance of 50 m (165 ft), two objects must be separated by more than 3 m (10 ft) for the depth separation to be detectable (Ogle, 1962).
134
Vision in Flying, Driving, and Sport
Figure 9.6: Visual correlates of time to passage (TTP). A: A monocular correlate of TTP. The pilot's eye (open circle) is moving at a constant velocity in a straight line (shown by the heavy arrow) past a tree. 7 is the optical angle at the eye subtended by the current location of the tree and the point at which the tree will cross the frontoparallel plane containing the eyes (P) and 9 is the tree's angular subtense. B: Binocular correlates of TTP. The pilot's eyes (open circles) are moving at a constant velocity in a straight line (shown by the heavy arrow) past a tree. 6 is the retinal disparity of the tree relative to a fixed reference point (F), / is the interpupillary separation, and 7 is the optical angle at the eye subtended by the current location of the tree and point P. See text for details. Unfortunately, collision avoidance in low-level flight is an area with an abundance of research that has addressed steps 1 and 2 in the approach laid out at the start of this chapter but with a paucity of work addressing step 3. To my knowledge there has been no previous research that has systematically manipulated the cues to direction and TTP expressed in equations 9.6-9.10 in a flight simulator and measured collision avoidance performance. This is an important omission since most flight simulators have only monocular visual cues; therefore, it would seem important to test their validity by assessing the relative importance of binocular information. It is evident from the analyses above that precise detection of the rate at which the angular size of an object is increasing (dO/dt) is important for making judgments about the direction of motion and time to contact. Therefore, it might be expected that more highly skilled pilots would have a greater sensitivity to retinal image expansion as compared to novice pilots. This prediction was tested directly in a unique merging of laboratory and field research conducted by Kruk, Regan, and colleagues (Kruk and Regan, 1983; Kruk, Regan, Beverley, and Longridge, 1981). In these studies laboratory measurements of discrimination thresholds for rate of change of size were found to significantly correlate with flying performance in low-level flight and formation flight. A final point on low-level flight concerns the use of texture in flight simulator dis-
Rob Gray
135
Figure 9.7: Errors in estimating absolute time to contact (TTC) for a simulated approaching textured object plotted as a function of the initial diameter of the texture elements covering the object. In the "accurate simulation" condition the object size and texture element size increased as the simulated object approached the observer. In the "dot size constant" condition the texture dots covering the object remained constant sized as the object approached. Data are means for three observers. Reprinted with permission from Gray, R. and Regan, D. (1999) Motion in depth: adequate and inadequate simulation. Perception and Psychophysics, 61: 236-245. Copyright Psychonomic Society Publications, 1999. plays. As discussed above, the presence of texture on object and ground surfaces in critical for the pilot's ability to visually segregate objects from their surrounding. Therefore, it is not surprising that considerable effort has been put adding realistic texture to simulator displays. However, because the addition of texture is "computationally expensive" and can dramatically reduce the display frame rate shortcuts are often taken that result in visual information that is not consistent with what occurs in the real world. An extreme example of this is texture mapping where an object such as a building or tree is "painted" with a texture pattern that does not change as a function of viewing distance.7 This creates a potential problem since the rate of expansion of the texture elements on the surface of the object provides information about TTC that complements information provided by the change in the overall angular size of the object (Beverley and Regan, 1983). Indeed, Martin Regan and I have shown that when the texture elements do not expand TTC is dangerously overestimated (Gray and Regan, 1999). This effect, shown in figure 9.7, depends on the grain of the texture on the object. The overestimation is larger for objects with large texture elements (e.g., the bricks on the side of building) compared to objects with small texture elements (e.g., the needles on the surface of a pine tree). In the extreme case (texture elements 7 Even today the expansion of texture elements on objects is rarely simulated. If it is done it is usually in one or two discrete steps instead of a continuous change.
136
Vision in Hying, Driving, and Sport
Figure 9.8: A: Visual information in overtaking and passing. The driver needs to compare the time required to overtake (TRO) to the time required for the oncoming car to reach the critical distance (TTCD) need to complete the overtaking maneuver. B: Number of overtaking maneuvers initiated for different ranges of the value of TTCDTRO. Open bars show data for the condition where observers adapted to closing speed by driving on a straight empty road prior to overtaking. Filled bars show data for the no-adapt baseline condition. TTCD-TRO values less than zero were denned as unsafe. less than roughly 5 min arc) the lack of expansion does not affect judgments of TTC presumably because the rate of expansion is below threshold. Currently there are several more complex problems associated with use of texture displays. For example, the computer graphics technique known as mipmapping leads to the undesirable side effects that the luminance contrast of the display is inversely related to texture density (Chaudhry and Geri, 2003) and blurring increases as altitude decreases.
9.3 Vision in Driving On the roads of the U.S.A., 41,821 individuals were killed and 3.2 million injured during the year 2000, and 2000 was a typical year (NHTSA, 2000). In contrast to pilots, drivers are not highly selected nor do we receive even close to the amount of training. Therefore, it may not be surprising that accident reports implicate errors in perception and decision making as the probable cause of the vast majority of driving accidents. One of the more dangerous perceptual judgments a driver must make is whether there is sufficient time to complete a driving maneuver before colliding with an oncoming car, for example, in overtaking and passing. Accident analyses indicate that overtaking a more slowly moving vehicle is one of the more dangerous situations a driver faces (Jeffcoat, Skelton, and Smeed, 1973; Clarke, Ward, and Jones, 1998, 1999). In this section I focus on the visual information that can be used to perform this complex maneuver. One reason for the high level of driver error involved in overtaking is the complexity
Rob Gray
137
Figure 9.9: Example data record for one swing in the simulated baseball batting task used by Gray (2002a). Open circles plot the bat height as a function of time since release and solid circles plot ball height. Temporal swing errors were calculated from the difference between the instant in time when the ball crossed the front of the plate and the instant in time when the minimum bat height occurred (point A). Spatial swing errors were calculated from the difference between the height of the ball when it crossed the front of the plate and the minimum bat height (point B). Reprinted with permission from Gray, R. (2002) Behavior of college baseball players in a virtual batting task, J. Exp. Psych: Hum. Percept. Per/., 28: 1143. Copyright 2002 American Psychological Association. of the visual judgments involved. Drivers must simultaneously estimate the TTC with an oncoming car, monitor the TTC with the lead vehicle so as to avoid a rear-end collision, and estimate the time required to complete the overtake based on the current speed, road conditions, and knowledge of the capabilities of their own vehicle. The sources of information used by drivers to make these estimates and control their vehicle during the overtaking maneuver are largely unknown; however, there are several visual cues the driver could potentially use. In this section I will consider visual correlates of speed, distance, and time to collision during driving. Accurate information about the TTC with the lead vehicle and oncoming cars is available to the driver based on the instantaneous angular subtense of the approaching vehicle (equation 9.9 above). In driving, TTC information appears to be particularly important for the initiation and control of braking (van Winsum and Heino, 1996; Tilmaz and Warren, 1995). In particular, as first suggested by Lee (1976), drivers appear to regulate the first temporal derivative of equation 9.9 (commonly called "tau-dot") around a critical value of -0.5 when decelerating (Yilmaz and Warren, 1995). This control strategy ensures that the vehicle comes to a stop exactly at the object the driver is approaching (e.g., a stop line at an intersection). In other driving situations it has been found that the time headway (TH) appears to be a more important control vari-
138
Vision in Flying, Driving, and Sport
able than TTC (Lee, 1976; van Winsum, 1998; van Winsum and Heino, 1996). The distinction between TTC and TH can be best understood if we consider a car-following situation. If the follower maintains a constant distance behind the lead vehicle, the TTC (i.e., the time until the front bumper of the follower's car contacts the rear bumper of the lead car) is infinite and there is no change in the angular size of the lead vehicle. On the other hand, the TH, defined as the time until the front bumper of the follower's car reaches the location on the roadway currently occupied by the rear bumper of the lead vehicle (Lee, 1976), is finite and will depend on the follower's speed. Van Winsum and Heino (1996) have reported that, when following another vehicle, drivers regulate their speed to maintain a fixed value of TH that varies from driver to driver depending on skill level, age, and personality. Little is known about the optical specification of TH. Although in theory this variable could be computed by using the rate of expansion of texture elements on the road surface or the rate of separation between edges of the road (Gray and Regan, 2000b), this has not been investigated empirically. It has been demonstrated that the perception of the relative speed of self-motion is influenced by the visual information provided by the global optic flow rate and/or the edge rate (i.e., the number of edges that pass the observer's eye in a given time period) (Larish and Flach, 1990); however, these visual cues do not provide accurate information about absolute speed.8 Field studies have consistently demonstrated that drivers cannot accurately estimate their speed of travel: "errors in subjectively estimating speed are sufficiently great that drivers should consult speedometers" (Evans, 1991, p. 128). Studies using verbal estimates of speed (reviewed in Groeger, 2000) and studies using a procedure that requires drivers to adjust their speed to a specified level (e.g., halve their current speed) (Denton, 1976,1977) have shown that speed estimates are highly inaccurate (errors range from 10-60 km/h (6-37 mph)) and are easily biased by factors such as the driving speed on the previous trial. There are two primary sources of information that a driver could use to estimate the absolute distance of another vehicle on the roadway, although both sources of information are very limited. The vergence angle of the eyes provides accurate distance information for an object that is fixated; however, because this source of information is only effective for objects nearer than about 10 m, it would not be useful for most driving situations. It also been suggested that drivers could use angular size as cue to absolute depth for familiar-sized objects such as cars and pedestrians (Stewart, Cudworth, and Lishman, 1993). However, usage of this cue could lead to dangerous estimation errors if the driver incorrectly identifies the object that is being approached (e.g., they mistake a child pedestrian for an adult or mistake the type of car). This is directly analogous to the problem of using the angular substense of the runway to control landing discussed above. Consistent with this theoretical analysis, empirical studies have demonstrated that drivers are quite inaccurate when estimating absolute distance. Observers consistently underestimate absolute distance with estimated distance being a power function of the absolute distance, with an exponent of roughly 0.8 (Groeger, 2000; Teghtsoonian and Teghtsoonian, 1969). The research reviewed above has primarily examined performance in conditions 8 In order to estimate absolute speed on the basis of the global optic flow rate or the edge rate, the driver would need to know the absolute distances of the objects that are creating the optical flow.
Rob Cray
139
where visual information is curate and reliable. Another possible source of judgment error is that in some situations the information provided by the human visual system is inaccurate. For example, a driver's estimates of speed, TTC, and distance can be distorted by fog (Snowden, Stimpson, and Ruddle, 1998). For objects with a small angular size (e.g., a motorcycle viewed from a distance of 300 m), observers cannot accurately estimate TTC from equation 9.9 because the object's rate of expansion is near the detection threshold. Hoffmann and Mortimer (1996) have estimated the threshold value of d6/dt for driving to be roughly 0.003 rad/sec and have shown that dO/dt can be well below this value in many driving situations. This is similar to the problem of using the expansion of the angular size of the runway at the beginning of the final approach to landing a plane, discussed above. Recently, Martin Regan and I have shown that staring straight ahead during simulated driving on a straight open road can give the driver the illusion that the TTC (and TH) with other vehicles is longer than it really is (Gray and Regan, 2000b). Following simulated highway driving on a straight empty road for 5 min, drivers initiated overtaking of a lead vehicle substantially later (220-510 ms) than comparable maneuvers made following viewing a static scene. This closing speed aftereffect is quite distinct from the well-known adaptation of the perceived speed of self-motion that is caused by the expanding retinal flow pattern (Denton, 1976) (i.e., drivers underestimate their driving speed following adaptation) and is distinct from the classical motion aftereffect (Addams, 1834). Unlike these other phenomena, the closing speed aftereffect is produced by local adaptation of looming detectors that signal motion-in-depth for objects near the focus of expansion (Regan and Beverley, 1979). To summarize, accurate visual information TTC is available to a driver while information about absolute driving speed and the absolute -distance of other vehicles is lacking in most driving situations. Laboratory and field research has shown that under optimal conditions drivers can accurately estimate TTC but cannot accurately judge their own speed of travel. Under non-optimal conditions (e.g., closing speed adaptation or for small objects) TTC estimation can also be highly inaccurate. Martin Regan and I recently investigated the visual information used to control overtaking in a driving simulator (Gray and Regan, 2003). In separate experiments participants actively executed overtaking maneuvers and made passive yes/no judgments about whether it was safe to overtake. The speeds of the lead and oncoming vehicles were varied from trial to trial to create different safety margins. As shown in figure 9.8A, in deciding whether it is safe to overtake a driver essentially needs to compare two times: the time required to overtake (TRO) and the time it would take the oncoming car to reach the critical distance required to pass (TTCD). A safe pass requires that TTCD > TRO. The solid bars in figure 9.8B show the distribution of overtaking maneuvers initiated as a function of TTCD-TRO for 18 drivers. Drivers in our experiment initiated unsafe overtaking maneuvers (i.e., TTCD < TRO) on 16% of the trials. Results from the judgment task were even worse as participants made unsafe judgments on 30% of the trials. We next examined whether the ability to judge whether it is safe to overtake was affected by closing speed adaptation. The open bars in figure 9.8B plot the number of overtakes that were initiated following adaptation to closing speed produced by 5 min of driving on a straight open road. The results are striking: closing speed adaptation substantially increased the total number of unsafe
140
Vision in Flying, Driving, and Sport
overtaking maneuvers (from 16% to 29%). Analysis of the passive overtaking judgment revealed that this adaptation effect is even more dangerous. Reaction times for overtaking judgments were significantly slower and more variable following closing speed adaptation. The dangerous state of being adapted to closing speed occurs when a driver gazes fixedly at the road or at an oncoming vehicle rather than scanning the scene ahead (a state that is commonly referred to as "highway hypnosis"). Our driving simulator results suggest that, in real-world driving situations, closing speed adaptation may impair the ability of a driver to decide accurately whether they have sufficient time to complete a maneuver such as an overtake while avoiding collision with an oncoming car. Closing speed adaptation not only makes judgments slower but also substantially biases the driver towards an underestimate of the time required, thus increasing the probability of collision. These conditions could be avoided by encouraging the driver to make more frequent eye movements. One possible method for achieving this would be to use an in-car eye tracker that would send a warning signal when there was a long period of steady fixation. In this study we also found that a substantial proportion of our drivers used the distance of the oncoming car as the control variable in overtaking (i.e., overtaking was initiated whenever the oncoming cars was further away than some critical distance irregardless of its speed). Eleven percent of our drivers used this strategy in all situations, while another 33% used it when the rate of expansion of the oncoming car was below threshold. The use of distance as a control variable is problematic for several reasons. First and foremost, this strategy is not robust across situations (e.g., if the oncoming car is approaching at an unusually high speed). The second major problem associated with using distance as a control variable is that, as described above, previous research has shown that drivers cannot accurately estimate the absolute distance of another vehicle on the roadway.
9.4 Vision in Sports Ted Williams, the famous Red Sox outfielder, said that "hitting a baseball... is the single most difficult thing to do in sport" (Williams and Underwood, 1970) and with career totals of 521 home runs, 2,654 hits and a .344 life time batting average, he is eminently qualified to be the judge. A major-league fastball travels the 18.5 m (60.5 ft) distance between the pitcher's mound and the plate in a mere 410 ms. To hit the ball into fair territory it has been calculated that the batter must estimate the time of arrival of the approaching ball with an accuracy of ±9 ms and estimate the height of the ball when it crosses the plate with an accuracy of ±1.3 cm (±0.5") (Watts and Bahill, 1991). Furthermore, it has been demonstrated that pursuit eye movements are not fast enough to keep the ball in foveal vision for its entire flight (Bahill and LaRitz, 1984; Hubbard and Seng, 1954). Baseball batting truly pushes the limits of human performance. In this section I examine the sources of visual information and motor-control strategies involved in hitting. It has been proposed that the perceptual component of the act of hitting can be reduced to the judgments of where and when; a batter needs only know the position of
Rob Gray
141
the ball when it crosses the plate and the instant in time that it will be there (Bahill and Karnavas, 1993). The binocular and monocular sources of information about time to passage expressed in equations 9.8 and 9.9 (figure 9.6) could be used by the batter to estimate the instant in time when the ball would cross the plate. For the act of hitting a baseball, Bahill and Karnavas (1993) calculated the value of d9/dt in equation 9.8 to be roughly 30 tunes above discrimination threshold the moment the ball is released leading to the conclusion that "from the instant the ball leaves the pitcher's hand, the batter's retinal image contains accurate cues for time to contact" (p. 6). As evidence for this claim, they point out that batters rarely make purely temporal errors that would result in line drives hit in foul territory. Due to the fast speeds involved in baseball, the value of dS/dt in equation 9.9 (i.e., the rate of change of retinal disparity) would also be well above discrimination threshold: when a 90 mph pitch is 40 ft from the plate its instantaneous value of dS/dt is approximately 0.8 deg/s. However, the ability to make fine discriminations of an information source is necessary but not sufficient for accurate estimation of an absolute value.9 The relevant question here is whether batters can use the information sources to accurately estimate the absolute TTC. To address this issue, consider the psychophysical findings of Gray and Regan (1998). In this study, the accuracy of observer's estimates of absolute TTC for a simulated approaching ball was measured over a range of TTC values from 1.8 to 3.2 s. We reported 2-12% errors for judgments based on equation 9.9 alone, 2.5-10% errors based on equation 9.10 alone, and 1.3-3% errors when both monocular and binocular sources of information were available. If we assume that these percentage values can be generalized from a TTC value of 1.8 s (a 23 mph pitch) to the 0.4-0.6 s TTC range involved in hitting, then a 1.3% estimation error corresponds to a temporal error of approximately 5 ms. This value is well within the ±9 ms error margin calculated by Watts and Bahill (1991). The findings of Gray and Regan (1998) suggest that a hitter could estimate the TTC of an approaching ball more accurately if both the information sources expressed in equation 9.9 and equation 9.10 are used (although the best estimation performance for either cue alone is also within the required margin for error). Bahill and Karnavas (1993) have proposed that the more difficult judgment for the batter (and the one with the relatively smaller margin for error)10 is estimating where the ball will be when it crosses the plate. The aspect of this judgment that is particularly difficult is predicting how far the ball will drop in height. Although batters are exquisitely sensitive to the angular drop speed of the ball (Regan and Kaushal, 1994) and this information is well above threshold from the instant the ball leaves the pitcher's hand (Bahill and Karnavas, 1993), the angular drop speed is insufficient for judging height because the relationship between the angular drop speed and the physical drop speed depends on the ball's absolute distance. In the absence of cues to 9
For example, consider the case of a batter judging the TTC of two pitches, a 95 mph fastball with a TTC of 0.43 s and a 85 mph curveball with a TTC of 0.48 s. If the batter judged that the 95mph pitch would arrive in 0.6 sec and the 85 mph pitch would arrive in 0.65 sec, discrimination of relative TTC would be precise; i.e., the hitter would correctly judge that the 95 mph pitch would arrive sooner with only a 12% difference in TTC between the pitches but estimation of absolute TTC would be quite inaccurate (an error of 0.17 s is twice the temporal accuracy required). '"Relative to a 95 mph pitch, a 90 mph pitch arrives 21 ms later (i.e., 2.3 times the temporal margin for error) and crosses the plate 2.8 inches higher (i.e., 5.6 times the spatial margin for error).
142
Vision in Flying, Driving, and Sport
the ball's absolute distance, two possible means of scaling the angular drop speed with distance to get an accurate estimate of height have been identified. Bahill and Karnavas (1993) have proposed that hitters use the pitch speed in lieu of distance information to estimate the height of the ball. In particular, the height of ball when it crosses the plate (Yp) is given by
where DM is the distance to the mound, t is the time since pitch release, 5 is the estimated pitch speed, and dip/dt is the angular drop speed. Note that the value 5 in this equation is the absolute speed of the ball that can also not be estimated directly from retinal image variables. Bahill and Karnavas (1993) argue that "the speed estimator probably uses memory and other sensory inputs: some visual, such as the motion of the pitcher's arms and body" (p. 8). Alternatively, Bootsma and Peper (1992) have suggested that batters could take advantage of the fact that the ball is always the same physical size. In particular, the ball's height when it crosses the plate is given by
where R is radius of the ball (see Todd (1981) and Regan and Kaushal (1994) for similar derivations). Note that equation 9.12 is analogous to the visual correlate of direction of motion in depth in the horizontal plane shown in equation 9.6. What evidence is there for the use of these information sources in hitting? Bahill and Karnavas (1993) (following McBeath, 1990) argue that the use of equation 9.1 1 to estimate height is evidenced by a perceptual illusion that is occasionally experienced by batters: the rising fastball. It is physically impossible to throw a ball overhand so that it will overcome gravity and rise during its flight (Watts and Bahill, 1991), yet baseball batters claim that the ball frequently "jumps" over the bat at the last instant. Bahill and Karnavas (1993) propose that this illusion is due to a misestimate of pitch speed. If a batter underestimates the speed of a pitch, the height estimate based on equation 9. 1 1 will be an underestimate. Therefore, at the point of contact the ball will appear to "jump over" the hitter's bat. However, Proffitt and Kaiser (1995) have argued that hitting strategies requiring an estimate of pitch speed would not produce the temporal precision exhibited by professional hitters. As support for the use of equation 9.12 in estimating height, Bootsma and Peper (1992) cite evidence that introducing balls of different sizes alters the judged spatial position of objects in the horizontal plane. Using real approaching objects, Bootsma and Peper (1992) found that the passing distance at which subjects judged an approaching ball to be reachable increased with ball size as predicted by equation 9.6. To our knowledge this study has not be replicated for judgments in vertical dimension, i.e., based on equation 9.12. Unfortunately there have been very few studies that have investigated the specifics of how these sources of visual information are used to control the various motor responses involved in hitting. Clearly, there is more involved in hitting than judging where and when - the batter needs to utilize this information to modify the complex
Rob Gray
143
biomechanics involved in swinging a bat (Shaffer et al., 1993; Welch et al., 1995). One simple strategy that has been proposed is that accurate timing could be achieved by initiating a constant duration swing at a critical value of TTC (i.e., so that the duration of the movement matches the remaining time before contact) (Fitch and Turvey, 1978). This hypothesis has the attractive feature that the degrees of freedom problem in movement control (Bernstein, 1967) is greatly simplified. I recently investigated the visual information involved in hitting using a virtual baseball batting task (Gray, 2002a, 2002b). Experienced college baseball players swung real bats (equipped with 3D motion trackers) at simulated approaching baseballs. In this first set of experiments only monocular information was available to the batter and the ball was presented on a black background (i.e., the simulated pitcher was not visible). Simulated pitch speed was varied randomly between 60 mph (27 m/s) and 87 mph (39 m/s). Figure 9.9 shows a typical recording of bat and ball height as a function of time. Temporal swing errors were calculated from the difference between the instant in time when the ball crossed the front of the plate and the instant in time when the minimum bat height occurred (shown as point A in figure 9.9). Spatial swing errors were calculated from the difference between the height of the ball when it crossed the front of the plate and the minimum bat height (shown as point B in figure 9.9). Figure 9.10A plots the minimum bat height as a function of pitch speed for one highly skilled college baseball player in our study. Similar results were obtained for five other batters. It is clear from this figure that this batter had difficultly controlling the spatial component of his swing. Minimum bat height was significantly correlated with pitch speed (R = 0.6, p < 0.001)11; however, the variation in swing height (slope = 0.026 m) was much less than the actual variation in pitch height (dashed line; slope = O.lm). Why does this effect occur? As described above, this may occur because there is no direct perceptual correlate of bah1 height; instead, batters must estimate height indirectly using absolute speed or absolute distance. If as suggested by Bahill and Karnavas (1993) batters used pitch speed to estimate the height of the ball when it crosses the plate (i.e., equation 9.11), large random variations in pitch speed would explain the large spatial errors in the swing. Conversely, if batters were using the information expressed in equation 9.12, they should have been unaffected by variations in pitch speed; therefore, my results were not consistent with batters using ball size to judge height. Figure 9.10B plots the point in time when the minimum bat height occurred as a function of pitch speed for the same batter. Consistent with the analysis of the visual information above, batters in my study were significantly better at controlling the temporal component of the swing than the spatial component of the swing. The slope of the line of best fit (-11.25 ms) was much closer to the predicted slope (-17.9 ms; dashed line). This result is consistent with the analysis of the visual information since, unlike pitch height, the TTC can be estimated directly from retinal image variables. How did batters use TTC information to control the timing of their swing? The results of my study were not consistent with the proposal that batters initiate a constant duration swing at a fixed TTC. Instead, batters appeared to initiate a variable duration swing 11 In the batting simulation the ball was always released from a height of 6 ft (1.83 m) so that the height of the pitch when it crossed the plate was perfectly correlated with pitch speed. This was done to allow for comparison with the model of hitting proposed by Bahill and Karnavas (1993).
144
Vision in Flying, Driving, and Sport
Figure 9.10: Performance data in the simulated baseball batting task used by (Gray, 2002a): A: Minimum bat height and B: time of minimum bat height as a function of pitch speed. Solid lines are the line of best fit. Dashed lines in A and B plot the actual variation in height and time of arrival, respectively. Reprinted with permission from Gray, R. (2002) Behavior of college baseball players in a virtual batting task, J. Exp. Psych: Hum. Percept. Per/., 28: 1143. Copyright 2002 American Psychological Association. at a constant time after the pitch was released. This meant that swings were initiated at shorter TTC values as pitch speed increased. Clearly, further research is needed to understand how a baseball swing is controlled. Given that a batter does not have direct visual information about absolute pitch speed, how might this quantity be estimated during hitting? One possibility suggested by anecdotal evidence from baseball players is that the sequence of previous pitches is used to anticipate the speed of the upcoming pitch. For example, after seeing several "off-speed" (i.e., slow) pitches batters often "gear up" for a fastball (Williams and Underwood, 1970). Figure 9.11 shows the effect of pitch sequence on the mean temporal error for our baseball batters. These data are for a condition in which the simulated pitcher could only throw two pitches (chosen randomly): "slow" pitches traveled at 31 ± 0.67 m/s (70 ± 1.5 mph) and "fast" pitches traveled at 38 ± 0.67 m/s (85 ± 1.5
Rob Gray
145
Figure 9.11: Effect of pitch sequence on batting performance. Solid bars are mean temporal errors for fast (F) pitches that were preceded by three consecutive fast pitches; open bars are mean temporal errors for fast (F) pitches that were preceded by three consecutive slow (S) pitches, and gray bars are mean temporal errors for a random speed condition. Reprinted with permission from Gray, R. (2002) Behavior of college baseball players in a virtual batting task, J. Exp. Psych: Hum. Percept. Per/., 28: 1143. Copyright 2002 American Psychological Association. mph). It is clear that the prior sequence of pitch speeds had a strong influence on the temporal error in the swing; for all six batters, the mean temporal errors for fast pitches that were preceded by three consecutive fast pitches (solid black bars) were substantially smaller than mean errors for a fast pitches that were preceded by three consecutive slow pitches (open bars). This expectancy effect can be modeled by a simple Markov process where the speed of the upcoming pitch is probabilistically determined by the previous three pitches in the sequence (Gray, 2002b). An indirect perceptual cue that baseball batters could use to distinguish between fast and slow pitches is the ball's rotation direction. Because of the biomechanics and physics of each pitch, a fastball travels with underspin (i.e., from ground to sky) while a curveball travels with overspin. In laboratory judgment experiments it has been demonstrated that college baseball players can distingish between a fastball and a curveball from 200 ms video of the ball's flight at a 90% accuracy rate (Burroughs, 1984). When this cue was added to my simulation such that pitches faster than 74 mph traveled with underspin while slower pitches traveled with overspin, hitting performance improved significantly. In particular, spatial errors were reduced by an average of 6.5 cm (2.6") for the six college baseball players in the study. Despite the severity of the perceptual-motor demands in baseball batting, it should be emphasized that accurate predictive visual information is available through "direct" perceptual variables. As described in detail above, the time to contact can be predicted from the ball's rate of expansion (equation 9.8 and/or 9.9), and the ball's height
146
Vision in Flying, Driving, and Sport
can be predicted from ball diameter, rate of expansion, and angular drop speed (equation 9.12). Therefore, as proposed by Bootsma and Peper (1992), it is possible to hit successfully entirely on the basis of perceptual information pickedup during the ball's flight. However, the results of my simulated batting study do not support this "direct pick-up" proposal. When pitch speed was varied randomly from trial to trial our batters could not consistently make contact with the ball even though the information provided by equations 9.8 and 9.12 was available. Furthermore, the large improvement in hitting performance with the addition of rotation cues is not compatible with this proposal as rotation direction does not affect the values of equation 9.8 or 9.12. Rotation direction does not provide any information to the batter about the future location of the ball, so presumably it aids the batter by influencing estimated pitch speed.
9.5
Conclusions
The visual-motor control involved in flying an aircraft, driving an automobile, and in playing a fast ball sport like baseball requires an observer to make precise and accurate spatio-temporal visual judgments. Developing theoretical understanding of the information support for these judgments is crucial for creating effective training programs, designing simulators that have a positive transfer of training, and reducing the number of accidents and performance errors in these domains. Early research primarily focused on identifying the retinal image information that could be used to control these actions (step 1 in the rational approach). Despite the vast differences between flying, driving, and playing sports, in theory these behaviors can all be controlled on the basis of a small set of optical variables specifying time to contact and direction of motion in depth. A key result from the analysis of these optical variables is that there are no reliable retinal image correlates of absolute distance and absolute speed. This limitation appears to be a major source of performance error since in the absence of visual cues to these absolute quantities actors must rely on unreliable sources of information such as the angular size of familiar objects (e.g., the runway in landing) or the event history. Research investigating observers' sensitivity to retinal image correlates of time to contact and direction of motion in depth (step 2 in the rational approach) indicates that human observers are highly sensitive to these information sources. Furthermore, as might be expected, it appears that highly skilled pilots and athletes are characterized by an above average sensitivity to visual correlates of time to collision and direction of motion in depth (Gray, 2002a; Kruk and Regan, 1983). Despite the large body of research on visual sensitivity to these variables, there are two major limitations that need to be addressed in future studies. First, the vast majority of previous research has used relative discrimination tasks while performance in these domains hinges on accurate absolute estimation. Being able to discriminate small changes in a retinal image is not sufficient for accurate estimation of an absolute value. For example, Martin Regan and I (Gray and Regan, 2000a) have shown that in some cases observers cannot reliably estimate the absolute TTC for a rotating nonspherical object (e.g., an American football or rugby ball) that is approaching even though TTC discrimination thresholds for the same stimulus were quite low. The second limitation is that in most experiments
Rob Gray
147
sensitivity to the retinal images variables has been measured in special conditions that frequently do not match what occurs during real-world execution of the action. For example, the vast majority of research on sensitivity to visual information about TTC has examined straight (perpendicular), head-on approaches. This is an important limitation because when these special case assumptions are violated (e.g., during nonperpendicular approaches as illustrated in figure 9.4B) the visual correlates of TTC and direction of motion in depth can be subject to large estimation errors (Tresilian, 1991). Clearly, research examining more complex, realistic conditions is needed. Once actors have processed the relevant optical variables, they must next use these information sources to control the appropriate motor responses. In actions such as hitting a baseball that involve a complex series of muscle activation involving several different muscle groups the choice of an effective visual-motor control strategy is nontrivial. Research examining exactly how visual information is used to control motor action (step 3 in the rational approach) can be particularly challenging because control over visual information must often be traded off against the realism and external validity of the motor action (Gray, 2002a). A recent promising approach to studying visual-motor control has been to use virtual reality simulations of real actions. Simulation has the advantage that active motor responses can be combined with fine control over the visual stimulus and that situations that are physically impossible can be investigated (e.g., cue conflicts or sudden changes in information sources).
References Addams, R. (1834). Mr. Addams on a peculiar optical phenomenon. London and Edinburgh Philosophical Magazine and Journal of Science, 5: 373-374. Bahill, A. T. and Karnavas, W. J. (1993). The perceptual illusion of baseballs rising fastball and breaking curveball. /. Exp. Psych. Human Percept. Per/., 19: 3-14. Bahill, A. T. and LaRitz, T. (1984). Why can't batters keep their eyes on the ball? Am. Sci., 72: 249-253. Beall, A. C. and Loomis-, J. M. (1997). Optic flow and visual analysis of the base-tofinal turn. Int. J. Aviation Psychol., 7: 201-223. Bernstein, N. A. (1967). The Coordination and Regulation of Movements. Permagon Press: Oxford. Beverley, K. I. and Regan, D. (1983). Texture changes versus size changes as stimuli for motion in depth. Vis. Res., 23: 1387-1399. Bootsma, R. J. (1991). Predictive information and the control of action: What you see is what you get. Int. J. Sports Psych., 22: 271-278. Bootsma, R. J. and Peper, C. E. (1992). Predictive visual information sources for the regulation of action with special emphasis on hitting and catching. In L. E. Proteau (Ed.), Vision and Motor Control. Elsevier: North Holland. Bootsma, R. J. and Oudejans, R. R. (1993). Visual information about time-to-collision between two objects. J. Exp. Psychol. Hum. Percept. Perform., 19: 1041-1052.
148
Vision in Flying, Driving, and Sport
Calvert, E. S. (1954). Visual judgments in motion. J. Inst. Navigation, 7: 233-251. Chaudhry, S. and Geri, G. A. (2003). Display related effects of terrain-texture density and contrast on percieved air speed in simulated texture. SID 03 Dig. Tech. Papers, 24: 276-279. Clarke, D. D., Ward, P. J. and Jones, J. (1998). Overtaking road-accidents: Differences in manoeuvre as a function of driver age. Accid. Anal. Prev., 30: 455-467. Clarke, D. D., Ward, P. J. and Jones, J. (1999). Processes and countermeasures in overtaking road accidents. Ergonomics, 42: 846-867. Denton, G. G. (1976). The influence of adaptation on subjective velocity for an observer in simulated rectilinear motion. Ergonomics, 19: 409-430. Denton, G. G. (1977). Visual motion aftereffect induced by simulated rectilinear motion. Percept., 6: 711-718. Evans, L. (1991). Traffic Safety and the Driver. Van Nostrand Reinhold: New York. Fitch, H. L. and Turvey, M. T. (1978). On the control of activity: Some remarks from an ecological point of view. In D. Landersand R. Christina (Eds.), Psychology of Motor Behavior and Sport, pp. 3-35. Human Kinetics: Champaign, IL. Flach, J. M., Hageri, B. A. and Larish, J. F. (1992). Active regulation of altitude as a function of optical texture. Percept. Psychophys., 51: 557-568. Flach, J. M., Warren, R., Garness, S. A., Kelly, L. and Stanard, T. (1997). Perception and control of altitude: Splay and depression angles. J. Exp. Psych. Hum. Percept. Perf., 23: 1764-1782. Galanis, G., Jennings, A. and Beckett, P. (2001). Runway width effects in the visual approach to landing. Int. J. Aviat. Psych., 11: 281-301. Gibson, J. J. and Crooks, L. E. (1938). A theoretical field analysis of automobile driving. Am. J. Psych., 51: 453^71. Gray, R. (2002a). Behavior of college baseball players in a virtual batting task. J. Exp. Psych. Hum. Percept. Perf, 5: 1131-1148. Gray, R. (2002b). Markov at the bat: A model of cognitive processing in baseball batters. Psychol. ScL, 13: 543-548. Gray, R. and Regan, D. (1996). Accuracy of reproducing angles: Is a right angle special? Percept., 25: 531-542. Gray, R. and Regan, D. (1998). Accuracy of estimating time to collision using binocular and monocular information. Vis. Res., 38: 499—512. Gray, R. and Regan, D. (1999). Motion in depth: Adequate and inadequate simulation. Percept. Psychophys., 61: 236-245. Gray, R. and Regan, D. (2000a). Estimating the time to collision with a rotating nonspherical object. Vis. Res., 40: 49-63. Gray, R. and Regan, D. (2000b). Risky driving behavior: A consequence of motion adaptation for visually guided motor action. J. Exp. Psychol. Hum. Percept. Perform., 26: 1721-1732.
Rob Gray
149
Gray, R. and Regan, D. (2003). Visual-motor control and decision making of drivers during overtaking. Human Factors, in press. Groeger, J. A. (2000). Understanding Driving: Applying Cognitive Psychology to a Complex Everyday Task. Psychology Press: Philadelphia, PA. Hoffmann, E. R. and Mortimer, R. G. (1996). Scaling of relative velocity between vehicles. Accid. Anal. Prev., 28: 415^21. Hoyle, F. (1957). The Black Cloud. Penguin: Middlesex, England. Hubbard, A. W. and Seng, C. N. (1954). Visual movements of batters. Res. Quart., 25: 42-57. Jeffcoat, G. O., Skelton, N. and Smeed, R. J. (1973). Analysis of National Statistics of Overtaking Accidents. University of London, International Driver Behavior Research Association: London. Kleiss, J. A. and Hubbard, D. C. (1993). Effects of 3 types of flight simulator visual scene detail on detection of altitude change. Hum. Fact., 35: 653-671. Kraft, C. L. (1978). A psychophysical contribution to air safety: Simulator studies of visual illusions in night visual approaches. In H. Pick, H. W. Leibowitz, and J. R. Singer (Eds.), Psychology from Research to Practice, pp. 363-385. Plenum: New York. Kruk, R. and Regan, D. (1983). Visual test results compared with flying performance in telemetry-tracked aircraft. Aviat. Space Environ. Med., 54: 906-911. Kruk, R., Regan, D., Beverley, K. I. and Longridge, T. (1981). Correlations between visual test results and flying performance on the advanced simulator for pilot training (ASPT). Aviat. Space Environ. Med., 52: 455-460. Langewiesche, W. (1944). Stick and Rudder. McGraw Hill: New York. Larish, J. F. and Flach, J. M. (1990). Sources of optical information useful for perception of speed of rectilinear self-motion. /. Exp. Psychol. Hum. Percept. Perform., 16: 295-302. Lee, D. N. (1976). A theory of visual control of braking based on information about time-to-collision. Percept., 5: 437-459. Marsh, G. (1985). Avoiding the wires. Defense Helicopter World, June-August, 2223. McBeath, M. K. (1990). The rising fastball: Baseball's impossible pitch. Percept., 19: 545-552. Mertens, H. W. and Lewis, M. F. (1982). Effect of different runway sizes on pilot performance during simulated night landing approaches. Aviat. Space Environ. Med., 53: 463-471. Mulder, M., Pleijsant, J. M., van der Vaart, H. and van Wieringen, P. (2000). The effects of pictorial detail on the timing of the landing flare: Results of a visual simulation experiment. Int. J. Aviat. Psych., 10: 291-315. NHTSA. (2000). Traffic Safety Facts 2000. National Highway Traffic Safety Administration: Washington, DC.
150
Vision in Flying, Driving, and Sport
Ogle, K. N. (1962). Spatial localization through binocular vision. In H. Davson (Ed.), The Eye, Vol. 4, pp. 271-324. Academic Press: New York. Portfors-Yeomans, C. V. and Regan, D. (1996). Cyclopean discrimination thresholds for the direction and speed of motion in depth. Vis. Res., 36: 3265-3279. Portfors-Yeomans, C. V. and Regan, D. (1997). Discrimination of the direction and speed of motion in depth of a monocularly visible target from binocular information alone. /. Exp. Psychol. Hum. Percept. Perform., 23: 227-243. Proffitt, D. R. and Kaiser, M. K. (1995). Perceiving Events. In: Perception of Space and Motion. Academic Press: New York. Regan, D. (1995). Spatial orientation in aviation: visual contributions. J. Vestib. Res., 5: 455-471. Regan, D. (2002). Binocular information about time to collision and time to passage. Vis. Res., 42: 2479-2484. Regan, D. and Beverley, K. I. (1979). Visually guided locomotion: Psychophysical evidence for a neural mechanism sensitive to flow patterns. Science, 205: 311313. Regan, D., Giaschi, D. E. and Fresco, B. B. (1993). Measurement of glare susceptibility using low-contrast letter charts. Optom. Vis. Sci.,10: 969-975. Regan, D. and Gray, R. (2000). Visually guided collision avoidance and collision achievement. Trends Cog. Sci., 4: 99-107. Regan, D. and Gray, R. (2001). Hitting what one wants to hit and missing what one wants to miss. Vis. Res., 41: 3321-3329. Regan, D. and Gray, R. (2003). A step by step approach to research on time to contact and time to passage. In H. Hecht and G. J. P. Savelsbergh (Eds.), Theories of Time-to Contact. Elsevier-North Holland. Regan, D., Gray, R. and Hamstra, S. J. (1996). Evidence for a neural mechanism that encodes angles. Vis. Res., 36: 323-330. Regan, D. and Kaushal, S. (1994). Monocular discrimination of the direction of motion in depth. Vis. Res., 34: 163-177. Shaffer, B., Jobe, F. W., Pink, M. and Perry, J. (1993). Baseball batting: A electromyographic study. Clin. Orthopaed. Related Res., 292: 285-293. Snowden, R. J., Stimpson, N. and Ruddle, R. A. (1998). Speed perception fogs up as visibility drops. Nature, 392: 450. Stewart, D., Cudworth, C. J. and Lishman, J. R. (1993). Misperception of time-tocollision by drivers in pedestrian accidents. Percept., 22: 1227-1244. Teghtsoonian, M. and Teghtsoonian, R. (1969). Scaling apparent distance in natural indoor settings. Psychonomic Sci., 16: 281. Todd, J. T. (1981). Visual information about moving objects. J. Exp. Psychol. Hum. Percept. Perform., 7: 975-810.
Rob Gray
151
Tresilian, J. R. (1991). Empirical and theoretical issues in the perception of time to contact. J. Exp. Psychol. Hum. Percept. Perform., 17: 865-876. van Winsum, W. (1998). Preferred time headway in car-following and individual differences in perceptual-motor skills. Percept. Motor Skills, 87: 863-873. van Winsum, W. and Heino, A. (1996). Choice of time-headway in car-following and the role of time- to-collision information in braking. Ergonomics, 39: 579-592. Watts, R. G. and Bahill, A. T. (1991). Keep Your Eye on the Ball: Curve Balls, Knuckleballs, and Fallacies of Baseball. W. H. Freeman and Company: New York. Welch, C. M., Banks, S. A., Cook, F. F. and Draovitch, P. (1995). Hitting a baseball: A biomechanical description. J. Orthopaed. Sports Phy.l Ther., 22: 193-201. Wiener, E. L. (Ed.). (1988). Human Factors in Aviation. Academic Press: New York. Williams, T. and Underwood, J. (1970). The Science of Hitting. Simon and Schuster: NY. Yilmaz, E. H. and Warren, W. H., Jr. (1995). Visual control of braking: A test of the tau hypothesis. J. Exp. Psych. Hum. Percept. Perform., 21: 996-1014.
This page intentionally left blank
10. Form-from-Waterc r in Surface Perception, and Old Maps Lothar Spillmann, Baingio Pinna, and John S. Werner Form-from-color was studied using long-range assimilative color spreading from a narrow chromatic edge onto an enclosed white surface area (watercolor effect). Five experiments were performed in which a dark (e.g., purple) contour was flanked by a lighter chromatic contour (e.g., orange). The strength of the watercolor effect for determining figure-ground organisation was compared to that of the classical Gestalt factors of proximity, parallelism, good continuation, and Pragnanz. We found that watercolor was more effective in determining figure-ground segregation under all conditions tested owing to the combined effect between surface uniformity, depth segregation, and border ownership. The findings reveal a fundamental difference of illusory and real color in form perception and, in addition, shed light on the use of chromatic borders in seventeenth-century cartography to demarcate geographical regions.
10.1
Introduction
Upon reading the classical accounts of Gestalt psychology (Koffka, 1935; Metzger, 1936), one is surprised to find how much was then known about figure-ground segregation merely from phenomenological observation. Surfaces, it was maintained, are defined by stimulus discontinuities resulting in perceptual "jumps" in brightness, color, motion, texture, and depth (Gestalt factors of similarity and common fate). But stimuli characterized by lines on a uniform background required additional determinants for structural organization and grouping, e.g., the Gestalt factors of proximity, symmetry, good continuation, and closure (Wertheimer, 1923). The psychophysics and neurophysiology of these "laws of vision" underlying our perception of the world have attracted renewed interest by researchers, beginning in the early 1980s and continuing 153
154
Form-from-Watercolor
to the present day (for reviews, see Regan, 200!•; Spillmann and Ehrenstein, 2004). One factor unknown to the Gestaltists, watercolor, ranks among the most powerful of all in determining spatial organization. It is special in that unlike the perceptual attributes just mentioned, it is characterized by an illusory appearance, and yet its effect on figure-ground segregation is superior to that of most classical Gestalt factors. It is therefore surprising that this remarkable factor which was known long ago - in seventeenth-century mapmaking - has been overlooked by vision researchers until it was rediscovered and made explicit by Pinna (1987). Pinna et al. (2001) noted that the watercolor effect resembled the subtle colors used by cartographers in conjunction with double contours to separate adjoining countries. Wollschlager et al. (2002) actually showed an example of an old map of Africa (by Johan Blaeu in Le Grand Atlas ou Cosmographie, Vol. 10: Africa, 1663. British Library Maps C.S.b.l) that exploited the effect, and more examples of the use of this technique by early cartographers can be found (Bagrow and Skelton, 1985). In fact, if a map consists of black lines only, it easily becomes ambiguous. This problem was first demonstrated in the context of figure-ground organization by Ehrenstein (1930), who used a schematic map to demonstrate that the same map could produce two alternative figure percepts of Italy or the Adriatic Sea, respectively. In this chapter we provide additional demonstrations showing the importance of watercolor for figure-ground segregation and point out a possible relation between watercolor spreading and early map making.' The technique introduced by early cartographers to distinguish between different regions of a map consisted of separating adjacent areas by adding a thick colored line to the inside of a black boundary. The potential consequence of this approach, likely not realized because of the blackness of the boundary, may be appreciated by comparing the two maps of Europe in figure 10.1. These two maps are identical in outline, but the figure-ground organization is reversed by the watercolor effect due to the orange fringe lining the purple boundary. The hue of the lighter contour uniformly spreads to fill in the surface and defines either bodies of water (top) or land masses (bottom) as figure. Some mapmakers chose a wider chromatic edge with a shallower gradient than used in figure 10.1 (see also the painting by Kandinsky in Spillmann and Pinna, 2003). While this approach presumably served their purposes in better defining geographic regions, it never produced a coloration or figural effect as compelling as that in figure 10.1. The idea of cartographers that color could be exploited to segregate figure from ground (and figure from figure) anticipated Rubin's (1915) descriptive rules: (i) a figure appears closer to the observer (depth) than the background; (ii) its color appears denser (surface color) than the same color on the ground; and (iii) the contour belongs to the figure (border ownership), not the ground. These phenomenological observations suggest that the bridge between watercolor and figural effect is in the contour, by which figure and ground are separated. This has likely been the intuition of the cartographers and this is also the basis of the watercolor effect, where the coloration and figural segregation are defined by the boundaries. 'Color reproductions of these figures may be found on the accompanying CD.
Lothar Spillmann, Baingio Pinna, and John S. Werner
155
Figure 10.1: Schematic maps of Europe. A: An outer fringe to induce uniform color spreading over water bodies. B: An inner fringe to induce watercolor over land masses. Only in the latter map could subjects spontaneously recognize Europe. (Color versions may be found in the accompanying digital media.) In a recent study (Pinna et al., 2003), the watercolor effect was found to win when pitted against conflicting figural information defined by the classical Gestalt principles listed above. In this regard it may be termed a new principle of figure-ground organization. In the present work, the study of the watercolor effect is expanded using examples pertinent to geographical maps.
156
10.2
Form-ftom-Watercolor
General Methods
Stimuli were wiggly, purple contours flanked by another chromatic (usually orange) edge on a white background. They were drawn by hand using magic markers and were about 6 arc min wide. Stimuli were presented under Osram Daylight fluorescent light (250 lux, 5,600° K) and under this illumation the CIE x, y chromaticity coordinates of the colored lines were 0.30,0.23 (purple), 0.57,0.42 (orange), and 0.62,0.34 (red). Stimuli were observed binocularly from a distance of 50 cm with freely moving eyes. Independent groups of fourteen naive observers with normal vision participated in each condition of each experiment. Stimuli were presented in a different random sequence for each observer. The subject's task was to report what was figure and what was ground. In addition, each subject rated the relative strength (in percent between 0 and 100) of a given surface being perceived as figure. A training period preceded each experiment during which observers familiarized themselves with the concepts of figure and ground and also practiced scaling the relative strength or salience of a given region as being perceived as figure.
10.3
Experiment 1: How to Create Two Geographical Maps by Using One Boundary
The use of colored fringes by early cartographers enhanced the figural organization of their maps by differentiating neighboring countries and regions. We found that by inserting colored fringes along one side or the other of a chromatic (purple) boundary, it is possible to create - in perception - two completely different geographical maps (figure 10.1). The difference between the two maps holds in spite of what we know about geographical maps and ipso facto despite past experience (landlocked people may perceive something different from seafaring people). This was tested in the first experiment.
Stimuli A map of Europe, as illustrated in figure 10.1, was presented using three different conditions: (i) purple contours only; (ii) orange fringes added to the bodies of water (the Mediterranean, Black, and Baltic Seas and the Atlantic Ocean); and (iii) orange fringes added to the land masses (Europe and North Africa), i.e., a reversal of the side of the chromatic flanking contour. The overall size of the map was 21.3 x 15.6°. The procedure followed that given in section 10.2. Results For all subjects, conditions (i) and (iii) were perceived as the continent of Europe (a rating of 100%), whereas condition (ii) was described by all subjects as two ragged peninsulas. Only after explicit suggestions by the experimenter did 12 of 14 subjects admit to seeing Europe in this condition, but the percept was described as not compelling. The mean rating for perceiving land masses as a figure was only 25.7% (SD = 4.04). Two subjects did not perceive Europe at all, although they recognized the European map in an atlas quite readily.
Lothar Spillmann, Baingio Pinna, and John S. Werner
157
Figure 10.2: Test stimulus for pitting watercolor spreading against proximity and parallelism in figure-ground organization. This result shows that a boundary may belong to one or the other side depending on the presence or absence of a fringe line and clearly demonstrates that the watercolor effect is stronger than past experience (Wertheimer, 1.923) in inducing figure-ground segregation. Even after repeated suggestions by the experimenter, regions associated with prior knowledge tend to remain background while the complementary area - because of the watercolor effect - attains the status of figure.
10.4 Experiment 2: Watercolor Effect vs. Proximity and Parallelism Cartographers required a powerful means to bias figure-ground organization vis-a-vis the not yet known classical Gestalt principles. In this experiment, we tested the hypothesis that the watercolor effect is more effective in structurally organizing a complex pattern than the Gestalt principles of proximity (elements that are close to each other tend to be grouped) and parallelism (Morinaga's (1941) Ebenbreite: equidistant contours enclosing areas of similar or same width will appear to belong to one figure). Stimuli Stimuli were composed of a large outer square, 10.2 ° on the side, containing two smaller square-shaped frames tilted in opposite directions (figure 10.2). The first frame was rotated 10° to the left and had sides of 7.4° and 5.7°, while the second frame was rotated 10° to the right and had sides of 3.4° and 2.3°, respectively. Five conditions were used: (i) purple contours only; (ii) orange fringes lining the three interspaces outside the two frames (figure 10.2A); (iii) orange fringes lining the insides of the two frames (figure 10.2B); (iv) red instead of orange fringes lining the interspaces outside the two frames; and (v) uniform orange color covering the inter-
158
Form-from-Watercolor
Figure 10.3: Figural strength of the parallel frames plotted for five test conditions.
spaces outside the two tilted frames. This latter condition was a control to find out whether the figural effect of watercolor spreading is simply due to uniform coloration (Gestalt factor of similarity).
Results In figure 10.3, mean ratings (in %) of the two frames being perceived as figures are plotted for each of the five conditions. In the purple-contours-only condition (i) and when watercolor and parallelism were combined (iii), the two frames were always perceived as figures and received (near) 100% ratings. However, when the watercolor effect was pitted against both of the above Gestalt factors, the interspaces outside were overwhelmingly chosen in the orange (ii) and, less frequently, the red watercolor condition (iv), and the two frames were seen much less frequently as figure. Finally, when the interspaces were physically and uniformly filled with orange color (v), the parallel frames (proximity) won over uniform color, suggesting that illusory coloration is superior to real coloration in this context. Thus, the watercolor effect cannot be reduced to a mere effect of similarity due to surface color. A one-way ANOVA revealed that the relative strength of the frames being perceived as figure varied significantly among the five conditions (F4,65 — 912.721, p < 0.0001). In the Fisher PLSD post hoc analysis, there were no significant differences (p > 0.05) between conditions (i) versus (iii), (i) versus (v), and (iii) versus (v).
Lothar Spillmann, Baingio Pinna, and John S. Werner
159
Figure 10.4: Test stimulus for pitting watercolor spreading against good continuation and Prdgnanz in figure-ground organization.
10.5
Experiment 3: Watercolor Effect vs. Good Continuation and Prdgnanz
As with proximity and parallelism, to overcome the Gestalt factor of good continuation in geographical maps is a challenging task. This principle states that figures tend to be defined by continuous and smooth contours rather than contours that change abruptly. However, geographical borders sometimes take an unforeseen course far from a straight line. Similarly, the tendency towards Prdgnanz (Wertheimer, 1923; Koffka, 1935) needs to be overcome as political regions rarely conform to the simplest and perceptually most stable of shapes. Experiment 3 therefore compares good continuation and Prdgnanz with the watercolor effect as spatial organising factors using six different conditions to determine whether the watercolor effect offsets and overrules the effect of good continuation.
Stimuli Figure 10.4 illustrates the stimulus used. It consisted of a small square intersected in the lower left vertex by a circle. Both figures were surrounded by a large square. In this configuration the effects of good continuation and Prdgnanz act synergistically. The size of the large square was 14.6°; the size of the small square was 5.1°, and the radius of the circle was 3.1°. . Six experimental conditions were tested: (i) purple contours only; (ii) orange-color fringes added to the inside of the overlapping area and to the inside of the large interspace between the two figures and the large outer square; (iii) orange-color fringes added to the inside of the nonoverlapping surfaces of the square and circle; (iv) red fringes added to the inside of the overlapping area and to the inside of the large in-
160
Form-from-Watercofor
Figure 10.5: Figural strength of circle and square plotted for six test conditions. terspace between the two figures and the large outer square; (v) overlapping surfaces physically and uniformly colored with the same orange as used in the fringe; and (vi) nonoverlapping surfaces physically and uniformly colored with the same orange as used in the fringe. Results In figure 10.5, mean ratings (in %) of the intersecting square and circle being perceived as figures are plotted for each of the six conditions. In the purple-contoursonly condition (i), all subjects reported perceiving the two outline figures of the square and circle as figure. When an orange fringe was added to the inside of the overlapping area of the two shapes (condition ii), the figural effects of good continuation and Pragnanz were offset and the nonoverlapping parts appeared as holes or ground rather than figures. An analogous result was obtained when the orange fringe was added to the inside of the nonoverlapping areas (condition iii). These areas now appeared as figures and the overlapping section between them as ground or hole. Good continuation and Pragnanz were also overruled, but not as strongly when a red fringe was added to the overlapping area (condition iv). Finally, when the overlapping or nonoverlapping areas were physically colored with orange (conditions v and vi), subjects perceived the two outline figures in about 80% of the cases. Here, the effects of good continuation and Pragnanz were weaker than in the purple-contours-only condition (i), but still stronger than with the orange fringes present. A one-way ANOVA revealed that the relative strength of circle and square being perceived as figure changed significantly depending on the six edge conditions (F5i78 = 400.615, p < 0.0001). In the Fisher PLSD post hoc analysis all the individual conditions are significantly different from each other (p < 0.0001). In none of the
LotharSpillmann, Baingio Pinna, and John S. Werner
161
Figure 10.6: Test stimuli for evaluating watercolor spreading in disambiguating grouping and figure-ground organization. A: Purple contour only. B: Orange fringe added within the crosses. C: Orange fringe added outside the crosses. See CDROM for a color version of this figure. orange fringe conditions were square and circle perceived as figures. We thus conclude that watercolor wins over good continuation and Prdgnanz.
10.6 Experiment 4: Watercolor Effect Used to Disambiguate Grouping and Figure-Ground Organization In this experiment two sets of interleaved shapes were studied vis-a-vis watercolor spreading (figure 10.6A). These were crosses touching each other at the vertices (figure 10.6B) and eight-headed stars emerging from the region complementary to the crosses (figure 10.6C). The figure-ground organization between crosses and stars is ambiguous in that one set of shapes is "camouflaged" by the other and vice versa. The question thus is: Can the watercolor effect disambiguate figure and ground by perceptually strengthening one (e.g., crosses) or the other (stars). Stimuli Each cross was 5.1° x 5.7° in length. The overall figure was 18.8° x 17.7°. There were five conditions: (i) purple contours only; (ii) orange fringes added to the inner edge of each cross; (iii) orange fringes added to the outside edge of each cross; (iv) orange color physically and uniformly added to the surface of the crosses; and (v) orange color physically added to each of the stars. Results In figure 10.7, mean ratings (in %) of the crosses being perceived as figures are plotted for each of the five conditions. With the purple-contours only (i), the mean rating is 51%. This value increased to 95% when the orange fringes were added to the inner edges of the crosses (condition ii). When the orange fringes were added to the
162
Form-from-Watercolor
Figure 10.7: Figural strength of crosses being perceived as figure plotted for five test conditions. stars (condition iii), mean ratings dropped to 1.5%. The physically colored conditions (iv and v) were not significantly different from the purple-contours-only condition. A one-way ANOVA revealed that the relative strength of the crosses being perceived as figure changed significantly depending on the edge conditions (F 4,65 = 134.5, p < 0.0001). These results show that the watercolor effect is capable of disambiguating areas within maps by defining the boundaries as well as imparting uniform color on the enclosed areas regardless of shape.
10.7
Experiment 5: Why Did the Old Maps Fail to Elicit Strong Long-Range Coloration Effects?
The early cartographers sometimes placed chromatic contours inside a darker contour similar to the double contour used for eliciting the watercolor effect. One important difference between the old maps and stimuli eliciting the watercolor effect is that the dark contour in the maps was typically black whereas in our test stimuli it was purple. In preliminary observations, we found that the watercolor effect also occurs with a black instead of a purple contour, although less strongly (Pinna et al., 2001). There are other reasons why the cartographers of the seventeenth century may have
Lothar Spillmann, Baingio Pinna, and John S. Werner
163
Figure 10.8: Test stimulus used to evaluate the effect of contour width ratio. failed to obtain a strong surface coloration. This is because they used conditions that were not optimal for long-range color assimilation. In the old maps the two lines required to elicit the watercolor effect were not of equal width; instead, the dark boundary was generally thinner than the color fringe. Furthermore, the fringes were quite wide in absolute terms, contrary to the inducing lines used in the watercolor effect. It thus appears as though the colored fringes acted like solid contours, preventing the inner surface from becoming filled in by color. It is not known whether this was by design. It is clear, however, that by confining the color to the boundaries, cartographers differentiated between neighboring regions and at the same time guaranteed good legibility on the surface. In this experiment, we tested the hypothesis that the poor surface coloration obtained in the old maps is due, in part, to an unfavorable width ratio between the darker and lighter chromatic contours. Stimuli The basic stimulus consisted of two rectangles, one inside the other (figure 10.8). The size of the larger rectangle was 16.2° x 17.7°; the size of the smaller rectangle was 6.6° x 7.9°. The widths of both the purple boundary and the orange fringes were varied in three steps: 6, 12, 18 arc min; thus, their combinations yielded nine different conditions. The strength of the watercolor effect was scaled as in the other experiments. Results In figure 10.9, mean ratings (in %) of the interspaces being perceived as figure are plotted against the width of the purple contour with the width of the orange contour serving as a parameter. Results show that the watercolor effect decreases inversely with the width of the (purple) boundary. It likewise decreases when the width of the (orange) fringe increases, however, much less. The optimal watercolor effect
164
Form-from- Watercolor
Figure 10.9: Figural strenght of interspaces plotted as a function of the width of the outer purple contour for three widths of the inner orange contour. occurs with a ratio of 1 (6/6 arc min). Results of a two-way ANOVA revealed significant main effects for inner and outer contours (Fa,117 > 58.0, p < 0.001), but the interaction between them was not significant. This finding demonstrates that each factor influences the long-range color spreading effect in the same direction. However, increasing the width of the boundary line is more detrimental to the watercolor effect than increasing the width of the fringe line. The nonoptimal width ratio then may explain the weak coloration in the old maps.
10.8 Conclusion The results of this study show that the watercolor effect assigns figural status to surfaces colored by long-range assimilative color spreading. Its effect on figure-ground segregation wins over all the Gestalt factors tested: proximity, parallelism, good continuation, and Pragnanz. These results extend our previous comparisons of the figural strength of watercolor over the classical Gestalt factors (Pinna et al., 2003). We suggest that figural, or form, effects are linked to color effects because both of them start from boundaries and depend on them. However, they can also be separated (when the inducing lines are equiluminant (Pinna et al. 2001)) and each can be present without the other. The separation between the two effects suggests different mechanisms for color spreading and figural effects, although they may not be independent. There are two important aspects of our observerations that shed light not only on the maps of old, but more importantly, on the mechanisms underlying watercolor spreading. First, it is known that the watercolor effect is stronger when the two interacting lines have different luminances (Pinna et al., 2001). Second, the uniform coloration
Lothar Spillmann, Baingio Pinna, and John S. Werner
165
strengthens border ownership (Rubin, 1915; Nakayama and Shimojo, 1990; Zhou et al., 2000) through a colored fringe added to the darker boundary and by creating a gradient of three (perceptual) colors. These are the outer color (purple), the color of the fringe (orange) and the color of the induced surface spreading (light orange). This phenomenological hierarchy may explain the different results obtained with a physically uniform coloration. A similar gradient also occurs with achromatic colors although the strength of the watercolor is diminished (Pinna et al., 2001). Finally, we have observed informally that the presence of print tends to diminish the spread of watercolor and this certainly would have reduced the effect in the old maps. In summary, by reducing the possibility that the boundary could be reversed, the watercolor effect makes the border ownership stronger. In this way, it unambiguously increases the figural strength of the surface by assigning an inward-outward direction at the contour. This is what distinguishes the watercolor effect from the classical Gestalt figure-ground organizational factors.
Acknowledgments This work was supported by a research fellowship from the Alexander-von-Humboldt Foundation (to BP), Extended Freiburg-Padova Academic Exchange Program (to BP and LS), DFG grant SP 67/8-1 (to LS), and an Alexander-von-Humboldt Foundation Senior Research Prize (to JSW). We thank the Karl-Miescher-Foundation, Riehen, Switzerland, for supporting the production of the chromatic stimuli. We also thank Daniel Wollschlager and Walter Ehrenstein for comments as well as Monica Gaias and Cristiana Lenzerini for assistance in testing the subjects.
References Bagrow, L. and Skelton, R. A. (1985). History of Cartography. Precedent Pub.: Chicago. Ehrenstein, W. (1930). Untersuchungen iiber Figur-Grund-Fragen. Zeitschriftfur Psychologie, 117:339-412. Koffka,K. (1935). Principles of Gestalt Psychology. Harcourt Brace: New York. Morinaga, S. (1941). Beobachtungen iiber die Grundlagen und Wirkungen anschaulich gleichmasiger Breite. Archivfiir die gesamte Psychologic, 108: 310-348. Metzger, W. (1936). Gesetze des Sehens. W. Kramer and Co.: Frankfurt am Main. Nakayama, K. and Shimojo, S. (1990). Towards a neural understanding of visual surface representation. Cold Spring Harbor Symposia on Quantitative Biology, 40: 911-924. Pinna, B. (1987). Un effetto di colorazione. In V. Majer, M. Maeranm and M. Santinello (Eds.), // laboratorio e la citta XXI Congresso degli Psicologi Italiani, p. 158.
166
Form-from-Watercolor
Pinna, B., Brelstaff, G. and Spillmann, L. (2001). Surface colour from boundaries: A new 'watercolour' illusion. Vis. Res., 41: 2669-2676. Pinna, B., Werner, J. S. and Spillmann, L. (2003). The watercolour effect: A new principle of grouping and figure-ground organization. Vis. Res.,43: 43-52. Regan, D. (2001). Human Perception of Objects. Sinauer Associates: Sunderland, MA. Rubin, E. (1915). Synsoplevede Figurer. Copenhavn: Glydendalske. Spillmann, L. and Ehrenstein, W. H. (2004). Gestalt factors in the visual neurosciences. In L. M. Chalupa and J. S. Werner (Eds.), The Visual Neurosciences, pp. 15731589. MIT Press: Cambridge, MA. Spillmann, L. and Pinna, B. (2003). Reply to Barris. Vis. Res., 43: 1721-1722. Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt II. Psychologische Forschung, 4: 301-350. Wollschlager, D., Rodriguez, A. M. and Hoffman, D. D. (2002). Flank transparency: The effects of gaps, line spacing, and apparent motion. Percept., 31: 1073-1092. Zhou, H., Friedman, H. S. and von der Heydt, R. (2000). Coding of border ownership in monkey visual cortex. J. Neurosci., 20: 6594—6611.
Part III
Eye Movements
This page intentionally left blank
11. The Basis of a Saccadic Decision: What We Can Learn from Visual Search and Visual Attention Eileen Kowler 11.1
Prologue
Most studies of vision attempt to characterize visual capacities under the constraints present in the typical laboratory environment. Should the results turn out to be orderly enough, it then becomes reasonable to put the work in a broader context by trying to link the observations to one or more centrally important, real-world activities where vision plays a major role. Martin Regan, to the great benefit of the field, has always gone about things the other way around. Martin does not have to explain why a particular visual capacity he observes is relevant to survival because he begins with people in peril. It is impossible to think of Martin's work without envisioning cricket players knocked senseless by 100 mileper-hour projectiles, airplanes diving into runways, or automobiles crashing into one another on the highways. After analysis of the geometrical transformations of the retinal image that precede such disasters, Martin takes us to the quiet confines of the laboratory and to the fundamental principles of visual science that underlie the ability to successfully manage even the most casual contact with the outside visual world. The visual system, seen through Martin's work, is an elegant structure whose operation in nature can be understood when probed by means of rigorous, careful, theoretically driven experimentation. Alas, try as I might to follow Martin's example, my work falls into the former group (see above, "Most studies ..."). I can only hope that what follows has benefited to some small degree from the many conversations with Martin over the years and the times when he so generously shared his opinions and criticisms, always perfectly on the mark. 169
170
The Basis of a Saccadic Decision
Figure 11.1: Example of the sequence of fixations during the visual memory experiment of Melcher and Kowler (2001). The task was to recall as many items as possible from the display. Reprinted from Melcher, D. and Kowler, E. (2001) Visual scene memory and the guidance of saccadic eye movements. Vis. Res. ,41: 3597-3611. Copyright 2001 with permission from Elsevier.
11.2
Saccadic Decisions
The limitations of human vision force us to make saccadic eye movements to bring important details to the fovea, where these details can be resolved clearly and accurately. Research articles and book chapters about eye movements often refer to people busily making three saccades each second, every second (some have even counted up saccades/lifetime), but there is no basis for believing that such furious scanning takes place constantly. Interesting exceptions, however, are special tasks, such as reading or search, which require lots of saccades, particularly when performed under time pressure. Figures 11.1-11.3 show examples of scan patterns during three different tasks: picture memorization, reading, and visual search. If saccades are to accomplish anything useful, they must be driven by a rational decision strategy that takes the line of sight to the locations that are most likely to contain information needed to complete the task. Confidence in this belief is so great that there is a tradition, more than a century old (see Delabarre, 1897), of using observed patterns of eye movements to infer what aspects of a scene people find most important and useful, either when they are merely inspecting pictures (Yarbus, 1967) or when they are attempting to perform a more challenging visual task (Ballard et al., 1995; Epelboim and Suppes, 2001; O'Regan, 1990; Vishwanath and Kowler, 2003). Viviani (1990) offered an insightful critique of attempts to infer the sequence of cognitive events from sequences of saccades, pointing out, among other things, that cognitive processes involved in understanding a scene, or planning behavior, need not follow the piecemeal, one-at-a-time march that governs movements of the eye, since, in essence, thinking and planning are not constrained by the sequential placement of the fovea. I was interested in the assumption that eye movements are rational for several reasons. First, as an experimenter, I find popular, untested, and fragile assumptions about
Eileen Kowler
171
Figure 11.2: An example of horizontal and vertical eye movements over time while reading text.
Figure 11.3: Eye movements during visual search. The task was to find the tilted letter T hi one of the clusters and report its orientation.
eye movements irresistible. Second, as someone who has studied eye movements for more than 20 years, and logged more hours as a subject in Purkinje Image Tracker experiments than probably anyone, I knew that saccadic eye movements require work. In saccadic experiments I have to pay some attention to what I am doing; there is a sense that energy (granted, not much) is being expended in controlling the saccades. By comparison, smooth pursuit is easy (but that's another chapter). Most everyone who had been regarding eye movements as perfectly rational, and thus the key to seeing hidden mental events, had neglected the "work" aspect, that eye movements themselves, their planning, programming, and execution, consume cognitive resources that could have been devoted to other things. It is certainly possible, then, that some eye movement patterns might be preferred, not because they bring the line of sight to important places sooner, but because the planning and execution are in some way simpler.
172
11.3
The Basis of a Saccadic Decision
Search and Optimal Search
At around the same time I was thinking that it was appropriate to act on these vague ideas, I had an interesting conversation with Misha Pavel, who was studying visual search. Pavel had obtained some data indicating that in search tasks mediated by arm movements, where a cursor had to be moved to different locations in order to reveal the contents, people did not always behave sensibly. When the cursor was allowed to move quickly, the locations in the display were searched in the order indicated by cues specifying the probability of finding the target. On the other hand, when cursor speed was slowed, the cues signaling probability levels tended to be ignored in favor of a strategy of searching nearby locations first. This situation is analogous to many realworld search problems, where it often seems more sensible to search nearby locations first, even if the likelihood of reward is small, to avoid excessive travel time to remote regions. In search problems of this sort the best strategy is the one that maximizes the probability of reaching the target while at the same tune minimizing the costs incurred in reaching it (Stone, 1975). In the case of real-world search, the costs include things like travel time or the time required to retrieve information from locations once you arrive there. Thus, in many real-world situations the optimal search strategy requires taking the costs of travel into account, and giving preference to closer locations, even if the likelihood of finding the target at these locations is small. In Pavel's arm movement experiment, however, people did not behave optimally in that they had more time available than they realized, and gave more weight than necessary to reducing the total distance traveled. Performance closer to optimal has been observed in an application of Stone's theory to search mediated by attention. Shaw and Shaw (1977) studied the allocation of attention within a briefly presented display, where observers had to identify a target letter that could appear at one of eight locations, equidistant from the line of sight. The probability that the letter would appear at any of the locations was the main variable of interest. Two different probability distributions were tested, and the subjects were told these distributions in advance. Shaw and Shaw made reasonable assumptions about the relationship between the amount of attention paid to a location and the probability of a correct detection (i.e., the performance resource function (Sperling and Dosher, 1986), or the "return function" in search (see Shaw and Shaw, 1977, for details). This allowed them to compute the performance that would be predicted if attention were distributed optimally among the eight locations. Note that the optimal distribution of attention would not be strictly proportional to probability because the performance resource function showed diminishing returns; that is, small increases of attention yield greater benefits when applied to poorly attended locations than to wellattended locations. For attentional allocation to be optimal, subjects would have to use either the prior history of performance errors or some other internal knowledge of the performance resource function to adjust attentional allocation to produce best possible performance. Shaw and Shaw found that three of the four subjects allocated attention optimally, in ways consistent with both the probability cues and with the performanceresource function.
Figure lO.la Schematic map of Europe with an outer fringe to induce uniform color spreading over water bodies. Figure lO.lb Schematic map of Europe with an inner fringe to induce watercolor over land masses. Only in this map could subjects spontaneously recognize Europe.
Figure 10.2a Test stimulus for pitting watercolor spreading against proximity and parallelism in figure-ground organization. Figure 10.2b Test stimulus for pitting watercolor spreading against proximity and parallelism in figure-ground organization.
Figure 10.4 Test stimulus for pitting watercolor spreading against good continuation and Prdgnanz in figure-ground organization.
Figure 10.6a Test stimuli for evaluating watercolor spreading in disambiguating grouping and figure-ground organization. Purple contour only.
Figure 10.6b Test stimuli for evaluating watercolor spreading in disambiguating grouping and figure-ground organization. Orange fringe added within the crosses.
Figure 10.6c Test stimuli for evaluating watercolor spreading in disambiguating grouping and figure-ground organization. Orange fringe added outside the crosses.
Figure 10.8 Test stimulus used to evaluate the effect of contour width ratio.
Eileen Kowler
11.4
173
Saccades during Natural Visual Tasks
Are saccades used optimally during the performance of visual tasks? An answer to this question would require considering both the rationality of the eye. movement pattern, as judged by how well the eye movements led to obtaining useful information, and the costs of planning and executing this pattern. The "costs" of planning saccades would presumably be reflected in measures such as the time required to plan an accurate saccade, or the extent to which planning the saccade occurs at the expense of concurrent visual or cognitive analysis. Previous approaches to describing and analyzing saccadic performance were principally concerned with the rationality issue, that is, with how well eye movements led to the acquisition of useful information, and not with the cost of planning or carrying out these eye movements. Some examples: Land, Mennie, and Rusted (1999) performed a painstaking analysis of eye movements in freely moving people performing an everyday life task (making tea). A custom-designed video tracker was used to record the visual scene, from the viewpoint of the head, along with the position of the eye with respect to the scene, in order to obtain a real-time record of which objects were being examined as the tea making proceeded. The analysis of the eye movement patterns showed that people used eye movements quite rationally in that most of the time they looked at the object they were manipulating. The key issue to be considered in evaluating the cost of this efficient oculomotor pattern is to understand how they were able to locate the relevant objects so quickly. Land et al. attributed the efficient localization to a coarse visual memory that encoded the general configuration of the scene, and a shorter-duration memory that held the location of recently fixated objects. Thus, visual memory presumably is what led saccades to important locations without much time or effort devoted to planning. The familiarity with the visual environment avoided some of the saccadic planning costs that might be incurred when encountering a visual setting for the first time. The importance of visual memory in producing efficient shifts of gaze was investigated more directly by Epelboim et al. (1995), who used a revolving field sensor coil method to obtain high temporal and spatial resolution recordings of eye and head movements while subjects tapped a sequence of color-coded targets in a prespecified order. The targets were in random locations that stayed constant for a block of 10 trials. Epelboim et al. found that initial trials in each block were characterized by a time-consuming search of the display to find the next target to be tapped. After about five trials with targets in the same spatial configuration, the target locations were learned fairly well, as shown by the decrease in the number of erroneous fixations, and by the decrease in the duration of the pauses between saccades. Interestingly, the improvement with repeated presentations was much greater during the tapping task than during a task in which the requirement to tap was removed and subjects had only to look at each of the targets. Epelboim et al. obtained some insights about the reasons for these task differences by separating what they called "search" episodes from "sequence" episodes. Search episodes referred to the pauses before the gaze shifts made while in search of the next target, while sequence episodes occurred once a target location was learned. The number and duration of search episodes were the same across the tasks, while the duration of sequence episodes was always longer during the look-only task than during the tapping task. Epelboim et al. attributed the increased duration of
174
The Basis of a Saccadic Decision
sequence episodes to the time required to retrieve information about target location, which was, apparently, longer during looking than tapping. One way of summarizing the studies described above is to say that eye movements can bring the line of sight to important objects with little delay, provided that the locations of the objects are sufficiently familiar. What happens when the environment is unfamiliar, and the opportunity to learn the contents is restricted? Melcher and Kowler (2001) addressed this question in a study of the buildup of memory for objects in a scene. The scenes contained about 12 different and easily distinguishable objects presented for durations up to 4 s (see figure 11.1). To study the buildup of visual memory for the objects, a subset of scenes were viewed several times, with the repetitions randomly inserted in the trial sequence. The main result was that memory for the objects, assessed by a verbal recall task, improved with repeated viewing (Melcher, 2001), but, surprisingly, the eye movements showed little evidence of being influenced by the accumulating visual memory. Saccadic targets were instead chosen randomly, without regard for whether a location had been examined before, with object eccentricity as the only factor biasing the selection (closer items were preferred). Thus, although memory for the objects contained in the scene was steadily accumulating, the memory was apparently not detailed enough, or not accessible enough, to guide the momentary choice of where to look. Accessing the gradually accumulating memory as scanning was in progress was either difficult or time-consuming, or both, a distraction from the main task of memorizing the list of objects in the scene. Thus, it was better to allow some useless fixations than to spend a lot of time deciding where might be the best place to look. These results suggest that the cost of aiming the line of sight to the most useful location is not trivial and that people may elect not to pay the price because either the time or the effort required to choose a location carefully could be a distraction from the main task at hand.
11.5
Saccades and Visual Search: An Investigation of the Costs of Planning a Rational Saccade
To better understand the nature of the costs, in terms of time and effort required to plan a rational pattern of saccades, Araujo, Kowler, and Pavel (2001) studied how saccades were used to perform a simple two location search task. Displays consisted of two clusters of characters, located on either side of a central fixation point (figure 11.4). One cluster contained a tilted letter T surrounded by tiled L's, and the other contained exclusively L's. The task was to find and identify the orientation of the T. The size of the characters was small (about 10 min arc), and spacing was tight (center-to-center distance = 30 min arc), so that it was not possible to correctly identify the orientation of the T unless the cluster was fixated. Success in the task required that the first saccade be aimed at the cluster containing the T; the brief time the display was available (0.5 s) did not permit both locations to be examined. The key to making a rational decision about where to look was to take into account the visual cue we provided, namely, a fourfold difference in intensity between
Eileen Kowler
175
Figure 11.4: Sequence of frames hi a trial from the visual search experiment of Araujo et al. (2001). The target letter T is shown in the lefthand cluster in the second frame. In the experiment the orientation of the T was selected randomly from eight possible values. Reprinted from Araujo, C., Kowler, E., and Pavel, M. (2001) Eye movements during visual search: The costs of choosing the optimal path. Vis. Res., 41: 3613-3625. Copyright 2001 with permission from Elsevier. the clusters. On some days the brighter cluster had an 80% probability of containing the T; on other days the dimmer cluster was associated with the 80% probability. The distance between each cluster and the fixation point was also varied, with one cluster located between 1 ° and 5° from fixation, and the other cluster located 5 ° from the first. The only rational strategy was to use the intensity cue to aim the saccade at the highprobability cluster, but only one of the six subjects followed this strategy consistently. Figure 11.5 shows the proportion of first saccades aimed to the righthand cluster as a function of the eccentricity of this cluster. Data are shown separately for trials in which the T was likely to be on the right or on the left. Subject FF, the most rational of the group, directed saccades according to the probability cue, looking at the righthand location when the cue indicated that the T was likely to be on the right, and to the lefthand location when the cue indicated that the T was likely to be on the left. Distance of the clusters from fixation played little role (although even FF was not perfect, as shown by a few saccades directed to the low-probability righthand cluster when it was at a close eccentricity). Subjects BB and CC neglected the probability cue entirely and based the direction of the first saccade only on the distance of a cluster from the line of sight, looking preferentially at the nearer of the two clusters. AA, DD, and EE were perhaps the most interesting because they took the probability cue into account part of the time. Their saccades were based on a mixture of both the cue and eccentricity. The question raised by these data is not whether people can use probabilistic information sensibly, but what made it so difficult to do so. Even people who showed they
176
The Basis of a Saccadic Decision
Figure 11.5: Proportion of rightward saccades as a function of the eccentricity of the righthand cluster for all six subjects (AA-FF) for trials in which the target was likely (probability = 0.8) to be in the lefthand or righthand clusters. Reprinted from Araujo, C., Kowler, K, and Pavel, M. (2001) Eye movements during visual search: The costs of choosing the optimal path. Vis. Res., 41: 3613-3625. Copyright 2001 with permission from Elsevier.
Eileen Kowler
111
understood the significance of the cue (i.e., AA, DD, EE, and FF) did not do so all the time. What discouraged them from performing better? Additional analyses revealed at least some of the reasons for the irrational performance patterns. Figure 11.6 shows the latency of the first saccade as a function of eccentricity for saccades directed to both high and low probability clusters. The latencies were 20-50 ms longer for saccades to the high-probability cluster, showing the cost of taking the cue into account. The cost in terms of time was clearly small; there was ample time to allow an extra 50 ms to analyze the simple probability cue. These data, however, suggest that such options were often rejected, perhaps because taking the extra step to evaluate extrafoveal information was "expensive,"in terms of either the time or the cognitive effort involved. We may have been expecting people to do something that they rarely do in natural viewing. An option that was popular, however, was to attempt a second saccade. This was not too wise because the trials were too short to permit both clusters to be examined. Nevertheless, there was a consistent tendency to make a second saccade to the highprobability cluster in trials where the first saccade was aimed at the low-probability cluster. These second saccades occurred even when the target letter T was contained in the first cluster examined (figure 11.7), suggesting that the pair of saccades was programmed concurrently (Zingale and Kowler, 1987; McPeek et al., 2000) without regard for the information available during the intersaccadic pause. The sequence of the two saccades was performed so rapidly (often, remarkably rapidly; Figure 11.8) that the reports about the orientation of the T were almost always incorrect. Thus, given the option to slow down and deliberately choose a sensible saccadic strategy, subjects showed a clear preference to be oculomotor athletes and try to hit as many clusters as they could in the brief time allowed.' In a different visual search task, Hooge and Erkelens (1999) also reported preferences to neglect important cues signaling the location of a target. They had subjects search for a thin circle in an array of randomly intermixed thin and fat C's. Instead of slowing down the rate of saccades long enough to avoid looking at the fat C's, which would never be targets by virtue of their size, the subjects preferred to keep scanning rate constant and put up with a fair number of useless saccades to these obvious nontargets. Hooge and Erkelens (1999) did not determine how much saccade rate would have to be slowed to improve selectivity, and it is possible that, unlike the simple two-location task of Araujo et al. (2001), the intersaccadic pause durations that would have been needed to improve the selection of saccadic goals might have been prohibitively long. It is just such an issue that is prompting us to re-examine search, in a more complex sixlocation task, again using visual cues to designate high- and low-probability locations (figure 11.4) and varying the time available to complete the search. We want to find out whether the choice of where to ami the saccade can be adjusted on the basis of realistic estimates of the number of locations that can be scanned hi the allotted time, or whether, just as in the two-location search task described above, there is a reluctance to use extrafoveal information to make an informed choice about where to look. 'We surely could have motivated subjects to act more sensibly, but this was not the issue. At this stage of the research we did not want to mask the question we were most interested in, namely, what choice is made when a costly, but rational, saccadic strategy is pitted against a low-cost irrational one.
178
The Basis of a Saccadic Decision
Figure 11.6: Mean latency of the first saccade to high- and low-probability locations as a function of eccentricity for all six subjects (AA-FF). Vertical bars represent ± 1 SD, shown only for saccades to the low-probability location. SDs to high-probability locations were similar. Reprinted from Araujo, C., Kowler, E., and Pavel, M. (2001) Eye movements during visual search: The costs of choosing the optimal path. Vis. Res., 41: 3613-3625. Copyright 2001 with permission from Elsevier. There are interesting differences between the characteristics of the saccadic patterns used in the search experiments described above from the search mediated by attention, as studied by Shaw and Shaw (1977). In treatments of search that do not involve eye movements, a major concern is that suboptimal performance will result from allocating too much time, effort, or attention to a given location, something that is expected if the people fail to take into account the "diminishing returns" that characterize the effect of adding resources to performance. But the saccadic search patterns were suboptimal for
Eileen Kowler
179
Figure 11.7: Top: Proportion of trials with an attempted search of the second location when the first saccade was directed to the cluster that was likely (probability = 0.8, open bars) and not likely (probability = 0.2, filled bars) to contain the target, given the target was not at the first location searched, for all six subjects (AA-FF). Bottom: Same, except the target was present at the first location searched. Reprinted from Araujo, C., Kowler, E., and Pavel, M. (2001) Eye movements during visual search: The costs of choosing the optimal path. Vis. Res., 41: 3613-3625. Copyright 2001 with permission from Elsevier.
a different reason, namely, too little time was spent analyzing the available information. The reason too little time was spent is not necessarily a failure to realize that longer pauses would have produced improvements, but instead is more likely to be traceable to a property of saccadic programming mechanisms that discourages excessive analysis of extrafoveal information. If there is a property of the saccadic system that places a
180
The Basis of a Saccadic Decision
Figure 11.8: Examples of very brief intersaccadic intervals during two location searches. The trial starts at time = 0. The critical frame containing the target appeared 0.5 s after trial onset and was displayed until 1 s after trial onset. Reprinted from Araujo, C., Kowler, E., and Pavel, M. (2001). Eye movements during visual search: The costs of choosing the optimal path. Vis. Res., 41: 3613-3625. Copyright 2001 with permission from Elsevier. constraint on extrafoveal visual analysis, it is likely to be found in the link between saccades and visual attention.
11.6
The Role of Attention in the Programming of Saccades
Most visual scenes are crowded with objects and details; thus, for saccades to land with reasonable accuracy at a chosen location it is necessary for some front-end selective filter to define the visual signals that constitute the effective target for saccades. By requiring people to report some characteristic of visual targets located at different places in the scene while the programming of a saccade is in progress, it is possible to evaluate the distribution of attention during presaccadic periods. Studies using such methods showed enhanced perceptual performance at the goal of a saccade and poorer
Eileen Kowler
181
performance elsewhere, suggesting that a single attentional filter selects the target for both saccades and perception (Kowler et al., 1995; Hoffman and Subramaniam, 1995; Deubel and Schneider, 1996). Gersch, Kowler, and Dosher (2004) recently undertook similar experiments in a more natural situation, namely, with a sequence of saccades, rather than the single, isolated saccades studied in the past. Figure 11.9 shows the task they used. Displays contained six outline squares that had to be scanned repetitively. Subjects were asked to look at every other square, in a triangular path, continuing around the display until the trial was over. Before each trial, a cue (small black cross) inside one of the squares indicated where to begin scanning at the start of the trial, while another cue (filled white square) indicated where the critical perceptual target would appear. As shown in the figure, the perceptual target was a sequence of seven frames of a Gabor patch interleaved with visual noise that was presented during a randomly chosen intersaccadic pause (figure 11.10). The task was to report the orientation (leftward or rightward tilt) of the Gabor. Contrast thresholds for this judgment have been shown to be sensitive to attentional strength (Dosher and Lu, 2000; Lu and Dosher, 1998). The important new aspect of Gersch et al.'s (2004) experiment is that thresholds would be determined while the sequence of saccades was in progress. The location of the Gabor was cued in advance, which would, in principle, allow subjects to attend to it. But, given that the primary task was to maintain the saccadic scanning at a steady, brisk pace, would any attention be available for analysis of the Gabor? The answer was: not much. Figure 11.11 shows visual performance during the intersaccadic pauses for Gabors presented at each of the six outline squares. The numbers inside each square are the ratios of thresholds obtained during intersaccadic pauses relative to thresholds obtained in control trials in which the eye remained fixated at one of the six boxes. The top square in each figure represents the location of fixation at the moment the Gabor appeared.2 The arrows in the figure show the path taken by the saccades. Results are shown separately for the two subjects, and for cases where the Gabor was presented early or late (see figure 11.10) in the intersaccadic pause. A number of characteristics can be noticed. First, performance improved over time during the intersaccadic pause, suggesting that traditional saccadic suppression (Burr et al., 1994) resulted in higher thresholds early in the intersaccadic pause. Second, even late in the intersaccadic pause, thresholds were in general elevated relative to fixation (i.e., most ratios > 1). And third, performance was best, almost as good as during steady fixation, at only two locations, the location currently fixated, and the target of the very next saccade. Non-target locations did not do well at all. Particularly poor was the location that had to be skipped en route to the very next saccadic goal, showing the inhibition expected for regions surrounding attended locations (Bahcall and Kowler, 1999; Mounts, 2000; Cutzu and Tsotsos, 2003). Thresholds were also elevated for the target of the second saccade in the sequence, showing that attentional enhancement was confined to the goal of the upcoming saccade and did not spread to the goal of a subsequent saccade. This strategy of limiting attention to the saccadic goal is clearly not one that will fa2 In the actual experiment the position of fixation when the Gabor appeared could have been any box; data were pooled across the six different fixation positions to obtain the results shown in the figure.
182
The Basis of a Saccadic Decision
Figure 11.9: The experiment of Gersch et al. (2004). The first frame shown is the cue frame indicating starting fixation position (black cross, inside one of the boxes) and location of Gabor (small white square). Saccades to every other box began 100 ms after trial start. The Gabor+noise, frames were presented during a randomly selected intersaccadic pause. The Gabor appeared superimposed on noise during the trial. The starting fixation position and Gabor location varied randomly on each trial. Reprinted from Gersch, T, Kowler, E. and Dosher, B. (2004) "Dynamic allocation of attention during sequences of saccades", Vision Res., 44: 1469-1483. Copyright 2004, with permission from Elsevier.
cilitate detection of the Gabor, but it is a strategy that may be crucial for maintaining the sequence of saccades without disruption. This idea needs further tests. For example, instructions or other inducements should be used in an attempt to alter the distribution of attention because it might be that some attention can be lured away from the saccadic goal without too much disruption, as was the case for single saccades (Kowler et al., 1995). It is also important to examine other display configurations, particularly those that do not require repetitive scanning. However, given these caveats, the present set of results suggests that extrafoveal attention is important for only one thing: defining the goal of the next saccade so that the saccade is assured of landing in the right place.
Eileen Kowler
183
Figure 11.10: Sample eye traces (top: horizontal; bottom: vertical) during a trial. The series of Gabor/noise frames appeared either right after a saccade (Early) or after a 150 ms delay following saccade offset (Late). The middle trace is the event marker recording the start of the trial (at time — 0), the signal to begin scanning, and finally, the appearance of the frames containing the Gabor. Reprinted from Gersch, T., Kowler, E. and Dosher, B. (2003) "Dynamic allocation of attention during sequences of saccades", Vision Res., 44: 1469-1483. Copyright 2004 with permission from Elselvier.
11.7
Saccadic Decisions, Search, and Attention
The tight linkage between attention shifts and saccades suggests that, as long as a sequence of saccades is in progress, there is little attention available for surveying the visual scene and finding useful places to look. Instead, saccadic decisions have to be based on less demanding strategies, including reliance on well-entrenched or easily accessible memory. Other possible strategies could entail looking at objects that are large, bright, or in some way easily discerned with little attentional load. This is precisely what people did in the various saccadic tasks reviewed in this chapter: they eschewed careful selection of targets unless the target was easy to find. Such strategies could lead to preferential fixation of objects that are highly visible against backgrounds, or located close to the line of sight. Are these good strategies for guiding saccades? Not necessarily. On many grounds these strategies are terrible. They will lead to fixation of useless locations, they will cost time, and they will require frequent corrective saccades. But perhaps viewed in the context of the performance of the task as a whole, rather than exclusively in the context of the programming of an individual saccade, the less selective strategies have some virtues. The most economical option for the control of saccadic scanning may be to place minimal constraints on the choice of each saccadic target, and place the restrictions instead on visual memory so that only the most useful information is stored and the rest vanishes as soon as the line of sight moves on.
184
The Basis of a Saccadic Decision
Figure 11.11: The ratio of contrast thresholds for discriminating the orientation of the Gabor obtained during intersaccadic pauses relative to thresholds obtained during steady fixation for six different locations. The uppermost square represents the location of current fixation, and the arrows show the dkections of the next three saccades. In the experiment fixation could have been at any of the six locations (data were pooled over location). The location of the Gabor was cued in advance of each trial. Reprinted from Gersch, T, Kowler, E. and Dosher, B. (2004) "Dynamic allocation of visual attention during the execution of sequences of saccades", Vision Res., 44: 1469-1483. Copyright 2004 with permission from Elselvier.
11.8
Final Comments
I attempted in this chapter to present a case for taking into account the costs of saccadic programming when evaluating the overall rationality of a particular saccadic plan. Doing so may be important for both ultimately understanding and modeling the systems that coordinate saccadic target selection with other ongoing activity. At the very least, it is certainly important to take saccadic costs into account if the intention is to use eye movements to infer sequences of processing steps because some saccades may be crucial to the task, while others may be nothing more than frivolous glances. Discriminating these two types of saccades will be difficult without an appreciation for the rules and constraints that govern the generation of saccadic plans.
Eileen Kowler
185
Acknowledgments Preparation of this chapter as well as the research described (Araujo et al., 2001; Gersch et al., 2003) was supported by a grant from the Air Force Office of Scientific Research, F-49620-02-1 -0112.1 thank the following people for their contributions to this work as well as for many discussions that contributed to the ideas expressed here: Tim Gersch, Brian Schnitzer, David Melcher, Dhanraj Vishwanath, Chris Araujo, Julie Epelboim, Misha Pavel, Barbara Dosher, Jacob Feldman, Manish Singh, Thomas Papathomas, Doug DeCarlo, Robert Steinman, Marian Regan, and Martin Regan.
References Araujo, C., Kowler, E. and Pavel, M. (2001). Eye movements during visual search: The costs of choosing the optimal path. Vis. Res., 41: 3613-3625. Bahcall, D. O. and Kowler, E. (1999). Attentional interference at small spatial separations. Vis. Res., 39: 71-86. Ballard, D. H., Hayhoe, M. M. and Pelz, J. (1995). Memory representations in natural tasks. J. Cogn. Neuroscl, 7: 66-80. Burr, D. C., Morrone, M. C, and Ross, J. (1994). Selective suppression of the magnocellular visual pathway during saccadic eye movements. Nature, 371: 511-513. Cutzu, F. and Tsotsos, J. K. (2003). The selective tuning model of attention: Psychophysical evidence for a suppressive annulus around an attended item. Vis. Res., 43: 205-219. Delabarre, E. B. (1897). A method of recording eye movements. Am. J. Psychol., 9: 572-574. Deubel, H. and Schneider, W. X. (1996). Saccade target selection and object recognition: evidence for a common attentional mechanism. Vis. Res., 36: 1827-1837. Dosher, B., and Lu, Z.-L. (2000). Mechanisms of perceptual attention in multi-location cueing. Vis. Res., 40: 1269-1292. Epelboim, J., Steinman, R. M., Kowler, E., Edwards, M., Pizlo, Z., Erkelens, C. J. and Collewijn, H., (1995). The function of visual search and memory in sequential looking tasks. Vis. Res., 35: 3401-3422. Epelboim, J. and Suppes, P. (2001). A model of cognitive processes during eye movements in geometry. Vis. Res., 41: 1561-1574. Gersch, T., Kowler, E. and Dosher, B. (2004). Dynamic allocation of visual attention during the execution of sequences of saccades. Vis. Res., 44: 1469-1483. Hoffman, J. E. and Subramaniam, B. (1995). The role of visual attention in saccadic eye movements. Percept. Psychophys., 57: 787-795. Hooge, I. T. C. and Erkelens, C. J. (1999). Peripheral vision and oculomotor control during visual search. Vis. Res., 39: 1567-1575.
186
The Basis of a Saccadic Decision
Kowler, E., Anderson, E., Dosher, B. and Blaser, E. (1995). The role of attention in the programming of saccades. Vis. Res., 35: 1897-1916. Land, M., Mennie, N. and Rusted, J. (1999). The roles of vision and eye movements in the control of activities of daily living. Percept., 28: 1311-1328. Lu, L. and Dosher, B. (1998). External noise distinguishes attention mechanisms. Vis. Res.,3S: 1172-1198 McPeek, R. M., Nakayama, K. and Skavenski, A. A. (2000). Concurrent processing of saccades in visual search. Vis. Res., 40: 2499-2516. Melcher, D. (2001). The persistence of memory for scenes. Nature, 412: 401. Melcher, D. and Kowler, E. (2001). Visual scene memory and the guidance of saccadic eye movements. Vis. Res., 41: 3597-3611. Mounts, J. R. W. (2000). Attentional capture by abrupt onsets and feature singletons produces inhibitory surrounds. Percept. Psychophys., 62: 1485-1493. O'Regan, J. K. (1990). Eye movements and reading. In E. Kowler (Ed.), Eye Movements and Their Role in Visual and Cognitive Processes, pp. 395-447. Amsterdam: Elsevier. Shaw, M. L. and Shaw, P. (1977). Optimal allocation of cognitive resources to spatial locations. J. Exp. Psych., 3: 201-211. Sperling, G. and Dosher, B. A. (1986). Strategy and organization in human information processing. In K. R. Boff, L. Kaufman, and J. P. Thomas (Eds.), Handbook of Perception and Human Performance Vol I., Sensory Processes and Perception. (Chap 2). New York: Wiley. Stone, L. (1975). Theory of Optimal Search. New York: Academic Press. Vishwanath, D. and Kowler, E. (2003). Localization of shapes: Eye movements and perception compared. Vis. Res., 43: 1637-1653. Viviani, P. (1990). Eye movements in visual search: Cognitive, perceptual and motor aspects. In E. Kowler (Ed.), Eye Movements and Their Role in Visual and Cognitive Processes, pp. 353-393. Amsterdam: Elsevier. Yarbus, A. (1967). Eye Movements and Vision. New York: Plenum Press. Zingale, C. M. and Kowler, E. (1987). Programming sequences of saccades. Vis. Res., 27: 1327-1341.
12. Handling Real Forms in Real Life R. M. Steinman, W. Menezes, and A. N. Herst Abstract It has become clear during the 24 years that have elapsed since it first became possible to measure human gaze control, accurately under relatively natural conditions, that allowing a human being to perform ecologically significant tasks under ecologically valid conditions yields a view of human gaze control that differs appreciably from the view gamed from experiments performed under the restrictive conditions used in traditional, analytical, oculomotor experiments. This chapter with its accompanying PowerPoint presentation and proprietary eye movement visualization1 will review and demonstrate some of the more surprising observations. Its goal will be to alert the audience to the fact that very little of what we know about human gaze control, which is based mainly on studies with heads immobilized and pointless, looking only for the sake of looking tasks, has relevance for the way human beings actually control gaze when they perform real tasks hi realistic environments.
Introduction This chapter is unusual. It cannot stand alone. It was written to explain, amplify, and provide quantitative support for claims contained in the Microsoft PowerPoint presentation that was shown at the conference held to honor Prof. David Martin Regan. This chapter cannot make sense if it is read without running the PowerPoint application (steinman.ppt on the associated CD), and our proprietary eye movement visualization application (starteyemove.html on the associated CD) at the same tune the chapter is read. If the reader does not have the CD that was provided with this book, the material can be downloaded from our website (http://brissweb.umd.edu). Getting anything 'See the CD associated with this volume. 187
188
Handling Real Forms in Real Life
out of this chapter requires alternating among this text, the PowerPoint show, and the eye movement visualization. Instructions for doing one or another of these things at an appropriate time have been provided in the PowerPoint as well as throughout this chapter. If the reader does not know how to start and run such applications, get help. If the reader knows how, but prefers to explore this material without following the directions in the text and PowerPoint, good luck. The reader should, before reading any further, start the eye movement visualization (applet.htm) and the PowerPoint presentation (steinman.ppt). The PowerPoint show itself should be started when you are told to start it at the end of this introductory material. You will be told when to go to the eye movement visualization while you are watching the PowerPoint presentation. This chapter, like the PowerPoint presentation used in Toronto, was prepared primarily for visual scientists with a wide range of specialties but in most cases with little expertise in our area, specifically, human eye movement a.k.a "human oculomotor control." This puts us in the enviable position of being able to be much more frank when we present our research results and much more outspoken, when we call attention to their wide-sweeping implications, than would be possible if we were addressing an audience composed primarily of expert oculomotorists. Such an audience of oculomotorists would require a very different treatment, because much of what we have found out since 1992, when it first became possible to make accurate observations of visually guided manipulations, goes counter to the expectations of oculomotor experts. As such, some, actually many, of our findings and conclusions can be threatening to those who make a living, that is, get grant support, by pursuing research on human oculomotor control. We hope that you do not share this material with them because the last thing we would want to do is to threaten expert oculomotorists, our closet colleagues. If you do, you may discover that they simply will not believe what they can see with their own eyes. Such reluctance, at this time, is reasonable because our eye/head movement recording instrument is unique. The kinds of measurements we have been publishing since 1995 can only be made in our lab, so it is not surprising that our findings are, and will be, viewed as "controversial" by many expert oculomotorists, particularly those who have not worked with us up to this point. It is universally accepted, one may even say axiomatic in "science" that the acceptance of controversial results requires independent replication in independent laboratories. Replication by independent researchers can be done in our laboratory, but not independently elsewhere simply because our recording instrument, is unique. Note, however, that a number of oculomotorists (e.g., Collewijn, Erkelens, and Kowler), widely considered to be "experts," have collaborated with us on some of the naturalistic research described in Toronto, so we think that it might be worthwhile to take our "controversial" findings seriously, at least provisionally. As you surely suspect, any major revisions in the way we think about natural oculomotor control can have important implications for visual perception, too, (i) because of the exceedingly heterogeneous anatomical and functional properties of the human retina, and (ii) because any revisions in the way we think about the collaboration of the two eyes under natural conditions will have important implications for the singleness of binocular vision, as well as depth perception. There is no controversy about the fact that "normal vision" requires moving the eye and that these eye movements are universally accepted as having visual consequences, so any major changes in what
R. M. Steinman, W. Menezes, and A. N. Herst
189
we consider to be the "normal" eye movement pattern can raise fundamental questions about the mechanisms underlying basic visual as well as high-level perceptual processes. The findings we have been publishing since 1995 (Epelboim et al., 1995,1997; Epelboim, 1998; Malinov et al., 2000; Herst et al., 2001; Steinman et al., 2003) do just this, as you will soon see. Find and open the Povverpoint presentation steinman.ppt and start the slide show now. Let it run until you are told to return to this text
1
The "Scientific Method" In the best of all scientific methods in the best of all possible worlds, all variables save one are kept constant and the effects of manipulating the chosen variable are observed. This, and only this, allows one to provide unambiguous, compelling evidence to support a claim that some variable causes some outcome in the sense that its action is both a necessary and sufficient condition for the event to occur. This is all well and good, providing the operation of the variable chosen, when operating alone, is truly representative of what it does and how it does it when it operates as an element within a machine designed (by evolution, in this discussion) to perform complex, biologically useful tasks under natural conditions, that is, under the conditions present throughout the machine's development. So, clearly, the study of human oculomotor control should, like all useful behaviors, begin, ideally, by observing human oculomotor control under natural conditions. Once this has been done it becomes possible to break the machine down into meaningful functional units, each of which represents what we will call an "ecologically functional variable," a variable which cannot be broken down further without distorting the actions of the machine as it is intended to operate. Both the eye and the head are naturally used together as a 3D mobile sensory platform that is used to orient vision, hearing, taste, and smell receptors with respect to a wide variety of stimulating events within the environment. If one approaches the study of human oculomotor control from this perspective, the last thing that one would want to do would be to immobilize the head to study the way the eyes move. Common sense alone is sufficient for realizing this, but the potential dangers of ignoring it did not enter into the decisions of the oculomotor pioneers who begin studying human eye movements objectively at the end of the nineteenth century, (see Steinman (2003), for a description of the problems that inspired the first successful attempts to record human eye movements, and see Steinman (1986a), to find out why the head had to be immobilized during the first 80 years of oculomotor research). A great deal has been learned about human eye movements since the first recordings were made in 1898 under highly unnatural conditions, but how much of this knowledge actually applies to why and how humans use their eye movements as they perform useful tasks in the real world remains to be determined. At this point, it seems unlikely that a great deal of this existing knowledge can be used to explain human oculomotor control in everyday life. Continue with the PowerPoint
2
Technical advances that led to big surprises Collewijn's (1977) paper on the oculomotor performance of the "freely moving" rabbit introduced two major advances
190
Handling Real Forms in Real Life
in eye movement recording instrumentation in addition to producing a major revision in our understanding of this animal's oculomotor characteristics. Collewijn's advances in instrumentation will be described first. He introduced the "cube-surface field coil" arrangement for generating a magnetic field that is homogeneous throughout a relatively large region at its center. Having a homogeneous magnetic field allows the subject to move about near the center of the magnetic field without confounding head translation with eye rotation signals. He also introduced the "phase-detection" method in which the magnetic field rotates about the subject, whose eyes and head are fitted with sensor coils. The phase-detection method, unlike the "amplitude detection" method (introduced by Robinson in 1963), outputs eye angles relative to the earth-fixed coordinates established by the revolving magnetic field. This output is linear and capable of absolute calibration. These features made it possible to make major improvements of the dynamic range and accuracy of eye/head recording instruments. The year 1975 was particularly propitious for the study of human eye movements because in this year Collewijn et al. (1975) introduced the "silicone annulus sensor coil" (now manufactured by SKALAR-Delft) that made it easy to fit human eyes with sensor coils. These ingenious coils made it possible to measure human eye movements accurately over the entire range of ocular motility. Collewijn, using his novel recording instrument, showed that the rabbit made saccades frequently, but only made them when it moved its head. It never made saccades when the head was immobilized. It had been immobilized in all prior work, so the importance of saccades in the rabbit's oculomotor repertoire was not appreciated until its eye movements had been recorded under natural conditions. Furthermore, the velocity-amplitude characteristics of these saccades were like the human's, which was surprising because cats, whose retinas are much more like ours than the rabbit's, were only capable of making relatively slow saccades. Note that the cat's slow saccades were measured with its head immobilized, hardly the arrangements common during the feline's, our, or the rabbit's evolutionary history. Even the mantles of clams, which carry their light-sensing systems, move around when their shells are not hopping about. Continue with the PowerPoint
3
Confounding translations with rotations The study of human eye movements accurately under relatively natural conditions got off to a bumpy start. It began when Steinman (1976) gave a presentation in which he tried to answer a question posed by the organizers of a symposium concerned with significance of human eye movements in the real world. This symposium, organized by the Committee on Vision of the National Research Council, was held in Princeton, New Jersey, in April 1974. The talk was entitled "The role of eye movements in maintaining a phenomenally clear and stable visual world." Steinman began his presentation by pointing out that he was not sure that the role of eye movements in maintaining a phenomenally clear world "is known to God (Jones, 1966), perhaps only to Leon Festinger (Marquis, 1972)" (Steinman, 1976, p. 121). He said this because no accurate research on human oculomotor control under natural conditions was available at that time. After talking about what he and his students had been doing, he described the first attempt to do this kind of
R. M. Steinman, W.Menezes, and A. N. Herst
191
Figure 12.1: David A. Robinson sitting in his 2-foot-diameter magnetic field coils with his head supported "naturally" (from Steinman, 1976). The sensor coil used to record the orientation of his head is mounted at the front of a biting board clenched between his teeth. natural research by using Robinson's magnetic recording instrument to record head movements of subjects trying to sit as still as possible under natural conditions. The rationale for doing this was that knowing about irreducible head movements, i.e., head movements observed when a biting board or chin rest were not used, would at least let us find out what the oculomotor system had to compensate for outside of the laboratory. Figure 12.1 shows one of the subjects trying to sit as still as possible without any unnatural support for the head. The subject was David Robinson. He is shown sitting in the "magnetic-field search-coil" apparatus he invented and used mainly to study the eye movements of cats and monkeys. The coil, sensing 2D motions of his head, was mounted at the front of the unattached bite board that can be seen sticking of his mouth. He is using his hands and arms as natural support to help him keep his head as still as possible. There is a problem inherent in trying to do experiments like this in this apparatus that was overlooked by its inventor as well as by everyone else involved. What it was and how it was fixed will be explained in the next few slides. Continue with the PowerPoint
4 Getting around the confound The problem, of course, was the fact that the small (2-ft-diameter) Helmholtz field coil arrangement Robinson had been using is homogeneous only in a minuscule region near its center. A sensor coil moving in and out of
192
Handling Real Forms in Real Life
this region will confound an angular change of the coil with the coil's position within the magnetic fields. This became important when Skavenski and Steinman tried to make sense of eye and head recordings that had been made in Boston with Skavenski's small Helmholtz coil arrangement. The fact that we had a problem with this can be seen in the text of a talk at a Wenner-Gren Symposium held in Stockholm in 1969 in which we said that "we are not yet prepared to say whether all eye movements other than voluntary saccades are made in response to rotations and translations of the head" (Steinman, 1975, p. 409). The analyses made up until that point made it clear that something was very wrong in Denmark, so Stockholm was probably the last place one would want to say too much about our recordings. The fix was simply to make much larger fields, locate and map the homogeneous region near their center, and collect our head and eye data while the subject attempted to maintain fixation on a distant target while sitting or standing as still as possible. We published what we believed (and still believe) are "accurate" findings in Skavenski et al. (1979), the source of the PowerPoint slide now being shown. Prior, complete descriptions of how these things came to pass and their significance can be found in Steinman (1986a, 1995,2003). Large Helmholtz coil arrangements have come into use during the last decade or so, particularly in the oculomotor clinic. One can but wonder how many people who are using these large Helmholtz field coils are paying attention to where they are locating the subject's head with respect to the relatively small homogeneous region available (its location can vary depending on the location of objects like steel door frames and air conditioning ducts in the environment). One can also but wonder (given how little reading is done these days) whether many oculomotor researchers realize that allowing the head to move within magnetic fields produced by simple pairs of Helmholtz field coils, even when they are 6 ft on a side, unlike cube surface multi-field-coil arrangements, will confound head translations with eye and head rotations. This confound may not be enormous, but its presence should at least be considered in reporting results obtained when a subject is allowed to move her head within magnetic fields generated by pah's of Helmholtz coils. Continue with the PowerPoint
5
Really free at last This slide shows horizontal head (H divided by 10) movements, right eye movements (RE), left eye movements (LE), and vergence (LE - RE) of four subjects (RS, HS, LK, and HC) who were fixating the control tower of the Rotterdam airport as they oscillated the head horizontally, i.e., about its vertical axis. They started with large, slow oscillations (on the left) and reduced their size as they speeded up (on the right). The three records shown for each subject are shown separately for technical plotting reasons. The three records actually show a single "trial" for each of four subjects during which the subject oscillated the head continuously, from large slow to small fast oscillations, without interruption. The head and eye traces show movements with respect to the earth-fixed framework established by the rotating magnetic field. If the oculomotor system compensated perfectly for the head's oscillation, the eye's traces would be horizontal straight lines. If the compensation was the same in both eyes, the vergence traces would also be horizontal straight lines. If the eye trace moves in the same direction as the head, the oculomotor system is undercompensating.
1?. M. Steinman, W. Menezes, and A. N. Herst
193
If it moves in the opposite direction, it is overcompensating. Displacements of the vergence trace were not expected because the fixation target was very far away (3,000 m). Only two of the subjects showed anything resembling perfect compensation, RS and LK, and they only show this for a very brief period in only one eye. The red arrows point to these unusual periods of "nearly perfect compensation." Clearly, there was little suggestion of nearly perfect compensation of the actively produced head movements by any of the subjects. These results were not anticipated. They did not jibe with the view of oculomotor compensation popular at the time. For example, it was generally accepted that "the function of the vestibulo-ocular reflex (VOR) is to maintain a stable retinal image during head rotations by generating appropriate [our italics] compensatory eye movements" (Miles and Eighmy, 1980). There were even claims of near perfect compensation, e.g., "the remarkable fact emerges that the reflex produces virtually perfect compensation" (Wilson and Melvill Jones, 1979). Note, however, that the oculomotor system was better than most, but not all, eye movement recording systems of the era, but, clearly, it was far from perfect. Much of the reported "oculomotor perfection" was the result of inadequate instrumentation. The oculomotor "slop" simply could not be seen with the electro-oculographic (EOG) method popular for attempting to study "natural" head and eye movements at the time. Once it became possible to make accurate measurements of compensation with suitable instrumentation (i.e., by phase-detecting signals produced by sensor coils in a homogeneous, rotating magnetic field), it became clear that there was considerable retinal image motion within each eye, and the type of motion differed between the eyes. In short, the clarity of vision in the presence of natural head movements could not be produced by the "virtual perfection of oculomotor compensation." Once one accepts these results, the big problem facing visual scientists becomes what are called the "feature- and stereo-correspondence problems." Stated simply, the feature-correspondence problem calls attention to the fact that the visual system must know what feature goes with which feature on the retina within each eye as features of the proximal stimulus move about. The results obtained in Rotterdam also call attention to the binocular stereo-correspondence problem. Once it becomes clear that the assumption of tightly corresponding points between each of the eyes is unwarranted, it becomes hard to understand how fused, single stereovision can be established or maintained in the presence of the relatively large binocular mismatches observed as subjects move about naturally. We had only begun to appreciate that these problems existed in 1979, much less knew how they were solved by the visual system. Some progress has been made since then, largely within the computer or "machine vision" community (see Marr and Poggio, 1979; Ullman, 1979; Pollard, 1985, for some early publications calling attention to the correspondence problems and Faugeras, 1993; Shapiro and Brady, 1992, for reviews of some subsequent developments). Suggestions about how these problems might be solved by the human visual system can be also be found in Steinman et al. (1986b). These authors showed that stereo-hyperacuity was effective in the presence of considerable retinal image motion, and that Julecz patterns remained fused and could be fused easily in the presence of the kind of natural retinal image motion that had been observed in Rotterdam in 1980. Shortly after, Steinman and Levinson (1990) suggested that these problems might be solved by the visual system if it "knew" what the oculomotor system was doing to generate compensatory eye movements in each of the eyes,
194
Handling Real Forms in Real Life
in other words, if the visual system monitored oculomotor commands to the VOR. These authors simply offered what seemed to be a potentially plausible suggestion. It had no empirical support back then or indeed now. Continue with the PowerPoint
6
Beginning to work up close Handling objects became possible in 1992 when instrumentation was perfected for measuring very small translations of the head accurately. This instrumentation also allowed calibration of the positions of manipulanda on a worktable in front of the subject. These developments culminated in a unique head/eye monitoring system called the Maryland Revolving Field Monitor (MRFM). The theory underlying this unique instrument and its data processing can be found in Edwards et al. (1994, revised 1998). A summary of this 141 page technical report can be found in Epelboim et al. (1995), the first publication resulting from its use. The technical report can be downloaded from our website (http://brissweb.umd.edu). Only a very brief summary of the instrument and how it works will be provided here. The MRFM consists of three subsystems: (1) The revolving-field monitor/sensor coil subsystem (RFM), which uses phase detection to measure angular positions of the eyes and head (angle measurement accuracy = 1 min arc, linearity = 0.01%). Data are acquired at 976 Hz; successive sample pairs are averaged and stored at 488 Hz, so effective bandwidth is 244 Hz. Cube-surface field coils (2.44 m on a side) produce a spatially homogeneous magnetic field throughout a large fraction (~ 1m 3 ) of the cube's volume. SKALAR-DELFT sensor coils are used to measure horizontal (azimuth) and vertical (elevation) eyeangles. Head roll, pitch and yaw angles are measured with two orthogonal coils mounted on a tightly fitting cap. (2) The sparker tracking system (STS) measures 3D translations of the head by detecting the arrival time of acoustic signals generated by a "sparker" (bipolar electrodes) mounted firmly on the tightly fitting cap. The head translation measurement precision is 0.2 mm with accuracy of 1 mm. (3) A worktable serves as a platform for the targets. Its surface contains a grid of 154 wells, each with a microswitch at its bottom. Rods topped with LEDs of different colors served as targets in the initial experiments described in Epelboim et al. (1995). A target without an LED, located near the subject, defined the "home" position. Calibrations Two calibrations were performed before any of the experiments began: (i) sparkers of different heights were placed in 18 locations on the worktable to calibrate "sparker-space," and (ii) sighting centers of each subject's eyes were measured psychophysically with the head on a bite board. Only a single calibration procedure is required at the beginning of each experimental session, namely, sensor-coil orientation relative to its line of sight is recorded for each eye by having the subject fixate each eye's pupil seen in a mirror orthogonal to the axis of the coordinate system of the worktable. This direction corresponds to the straight-ahead direction when the subject is on the bite board. Sensor-coil calibrations are required at the beginning of each session because the relationship of the sensor coil to the line of sight will vary with the insertion of the eye coils at the beginning of each session. The position of the cap on the head will also vary from session to session. The hat's initial position is established by recording the angles indicated by the orthogonal pair of head coils and
195
R. M. Steinman, W. Menezes, and A. N. Herst TASK GROOM HONEY BEEF ARROW DOLL
N 4297 4588 2928 6589 11667
LENGTH 27.1 29.7 38.5 33.8 33.5
SD 2.24 2.37 1.33 3.36 1.99
ANGLE 14.3 13.0 10.0 11.5 11.5
SD 1.16 1.02 0.33 1.23 0.65
Table 12.1: Mean LENGTH of the cyclopean-gaze line of sight (cm) and its standard deviation (SD) while working at 5 TASKs; GROOM, HONEY, BEEF, ARROW, and DOLL. The number of samples (N), the vergence ANGLES (deg), and their standard deviations (SD) corresponding to the cyclopean-gaze lengths are also shown. the sparker's position. The hat measurements are made at the beginning of each session when sensor-coil positions on the eyes are recorded. In our first "natural" experiments, subjects were required to only look at a sequence of two, four, or six targets in a specified, or self-selected, order as fast and accurately as possible, or to tap similar sequences of targets as fast and accurately as possible. Subjects were not given any instructions about how they should go about looking at the targets in order to tap them. They used their gaze shifts quite differently in the looking-only and tapping tasks, which, from both an oculomotorist's and a commonsense point of view, should not have been the case. Why would only looking at targets accurately simply for the sake of looking at them be different from looking at the same targets in order to tap them? Said differently, why would the purpose or goal of such simple visuomotor tasks determine their gaze-shift dynamics? On the other hand, why shouldn't the purpose of a visuomotor task determine how it's done? It is clear that having ill-defined abstract entities like "goals" or "purposes" serve as important oculomotor inputs makes difficulties for bioengineers, who want to model the oculomotor "plant," but such inputs could make life easier for the rest of us who want to coordinate our eyes and hands efficiently to get things done. Continue with the PowerPoint
7
Numbers to support claims made in the PowerPoint The claims just made can be justified as follows. Consider first the claim "objects were brought to slightly different, but appropriate, working distances, where they were kept throughout the task." This claim was based on the observation that the length of the average of the lines of sight, when the visualizations were viewed from the left, remained much the same as the work was being done; i.e., there was little variability the working distance once it had been established. The mean lengths of cyclopean-gaze lines of sight and the variability around these means during each of the five evolve tasks is shown in table 12.1. The vergence angles corresponding to these mean distances and standard deviations are also given. The first two tasks, grooming a fellow chimp and using a stick to eat honey from a cone, were performed fairly close, at 27 cm and 30 cm, respectively. Cutting and eating the roast beef was done somewhat farther, 38 cm away. Fashioning the flint
196
Handling Real Forms in Real Life TASK
TIME of SEARCH
GROOM BEEF
6.64 0.11 0.69 9.57 11.17 0.97 0.22 2.37 10.21 14.21 0.79 1.38 7.23 17.14
ARROW DOLL
MEAN N=14 SD
CYCLOPEAN GAZE LENGTH (cm) 98.2 114.7 95.5 70.1 84.9 85.1 101.2 119.4 112.6 113.3 103.0 113.7 107.5 105.8
TABLE DISTANCE (cm)
GAZE ANGLE (deg)
TABLE ANGLE (deg)
57.9 48.0 60.2 38.6 42.6 47.4 44.3 45.1 45.0 42.2 46.8 43.0 48.1 51.8
3.9 3.2 4.0 5.5 4.5 4.6 3.9 3.2 3.5 3.4 3.8 3.4 3.6 3.6
6.6 7.7 6.4 10.0 8.9 8.2 8.8 8.4 8.7 9.0 8.2 9.0 7.9 7.4
101.8
47.2
3.9
8.2
13.10
6.0
0.63
0.96
Table 12.2: The maximum LENGTH of the CYCLOPEAN-GAZE line of sight when the subject searched for a needed object on the work table during each of the tasks. The DISTANCE from the eyes to the table is also shown as well as the TIME of each SEARCH within the trial. The ANGLES corresponding to the GAZE LENGTHS and TABLE DISTANCES are also shown.
arrowhead and assembling the Barbie Doll were performed with the manipulanda at an intermediate distance, 34 cm away. There was relatively little variation of working distances as the work was performed. In all five tasks, the standard deviations of the mean cyclopean-gaze lines of sight were 10% or less of the distance at which the manipulanda were kept. Now consider the second claim: "Whenever the subject needed to find something on the table, he diverged a lot. His lines of sight were far apart. They converged far beyond the objects on the table." This claim is supported in table 12.2. Only four tasks are included in this table. The eating honey task was omitted simply because the stick employed was already in the hand when the recordings began, so there was no need to search for tools or other necessary objects. In other words, when honey was eaten with the stick in hand, no large saccadic vergence changes were required, so none were made. Table 12.2 shows the maximum length of the cyclopean-gaze line of sight during each of these searches. Table 12.2 also reports the distance of the eyes from the table, where the needed objects were located. These searches included finding (i) the bowl for depositing a pin pulled out during grooming, (ii) the stone knife used to cut the roast
R. M. Steinman, W. Menezes, and A. N. Herst
197
beef, (iii) the arrowhead and the piece of flint used to knap it, and (iv) a Barbie Doll's body part. In all cases, the ends of the cyclopean gaze lines of sight were set farther away than we customarily teach is needed to fuse and to maintain fusion of binocular images. During all of these searches gaze intersected at twice, or even more than twice, the distance the objects were likely to be found, despite the fact that the subjects knew, and could easily see, that the objects themselves were much nearer. Table 12.2 also shows, when during the trial (time) each search for a tool or body part was made. This was included to make it easy for you to locate and examine each of these searches while viewing the visualizations available in the evolve folder. The mean maximum length of cyclopean-gaze line of sight, averaged over the five tasks (101.8 cm), was set 2.2 times farther than the distance from the eyes to the table (47.2 cm). Continue with the PowerPoint
8
Keeping busy during a bus ride on a bumpy road The teeter-totter was quite effective in bouncing the subject's head about along all three axes. The mean (SD) yaw, pitch, and roll frequencies (Hz) were 2.57 (0.84), 4.55 (0.94), and 3.00 (1.06), respectively, well above frequencies that can be smoothly pursued regardless of their amplitudes (Martins et al., 1985). The yaw, pitch, and roll amplitudes of the teetertotter oscillations were also appreciable, resulting in relatively high mean (SD) velocities (deg/s), namely, 13.81 (4.81), 8.01 (3.13), and 14.99 (6.12), respectively, many times the head velocities observed when a subject is allowed to sit still without artificial support (Skavenski et al., 1979). In short, the visually guided manipulations seen in the videos shown in the PowerPoint on the "bus" had to be done under considerable stress. These videos also showed that the coloring and embroidery tasks could be accomplished pretty well despite the stressful conditions. How was gaze controlled during these tasks? You will be able to find out when you look at the 30 s visualizations provided in the bus folder. Only examples of coloring a picture and reading a text, both mounted on the tilted table, were included in this folder because the actual position of the stimuli being manipulated cannot be seen when the objects of interest are held in the subject's hands. Visualizations of the subject's coloring and reading the mounted picture and text are worth viewing because these tasks provided some new information about how vergence is controlled during visually guided manipulations. The way vergence was controlled during these two tasks by each of the three subjects is summarized in table 12.3. The other tasks, putting on lipstick and shaving in which a mirror was held, did not add anything to what we already knew. They were performed in much the same way as similar tasks in the "longitudinal study" of the evolution of visuomotor control by primates (described above). Namely, "objects were brought to slightly different, but appropriate, working distances, where they were kept throughout the task." Table 12.3. shows that the mean length of the cyclopean-gaze line of sight varied considerably with the subject and task (30-137 cm), but with only one exception (Yura/Read), this length did not vary much while each task was performed. The standard deviations were small fractions of the mean lengths of the cyclopean-gaze lines of sight, namely, l/6th to l/18th of the lengths. The visualizations in the Bus folder should
198
Handling Real Forms in Real Life
SUBJECT /TASK
MEAN CYCLOPEAN LENGTH (cm)
SD
LENGTH ANGLE (deg)
BOB /COLOR BOB / READ JULIE /COLOR JULIE /RE AD YURA/ COLOR YURA/READ
52.7 56.6 30.0 41.6 40.9 137.4
2.98 3.55 4.84 4.12 6.51 59.8
7.4 6.9 10.5 7.4 8.2 2.5
MEAN TABLE DISTANCE (cm) 47.8 49.8 30.7 38.7 34.7 44.3
SD
TABLE ANGLE (deg)
2.33 1.63 4.62 1.51 4.72 1.29
8.9 8.9 11.1 6.8 7.3 7.5
Table 12.3: Mean LENGTH of the CYCLOPEAN-gaze line of sight (cm) and its standard deviation (SD) when three subjects (BOB, JULIE, and YURA) COLORed a picture or READ an English text that was mounted on a worktable as they were jiggled by the teeter-totter. The mean DISTANCE and SD of the eyes from the TABLE is also shown. The ANGLEs corresponding to these mean lengths and distances are also shown. Means are based on 30 continuous seconds of recording. be viewed close to "actual time," starting with the view from "behind" and switching to the view from the "left" about halfway through. Here is what you should do and what you should notice when you run these visualizations. Look at the coloring trials first. When viewed from behind, all three subjects will be seen to be performing in much the same way. When you view the coloring trials from the left, Bob and Yura will be seen to be converged beyond the plane of the paper upon which they were working (Bob averaged 5 cm beyond; Yura, 6 cm). Julie will be seen to be converged almost at the plane of the paper, averaging only 0.6 cm in front. Julie, who was slightly myopic, was not wearing contacts or spectacles during the experiment. In everyday life, she also usually performed such tasks without any refractive correction. Then examine how these subjects read the text, which was mounted on the table, while they were jiggled about. Again, start with the behind view and then look at the visualizations from the left. Start with Bob and Julie and only then look at Yura's reading performance. When you look from behind, all will seem to be doing much the same thing, They read across each line from left to right, and then make a relatively large saccade to shift gaze left to the start of the next line. But their performances will be seen to differ markedly when they are viewed from the left. Bob and Julie's cyclopean-gaze lines of sight will be seen to "intersect" slightly beyond the plane of the table. Bob averaged 7 cm beyond; Julie, 3 cm, but Yura's "intersection" averaged 93 cm, almost a meter, beyond! He behaved as if he was looking for something throughout the reading trial. He was probably doing just that, as you will see! Recall that Bob set gaze to about one meter whenever he had to find some manipulandum in the experiment on oculomotor demands made during primate evolution (see table 12.2).
R. M. Steinman, W. Menezes, and A. N. Heist
199
Run the Eye Movement Visualization applet Look at the visualizations in the "Bus" folder, returning here as needed to check out the claims about what you should see made just above, and then Continue with the PowerPoint The role of meaning in reading has been de-emphasized since the 1970s when the introduction of the SRI double Purkinje tracker made it possible for individuals with little appreciation of either instrumentation or human eye movements to buy an off-the- shelf eye movement recording machine, which they thought made it possible for them to manipulate texts and make accurate recordings of human eye movements during reading. This newly found ability to record human eye movements with minimal expertise led to the development of "current theories of reading eye movements [which] claim that reading saccades are programmed primarily on the basis of information about the length of the upcoming word, determined by low-level visual processes that detect spaces to the right of fixation" (Booth et al., 1997, p. 2899). Spaces in text were easier to "understand" than such abstract terms as "meaning," and the role it played in reading was played down until the role in spaces was examined by removing them from text. This research showed that meaning played a more important role in reading than was generally considered a decade or so ago (Epelboim et al., 1994,1996, 1997). This fact is not as well known as it might be, perhaps because it has not yet been picked up by introductory cognitive psychology textbooks, as far as we know. At least two undergraduate perception textbooks, which include a section on reading (not all do), have picked it up and understood its significance (Sekuler and Blake, 2002; Schiffman, 1996). The former devotes 2 of 11 pages to this work (18%), and the latter devotes 2 of 5 pages (40%). The important role of meaning in guiding reading can also be seen in our data when subjects were asked to read under considerable stress. Yura, who was required to read a text that he could not understand, acted as though he was searching for this critical cognitive element and set his vergence far beyond the plane of the paper containing the text. Note that he did not do this when the critical cognitive element, the outline of an common object, was available on the paper that he was required to color. The critical cognitive element, the meaning of the words, does not reside in the words when you do not understand the text's language and it must be sought elsewhere. Meaning clearly has a special place in reading, a fact that should not be downplayed.2 Continue with the PowerPoint 2
In our opinion, there is very little good modern work on reading, inspired by the introduction of the SRI double Purkinje image tracker. Most of it (i) makes claims that go beyond the spatial and temporal resolution of the instrumentation employed, (ii) proposes models that do not lend themselves to convincing tests, or (iii) reports data averaged across subjects, a methodological flaw that obscures fundamental features of the reading eye movement pattern. There are, of course, notable exceptions to our complaints, namely, the work of O'Regan (1990), who provided the reading eye movement model that Epelboim et al. (1994) were able to test, and Legge et al. (2002), who have been studying reading by low-vision subjects and have developed a convincing dynamical model of reading eye movements, that they call "Mr. Chips." Much of what we know about reading was discovered by Buswell, who was working between 1920 and 1937. His contributions were described very well by Kolers (1976), and the interested reader is encouraged to check this out.
200
10
Handling Real Forms in Real Life
Braking Saccades in Congenital Nystagmus An extensive study of Prof. Louis Dell'Osso's classical case of congenital nystagmus (CN), which used data collected with the Maryland MRFM, was published in 1992 (Dell'Osso et al., 1992a, b, c). Previously, CN was often considered to reflect a defect in the ability to maintain fixation of a stationary target. This would seem to be a reasonable interpretation when one looks at the rapid oscillations of the line of sight dominating the 4 s record of Dell'Osso's right eye (on top), attempting to maintain fixation along the horizontal meridian, that is reproduced in the PowerPoint. Note that this record was made with Dell'Osso's head stabilized on a biting board. One second of this record is also shown (on bottom) to make it easier to see his complex CN waveform. Dell'Osso's CN eye movement pattern has a mean frequency of 2.77 Hz (SD = 0.17), a mean absolute velocity of 10.4 (SD = 0.65) deg/s and a mean p-p amplitude of 1.5 ° (SD = 0.32), eye velocities and amplitudes, that would result in considerable retinal image smear (Steinman and Levinson, 1990). Frequencies, amplitudes and velocities of this magnitude make it very hard to obtain useful visual information while "maintaining fixation," because the diameter of the rod-free, flat floor of the fovea is only about 1.5°. Dell'Osso et al. (1992a) showed that what are labeled as "braking saccades" in the PowerPoint record reduced retinal image slip to values below 4 deg/s for appreciable periods ("foveation time" averaged 57 ms), velocities sufficiently low and dwell times sufficiently long to permit information to be gleaned from the visual array. Note that a saccade is called a "braking saccade" when it goes in the direction opposite to the direction of an ongoing, rapid, smooth eye movement and slows it down, allowing the fixation target to linger within the fovea. Braking saccades establish what Dell'Osso et al. call "foveation periods." Note that once foveation is considered to be important, only half of the saccades that went in the direction opposite to the fast smooth movement can be called "braking saccades," only those near the middle of the record when the target was within the fovea. Parameters of Dell'Osso's "braking saccades" (measured with the MRFM) were as follows: mean (SD) braking saccade amplitude = 35 (7.3) min arc; peak velocity = 44.0 (6.76) deg/s; presaccade smooth eye velocity = 14.0 (2.91) deg/s; postsaccade velocity = 2.1 (1.78) deg/s. Remember that these braking saccade characteristics were measured while Dell'Osso maintained fixation of a stationary target with his head stabilized on a biting board. These braking saccades provided effective braking: they reduced the presaccadic smooth eye velocity caused by Dell'Osso's CN by 85%, leaving retinal image slip velocity at 2 deg/s, a level at which normal visual information acquisition should be possible. Saccades, which can be seen at the right side of the horizontal oscillation (at the top of the record), did not permit foveation. We call these "intrusive saccades;" Dell'Osso did not name them. They go opposite to the direction of the rapid smooth eye movement, as braking saccades do, and they may, or may not, brake the subsequent smooth eye movement. According to Dell'Osso, these saccades, even when they slow the eye down, leave the target image at the edge, or even outside of the fovea, which means that they probably cannot aid visual information processing. Dell'Osso confined his interest to braking saccades that permitted foveation periods. "Intrusive" saccades do not.
R. M. Steinman, W. Menezes, and A. N. Herst
201
Dell'Osso et al. also showed that this "typical CN patient" had effective smooth pursuit (Dell'Osso et al., 1992b) and VOR (Dell'Osso et al., 1992c). CN patients are known to perform quite difficult visually guided motor tasks in the real world despite their ocular oscillations. Such accomplishments may be made possible by their effective use "braking saccades" (see Jacobs et al. (1999), for a detailed study of braking saccade characteristics in CN). Now that you are familiar with the CN eye movement pattern, look at the next three slides in the PowerPoint. We observed what seem to be "braking saccades" when the subjects read a text while the bus was jiggled. Continue with the PowerPoint
11
Braking saccades during the ride on the bus How similar are the "braking saccades" observed when we studied reading under the stresses imposed by jiggling the teeter-totter to "braking saccades" in the CN pattern, and were they made during the other tasks, as well? Answers to these and related questions can be found in tables 12.4 and 12.5. Braking saccades were made by all of the subjects in all of the tasks. Altogether, •2,849 saccades were made, and 397 of these (14%) met the braking saccade criterion described above. These braking saccades slowed the fast smooth eye movements that were underway when the braking saccade was made, on average, by 50%, almost as much as the 60% braking saccade reduction of smooth eye velocity in Dell'Osso's CN pattern reported by Jacobs et al. (1999). Dell'Osso's braking saccades were, on average, more effective in the fixation data we recorded with the MRFM (see the 85% reduction discussed above). This difference may simply reflect differences in the recording instrumentation, but it might reflect intrasubject variability of oculomotor parameters. The important point here is that the braking saccades, made in our experiment in which the rapid smooth eye movements reflected variability of the effectiveness of VOR or smooth pursuit, rather than an intrinsic oculomotor instability associated with CN, were quite effective. The dynamical characteristics of our braking saccades also closely resembled those associated with CN. This can be seen in figures 12.2-12.5, which reproduce representative position, velocity, and acceleration waveforms of the braking saccades made by Dell'Osso in the MRFM as well as those of our three subjects doing tasks as they rode on the jiggling bus. There is, however, one large difference that should not be overlooked. Namely, the average amplitudes and peak velocities of our subjects braking saccades were much larger than DeU'Osso's: ours were almost 3° and his were on a bit more than half a degree. Their peak saccadic velocities were also quite different, as expected. Their peak velocities were roughly proportional to their amplitude (see tables 12.4 and 12.5). Another important difference that should not be overlooked is the differences in the postbraking saccade smooth eye movement velocities. Dell'Osso's average was only 2.1 deg/s. Our subjects' average was more than four times as high. This could mean that the braking saccades made by our subjects on the jiggling bus could not help them with their visual processing very much because the retinal image velocities following their braking saccades were too high. They probably did do some good, however, because all three subjects did the tasks quite well (check out the completed coloring
Handling Real Forms in Real Life
202
TASK
N BS
Prop. TIME SAC N (sec) RATE SAC BS (SAC/s) 1.2 .207 0.14 180 1.1 60 65 0.17 1.6 282 0.15 180
BS
RATE (BS/s) 0.17 0.18 0.24
MEAN %E
SD %B
54.3 54.6 51.6
27.63 22.98 28.69
BOB /COLOR JULIE /COLOR YURA/ COLOR
30 11 43
BOB / READ-HAND JULIE / READ-HAND YURA / READ-HAND
22
24 24
126 116 157
0.17 0.21 0.15
60 60 60
2.1 1.9 2.6
0.37 0.40 0.40
43.5 48.9 48.6
25.31 25.63 27.92
BOB / READ-TABLE JULIE / READ-TABLE YURA / READ-TABLE
21 27 25
159 144 138
0.13 0.19 0.18
60 60 60
2.7 2.4 2.3
0.35 0.45 0.42
57.4 57.9 51.8
27.85 24.68 26.79
BOB /EMBROIDER JULIE / EMBROIDER YURA / EMBROIDER
49 8 43
256 117 342
0.19 0.07 0.13
240 60 240
1.1 2.0 1.4
0.20 0.13 0.18
52.1 46.3 49.4
26.39 27.20 29.10
BOB / LIP-HAND JULIE /LIP-HAND
7 8
68 68
0.10 0.12
60 60
1.1 1.1
0.12 0.13
54.1 63.7
28.05 30.20
BOB /LIP-TABLE JULIE /LIP-TABLE
11 13
68 193
0.16 0.07
60 60
1.1 3.2
0.18 0.22
42.1 34.4
25.34 23.38
BOB /SHAVE-HAND YURA / SHAVE-HAND
4 10
80 110
0.05 0.09
60 60
1.3 1.8
0.07 0.17
15.1 53.5
12.44 29.93
BOB /SHAVE-TABLE YURA / SHAVETABLE SUMMARY
11
69 84
0.16 0.07
60 60
1.2 1.4
0.18 0.10
38.5 61.0
36.22 26.59
6
N BS
N SAC
397
2849
SAC
RATE (SAC/s) 1.58
% BS
14
BS
RATE (BS/s) 0.22
MEAN
%B
(SD) 50.5 (27.26)
Table 12.4: The number of braking saccades (N BS), the total number of saccades (N SAC), and the saccade rate (SAC RATE (SAC/s)) are summarized, as well as the proportion of braking saccades (PROP. BS) and the braking saccade rate (BS RATE (BS/s)). The average % braking (Mean %B), i.e., the effectiveness of braking saccades in slowing the ongoing smooth eye movement, and its standard deviation (SD %B) are also summarized. Data for the three subjects (BOB, JULIE, YURA) is shown separately for each of the six tasks performed on the jiggling bus. The number of seconds of recording (TIME) made in each condition is also reported. The SUMMARY, on the bottom, shows averages over subjects and TASKs.
203
R. M. Steinman, W. Menezes, and A. N. Herst BS PEAK V deg/s (SD) 163.1(89.8) 138.7 (109.7) 164.3 (86.6) 154.0 (97.0)
Pre-V deg/s
JULIE BOB YURA MEAN
BSAMPminarc (SD) 164(187.2) 172 (283.2) 158 (139.3) 165 (216.2)
(SD) 18.9 (14.81) 21.6 (13.8) 13.5 (7.62) 17.9 (12.14)
Post-V deg/s (SD) 9.9(9.71) 11.2(10.86) 7.3 (6.79) 9.4 (9.23)
LOU's CN
35 (7.3)
44.0 (6.76)
14.0 (2.91)
2.1 (1.78)
SUBJECTS
Table 12.5: Braking saccade (BS) characteristics of the three subjects (JULIE, BOB, and YURA) averaged over the six tasks performed on the jiggling bus. The mean (SD) amplitude (BS AMP) are in min arc; BS peak velocity (BS peak V), PRE-BS smooth eye velocity (PRE-V), and post-BS Smooth Eye Velocity (Post-V) are in deg/s. Lou Dell'Osso's average braking saccade characteristics, measured with the MRFM, during maintained fixation with his head stabilized are also shown.
Figure 12.2: A representative braking saccade made by Dell'Osso's right eye while maintaining fixation. The traces (from top to bottom) are position, velocity, and acceleration profiles.
projects in the video). Other, less important, features of our PowerPoint presentation also find support in the tables. For example, reading on the jiggling bus was hard. Saccade rates during reading averaged only 2.33/s. These subjects' reading rates of texts, in languages they understand, are above 3 words/s when they are not jostled. It is also worth noting that saccade rates during all the other tasks were relatively low; the overall average was 1.47/s and several were only 1.1/s. The only surprise here was Julie's rate when she was putting on lipstick with the mirror required to do this properly mounted on the table. Her rate was 3.2 /s, a rate close to the saccade rates she has when she reads while sitting still. Apparently, getting lipstick on right required
204
Handling Real Forms in Real Life
Figure 12.3: A representative braking saccade made by Julie's right eye while reading on the jiggling bus. The traces (from top to bottom) are position, velocity, and acceleration profiles.
Figure 12.4: A representative braking saccade made by Bob's right eye while coloring on the jiggling bus. The traces (from top to bottom) are position, velocity, and acceleration profiles. a lot of looking around to check out progress. Bob, who also put on lipstick, made saccades much less frequently, only about one per second. But, note that he did not get it right with either the hand-held or mounted mirror.3 3
Yura was asked to put on lipstick, but refused. This led us to substitute the shaving task.
R. M. Steinman, W. Menezes, and A. N. Herst
205
Figure 12.5: A representative braking saccade made by Yura's right eye while reading on the jiggling bus. The traces (from top to bottom) are position, velocity, and acceleration profiles. It seems that braking saccades, rather similar to those observed in the CN eye movement pattern, are made by subjects who do not have CN when they try to do visuomotor tasks with different degrees of difficulty under the stresses imposed by jostling them as they work. These braking saccades do not seem to be as effective in reducing the velocity of the smooth eye movements resulting from the stresses imposed, but they do slow these smooth eye movements down a lot, on average about 50%. One cannot but wonder how someone with CN would fare if placed under such stress. Would a lifetime's experience in coping with unwanted, intrinsic oculomotor perturbations lead to more effective braking when the perturbations came from an extrinsic, rather than intrinsic, source? We cannot answer this intriguing question at this time because Prof. DelTOsso has not had an opportunity to perform these tasks on our jiggling bus. We can, however, examine a related question, namely, to what extent the CN pattern he shows when he maintains fixation with his head supported on a biting board shows up when his head is free and he performs a useful task under relatively natural conditions. We do have data of this kind. Dell'Osso ran our TAP protocol after the TAP-LOOK experiments (Epelboim et al., 1995) were published, so we can say something about this. He did the tapping task extremely well, completely tapping as accurately and faster than most of our other subjects. You can watch him do this by navigating to the visualization on the digital media, and selecting the file "LOU.TAP" in the LOOKJTAP folder. He completed tapping the six-target array in only 4 s. The visualization should be run very slowly, so set the "FRAMES SKIPPED" low enough for you to watch what he did at 10% or less of actual time. Selecting the SKIP 1 button will not be too slow because the trial is so short. Look closely for what have been called "foveation periods," i.e., intervals when gaze dwells on-target, the small black dot at the center of each colored disk. Once you
206
Handling Real Forms in Real Life
Figure 12.6: Position/time plots of Dell'Osso's right eye's gaze (its line of sight in space) as he performed the tapping task that can be visualized (LOU JAP.ani in the LOOK_TAP folder). The horizontal trace near the middle of the record shows gaze when he maintained fixation with his head stabilized on a biting board. The fixation trace, which shows details of his CN eye movement pattern, can be seen more clearly in the records reproduced in the PowerPoint. Here it is plotted on a scale that allows one to see the way he actually shifts gaze when it is used to accomplish something useful, namely, tap a sequence of six targets. The reader should look for signs of his nystagmus in the tapping record to get some idea about its potential importance once his head is free to move naturally as he actually does something other than fixate. Upward displacements of the traces signify eye rotations to the right. have done this a number of times and are satisfied that you know what is going on, come back and read on. Navigate to the VISUALIZATION and run LOU.TARANI now You should have noticed that LOU's cyclopean gaze, the blue line, was the key to understanding how he used gaze to guide tapping. You would have noticed cyclopean gaze dwelling on-target for a while just before it shifted to the next target in the array. There are also suggestions of an oscillation of fixation when gaze was on-target. You can also get an idea about how Dell'Osso's CN pattern shows up in his eye movement pattern when he actually tapped his way through this target array in figures 12.6 and 12.7. Also, note that the braking and intrusive saccade waveforms seen in the bottom record of figure 12.7 are very similar to the waveforms you saw in Dell'Osso's CN pattern shown in the PowerPoint presentation. Figure 12.6 plots, on the same scales, LOU's gaze during the task just visualized along with a sample of his CN fixation pattern. Note that he shifted gaze through almost 100° to tap the six-target array, and he did this expeditiously. There were not many long periods of maintained fixation. There are clear suggestions of his CN pattern
R. M. Steinman, W. Menezes, and A. N. Herst
207
Figure 12.7: Zoomed-in, selected sections of the position/time plots of Dell'Osso's gaze shown in figure 12.6. The sections were chosen to examine details of the right eye's gaze waveforms during two intervals (1 to 1.3 and 2 to 2.6 s) when gaze was relatively stable; i.e., he was "maintaining fixation" near one of the targets. See the text for the significance of these plots. in the interval between 2 and 2.6 s, but not between seconds 1 and 1.3, where fixation was also maintained. A closer examination of these fixations is possible in figure 12.7. The total oscillation between 1 and 1.3 s (on top) had a p-p amplitude of only 81 min arc (less than the diameter of the foveal floor), and there is no sign of either a braking or intrusive saccade. The fixation between 2 and 2.6 s (on the bottom) is more interesting. It shows one braking and one intrusive saccade (the first two, respectively). The third saccade in this episode was of the braking variety but the foveation period was too short (11 ms) for it to do much good. The amplitude of the braking saccade was 103 min arc, its peak velocity was 99.6 deg/s, the presaccadic smooth eye velocity was 44.2 deg/s, and the postsaccadic velocity was lower, only 10.7 deg/s, so the effectiveness (%B) of this braking saccade was 76%. These values for a braking saccade are much like those observed for Dell'Osso's braking saccades during fixation. Note, however, that the CN fixation pattern itself is rarely observed when Dell'Osso performs a natural task in a naturalistic environment in which he is free to move his head and torso. It should not come as a surprise, then, that he performs real tasks so well once his head is free to move. CN seems to be primarily a phenomenon observed when the head is immobilized than when the head is free and gaze is used to accomplish useful tasks. What we already knew A number of unexpected observations were made as soon as we began to perform tasks in which real objects were manipulated under relatively natural conditions. Many of these observations came as surprises: they were quite different from what one could expect on the basis of the oculomotor literature, a voluminous literature derived almost entirely from studies done under much less natural conditions. Some of the surprises were as follows:
208
Handling Real Forms in Real Life
1. Cyclopean gaze, on average, is more accurate than either eye's. 2. Subjects show no signs of a "dominant eye." 3. Microsaccades are rarely made, at best "once in a blue moon." 4. Vergence tends to be set 25%-45% beyond the attended plane. 5. Subjects fixate no more accurately than a given task demands. 6. Parameters way down in the "oculomotor plant" adjust to high-level task demands. 7. The head is most likely to begin to move before, or at the same time, as the eye. 8. Humans fixate accurately under natural conditions in order to see fine details, not because they can see them. These conclusions are based on a series of publications in which binocular eye and head movements were recorded under relatively natural conditions with the MRFM (Epelboim et al., 1995,1997; Epelboim, 1998; Malinov et al., 2000; Herst et al., 2001; Steinmanetal., 2003). Two new observations: We can add two new observations to this list. (1) Braking saccades, similar to those observed in CN, are observed when subjects are jiggled about while they perform close work. They are less effective in slowing down the fast smooth eye movements caused by this extrinsic kind of stimulation than the braking saccades in the CN pattern which arises from intrinsic factors, but they do slow down the disturbing smooth movements a lot, on average by 50%. This result, after the fact, is not surprising. We suspect that the braking saccades we observed in three "normal" subjects are ubiquitous. They probably have been seen often, and possibly even reported before. If they have not been observed and reported, it is probably simply because there have been very few accurate eye movement recordings made under similar conditions, i.e., while jiggling subjects about. We think that our "braking saccades" are analogous to what some like to call "catch-up" saccades. "Catch-up" saccades are seen when subjects fall behind the target during smooth pursuit. These saccades, in our view, are nothing special either. They appear whenever a subject decides to pay attention to a target's position, rather than its velocity, during smooth pursuit. There are also "get ahead" saccades, a category that is unlike "catch-up" saccades in that "get ahead" saccades are not often labeled in oculomotor research. "Get ahead" saccades are made whenever a subject is asked to pursue a smoothly moving target with gain above one, i.e., to go faster than the target (see Puckett and Steinman (1969) and Steinman et al. (1969), for papers on the voluntary control of saccades during smooth pursuit and on the voluntary control of smooth pursuit velocity, including "get ahead" saccades). In short, saccades simply do things like this whenever smooth eye movements "get out of hand." Whether the braking saccades observed in the CN pattern are another example of saccades made when smooth eye movements get out of hand is another matter. We think they are, but have no proof, and leave it to the reader to decide. (2) Subjects manipulating objects bring them to slightly different, but appropriate, working distances, where they are kept throughout the task, they prefer changing where
R. M. Steinman, W. Menezes, and A. N. Herst
209
an object is held to changing the angle between the eyes. Furthermore, subjects manipulating objects within arm's reach adjust vergence to be well beyond their working distance (< 1/2 m) whenever they search for needed tools or objects. This propensity to diverge in order to search may have important consequences for studying cognitive processing. It might provide us with a new, objective way of finding out whether a subject understands the text being read. Remember that Yura, asked to read an English text he could not understand, acted as though he was searching for its meaning far beyond the plane on which the text was held. He did not do this when the critical cognitive element, the outline of a smiling dog he had been asked to color, was presented in the same plane. We have not yet asked Yura to read a Russian text he can understand on the jiggling bus, nor have we tried to read a Russian text we cannot, so, at this time, we can only suggest that we may have discovered a new way of finding out whether someone can "read for meaning," namely, by recording binocular eye movements while the reader is forced to superimpose the reading eye movement pattern on a lot of violent, compensatory oculomotor activity. Another finding stands out in our work on handling real objects in the real world. Diplopia was not noticed by any subject during any of the manipulations despite the high degree of noncorrespondence between what should have been corresponding retinal points. This result, unlike (1) above, is intriguing. It makes Panum's areas in the real world huge, far larger than heretofore suggested (see Logvinenko, Epelboim, and Steinman (2001), for the largest claim known to us). This fact, coupled with the importance of cyclopean-gaze control observed in all of our binocular recordings, calls attention to the crying need to solve both correspondence problems (see above). It also encourages us to conclude by emphasizing that if you want to find out how gaze is used to guide manipulations, and if you do not record from both eyes, with the head free, and if you cannot observe cyclopean-gaze in action, you are not likely to succeed.
Acknowledgments This chapter is dedicated to the memory of Dr. Julie Epelboim (1964-2001), whose creativity and technical sophistication made it possible for us to study eye movements under natural conditions (see Herst et al., 2001, pp. 3318-3319). This research was supported by grants from the Chemistry and Life Sciences Directorate of the U.S. Air Force Office for Scientific Research.
References Booth, J. R., Epelboim, J. and Steinman, R. M. (1995). The relative importance of spaces and meaning in reading. Proc. Cog. Sci. Soc., 17: 533-538. Collewijn, H. (1977). Eye and head movements in freely-moving rabbits. /. Physiol. (Lond.), 266: 471-498. Coilewijn, H., Erkelens, C. J., Pizlo, Z. and Steinman, R. M. (1994). Binocular gaze movements: Coordination of vergence and version. In J. Ygge and G. Lenner-
210
Handling Real Forms in Real Life strand (Eds.), Eye Movements in Reading, pp. 97-115, Wenner-Gren International Science Series: Vol. 64, Pergamon: Oxford.
Collewijn, H., Steinman, R. M., Erkelens, C. J. and Regan, D. (1991). Binocular fusion, stereopsis and stereoacuity with a moving head. In D. Regan (Ed.), Vision and Visual Dysfunction: Binocular Vision, pp. 121-136. MacMillan: London. Collewijn, H., Steinman, R. M., Erkelens, C. J., Kowler, E. and Van der Steen, J. (1992). Binocular gaze control under free-head conditions. In H. Shimazu and Y. Shinoda (Eds.), Vestibular and Brain Stem Control of Eye, Head and Body Movements, pp. 203-220. Japan Scientific Societies Press, Tokyo. Collewijn, H., Steinman, R. M., Erkelens, C. J., Pizlo, Z. and Van der Steen, J. (1992). The effect of freeing the head on eye movement characteristics during 3-D shifts of gaze and tracking. In A. Berthoz, P. P. Vidal, and W. Graf (Eds.), The HeadNeck Sensory Motor System, pp. 412^4-18. Oxford University Press: London. Collewijn, H., van der Mark, F. and Jansen, T. C. (1975). Precise recording of human eye movements. Vis. Res., 15: 447^-50. Dell' Osso, L. R, Van der Steen, J., Steinman, R. M. and Collewijn, H. (1992a). Foveation dynamics in congenital nystagmus: I. Fixation. Documenta Ophthalmologia, 79: 1—24. Dell' Osso, L. F., Van der Steen, J., Steinman, R. M. and Collewijn, H. (1992b). Foveation dynamics in congenital nystagmus: II. Smooth Pursuit. Documenta Ophthalmologia, 79: 25-50. Dell' Osso, L. P., Van der Steen, J., Steinman, R. M. and Collewijn, H. (1992c). Foveation dynamics in congenital nystagmus: III. VOR. Documenta Ophthalmologia, 79: 51-70. Edwards, M., Pizlo, Z., Erkelens, C. J., Collewijn, H., Epelboim, J., Kowler, E., Stepanov, M. R. and Steinman, R. M. (1994; 1998). The Maryland revolvingfield monitor theory of the instrument and processing its data. Revised August 1,1998. Technical Report CAR-TR-711, Center for Automation Research, University of Maryland: College Park. Epelboim, J., Booth, J. and Steinman, R. M. (1994). Reading unspaced text: implications for theories of reading eye movements. Vis. Res., 34: 1735-1766. Epelboim, J., Booth, J. and Steinman, R. M. (1996). Much ado about nothing: The place of space in text. Vis. Res., 36: 461-^70. Epelboim, J., Booth, J. R., Ashkenazy, R., Taleghani, A. and Steinman, R. M. (1997). Fillers and spaces in text: The importance of word recognition during reading. Vis. Res., 37: 2899-2914. Epelboim, J., Steinman, R. M., Kowler, E., Edwards, M., Pizlo, Z., Erkelens, C. J. and Collewijn, H. (1995). The function of visual search and memory in sequential looking tasks. Vis. Res., 35: 3401-3422. Epelboim, J., Steinman, R. M., Kowler, E., Edwards, M., Pizlo, Z., Erkelens, C. J. and Collewijn, H. (1997). Gaze-shift dynamics in two kinds of sequential looking tasks. Vis. Res., 37: 2597-2607.
J?. M. Steinman, W. Menezes, and A. N. Herst
211
Epelboim, J. (1998). Gaze and retinal-image-stability in two kinds of sequential looking tasks. Vis. Res., 38: 3773-3784. Faugeras, O. (1993). Three-Dimemional Computer Vision. MIT Press: Boston. Herst, A. N., Epelboim, J. and Steinman, R. M. (2001). Temporal coordination of the human head and eye during a natural sequential tapping task. Vis. Res., 41: 3307-3319. Jacobs, J., B., Dell'Osso, L. F. and Erchul, D. M. (1999). Generation of braking saccades in congenital nnystagmus. Neuro-ophthalmol., 21: 83-95. Jones, A. (1966). The Jerusalem Bible. Doubleday: New York. Genesis 2: 5-10. Kolers, P. A. (1976). Buswell's Discoveries. In R. Monty and J. Senders (Eds.), Eye Movements and Psychological Processes, pp. 373-396. Lawrence Erlbaum Associates: Hillsdale, NJ. Kowler, E., Pizlo, Z., Zhur, G. J., Erkelens, C. Steinman, R. M. and Collewijn, H. (1992). Coordination of head and eyes during the performance of natural (and unnatural) visual tasks. In A. Berthoz, P. P. Vidal, and W. Graf (Eds.), The HeadNeck Sensory Motor System, pp. 419-426. Oxford University Press: London. Legge G. E., Hooven, T. A., Klitz, T. S., Mansfield, J. S. and Tjan, B. S. (2002). Mr. Chips 2002: New insights from an ideal-observer model of reading. Vis. Res., 42: 2219-2234. Logvinenko, A. D., Epelboim, J. and Steinman, R. M. (2001). The role of vergence in the perception of distance: A fair test of the Bishop Berkeley's claim. Spatial Vis., 15: 77-97. Malinov, I. V., Epelboim, J., Herst, A. H. and Steinman, R. M. (2000). Characteristics of saccades and vergence in two kinds of looking tasks. Rapid communication. Vis. Res., 40: 2083-2090. Marquis (1972). Who's Who in America. 37th ed., Vol. 1, pp. 992. Marquis: Chicago. Marr, D. and Poggio, T. (1979). A computational theory of human stereo vision. Proc. Roy. Soc. Lond. B, 204: 301-328. Martins, A. J., Kowler, E. and Palmer, C. (1985). Smooth pursuit of small amplitude sinusoidal motion. J. Opt. Soc. Am. A, 2: 234-242. Miles,F. A. andEighmy,B. B. (1980). Long-term adaptive changes in primate vestibuloocular reflex. I. Behavioral observations. /. Neurophysiol., 43: 1406-1425. O'Regan, J. K. (1990). Eye movements and reading. In E. Kowler (Ed.), Eye Movements and Their Role in Visual and Cognitive Processes, pp. 395-447. Elsevier Science: Amsterdam. Puckett, J. D. and Steinman, R. M. (1969). Tracking eye movements with and without saccadic correction. Vis. Res., 9: 695-703. Schiffman, H. R. (1996). Sensation and Perception: An Integrated Approach. Fourth Ed. John Wiley and Sons: Hoboken, NJ. Sekuler, R. and Blake, R. (2002). Perception. Fourth Ed. McGraw-Hill: New York.
212
Handling Real Forms in Real Life
Shapiro, L. S. and Brady, J. (1992). Feature-based correspondence: An eigenvector approach. Image and Vision Computing, 10: 283-288. Steinman, R. M. (1975). Oculomotor effects on vision. In P. Bach y Rita and G. Lennerstrand (Eds.), Ocular Motility and Its Clinical Implications, pp. 395-415. Wenner-Gren Symposium Series. Pergamon: Oxford. Steinman, R. M. (1976). The role of eye movements in maintaining a phenomenally clear and stable visual world. In R. A. Monty and J. W. Senders (Eds.), Eye Movements and Psychological Processes, pp. 121-154. Lawrence Erlbaum, Hillsdale: NJ. Steinman, R. M. (1986a). Eye movement. Vis. Res., 26: 1389-1400. Steinman, R. M. (1986b). The need for an eclectic, rather than systems, approach to the study of the primate oculomotor system. Vis. Res., 26: 101-112. Steinman, R. M. (1995). Moveo ergo video: Natural retinal image motion and its effect on vision. In M. S. Landy, L. T. Maloney, and M. Pavel (Eds.), Exploratory Vision: The Active Eye, pp. 3-50. Springer-Verlag: NY. Steinman, R. M. (2003). Gaze control under natural conditions. In L. M. Chalupa and J. S. Werner (Eds.), The Visual Neurosciences, pp. 1339-1356. MIT Press: Cambridge, MA. Steinman, R. M. and Collewijn, H. (1980). Binocular retinal image motion during active head rotation. Vis. Res., 20: 415-429. Steinman, R. M., Kowler, E. and Collewijn, H. (1990). New directions for oculomotor research. Vis. Res., 30: 1845-1864. Steinman, R. M. and Levinson, J. Z. (1990). The role of eye movement in the detection of contrast and detail. In E. Kowler (Ed.), Eye Movements and Their Role in Visual and Cognitive Processes, pp. 115-212. Elsevier Science: Amsterdam. Steinman, R. M., Pizlo, Z., Forofonova, T. I. and Epelboim, J. (2003). One fixates accurately in order to see clearly not because one sees clearly. Spatial Vis., 16: 225-241. Steinman, R. M., Skavenski, A. A. and Sansbury, R. V. (1969). Voluntary control of smooth pursuit velocity. Vis. Res., 9: 1167-1171. Ullman, S. (1979). The Interpretation of Visual Motion. MIT Press: Cambridge, MA. Wilson, V. J. and Melvill Jones, G. (1979). Mammalian Vestibular Physiology. Plenum Press: NY.
Part IV
Neural Basis of Form Vision
This page intentionally left blank
13. The Processing of Spatial Form by the Human Brain Studied by Recording the Brain's Electrical and Magnetic Responses to Visual Stimuli David Regan and Marian P. Regan 13.1
Introduction
Many kinds of visual stimuli evoke electrical and magnetic brain responses that can be recorded from the human scalp, are specific to the evoking stimulus, and allow the neural processing of shape, motion, depth, and so on, to be studied. Scalp evoked potentials (EPs) are caused by volume currents in the brain whose sources are membrane currents through the dendrites and cell bodies of large groups of cells aligned in parallel. Axon potentials (spikes) make little contribution. An electrical current generates a perpendicular magnetic field, and these evoked magnetic responses can be recorded using superconducting coils placed close to the scalp. In this chapter we describe endeavors to use electrical and magnetic brain responses as tools in attempts to discover the ways in which the brain first processes incoming visual information and then forms internal representations of features of the external world. In particular, we will discuss the processing of spatial form defined by luminance contrast, color contrast, texture contrast, and disparity (depth) contrast and human brain response correlates of the magno/parvo stream distinction.
13.2
Human Brain Electrophysiology: The Early Days
The one-year MSc (physics) program crafted for me in 1957 by Professor W. D. Wright differed from the current North American version. There was no coursework at all, as 215
216
Processing Spatial Form
was usual for MSc and PhD programs at that time. And I had definitely no wish to engage in any specialized original research. But as practical experience on which to base a review of color discrimination Wright suggested that I should repeat the classical color matching and wavelength discrimination experiments on which the international system for color measurement was based (Wright 1928, 1929). I also acted as a subject for K. J. McCree. Retinal image stabilizers were not available, but he and I could, by voluntary fixation, stabilize a foveally viewed image sufficiently well that the differently colored halves of a 1.5° x 2° bipartite field merged into a single color even when one half was red and the other blue (McCree, 1960a, b). Following this early demonstration that temporal changes are essential for color vision, I puzzled for years to understand why this should be so. Was it that temporal changes in the wavelength of light incident on a local retinal area are essential for color vision? Or was the crucial point that, in everyday vision, eye movements cause the sharp color-defined boundary between the two halves of the field to move to and fro on the retina? Several years later (in 1965) this long-term chewing-over of the puzzle led me to explore the first hypothesis by designing a device for simultaneously modulating the wavelength and the intensity of light which, when coupled with Christopher Tyler's brilliant invention of the titration method in 1967, produced three studies on the dynamics of color vision (Regan and Tyler, 1971a-c). The second hypothesis led to experiments as late as the 1980s in which I used a double-Purkinje eye tracker to stabilize a 1.5 ° x 2° bipartite field on the retina (thus degrading color discrimination) and then either oscillated the color-defined boundary or flickered the body of the colored areas so as to restore color discrimination in a controlled manner (Regan, 2000, pp. 231-233). But to return to 1957, while writing my review I was struck by the following: (a) Pierons (1932,1952) observation that both the maximum perceived saturation of a colored light and its maximum perceived luminance were not attained immediately after the light was switched on, and that the delay was shortest for red and longest for blue with green intermediate, and (b) the report (later discredited) that in cat the rate of rise of spike frequency following visual stimulation depended on color, maximum spike frequency being reached at 0.3,0.17, and 0.24 s after stimulation for red, green, and blue, respectively (Granit, 1955). Combining these two findings suggested that three color "labels" might be carried up a single nerve fiber. This continued to intrigue me in 1961, by which time I was teaching BSc physics at London University. What followed illustrates that a somewhat implausible idea can sometimes lead to unexpected but useful results. In 1961 it occurred to me that if brain responses evoked by a flickering light could be recorded from the scalp it might be possible to test the hypothesis that intrigued me. I was already aware that repetitive stimulation with very bright flashes (a xenon strobe) can produce what was called "photic driving" of the EEG, and that this was supposed to be entrainment of the spontaneous alpha rhythm of the brain by the flashing light. With a speculative leap in the dark I assumed that a quite different kind of brain response existed, and I called this a steady-state brain response.' This hypothetical lr The phrase "steady-state" was suggested by the steady-state response that follows the initial transient response of an electrical circuit driven by an input sinusoid a topic covered in a BSc lecture course I taught. In coining the term steady-state evoked potential (Regan, 1964, 1966b) I defined the idealized response as a repetitive EP whose constituent discrete frequency components remain constant in amplitude and phase over an indefinitely long time period. Although this definition does imply that the ideal steady-state EP is an
David Regan and Marian P. Regan
217
brain response would be phase-locked to a sinusoidally flickering light viewed by an individual and, in distinction to the photic driving response, would have properties determined by the characteristics of the visual stimulus. I had no evidence to support this assumption - a style of hypothesis generation with "both feet off the ground" that I later avoided and advised against (e.g., see Regan, 2000, pp. 501-505). But that was how I got into research. To test for the existence of a steady-state response I proposed to extract it from the spontaneous activity of the brain by subjecting an individual's EEG to Fourier series analysis, that is, by multiplying the EEG by sine/cosine pairs locked to sinusoidal flicker viewed by that individual. In particular, I supposed that a F Hz flicker would generate an F Hz "synchronous response" and that (1) color-dependent difference in the absolute phase lag of the synchronous response would indicate colordependent differences in the rate of change of spike frequency, and (2) color-dependent differences in the slope of the phase versus flicker frequency plot would indicate colordependent differences in the transport time of the neural signals. (To anticipate, it later became evident that supposition 1 was a nonstarter and as for 2, the phase vs. frequency plot was not an unequivocal measure of signal transport time — I termed it apparent latency; see Appendix 13.A2.) Had my knowledge of physiology and experimental research been other than negligible I would have realized that my idea not only was theoretically speculative and implausible (and based on dubious data) but also was technically overambitious to a laughable degree (see "Can a researcher be disadvantaged by having an encyclopedic and up-to-date knowledge of his or her research area", p. 505, Regan (2000)).2 In 1961 Professor Wright advised me to see Professor George Dawson of the London Postgraduate Medical School who, I was intrigued to find, had made the world's first automatic averager (from bits of ex-military equipment), and (figures 13.1 and 13.2) recorded the first visual evoked potentials (EPs) from metal electrodes attached to the human scalp, i.e., brain responses evoked by visual stimuli (Cobb and Dawson, 1960). These were transient EPs produced by a brief flash and presented as plots of voltage versus time (figure 13.3). Rather that treating my uninformed and ingenuous ideas with the brush-off they so richly deserved, this fine scientist and gentleman gave me the circuit diagram of his low-noise microvolt-level amplifier, some EEG electrodes, electrode paste and glue, a blunted hypodermic syringe, instructions on how to fix the electrodes to the scalp, and good wishes. However, the weekly 10 hours of lectures and 8 hours of laboratory demonstratinfinitely long train of identical waveforms, it is more helpful to think of the steady-state EP in terms of its constituent frequency components rather than in terms of a complex waveform endlessly repeated, because different frequency components can have different properties (Regan, 1970a, b). It is sometimes said that the distinction between transient and steady-state EPs is merely a matter of stimulus repetition frequency. This is incorrect: in principle, steady-state EPs can be evoked by low-frequency sinusoidal stimulation. The real distinction is that the system attains a dynamic steady state throughout the duration of the recording period without ever returning to its resting state, while for a transient EP the system is in its resting state before any given brief stimulus. 2 Had I not been insulated in a department whose faculty had no interest in vision or brain research I might well have encountered one of the highly intelligent academics who can rubbish any half-conceived idea by logical argument, and my ignorant enthusiasm would have been destroyed. (Forty years on I notice that a not inconsiderable proportion of such highly intelligent destroyers have proved to be rather sterile in their own research.) By the time I encountered such people I had some small track record in original research, but was still shaken. Even now such an encounter can leave me looking for reassurance.
218
Processing Spatial Form
Figure 13.1: Dawson's automatic averaging machine. General view of the storage unit. A and B are the two distributors. Cgl-62, storage capacitors; KI and K2, timing contacts initiating the stimuli, the changing of the store, and the starting of display sweeps; M, driving motor. Reprinted from Dawson, G. D. (1954) A summation technique for the detection of small evoked potentials. Electroencephalogmphy and Clinical Neurophysiol., 6: 65-84. Copyright 1954 with permission from Elsevier. ing assigned by my Head of Department did not spell encouragement for my aberrant research interest, and no lab space was available for that sort of thing ("it's really not physics") - though lab space would be available if I decided to research proper physics. But when, after months of persistence, an influential member of the department3 supported my proposal to install a prefabricated garden shed in the carpark, this to act as my lab, the higher administration became worried about fire hazards. My Head of Department relented and offered me the use of a small gent's toilet for one year (later extended to 18 months).4 My first job was to remove the porcelain fixtures. I had obtained a magslip sine/cosine alternator (previously used in a WW2 Lancaster bomber) that could be driven by gears from a rotating polaroid which in turn would modulate a light sinusoidally. Unfortunately, commercial analogue multipliers were not then available and suitable computers were decades into the future. But more good fortune! Following a phone call by Professor Wright I was ushered into the office of Professor Dennis Gabor by his secretary, who whispered into my ear "Gaahbor, 3 Dr. John Barton, a distinguished cosmic ray (ultra high energy) physicist and dedicated scientist who, I assume, decided that my obsessional enthusiasm to do research on the aberrant topic of my choice should be encouraged, even though he must have wondered whether it had any scientific merit. 4 For repeatedly driving an ambulance through falling bombs to collect casualties during the London Blitz she had been awarded the George Cross for gallantry - the civilian equivalent of the Victoria Cross (Congressional Medal in the USA). This left me awestruck. But not only was she a woman physics Ph.D., a rarity in the 1950s and early 1960s, but also a Head of Department. It was clear the she was nervous about her scientific credibility when all other heads of physics departments at London University were male, some with the Nobel Prize and others similarly exhalted, most of whom had gained fame by performing exotic scientific research and directed large projects during the war rather than driving an ambulance as the bombs fell. No doubt she realized that if my projected research had turned out to be nonsense as well as "not proper physics," hindsight would have had 20/20 vision, and the reputation of the department tarnished.
David Regan and Marian P. Regan
219
Figure 13.2: The averaged transient potential evoked by flash. Dawson's electromechanical averager was used to record this, the first published averaged VEP. The upper trace is the average of the occipital response to 110 bright flashes. The lower trace depicts retinal potentials recorded by Dawson's superimposition technique, demonstrating that there is no evidence of electrical spread from the retina to the VEP. The time scale marks intervals of 1, 5, and 20 ms. Adapted from Cobb and Dawson (1960) "The latency and form in man of the occipital potentials evoked by bright flashes". J. Physiol. (Lond.), 152: 108-121. Reprinted with permission from The Physiological Society. Gaahbor." (He did not like mispronunciations of his name.) He took from his desk drawer and handed to me two devices cased in epoxy resin that he had made from phonograph pickup cartridges and sundry other items. Soon I had a working analyzer. Building the equipment against the clock was an exhausting, but eventually an exhilarating obsession. Without Marian Regan's support and unwavering belief that I could do it, I could never have persisted.5 And I owe much to the head technician of the large physics workshop, George Dickson, an ex-WW2 sergeant full of hard ex5 At this time with a single-subject specialized BSc in mathematics, Marian was teaching mathematics at the Henrietta Barnet School for Girls, a school for pupils sufficiently gifted to pass the stiff entrance examinations. Marian's PhD and research contributions were more than 25 years into the future.
220
Processing Spatial Form
Figure 13.3: Transient and steady-state analyses. A transient EP is commonly depicted as a plot of voltage versus time and can be regarded as the visual system's response to a sharp hit. A steady-state EP consists of one or more frequency components and is commonly represented by pairs of plots of voltage versus frequency and phase versus frequency, one pair for each frequency component. This response can be regarded as the visual system's response to prolonged shaking. Here only the amplitude versus frequency plot of the fundamental (F Hz) component of the response to an F Hz stimulus is shown. Because the human visual system is highly and essentially nonlinear, the steady-state (frequency-domain) response to a given stimulus cannot in general be fully predicted from the transient (time-domain) response to that same stimulus: steady-state and transient responses can provide complementary information. perience and with limited admiration for academics. When he saw how keen I was, he developed an interest in my project and gave me unstinting support, diverting to my project workshop resources and his most accomplished technician (Bob Davies), a young man who reveled in challenges to his skills with milling machine and lathe to create with artistry pieces of equipment that I own and value 40 years later. The senior technicians took a detailed interest in day-to-day progress, continually urged me to greater efforts, and expected results: their practical and moral support at work and Marian's at home were a wonderful gift and encouragement. Among the memorable moments of this hectic time was, dressed in protective gear and not knowing what to expect, starting a 500 Watt high-pressure xenon arc for the first time with a homemade 40,000 V pulse generator, immediately followed by successful ignition by a homemade 25 A DC supply, with the leads levitating in the magnetic field of the hugh choke, and ozone gas filling the room, immediately rusting exposed steel. Discovering that ozone gas is some eight times more lethal than chlorine, a poison gas used in WW1, I arranged to pipe the gas to a location just under the window of the adjoining office I shared with another faculty member. He did not complain. I had not objected to his practice of filing slugs of radioactive indium held in a small vice on his desk until they were the right shape to serve as floats that would indicate variations in the density of mercury held at high temperatures and vast pressures. Although expenditure so far had been very low, I now needed a Leeds-Northrup
David Regan and Marian P. Regan
221
Figure 13.4: Part of the equipment. The observer (author DR) lies prone, observing the 500 W xenon arc through infrared, ultraviolet, and monochromatic niters in Maxwellian view. The former location of the WC is under the subject's feet. Key: A, arc lamp housing; AL, magslip sine/cosine alternator in mumetal box; D, distribution box between scalp electrodes and preamplifier; L, twisted electrode leads; Mg., white MgO sector for heterochromatic flicker photometry; P, flexible metal pipe to remove poisonous ozone gas and cool the arc lamp; R.P. and S.P., rotating polaroid and stationary polaroid to modulate the light beam sinusoidally; S, 40 kV lamp starter; S.H., shutter; W, 25 amp DC line from power supply. two-pen recording potentiometer costing £800 and no cheap ex-military equivalent could be found. Dr. John Barton solved the problem by finding that projected Department funds at year-end had much more than £800 for items less than £50. After a lunch, the Leeds-Northrup representative decided that he was sufficiently amused with what I was trying to do that he would submit 16 invoices for £50 (for notional subunits of the instrument) to be paid several months ahead, and to supply me with the complete assembled machine immediately. But eventually the equipment was all working safely and reliably and could record moment-to-moment signals well below 0.1 microvolts, even when buried in noise 100 times larger than the signal. Figure 13.4 shows progress to that date. Then the first test. Marian was the subject. No steady-state brain responses! We
222
Processing Spatial Form
Figure 13.5: The first steady-state evoked potentials. This figure shows the fundamental (F Hz) frequency component of the steady-state brain response evoked by F Hz sinusoidal flicker. The two traces are the running averages of the F Hz sine and F Hz cosine multiplier outputs whose instantaneous amplitudes (measured horizontally from the vertical zero axis) are marked 6>i and 6-2, respectively. Time runs upwards, and the onset of the flicker stimulus is arrowed (on). These traces indicate the following: (a) whether the response is a steady-state EP, that is, whether the amplitude and phase is approximately constant over time; (b) if so, the segment of the recording over which this requirement holds. These two questions must be answered before the next stage, namely, that of-integrating 6\ and #2 over the designated section of the recording and computing the mean values #1 and 02- The inset shows how the amplitude (S) and phase (
) of the steady-state frequency component are calculated from Oi and 6%. tried everything. Still failure. Then George Dickson stepped in. He would not allow this project to fail. He would be the subject and "sort it out." Immediately we recorded large clear steady-state EPs. What a moment (figure 13.5)! Now with only months left in my tenure of the gent's toilet, he put the workshop in charge of his senior technician several afternoons a week and acted as subject. As well, friends ran the equipment with me as subject. Within weeks we had a body of data from George and from other subjects. However, my findings did not support the original hypothesis and were puzzling.
David Regan and Marian P. Regan
223
Two years after I had last seen him I telephoned Professor Dawson, who came to watch a demonstration. He seemed intrigued by the fact that an abrupt reduction or increase in the intensity of the light produced an immediate change in the brain response. (This was because I displayed a running average of the brain response by passing the sine and cosine multiplier outputs through lowpass filters, the basis of the subsequent sweep technique; see section 13.9.). He did not seem interested in the failure to confirm or refute my original hypothesis. There was an anxious period while he thought, then he said that "this technique is more quantitative than averaging and is the most sensitive method available," and said it would be useful to study multiple sclerosis (of which I had never heard). He told me that these were the first recorded submicrovolt responses "but nobody will believe you unless you have a PhD" He told me to write it up quickly as a thesis, and he would see what he could do. I wrote it up during two weeks of hard work in the summer of 1964, by which time he had seen Professor Wright and they had contrived to have me registered for a PhD at Imperial College, the registration being dated three years earlier. Within weeks, in November 1964,1 had a PhD (Regan, 1964). But, though pleased, my Head of Department still did not fully approve of research that was not proper physics so, with the guidance of Professor Dawson and his recommendation of me to Professor D. M. MacKay, I applied for an MRC grant, gave up my tenured faculty position, and moved to the Research Department of Communication at Keele University to live off soft money for the next 9 years.
13.3
My Introduction to the Mathematical Analysis of Nonlinear Behavior and to the Joys of Collaborative Research
With some additional data my 1964 thesis material was eventually published as three papers, the first being in Nature (Regan, 1966a). I was unaware that Professor Henk van der Tweel, Head of Medical Physics and Dean of Medicine at the University of Amsterdam, had reviewed that paper until I received an invitation to visit Amsterdam. I wore a formal suit and tie throughout my visit, as did everyone else in the department. Professor van der Tweel immediately made it clear that he expected his then-student Henk Spekreijse and I to be future colleagues. I was shown every research project in his large lab and was mightily impressed. This new world was exciting and strange. We had many discussions. I found that the Amsterdam group had shown that the time-averaged EP to 5 Hz flicker produced a frequency-doubled 10 Hz response, thus demonstrating a strong nonlinearity (Spekreijse and van der Tweel, 1965). Furthermore, this nonlinear behavior could be rendered quasi-linear by superimposing noise-modulated light on the 5 Hz flicker (figure 13.6). Spekreijse was writing a thesis which used this finding to analyze an early sequence of stages in visual processing, and in particular as linear filter, followed by a rectifier-like nonlinearity, followed by linear filter. (This concept was much exploited later by researchers in visual and auditory psychophysics.) By 1966 Spekreijse had a quantitative mathematical model (in the absence of digital computers the term "computational model" had yet to be born) and had identified two parallel pathways (Spekreijse, 1966).
224
Processing Spatial Form
Figure 13.6: The linearizing phenomenon. Averaged transient EPs to 5.6 Hz flicker (A) and to 11.2 Hz flicker (B) both approximated 11.2 Hz sinusoids, but when noisemodulated light was superimposed on the flickering stimulus the frequency-doubled 11.2 Hz response was abolished (C), whereas the linear 11.2 Hz response was relatively unaffected (bf D). Reprinted from Spekreijse, H. and van der Tweel, L. H. (1965). "Linearizing of evoked responses to modulated light by noise". Nature, 205: 913-915 with permission from Nature. Copyright 1965 Macmillan Magazines Limited.
To be welcomed and treated as an honored guest in this awe-inspiring department seemed like a dream. But there was more to come. Shortly after my return to England Spekreijse sent me his thesis in draft with a request to polish the English and a note that the university would invite me to be his external examiner. He added a hesitant suggestion that I might decide not to wear my colorful yellow and red academic robes, the Amsterdam style being sober - robes of severe black with a tiny patch of color to denote the faculty. The examination was conducted in a seventeenth-century church with a large audience, including Heads of Department from all over the Netherlands and Spekreijse in formal black tie. I was expected to open in Latin. Certainly "Rector Magnificus" sounds more impressive than "University President." Professor van der Tweel seemed oddly tense. I was puzzled: Spekreijse's thesis seemed to me brilliant. Indeed, I have not seen one to match it since then. It was many years before I found that a Professor who "promotes" his student (i.e., allows his PhD student to go to examination) is expected to resign his professorship if the student fails. And many of the attending professors seemed to be there not to ask questions on the thesis but on the several "propositions," i.e., brief original statements on topics unconnected with the thesis that a Dutch PhD must include. Henk Spekreijse had a difficult time answering questions from Professor van der Tweel's friends (several of whom were his ex-colleagues in the wartime Resistance) about propositions on the viscosity of tomato ketchup and some arcane point about nuclear reactors. I believe
David Regan and Marian P. Regan
225
Figure 13.7: Brain responses to the appearance and disappearance of spatial form defined by luminance contrast: Transient EP to onset (appearance) and offset (disappearance) of a pattern of fine checks presented to one eye (A) and viewed binocularly (B). Binocular viewing differentially enhances the offset response. Reprinted from Spekreijse, van der Tweel and Regan, D. (1972). "Interocular sustained suppression: correlations with evoked potential amplitude and distribution". Vis. Res., 12: 521-526. Copyright 1972 with permission from Elsevier. they enjoyed making Spekreijse's "promoter" sweat a little about his continued tenure.6 Following the subsequent festivities Henk Spekreijse and I spent some weeks on a joint experiment, and this happy arrangement continued in one or another lab as the years went by and our children were born, grew to adulthood, and presented us with grandchildren. Figure 13.7 resulted from our 1969 collaboration. It shows that the transient EPs to the abrupt appearance and to the abrupt disappearance of a spatial pattern are quite different. The several successive negative and positive peaks index subprocesses, some of which do not directly intrude into conscious perception. We also showed that sustained retinal rivalry suppresses pattern EPs (i.e., slow waves) throughout visual cortex, a phenomenon that has resurfaced after an intervening 30 years (Spekreijse, van der Tweel, and Regan, 1972). 6 Henk van der Tweel (1915-1997) made important contributions to cardiology in the Netherlands by designing and personally constructing the instruments that allowed advanced research on the electrical activity of the heart. He also made important contributions to human brain electrophysiology. During the war he and his wife Liese helped others to safety at severe risk to their own lives. Although I had known and admired him since 1965 it was not until after his death that I learned that he was not born Henk van der Tweel. This was his "nom de guerre," his Resistance name, and the Dutch government had allowed him to use it as his legal name when peace finally came to his country. His great skills as a forger, developed during the war, served him well in peacetime. He became a noted restorer of Rembrandt etchings. The English-language proceedings of a meeting organized by the Netherland Royal Academy to celebrate the life of Henk van der Tweel (entitled "Van Hoofd en Hart: Henk van der Tweel") is available from the Stichting Van Hooft en Hart, P.O. Box 12011, 1100AA, Amsterdam-Zuidoost, The Netherlands.
226
Processing Spatial Form
Figure 13.8: Color-defined form. Equiluminant red and green checks abruptly exchanged places six times per second. From Regan (2000). Human Perception of Objects. Sinauer Press: Sunderland, MA. Used with permission from Sinauer Associates Inc. Using checkerboard-patterned and bar-patterned mirrors kindly provided by Henk Spekreijse, I built a device for recording EPs to spatial form defined by color contrast. (Regan and Sperling, 1971; Regan, 1973a). Figure 13.8 shows how a counterphasemodulated pattern of red checks (R) was superimposed on a counterphase-modulated pattern of green checks (G) so as to create a pattern of equiluminant red and green checks that exchanged positions periodically with zero change of local luminance or total light flux. On either side of the equiluminant point there was luminance contrast, and the checks appeared to dart around in apparent motion. At the equiluminant point, however, this apparent motion ceased, indicating that chromatic contrast had only a weak input to the mechanism underlying apparent motion (Regan, 1973a). For color-normal observers, steady-state EP amplitude showed no dip at equiluminance (Regan, 1973a). In contrast, for a deuteranopic subject (Henk Spekrejise) the pattern EP was abolished at equiluminance (figure 13.9). Evidence that the large brain responses recorded from the color-normal subject was not an artifact of spurious luminance contrast in the retinal image caused by ocular chromatic aberrations was as follows: (1) The deuteranope gave essentially zero response at equiluminance; (2) the color-normal subject continued to give large responses even when an optical technique was used to cancel simultaneously longitudinal chromatic aberration, chromatic difference of magnification, and chromatic variation of deviation of the principal ray (Regan, 1973a). We also recorded a transient EP to the appearance of equiluminant form and a morphologically different transient EP to the disappearance of that form (Regan and Spekreijse, 1974).
David Regan and Marian P. Regan
227
Figure 13.9: Steady-state electrical responses of the human brain produced by stimulating the central fovea with patterns of red-green checks. Subjects viewed a circular region of diameter 3.0° that contained about 140 checks, each of which was 0.18° wide. As illustrated in figure 13.8, chromatic spatial contrast across the boundaries of the checks reversed six times per second. This stimulus produced a repetitive electrical signal in the subject's brain that consisted almost entirely of a 6 Hz frequency component. The continuous and dashed lines, respectively, show how the amplitude of this 6 Hz component depended on the green/red luminance ratio for a color-normal and a deuteranopic subject. (Red luminance was fixed and green luminance varied.) The dotted line indicates the results of replacing the red light with a green light of equal mean luminance. Contrast was 0.4 for the red and the green components of the pattern. The green light was of wavelength 547 nm (10 nm full bandwidth at half power) and the red light was of wavelength 640 nm (56 nm full bandwidth at half power). The overlap of spectral power was essentially zero. Reprinted from Regan, D. (1973) Evoked potentials specific to spatial patterns of luminance and color. Vis. Res., 13: 2381-2402. Copyright 1973 with permission from Elsevier. Calibration curves for the monochromatic light are given in the reference.
228
Processing Spatial Form
Figure 13.10: The stimulus that produced the electrical brain responses shown in figure 13.1 IB, C. A: A pattern of alternate equiluminant red and green checks (about 95 checks in total) that subtended 2.2° was viewed with the central fovea. The luminances of the red and green checks were squarewave-modulated in antiphase. B: The luminance difference across each edge reversed polarity every 270 ms. C: The green light was replaced by a red light of the same luminance so that all checks were red. The luminance difference across each edge reversed polarity every 270 ms. D: The red light in A was replaced by a green light of the same luminance so that all checks were green. The luminance contrast across each edge reversed polarity every 270 ms. From Regan, D. (2000) Human Perception of Objects. Sinauer Press: Sunderland, MA. Used with permission from Sinauer Associates Inc. Although it might seem somewhat counterintuitive, it is not necessarily the case that the color-normal's large brain response at equiluminance was a response to chromatic contrast rather than a response to monochromatic contrast (as distinct from luminance contrast). The basis for this statement will become clear when I have reviewed the results of the next experiment. Figures 13.10 and 13.11 raise questions about the concept of equiluminance in the context of color-defined spatial form. In this experiment, red checks always remained red and green checks always remained green (Regan and Spekreijse, 1974). Figure 13.10 shows how the luminance of each set of checks was squarewave modulated. The uppermost and lowermost of the three checks shown in figure 13.10A were always red, but their luminances abruptly increased and decreased (in synchrony)
David Regan and Marian P. Regan
229
repetitively. The central check was always green. At the instant that the luminance of the red checks increased by Ai cd/m2, the luminance of the green checks decreased by AL cd/m2. Since the time-averaged luminance of red and green checks was equal, this meant that the luminance contrast across each check's edge reversed every 270 ms (figure 13.10B). One might well expect these reversals of luminance contrast to produce a brain response in a normally sighted observer. But figure 13.11C shows that this was not the case. A further surprise: figure 13.1 IB shows that the deuteranope did give a response. This is the mirror image of the findings in the previous experiment (figures 13.8 and 13.9). The possibility that the color-normal subject was insensitive to luminance contrast was rejected by using the stimuli shown in figure 13. IOC, D. Both gave clear responses (e.g., figure 13.1 ID). Why did the reversals of luminance contrast in figure 13.10A, B produce no brain response, while reversals of luminance contrast in an all-red or all-green checkerboard give strong responses? It is difficult to avoid the following conclusion. In the achromatic contrast system there is something very different about the physiological effect produced when two adjacent locations are illuminated by lights of different luminances (Li and L2 cd/m2) that have the same wavelength, and the physiological effect produced when two adjacent locations are illuminated by lights of the same two different luminances (Li and L2 cd/m2) that have different wavelengths. Nevertheless, in 1974 and also still today, both these spatial patterns are said to have identical luminance contrasts. In an attempt to account for our findings, we offered the following hypothesis: 1. At the earliest contrast-processing stage, the color-normal subject has no spatial contrast mechanism whose spectral sensitivity matches - even approximately the equiluminant curve defined by the CIE V(A) curve or by any of the three measures of sensation luminance. This hypothesis accounts for the color-normal subject's absence of response to the stimulus illustrated in figure 13.10A, B. 2. The color-normal subject has a mechanism that responds to monochromatic spatial contrast. It consists of two (or more) parallel submechanisms that have different spectral sensitivities, all of which differ considerably from the V(A) curve. This hypothesis accounts for the finding that the color-normal subject gave symmetrical contrast-reversal responses to the monochromatic stimuli depicted in figure 13. IOC, D but gave similar responses to the pattern of alternate red and green checks depicted in figure 13.10A, B only when the red luminance was considerably higher than the green luminance, and vice versa. With the aim of isolating the hypothetical monochromatic contrast mechanism that was most sensitive to red light, I stimulated a subject's fovea with a 2 ° x 2° pattern of monochromatic deep red (676 nm) checks, each of which was 0.15 degrees wide (Regan, 1974, 1975a, b, 1979). Superimposed on the pattern was a uniform unpatterned monochromatic patch of desensitizing light that subtended 6 °. The basic idea was to vary both the wavelength and the intensity of the desensitizing light so as to hold the response to the checkerboard stimulus at a constant amplitude and thus establish the spectral sensitivity curve of the mechanism that was responding to the red checkerboard.
230
Processing Spatial Form
Figure 13.11: Electrical responses of the human brain produced by stimulating the central fovea with patterns of red-green checks. A: The stimulus. Red checks remained red and green checks remained green, but their luminances alternated between a lower (hatched) and a higher (not hatched) level. The stimulus pattern contained approximately 47 checks rather than the five shown here. Maximum contrast was 11 % for both the red and green components of the pattern. The horizontal diameter of the field was 2.2°. Mean luminance was 8.3 cd/m2. The red and green lights had the same specifications as those used in figure 13.9. B: Brain responses from a deuteranope. The traces labeled "Equiluminant Red-Green" are responses to the stimulus depicted in figure 13.10. Traces labeled "+0.1 Log Unit Red" and "-0.1 Log Unit Red", respectively, were obtained by brightening and dimming the red component of the light by 18%. C: Brain responses from a color-normal subject. Details as for B. The traces marked "Red-Red" were obtained by stimulating with reversals of monochromatic contrast as illustrated in figure 13. IOC. Reversals of luminance contrast across the edges of the squares are marked by the arrowed R's. Reprinted from Regan, D. and Spekreijse, H. (1974) Evoked potential indicators of colorblindness. Vis. Res., 14: 89-95. Copyright 1974 with permission from Elsevier.
David Regan and Marian P. Regan
231
Figure 13.12: Relative spectral sensitivities of two parallel foveal mechanisms sensitive to monochromatic contrast. The bold curves were obtained by measuring electrical responses recorded from the brain of a human subject. The fine dashed line shows the same subject's relative luminosity curve obtained by psychophysical heterochromatic flicker photometry. Reprinted from Regan, D. (1974) Electrophysiological evidence for colour channels in human pattern vision. Nature 250: 437—439. Copyright (1974) Macmillan Magazines Ltd. The rightmost heavy curve in figure 13.12 was the result. The leftmost heavy curve in figure 13.12 shows a spectral sensitivity curve obtained by replacing the red checkerboard probe with a monochromatic green (544 nm) checkerboard probe. Although a monochromatic blue (436 nm) checkerboard with 0.2 °-wide checks gave easily recordable brain responses, they were considerably weaker than the response to all-red or to all-green checkerboards (Regan, 1973a). Presumably that, coupled with the slightly smaller checks used (0.15° width), resulted in failure to obtain data on a putative spectral sensitivity curve of shorter wavelength than the leftmost curve in figure 13.12. A psychophysical equivalent of the experiment just described might be carried out as follows. Take two monochromatic checkerboard patterns, one of wavelength 590 nm, the other of wavelength 545 nm, both having the same luminance. On the basis of the findings shown in figure 13.12, one would envisage that contrast detection threshold for the red checkerboard would be elevated about 0.4 log units more by adapting to a 100% contrast version of the red checkerboard than by adapting to a 100% contrast version of the green checkerboard. And that contrast detection threshold for the green checkerboard would be elevated about 0.4 log units more by adapting to a 100% contrast version of the green checkerboard than by adapting to a 100% contrast version of the red checkerboard.7 7 In the design of such an experiment it would be preferable to create the monochromatic checkerboards using narrow-band interference filters or light-emitting diodes rather than by using a computer monitor,
232
Processing Spatial Form
An abstract by Yamamoto and DeValois (1996) reports an experiment whose design is not greatly different from that just described. They concluded that "These results are not readily explained by the standard two chromatic opponent mechanisms and one color-blind luminance mechanism.. . . [T]he data suggest the existence of colorselective detectors that respond to effective intensity differences, i.e., noncolor-blind 'luminance' mechanisms." Similar conclusions were reached by Ellis et al. (1975). What would be the implications for research on the visual processing of colordefined form if the human achromatic contrast mechanism is indeed organized along the lines just proposed? The implication is that it would be impossible to totally "silence" the entire achromatic spatial contrast system. But this by no means denies the possibility that stimuli could be designated to stimulate the chromatic contrast system considerably more strongly than the achromatic contrast system. How might all this relate to the properties of single neurons in monkey striate cortex? A relevant statement by Lennie et al. (1993, p. 1289) in a section entitled "Multiple mechanisms with V(A)-like spectral sensitivity?" is as follows: "Although the average spectral sensitivity of neurons in the upper layers is close to V(A), few individual neurons have the spectral sensitivity of V(A); indeed the spectral sensitivity of many that respond well to achromatic stimuli clearly differ from V(A), generally having narrower spectralsensitivity functions that result from their receiving opposed (albeit weakly opposed) inputs from M cones and L cones. Cells with the weakly opponent organization are chromatically heterogeneous and form no sharply identified group, yet they are so numerous and generally have such finely tuned spatial and orientational selectivities that there can be little doubt that they play some important role in form vision. Could this heterogeneous population of cells give rise to the V(A)-like spectral-sensitivity functions in acuity tasks and those involving detections of punctuate light? . . . . [I]f we suppose that the visual stimulus activates several cells, linear combinations of signals from these cells might reasonably be expected to give rise to a spectral-sensitivity curve that reflects the average of the spectral sensitivities of the individual cells." Evidence for a brain response to chromatic contrast was subsequently obtained by recording the magnetic field of the brain (Regan and He, 1996).8 (Because electrical because a green of dominant wavelength 545 nm and a red of dominant wavelength 590 nm generated on a monitor would have very considerable spectral overlap, while the monochromatic lights used to obtain our brain response data had essentially zero overlap. 8 The brain's evoked magnetic field is about 107 times smaller than the earth's field, so that the major problem in recording the evoked magnetic response is to extract it from noise caused by the heart, passing vehicles, elevators, and so on, that generate fields hundreds or thousands of times larger than the signal. The brain's evoked magnetic field creates a voltage between the ends of a coil of wire placed over the scalp, but the signal is far lower than the noisy voltage caused by thermal agitation in the wire. To remove this thermal agitation the coils are immersed in liquid helium to cool them to just above absolute zero where their electrical resistance falls to zero. At this low temperature quantum physics reigns, and superconducting quantum interference devices (SQUIDs), whose operation is based on the Josephson effect, are needed to extract the electrical signal. After which, the experimental procedure is similar to that used for recording EP. (Regan, 1989, p. 470-482; Kaufman and Williamson, 1980, 1982; Lu and Kaufman, 2003).
David Regan and Marian P. Regan
233
Figure 13.13: Magnetic responses evoked by chromatic contrast and luminance contrast. The thin and thick traces, respectively, are the magnetic responses to the onset (ON) and offset (OFF) of contrast for a red-green chromatic grating and for a yellow luminance grating. The two gratings were both created by superimposing the same monochromatic red and monochromatic green gratings, the only difference being a 90 deg shift in the relative spatial phase of the component monochromatic gratings. Spatial frequency was 2 cycles/deg. The stimulus field was a square of subtense 4 ° x 4° and the subject fixated midway along the upper edge of the square. Each trace is the average of 120 trials. The recording coil was located 5cm above the inion. Reprinted from Regan, D. and He, P. (1996). Magnetic and electrical brain responses to chromatic contrast in human. Vis. Res., 36: 1-18. Copyright 1996 with permission from Elsevier. and magnetic fields are perpendicular, the two kinds of brain response are most easily recorded from neurons that have different orientations with respect to the portion of skull with they underlie.) At some recording sites the response to the onset of equiluminant red/green checks was considerably larger than the response to a pattern of bright and dim yellow checks (figure 13.13). Furthermore, the response to an equiluminance red/green grating fell off more sharply with spatial frequency than did the response to a yellow luminance grating.
13.4 Brain Responses to Depth-Defined Form and to "Short-Range" Apparent Motion Figure 13.14A-C shows the first reported EPs to the appearance and disappearance of depth-defined form (stereo EPs) (Regan and Spekreijse, 1970). Subjects viewed a pair of identical Julesz patterns hi binocular fusion. The percept was a 4.5 ° x 4.5° flat plane of dots. Then the left eye's pattern was abruptly replaced by a second pattern that,
234
Processing Spatial Form
outside a central 3.4° x 3.4° square, was in perfect register with the first and differed only in that the central square had an uncrossed (near) disparity with respect to the right eye's pattern (upper left inset). The percept now was a 3.4° x 3.4° square standing in front of a surround of dots. Then the left eye's patterns were abruptly exchanged again and the square disappeared, leaving a 4.5 ° x 4.5° flat plane of dots. Figure 13.14A-C shows that both the appearance and disappearance of the depthdefined square correlated with a brain response. But was this brain response really caused by the depth-defined form? Figure 13.14D-F show the results of a control experiment. The static pattern viewed by the right eye was occluded, while the left eye's stimulus was unchanged. In this situation there were no disparity changes, so any brain responses would be caused by monocular stimulation. For abrupt disparity changes of 10 arc min subjects reported that the 3.4° x 3.4° central square of dots appeared to jump from side to side as a solid whole. This coherent apparent motion effect9 produced a clear brain response (figure 13.14D). For abrupt disparity changes of 20 and 40 arc min, however, the global motion effect was absent: individual dots appeared to move in different directions. Not only was the percept of coherent motion lost, but the brain response was also absent for the 20 arc min and 40 arc min disparity jumps (figure 13.14E, F). However, large responses were recorded in the stereo condition for the 20 arc min and 40 arc min disparity jumps (figure 13.14B, C). We concluded that these were not artifacts of monocular apparent motion. They were stereo brain responses (Regan and Spekreijse, 1970).
13.5
Dissociation of the Brain's Magnetic Responses to Texture-Defined Form and to Texton Change
Figure 13.15 shows how nonlinear behavior was exploited to demonstrate a magnetic brain response to texture-defined form. The problem was that the appearance and disappearance of texture-defined form was necessarily accompanied by a change in textons. (Textons are conspicuous local features within a texture pattern; Julesz, 1981). Magnetic and electrical averaged evoked responses were recorded while observers viewed a pattern of 8 x 8 texture-defined checks that subtended 4 ° x 4° (figure 13.15A). Each stimulus consisted of an abrupt change in the orientation of the lines within alternate checks, as depicted in figure 13.15B, D, F (Regan and He, 1995). At the recording location used to obtain the data shown in figure 13.15, the abrupt changes in line location produced little magnetic response (figure 13.15B, C). In figure 13.15D the blank checks were filled with stationary vertical lines. (These stationary lines would not, of course, evoke any averaged magnetic response.) However, the abrupt changes in line orientation that produced little response in figure 13.15C now gave a clear magnetic response (figure 13.15E). Furthermore, the response to an abrupt vertical to horizontal change of orientation was quite different from the response to an abrupt horizontal to vertical change of orientation. In figure 13.15F the blank checks of figure 13.15B were 9
This coherent apparent-motion effect was later termed "short-range motion" by Braddick, who reported a series of careful studies of the phenomenon. Our 10-20 arc min estimate of the range of this effect (Regan and Spekreijse, 1970) was in good agreement with 15 arc min estimate of Braddick (1974).
David Regan and Marian P. Regan
235
Figure 13.14: Brain responses evoked by the appearance and disappearance of depthdefined form. A-C: The central region of the static random-dot stereogram appeared to jump forward and backward in depth at a rate of 0.45 times per second. D-F: The stimulus was the same except that the eye viewing the static reference pattern was occluded so that no depth changes were seen by the subject. Therefore, the VEPs in D-F were produced by sideways apparent movement. A 10 arc min sideways movement gave the illusion that the central path of the dot pattern moved as a whole (coherent short range motion), and there were clear VEPs. But 20 arc min (E) and 40 arc min (F) movement produced no illusion of coherent "short-range" motion and, correspondingly, VEPs were much weaker or absent. G: shows the noise level (stimulus occluded). Two repeats of each trace are shown. Reprinted from Regan, D. and Spekreijse, H. (1970) Electrophysiological correlate of binocular depth perception in man. Nature, 225: 9294. Copyright (1970) Macmillan Magazines Limited.
filled with stationary horizontal (rather than vertical) lines. Figure 13.15G shows that an abrupt vertical to horizontal change of orientation now evoked a similar response to that evoked by an abrupt horizontal to vertical change of orientation in figure 13.15E, and vice versa. The magnetic responses shown in figure 13.15E, G are clearly not the
236
Processing Spatial Form
Figure 13.15: Magnetic brain responses to texture-defined form. A: The stimulus was an 8 x 8 checkerboard pattern of checks. B, C: Abrupt changes in the orientation of lines within alternate checks produced little magnetic response. D-G: When the blank checks were filled with stationary vertical (D,E) or horizontal (F, G) lines the abrupt changes of orientation produced strong magnetic responses that correlated with the onset and offset of spatial form rather than by the sense of line orientation change. Reprinted from Regan, D. and He, P. (1996). Magnetic and electrical responses of the human brain to texture-defined form and to textons. J. Neurophysiol., 74: 1167-1178 with permission from the American Physiological Society. sum of responses to the time-varying checks (figure 13.15B, C) and to the stationary checks. Therefore they violate the superimposition requirement for linearity. Although, as already stated, the two response waveforms in figure 13.15E, G are dissociated from the sense of line orientation change, they correlate with the appearance/disappearance of a texture-defined checkerboard pattern. In particular, the onset of form produced a double-peaked response, while form offset produced a single peak that was directed oppositely to the double peak.
13.6 Three Subsystems in the Steady-State Evoked Potential to Flicker and a Magno Stream/Parvo Stream Correlate in Human While I recorded steady-state EPs in London, in Amsterdam van der Tweel and Verduyn Lunel (1965) were recording time-averaged EPs to flicker and had identified subsystems near 10 Hz and near 40 Hz. It was to these subsystems that Henk Spekreijse brilliantly applied nonlinear systems analysis (Spekreijse, 1966). I confirmed these findings (Regan, 1967), showed that the 40 Hz (high frequency) subsystem had the
David Regan and Marian P. Regan
237
Figure 13.16: The three parallel flicker VHP subsystems evoked potentials produced by flickering an unpatterned stimulus fall into three frequency regions: "low-frequency," "intermediate-" or "medium-frequency," and "high-frequency" regions. Particularly the medium-frequency, but also the high-frequency responses are emphasized by using large stimulus fields. The dotted line shows that the EPs to small checks (less than about 15-20 arc min) are emphasized at temporal frequencies of five to eight reversals per second. Reprinted from Regan, D. (1975) Recent advances in electrical recording from the human brain. Nature. 253: 401-407 with permission from Nature. Copyright (1975) Macmillan Magazines Limited. same spectral sensitivity as the psychophysical sensitivity curve of the human eye, and added a third (medium frequency) subsystem peaking near 16-20 Hz (Regan, 1968, 1969). Figure 13.16 depicts the temporal tuning of the three flicker subsystems, and also that of responses to checks. Many years later, at the 1991 ARVO meeting, having escaped from an irritating lecture on the proposed role of 40 Hz brain oscillations in consciousness, Barry Lee and I sat on the wall of a bowling green and discussed his recent experiments recording from parvo and magno retinal ganglion cells in macaque monkeys. To my surprise I found that his stimuli corresponded closely to the stimuli I had used to investigate the medium (16-20 Hz) and high-frequency (40 Hz) flicker EPs in human during the 1960s and that the findings were similar. He agreed to undertake a few more experiments to bring the stimulus conditions in monkey and human into even closer comparability. I had used the classical heterochromatic flicker photometry (HFP) technique to measure spectral sensitivity in man, both psychophysically and by recording the 40 Hz response, and he had used the same technique to measure the spectral sensitivity of magnocellular (MC) pathway cells of the macaque. The three measures gave closely similar spectral-sensitivity curves. When the relative phase between red and green lights was varied, the point of minimum subjective flicker for human observers was close to a sharp minimum found in the amplitude of the 40 Hz response in human and
238
Processing Spatial Form
was also close to a minimum in the response of MC-pathway neurons in the monkey. The human 40 Hz response saturated at between 10 and 30% modulation depth, and so did the response of MC-pathway cells in the monkey. The medium-frequency response in humans showed none of the above correlations with MC-pathway properties. Furthermore, parvocellular (PC) pathway cells responded vigorously to constant-luminance, chromatic modulation, at frequencies higher than can be detected by human observers. The human medium-frequency response also was strong in that stimulus situation. In addition, effect of modulation depth on the response of PC-pathway cells showed little saturation, and this behavior was paralleled by the human 16-Hz response. We concluded that the properties of MC-pathway neurons in macaques are closely similar to the properties of the human 40 Hz response in the respects just described. We suggested that the 40 Hz response may offer a means of objectively isolating and investigating the contribution of the MC stream to cortical activity in human. In contrast, the properties of PC-pathway neurons in macaques are quite different from the properties of the humans 40 Hz response, and in several respects resemble the properties of the human medium-frequency response. Staying faithful to the point that inspired our study, we noted that if the 40 Hz responses are linked to consciousness, then the physiological basis of consciousness extends as far peripherally as the retina. This note was removed by the editor (Regan and Lee, 1993).
13.7
The Frequency Tagging Technique: Some Early Applications
While listening to a BBC radio program on migraine that described a technique for injecting radioactive gas into the brain's blood supply, it occurred to me that asymmetry in the excitability of the left and right halves of the brain might be assessed less alarmingly. The idea was to stimulate simultaneously the left visual half-field at frequency -Fi and the right visual field at frequency F% with FZ — F\ being so small (e.g., 0.1 Hz) that the two frequencies were identical so far as brain function was concerned yet could easily be separated on-line by two parallel Fourier analyses. Separate responses from left and right hemispheres could then be recorded simultaneously from a single electrode derivation (Regan and Heron, 1969). The technique did indeed detect changes in the response asymmetry in patients experiencing migraine attacks, though a bucket had to be added to the lab equipment. Our presentation at a meeting sponsored by the Migraine Society (Regan and Heron, 1970) attracted the attention of several prominent individuals whose misfortune it was to suffer from migraine, most notably Princess Margaret (President of the Migraine Society), who later paid a private visit to my laboratory to inspect the equipment and urge me on to greater efforts. After this promising start, patients with severe debilitating migraines were brought to a hospital ward where we installed the equipment. The intent was to compare the effects of medications administered through catheters. We failed. Three successive patients were so impressed by the drama and by the scientific
David Regan and Marian P. Regan
239
equipment that they experienced their longest migraine-free periods for years. It seems that we had created an extremely expensive placebo. The frequency-tagging technique was later expanded to the simultaneous recording from four retinal sites of responses to pattern and to flicker (Cartwright and Regan, 1974). More recent applications of frequency tagging are described below in sections 13.10 and 13.11 below.
13.8
The Sweep Method: A Fast Hybrid Technique for Gathering Data within a Short Period and for Measuring Adaptation and Other Nonstationary Processes
During a visit to UC Berkeley in 1971,1 was told that one of the "wish list" lines of research in the current NEI five-year plan was the development of a rapid objective method for prescribing spectacles, a method that could be operated by a technician. I rashly stated that the problem was easily solved and was immediately told that talk is easy and challenged to a wager. I eventually developed a technique which exploited the immediacy of the response of the running average display in figure 13.5 to a change in the visual stimulus (Regan, 1973b). For example, the axis of corneal astigmatism could be identified objectively by viewing a rapidly reversing pattern of small checks through a narrow slowly-rotating slit placed just in front of the cornea. When the orientation of the slit coincided with the axis of corneal astigmatism, the checks looked sharp and the EP rose to a maximum. Figure 13.17A shows a continuous recording taken during four successive rotations of the slit to illustrate a consistent indication of 45° axis of astigmatism. One rotation was sufficient to identify the axis of astigmatism. For comparison, figure 13.17B illustrates the conventional static method for identifying the axis of astigmatism. The slit was set to each of 12 orientations, and an EP recorded at each orientation. The 12 data points were then connected by lines. The recording time required to identify the axis of astigmatism was 30 min using the conventional method compared with as little as 10 s for the sweep method. The refraction was completed by sweeping the power of a special lens placed in front of the eye while the checkerboard pattern was viewed through the slit, first set along the axis of astigmatism, and then perpendicular to the axis. An eye could be completely refracted within a total recording duration of 60 s. The difficulties of getting the sweep technique to work reliably made me often regret my rash statement. It was only the realization that, rather than being a mere technical exercise, the method might be useful in recording from preverbal infants and young children that made the effort seem worthwhile. Figure 13.18 shows a display for monitoring the visual status of an amblyopic child through months of patching therapy. (The good eye is penalized by wearing a patch to allow the amblyopic eye to regain synaptic connections to binocularly driven cortical neurons.) The check size of a rapidly reversing checkerboard pattern was slowly zoomed from small to large. In an attempt to ensure that the child looked at the rapidly reversing checkerboard
240
Processing Spatial Form
Figure 13.17: Identification of the axis of corneal astigmatism by the sweep VEP method and by the conventional static method. The subject viewed a checkerboard pattern whose bright and dim checks abruptly exchanged places six time per second. A slowly rotating narrow slit was placed immediately in front of the subject's cornea. A: Continuous recording of the running average of the amplitude of the 6 Hz component of the brain response during four rotations of the slit. B: Data points plot the amplitude of the 6 Hz component of the brain response for 12 static orientations of the slit. Reprinted from Regan, D. (1973) Rapid objective refraction using evoked brain potentials, Invest. Ophthalmol., 12: 669-679 with permission from ARVO.
with sharpest accommodation, a cartoon movie was superimposed on it. The temporal frequencies in the cartoon would not affect the Fourier analyzer, which was locked to the check reversal rate: the child looked at the movie and the analyzer "looked" at the checks. Figure 13.19 shows plots of EP amplitude versus check size for the good eye and the amblyopic eye recorded at an early stage of patching therapy. As the months proceeded, the curve for the amblyopic eye became more and more similar to the curve
David Regan and Marian P. Regan
241
Figure 13.18: Sweep VEP display for use in testing amblyopic infants. The appearance of the superimposed cartoon movie and the checkerboard pattern at one point in the zoom of check size. The cartoon was quite visible, but its presence did not seriously reduce the amplitude of the brain response to the rapidly reversing and slowly zooming checks. Reprinted from Regan, D. (1977) Speedy assessment of visual acuity in amblyopia by the evoked potential method. Ophthalmologica, Basel, 175: 159-164 with permission from Karger AG, Basel.
242
Processing Spatial Form
Figure 13.19: Sweep VHP data from an amblyope undergoing patching therapy: plots of brain responses amplitude versus check size recorded by continuously zooming check size. Reprinted from Regan, D. (1977) Speedy assessment of visual acuity in amblyopia by the evoked potential method, Ophthalmologica, Basel, 175: 159-164, with permission from Karger AG, Basel. for the good eye (Regan, 1977). The single-sweep technique is now widely used. For example, its speed was put to good use by Christopher Tyler, Tony Norcia, and colleagues to take advantage of the short-lived period of cooperation offered by young babies. In a tour de force of developmental research on a large population, they documented the development of visual acuity and contrast sensitivity from the age of 10 weeks to 12 months (Norcia and Tyler, 1985; Norcia et al., 1986, 1988, 1990). Without these data it would not be possible to use EP recording to detect developmental abnormalities in preverbal infants - an important application of EPs. The speed of the single-sweep technique has also been exploited in studies of adaptation. For example, figure 13.20 illustrates the use of the single-sweep method by Nelson et al. (1984) to demonstrate rapid adaptation to grating contrast in human (automatic gain control). The power of the sweep concept is not restricted to situations where the recording duration must necessarily be short. An extension of the single-sweep technique, namely, the sweep-averaging technique, has the property that it is resistant to certain kinds of noise and signal variability. This hybrid frequency domain/time domain technique allows a graph to be obtained with much higher accuracy than when it is plotted conventionally point bypoint (Regan, 1974,1975a). In effect, several rapid samples of the entire graph are taken in succession and the entire graphs are averaged. Ordinates in figure 13.21 plot the amplitude of the IF frequency component of the steady-state EP recorded while a subject viewed a pattern of bright and dim monochromatic red checks that abruptly exchanged places 2F times/s (IF = 6 in this example). A monochro-
David Regan and Marian P. Regan
243
Figure 13.20: Visual adaptation to contrast: rapid measurement of contrast threshold by the sweep VHP method. The continuous lines are upward sweeps starting at 0.1% contrast before adapting for 60 s to a 75% contrast grating. The dashed lines are upward sweeps recorded immediately after adaptation, showing a roughly threefold increase of threshold. But note that this threshold was lower than the preadaptation threshold for 20-0.1% downward sweeps of contrast. Reprinted from Nelson, J. I. et al. (1984) A rapid evoked potential index of cortical adaptation, Invest Ophthal. Vis. Sci., 59: 454464, with permission from ARVO. matic red patch of unpatterned light was superimposed on the reversing checks and its luminance ramped so as to progressively reduce the contrast of the checks: log spatial contrast of the checks was ramped linearly (abscissa). Figure 13.21 A shows four individual sweeps; figure 13.21B is the average of the four traces in A and shows improved signal-to-noise ratio; each of the two traces in figure 13.21C is the average of 16 sweeps, while the noise trace is the average of 16 sweeps made with the light stimulus occluded.
13.9
Response Spectrum Recorded at Ultra High Resolution: Nondestructive Zoom-FFT
Before going on to the next section we should step aside to discuss a signal-to-noise issue. The theoretical limit of resolution in the spectrum of a time series, the GaborPauli-Heisenberg limit, is given by AF = I/ A T, where AT is the length of the time series in seconds and AF Hz is the frequency resolution.10 Although the simple analog 10 The equation includes a scaling constant that depends on how one chooses to describe frequency bandwidth AF and how one chooses to denned the duration of the signal. Gabor (1946) chose the root mean
244
Processing Spatial Form
Figure 13.21: The sweep averaging technique: an example. See text for details. Reprinted from Regan, D. (1975) Colour coding of pattern responses in man investigated by evoked potential feedback and direct plot techniques. Vis. Res., 15: 175-183. Copyright 1975 with permission from Elsevier. technique described in section 13.2 can deliver this theoretical limit of resolution for harmonic and even cross-modulation frequency components, it is inconvenient. The FFT algorithm is convenient, but the best possible frequency resolution provided by the approach is determined by the number of samples used to describe the time series and is far below the Gabor-Pauli-Heisenberg limit for long recording durations. The zoom-FFT technique was adapted to nondestructive form (nondestructive zoom-FFT) by Marian Regan so as to attain the theoretical limit of frequency resolution over an indefinitely wide bandwidth (see also Regan, 1989, pp. 98-108; M. P. Regan and D. Regan, 1988). For example, after recording brain responses for 520 s while the obsquare (RMS) definition of bandwidth and duration, giving AF = 1/2 A T. The AF = I/ A T version implies a different definition of bandwidth in which AF is measured between the first two zeros in the amplitude spectrum.
David Regan and Marian P. Regan
245
Figure 13.22: EPs analyzed by nondestructive zoom-FFT. The stimulus was a patch of light nickering at FI Hz superimposed on a second patch flickering at F% Hz. This is a single spectrum over 0.5-49.5 Hz bandwidth that has been cut into sections for convenience - the ordinate for the 0.5-17 Hz section is 300 and for the 35-47 Hz section, 6. The EEG was analyzed by nondestructive zoom-FFT at a resolution of 0.0039 Hz. The steady-state EP consisted of discrete frequency components whose bandwidth were less than 0.0039 Hz. Cross-modulation terms were due to nonlinear interactions between neural responses to the two flickering lights. server viewed a counteiphase-modulated grating, a harmonic component of the evoked potential fell within a bandwidth of 0.002 Hz (Regan, 1989, figure 1.70B). A bandwidth of 0.004 Hz or 0.008 Hz can be used on an everyday basis, as illustrated in figure 13.22. The point of this high resolution is that all the signal power is concentrated into a few very narrow regions along the frequency axis whereas the noise is distributed continuously along the frequency axis. This allows very weak signals to be recorded at high signal-to-noise levels. For example, the (2Fi + 4F2) component of the steady-state EP near 45 Hz in figure 13.22 is about 20,000 times below the noise level. This topic is discussed more fully in Regan (1989, pp. 103-111). It was perhaps fortunate that we had already collected a good deal of data before we presented this line of research to a critical audience. A distinguished vision sci-
246
Processing Spatial Form
Figure 13.23: Amplitude modulation and phase modulation of a sinusoid by a noise waveform. A: Spectrum of modulating noise waveform (DC, 5 Hz). B: Spectrum of 150 Hz sinusoid. C: Spectrum of sinusoid amplitude-modulated by noise waveform. D: As C, but amplitude modulation stronger. E: Spectrum of sinusoid phase-modulated by noise waveform. Reprinted from Regan, D. (1989) Human Brain Electrophysiology: Evoked Potentials and Evoked Magnetic Fields in Science and Medicine. New York: Elsevier. Copyright 1989 with permission from Elsevier. entist with an impeccable physics background had an initial difficulty in believing our findings because, as he pointed out, moment-to-moment variability of the EP signal would broaden the bandwidth of all the frequency components. And indeed, standard physics and engineering textbooks show how multiplying a sinusoid of frequency F Hz by a noise waveform broadens the narrow F Hz line in the spectrum.l' So we went back to our laboratory and collected the data shown in figure 13.23. Rather than multiplying a sinusoid by a noise waVeform, we multiplied the sinusoid by (1 + fc(noise waveform)) where the second term was less than 1.0, so as to modulate the sinusoid. Figure 13.23A, B shows, respectively, the spectrum of the DC-5 Hz noise and the narrow spectral line of a 150 Hz sinusoid whose bandwidth (0.0078 Hz) was determined by the recording duration. Figure 13.23C shows the spectrum of the noise-modulated sinusoid (30% modulation depth). The bandwidth of the upper part of the spectral line is still 0.0078 Hz, but the base is broadened to form a pedestal of total width 10 Hz (twice the noise bandwidth). Figure 13.23D shows that increasing the modulation depth raises the pedestal without increasing its width. Temporal jitter of the steadystate EP was simulated by phase-modulating the sinusoid by the noise waveform. The amplitude of the phase modulation was 40°. Figure 13.23E shows that phase modulation had a similar result to amplitude modulation. Figure 13.23 demonstrates that, providing the noisy variations in the amplitude and phase of a steady-state EP frequency component are not too great, the bandwidth of a steady-state EP component is set by the recording duration, and its height above the 1 ' This provides a standard means of generating narrowband noise centered on any desired frequency Ifc Hz. A white noise source is filtered so that the bandwidth is DC-F^AX where FMAX
David Regan and Marian P. Regan
247
noise pedestal is determined by the size of the noisy variations of amplitude and phase.
13.10
Measurement of the Orientation, Spatial Frequency, and Temporal Tuning of Spatial Filters by Exploiting the Nonlinearity of Neurons Sensitive to Spatial Form
The method next described can, in principle, be used to measure the tuning characteristics of neural mechanisms within any sensory modality. To illustrate the general idea we choose the particular example of a neural mechanism sensitive to spatial contrast. The rationale in this particular case is as follows. When the eye is stimulated by two superimposed gratings, it may be that the EP is not equal to the sum of the EPs to the individual gratings; i.e., the visual system is behaving nonlinearly (appendix 13.A1). This departure from linearity can be gross. The crucial point is that nonlinear gratinggrating interactions in the brain response can only be produced if the neural mechanism that generates the brain response "sees" both gratings simultaneously. In other words, nonlinear interaction terms cannot be generated if only one grating is within the spatial frequency, temporal frequency, and orientation bandwidth of the generator mechanism. Given this rationale, the tuning bandwidths of the mechanism that responds to any given grating can be estimated in the following way (Regan, 1983): 1. Spatial frequency bandwidth: Superimpose on the reference grating a grating of closely similar orientation and closely similar counterphase-modulation frequency, and then vary the spatial frequency of this second grating. 2. Orientation bandwidth: Superimpose on the reference grating a second grating of closely similar spatial frequency and closely similar counterphase-modulation frequency, and then vary the orientation of this second grating. 3. Temporal bandwidth: Superimpose on the reference grating a second grating of closely similar spatial frequency and orientation, and then vary its counterphasemodulation frequency. The nonlinear grating-grating interactions between a grating that exchanges bright and dim bars 2F\ times per second and a grating that exchanges bars 2F2 times per second comprise (1) suppression of the 2F± Hz, 2F2 Hz, and other harmonic components of the response and (2) generation of cross-modulation components of frequency (nFi ± mF2) Hz (where n and m are integers). Figure 13.24 illustrates this point. Figure 13.24A shows the 2Fi and 4Fi harmonics in the response to a grating that was weakly counterphase-modulated at FI Hz. Figure 13.24B shows the result of superimposing a second grating on the first grating, the second grating being strongly counterphase-modulated at F2 Hz. The 2F\ and 4F\ components in Figure 13.24A are abolished, and cross-modulation components are evident. Figure 13.24C shows that the effects shown in B were specific to the processing of spatial form. In figure 13.24C, rather than a second grating, a strongly flickering homogeneous patch of
248
Processing Spatial Form
Figure 13.24: Nonlinear interaction between two gratings. Sections of the DC-100 Hz power spectrum are shown at ultra narrowband 0.0078 Hz resolution (nondestructive zoom-FFT). A: The stimulus was a single grating counterphase-modulated at 7.938 Hz. B: A second grating, counterphase-modulated at 7.080 Hz, was superimposed on the grating in A. bf C: An unpatterned field flickering at 7.080 Hz was superimposed on the grating in A. The terms boxed in B index nonlinear interactions. Both gratings were vertical subtended 10° and had a spatial frequency of 5 cycles/deg. Reprinted from Regan, D. and Regan, M. P. (1988) Objective evidence for phase-independent spatial frequency mechanisms in the human visual pathway, Vis. Res., 28: 187-191. Copyright 1988 with permission Elsevier.
David Regan and Marian P. Regan
249
Figure 13.25: Spatial frequency tuning and orientation tuning of a brain mechanism sensitive to spatial form. Steady-state brain responses of frequency 2F Hz produced by F Hz counterphase-modulation of a fixed vertical grating (A) and (B, C) with a second (variable) grating superimposed on it. Reprinted from Regan, D. (1983) Spatial frequency mechanisms in human visual responses to two-dimensional patterns and a limitation of Fourier methods. Vis. Res., 27: 2181-2183. Copyright 1983 with permission from Elsevier.
light was superimposed on the grating in figure 13.24A. It is evident that the 2Fi and 4-F\ components were not suppressed, and the cross-modulation components present in figure 13.24B were absent (Regan and Regan, 1987,1988). Figure 13.25B shows how suppression of the 2Fi component of figure 13.25A was used to show that the broad spatial frequency tuning bandwidth observed when only one grating was used (figure 13.25A) was composed of multiple narrow-bandwidth submechanisms even at contrast levels that were well above threshold (Regan, 1983). When fixed gratings of different spatial frequencies were used, the maximum suppressions always occurred when the spatial frequency of the two gratings were equal. The temporal-frequency tuning of any one of these submechanisms fell into one of only two classes: lowpass or bandpass (Regan, 1983).
250
Processing Spatial Form
Figure 13.25C shows that suppression of the 2F± Hz component was greatest when the two gratings were parallel. These nonlinear interactions were much weaker when the gratings differed in orientation by about 40°. This finding is consistent with a full half-sensitivity bandwidth of about 25 ° for orientation-tuned cortical neurons. But nonlinear interactions were again strong when the two gratings were at right angles. This finding could not have been predicted from data obtained with a single grating. It can be understood if there is a strong nonlinear interactions between orthogonal orientation-tuned neural mechanisms (Regan and Regan, 1987).
13.11
A Visual-Auditory Convergence Area in the Human Brain
A variant of the frequency-tagging method described in section 13.7 can be used to investigate and localize brain regions where information about two (or even three) sensory modalities converge. An illustration follows: Subjects viewed a light flickering at FV Hz while listening to an auditory tone that was amplitude-modulated at FA Hz. The auditory pathway of the brain could generate responses at harmonic frequencies of FA (i.e., FA, %FA, %FA, etc.) and the visual pathway of the brain could generate response at harmonic frequencies of Fv (i.e. Fv, 2.FV, 3Fy, etc.), but cross-modulation frequencies [(FA + Fv), (2FA + 2Fy), (%FA — 2Fy), etc.] could be generated only after auditory and visual signals had converged. Cross-modality cross-modulation components were, therefore, a signature of audiovisual convergence areas of the brain (Regan, He, and Regan, 1995). The magnetic field of the brain was analyzed in the frequency domain at 0.008 Hz resolution at recording sites that are marked + or — in figure 13.26A-C according to the phase of the response. The lines are isofield contour maps for the 2FV (figure 13.26A), the IF A (figure 13.26B) and, the (2FA+FV) (figure 13.26C) components of the brain's evoked magnetic field. Magnetic source location placed the intracranial source of the (IFA + FV) field component approximately 2 cm inferior to the source of the IF A component, in fair agreement with the relative locations of primary auditory cortex and an audiovisual convergence area in monkey brain (Tigges and Tigges, 1985).
13.12
A Frequency Domain Technique for Testing Nonlinear Models of the Human Visual System
When we first recorded ultra high-resolution spectra to sum-of-two sinusoids stimulation such as figure 13.22 we could not replicate the spectra. Attempts to replicate produced spectra in which different frequency components did not have the same relative power, and some were even absent. Then we controlled the stimulus conditions much more precisely and found that the spectrum for a given stimulus condition replicated well and that the spectrum was exquisitely sensitive to the relationship between the two sinusoids. Furthermore, a sum of two nickering lights gave a quite different pattern of spectra than a sum of two counterphase-modulated gratings. Clearly, there
David Regan and Marian P. Regan
251
Figure 13.26: Audiovisual convergence area in the human brain: isofield contour maps of magnetic response amplitude recorded from the human brain during simultaneous visual and auditory stimulation. A: Responses generated in the visual system. B: Responses generated in the auditory system. C: Responses generated in an audiovisual convergence area. The circle round the map is a best fit to the curvature of the skull over the region of the head from which recordings were made. Recording sites are marked + and — according to the phase of the response. The viewpoint was from the left side of the head in B and C and from directly behind the head in A. The inion is marked with a filled circle in A. The left periauricular point is marked with a filled triangle in B and C. Reprinted from Regan, M. P., He, P. and Regan, D. (1995) An audio-visual convergence area in human brain. Exp. Brain Res., 106: 485-487. Copyright SpringerVerlag. was regularity here. We had discovered empirically that there was a complex fine structure in the brain response that was very sensitive to the stimulus parameters. But what did it mean? This was as far as author DR could go. It was MPR who developed the theoretical understanding described next. Response asymmetry is a frequently encountered property of single neurons at both peripheral and central levels in mammalian visual pathways. For example, asymmetrical responses to increase versus decrease of light intensity (Schiller et al., 1986),
252
Processing Spatial Form
Figure 13.27: Half-wave rectifier characteristics. A: Compressive. B: Accelerating. C: Initially accelerating then compressive. Reprinted from Regan, M. P. and Regan, D. (1988) A frequency domain technique for characterizing nonlinearities in biological systems. J. Theoret. Biol., 133: 293-317. Copyright 1988 with permission from Elsevier. spatial contrast (Movshon et al., 1978), leftwards versus rightwards motion (Barlow and Levick, 1965), and receding versus approaching motion in depth (Cynader and Regan, 1978; Poggio and Talbot, 1981) are well-documented properties of visual pathway neurons of cats and nonhuman primates. Because single-neuron electrophysiology is rarely ethical in humans, little is known about human single-neuron electrophysiology, but there is ample psychophysical and EP evidence that response asymmetries are common hi the human visual system. For example, the human visual pathway contains channels for increasing versus decreasing luminance (Clynes et al., 1964), increasing versus decreasing contrast (Spekreijse et al., 1972), leftwards versus rightwards motion (Sekuler et al., 1978), and the direction of motion in depth (Beverley and Regan, 1973; Hong and Regan, 1989). The general class of asymmetrically responding neurons encompasses a variety of nonlinear characteristics. Asymmetric compressive characteristics (figure 13.27A) have been described in single-cell responses to luminance contrast (DeValois et al., 1982) and velocity (Orban, 1985). Asymmetric accelerating characteristics (figure 13.27B) have been postulated to explain behavioral data on motion detection (Reichardt et al., 1983). An asymmetric characteristic that combines compression and acceleration with a threshold (figure 13.27C) has been proposed to describe contrast transduction in human vision (Legge and Foley, 1980). One kind of response asymmetry, half-wave rectification, is illustrated in figure 13.28A, B. A half-wave rectifier's response to a sinusoidal input always has the same polarity-positive in this case. Two half-wave rectifiers can be combined to give a symmetric (figure 13.28C) or asymmetric (figure 13.28D) full-wave rectifier. A physiological approximation to constant-polarity full-wave rectification (figure 13.28C or D) is offered by the on-off organization of the visual pathway in primate. Luminance increase (on) and decrease (off) are both signaled by excitation along parallel pathways that differ both in connectivity and neurochemistry (Schiller et al., 1986).12 Even when the rectifier limbs are straight, as in figure 13.28, the response of a single rectifier to an F Hz input signal contains higher harmonics of the input frequency rather than being 12
The physiological advantage of this arrangement is that it provides a combination of signaling speed and metabolic efficiency: the reference level of luminance is signaled by inactivity of both on- and off-channels (Schiller et al., 1986).
David Regan and Marian P. Regan
253
Figure 13.28: Signal distortion caused by rectifiers. The input (vertical arrow) is sinusoidal. The output is depicted along the x-axis. A, B: Half-wave. C: Symmetric full wave. D, E: Asymmetric full wave. Reprinted from Regan, M. P. and Regan, D. (1988) A frequency domain technique for characterizing nonlinearities in biological systems. J. Theoret. Biol, 133: 293-317. Copyright 1988 with permission from Elsevier. an F Hz sinusoid as is the case for a linear system. A second point - and the crux of this section - is that the response of a single rectifier to an input that consists of the sum of two sinusoids of frequency FI and F% generally contains multiple discrete terms of frequency (nFi ± mF^) where n and m are zero or integers. Marian P. Regan developed a mathematical procedure for obtaining the output of one or more asymmetrically responding elements when fed with the sum of two sinusoidal inputs (Regan and Regan, 1988). Appendix 13.A3 summarizes these mathematics. Rather than activating a single asymmetrically responding neuron, it is usually the case that a visual stimulus activates a cascaded series of asymmetrically responding neurons whose characteristics may all differ (e.g., compressive, accelerating, or mixed; see figure 13.27). These neurons may be AC-coupled or DC-coupled. Solutions have been obtained by author MPR for up to five rectifiers in series with different characteristics, thus modeling up to five cascaded neural transformations. She has also derived solutions for the dichoptic case where several nonlinear transformations occur in the left and right monocular channels followed by further nonlinear transformations after binocular convergence. This approach is convenient for analyzing a physiological system from the standpoint of parallel processing, and also in terms of "dissecting" the hierarchical sequence of processing. The theoretical result takes the following form. If one of the two sinusoidal inputs is held at constant amplitude while the amplitude of the other input is varied, we obtain a family of curves — one for each of the nonlinear cross-modulation (nFi ± mF2) components and one for each harmonic of the two input frequencies.
254
Processing Spatial Form
Figure 13.29: Mathematical basis for testing multineuron nonlinear models. Ordinates plot the amplitudes of frequency terms in the output of three cascaded square-root rectifiers whose input is the sum of two sinusoids, one of frequency F\ and the other of frequency F%. The amplitude (Ai) of the F\ input was held constant while the amplitude (A^) of the FZ input was varied. Values of A; are plotted along the abscissa, where k = A%/Ai. Reprinted from Regan, M. P. and Regan, D. (1988) A frequency domain technique for characterizing nonlinearities in biological systems. /. Theoret. Biol., 133: 293-317. Copyright 1988 with permission from Elsevier.
Figure 13.29 shows an example of such a family of curves, in this case for a model consisting of three square-root (compressive) rectifiers in sequence. This family of curves is changed greatly when the number of cascaded stages or the shape of the rectifier arms is changed. The main finding is that, for all the cases so far investigated, this family of curves is characteristic of the nonlinear model. When many high-order terms are taken into account the specificity is so high that it may not be fanciful to regard the family of curves as a kind of "fingerprint" of the model. The nondestructive zoom-FFT technique allows different models to be compared against data. Up to 30 discrete frequency components can be recorded including terms up to the tenth order, thus allowing sharp testing of models. For example, figure 13.30 shows the following three simultaneous recorded responses: a 2F^ Hz response produced by stimulating one eye with FI Hz flicker, a 2F% Hz response produced by stimulating the other eye with F2 Hz flicker, and a nonlinear response component generated by binocular neurons. This last response can be used to demonstrate the presence of
David Regan and Marian P. Regan
255
Figure 13.30: Demonstration of binocular neurons. One eye viewed a homogeneous field flickering at FI =8 Hz while the other eye viewed a similar homogeneous field flickering at F-2 = 7 Hz. The EEC spectrum was recorded at a resolution of 0.004 Hz by zoom-FFT. The (Fi + F2) component is due to a nonlinear interaction between signals from the left and right eyes, demonstrating the presence of binocular neurons. From Regan, M. P. and Regan, D. (1989) Objective investigation of visual function using a nondestructive zoom-FFT technique for evoked potential analysis. Can. J. Neurolog. Sci., 16: 168-179. Reprinted with permission from the Canadian Journal of Neurological Sciences.
binocular neurons even in patients with very low acuity in both eyes. Figure 13.31 shows how the theoretical approach described above can be used to test models of such binocular processing (e.g., to test whether a patient's binocular neurons function normally). The data shown in figure 13.31 rejected many plausible models, and were most compatible with two monocular linear rectifiers that fed a binocular compressive (square root) rectifier (Regan and Regan, 1989).
256
Processing Spatial Form
Figure 13.31: Investigation of binocular neurons. Ordinates plot the powers of some of the frequency components in the evoked potential recorded while a subject viewed a homogeneous field flickering at 9 Hz with 20% modulation depth in the left eye while a second homogeneous field flickering at 7 Hz in the right eye was varied in modulation depth from 0% to 70%. From Regan, M. P. and Regan, D. (1989) Objective investigation of visual function using a nondestructive zoom-FFT technique for evoked potential analysis. Can. J. Neurolog. Sci., 16: 168-179. Reprinted with permission from the Canadian Journal of Neurological Sciences.
13.13 Appendix 13.A1
Linear Systems and the Wide and Wild World of Nonlinear Systems
A system or part of a system is said to behave linearly if the relation between its output and its input obey the following: 1. If input Ia (t) produces output Oa (t) and input Ib (t) produces output Ob (t), then input [Ia(t) + h(t)} produces output [Oa(t)+O\,(t)}. (This requirement is sometimes called the superimposition requirement.) Any system that does not obey this requirement is said to be nonlinear. Although all linear systems are similar in that they obey requirement (1) above, the number of kinds of nonlinear behavior is indefinitely large. A smoothly curved inputoutput characteristic is a nonessential nonlinearity that behaves more and more linearly as the amplitude of the input is progressively reduced. An essential nonlinearity does not approximate linear behaviour even when the input signal is small. A threshold, multiplication, and rectification are examples of essential nonlinearities that are commonly encountered in MEG, evoked potential (EP) studies, and neurophysiology in general. Time invariance means that the system's output O(t) does not depend on the
David Regan and Marian P. Regan
257
time at which the input I(t) is applied; i.e., the response to I(t — T) is O(t — T) for all T and /(£). A consequence of linearity plus time invariance is as follows: 2. A pure sinusoidal input produces a pure sinusoidal output of the same frequency as the input (Bracewell, 1965, pp. 185-186). A system whose output does not depend on the previous history of inputs is called a zero-memory system. The hysteresis curve of an iron transformer's core is a familiar example of memory. As discussed and illustrated in Regan (1975a), when using the sweep method with MEG or EP recording it is essential to check for hysteresis. A perhaps less familiar manifestation of memory is the phenomenon of nonlinear resonance in which the output of a system fed with a sinusoidal input depends on whether the frequency of the input is increasing or decreasing so that there is a region of instability within which the output produced by particular input may have two possible values (the jump effect). Formal discussions of nonlinear oscillations and the jump effect are available (Blaquiere, 1966; Hagedorn, 1982; Hayashi, 1964; Stoker, 1950). A dramatic illustration of complex nonlinear oscillations in EP recording is given in Regan, Schellart, Spekreijse, and van der Berg (1975). A very convenient feature of linear system behavior is that, if a sinusoidal input is applied to a linear system and the amplitude and phase of the steady-state output are recorded over the range of input frequencies that produce an output, then the timedomain output produced by a single input pulse or any other transient waveform can be computed by means of the inverse Fourier transform. Furthermore, the minimum possible phase shift produced by any linear system can be calculated if the effect of input frequency on output amplitude is known (Aseltine, 1958; Bracewell, 1965). In a certain sense, however, linear systems are tame and somewhat boring: both the static and dynamic behavior of a linear system is severely restricted, and its range of possible behaviors is narrow (Hirsch and Smale, 1974). As noted by Reichardt and Poggio (1981, p. 187), writing on the topic of neural information processing: "...every nontrivial computation has to be essentially nonlinear, that is, not representable (even approximately), by linear operations." Although any linear system can be analyzed by the same method of linear systems analysis, there is, unfortunately, no single method of analysis that can be applied to all nonlinear systems: for nonlinear systems the method of analysis depends on the type of nonlinear behavior. And, as mentioned above, the number of different kinds of possible nonlinear behavior is indefinitely large. For the engineer, this offers an indefinitely large range of nonlinear behaviors to be exploited - while at the same time challenging him or her with mathematical and conceptual problems that are seldom straightforward and may even exceed the competence of any living mathematician. On the other hand, though considerably more demanding than linear systems analysis, the application of methods of nonlinear systems analysis to sensory systems offers insights that are hidden to linear systems analysis. For example, although the sequence and characterization of processing stages within a linear system cannot be obtained by comparing the system's output and input, this analysis is possible for some systems that contain a nonlinear stage (Jenkins and Watts, 1968, p. 45).
258
Processing Spatial Form
13.A2 Some Definite Time Elapses between Stimulation of the Eye or Ear and the Occurrence of any Given Evoked Response This transport or conduction time is of interest in basic research because it may provide clues to the neural encoding of sensory information; it is of interest in clinical work because it can indicate abnormal neural function. There seems to be no agreed term for transport time. Although the term "latency" is sometimes used as a synonym when describing transient EPs, the correspondence between latency and transport time is unequivocal only for the onset of the first deflection. The widely used "latency to peak" measure is even more equivocal. For example, it may be that the arrival of signals at one cortical location triggers a process that is eventually responsible for peak A, while the arrival of signals at a second cortical location triggers a process that is eventually responsible for a later peak B. Each of the two values of "latency to peak" includes the time taken by the cortical process as well as the signal transport time; in such a case it is not immediately evident whether the longer latency of peak B is due to a longer transport time or a slower cortical process or some combination of the two. A further complication is that, in principle, two different cortical processes can contribute to the same peak in the averaged waveform. Furthermore, the peak latency can be spuriously affected, not only by the bandpass filter settings of the EEG amplifier but, in principle, also by the frequency tuning within the retina and the brain itself. For steady-state EPs, an unequivocal estimate of transport time seems to be unattainable in the general case. In brief, this is because the transport time of any given frequency component must be estimated from measurements of response phase. However, response phase depends on several factors additional to transport time, and the information required to allow for these other factors is not generally available. Following a previous approach we write the phase shift between stimulus and response ( degrees) for the fundamental (F Hz) component of the steady-state EP as
where F Hz is the stimulus frequency, T sec is the transport time, and 0' is an additional phase shift of unknown origin. Hence,
where ^ is the slope of the phase- versus-frequency plot. Thefirstterm in the equation is the contribution of transport time T to the slope. The second term is familiar in the everyday context. For example, the speed of light in water depends on frequency (and hence on color), and one consequence is the rainbow. This second term is analogous to the dispersion of light, because it represents a dependence of transmission time on frequency. In a linear system the third term is due entirely to the effect of attenuation, and such phase shifts can be calculated from the gain- versus-frequency plot. Unfortunately, this calculation is not necessarily valid for a nonlinear system.
David Regan and Marian P. Regan
259
From the preceding equation
Because of this evident difficulty in estimating transport time T, I coined the term "apparent latency" defined as
where T' is the apparent latency in seconds. If the second and third terms in Equation 13.1 are both zero, then the apparent latency equals the transport time, but without additional evidence this cannot be assumed to be so. The reason for using the word "apparent" was to emphasize that, although apparent latency has the same dimension as transport time (i.e., time), they are qualitatively different. For example, an amplifier with effective zero transport time can introduce an "apparent latency" of many milliseconds, and in principle, "apparent latency" can be negative even in the presence of a finite (necessarily positive) transport time. Whether the quantity "apparent latency" is of any value in EP research is an empirical question; the suggestion was that its value might be worth exploring on the grounds that, when the ideal (i.e., an unequivocal estimate of true transmission time) is unattainable, it makes sense to make the best of what one has. So far we have restricted the concept of apparent latency to the fundamental (F Hz) component of the response to F Hz stimulation, but a phase-versus-frequency curve can be plotted for each frequency component of the steady-state EP. Thus, an apparent latency can be assigned to each discrete component. In the case of harmonic components of frequency NF, the apparent latency is defined as T' = 3^N^- For example, a transport time of 100 msec will give a 36°/Hz contribution to the slope for the fundamental component, and a 72°/Hz contribution for the second harmonic component. For nonharmonic frequency components the situation is different, and may allow some insight into processes peripheral to the nonlinearity that generates these nonlinear components.
13.A3
A Method for Deriving the Response of Asymmetric Nonlinearities to a Sum of Two Sinewaves
We first consider the simple case of a half-wave linear rectifier fed with a single sine wave, and then with the sum of two sine waves. After this introduction we go on to the accelerating and compressive rectifiers fed with the sum of two sine waves, and finally discuss cascaded rectifiers and parallel-cascaded rectifiers of the same type and of mixed types.
13.A3.1 Half-Wave Linear Rectifier: Response to a Single Sinusoid Let the input to a half-wave rectifier (y = cx,x > 0;y — 0,x < 0) be e(t) = A cos(pt + 9p) = A cos x where p = 2-rr * frequency of input and Op — phase. Taking A > 0 and the constant of proportionality c = 1, the output is a function f ( x ) , where
260
Processing Spatial Form
We can express f ( x ) in terms of a Fourier series in x, where
and
and
13.A3.2
Half-Wave Linear Rectifier: Response to the Sum of Two Sinusoids
If the input voltage is given by
then we can rewrite this as
where k = Q/P.
David Regan and Marian P. Regan
261
Figure 13.32: The region for which cos x + k cos y > 0. Reprinted from Regan, M. P. and Regan, D. (1987) A frequency domain technique for characterizing nonlinearities in biological systems, /. Theor. Biol, 133: 293-317. Copyright 1988 with permission from Elsevier. The case k < 1. Without loss of generality, we can take P > 0 and the constant of proportionality, c, to be 1. First let us consider k < 1, and set
where x = (pt + Op) and y = (qt + 9q). f ( x , y) is a surface in and above the (a:, y) plane, bounded by (cos x + k cos y) = 0 in the (x, y) plane. Clearly adding 2?r to x or y leaves /(x, y) unchanged, so /(x, y) is a periodic function in x and y. So if we know /(x, y) in the rectangle (—vr, TT) * (-•TT, TT), we will know all its values. Since f ( x , y) is bounded in the rectangle (—TT, TT) * (—TT, TT) and its first derivatives are bounded, the double Fourier series in cc, y of f ( x , y) is a valid expansion in this rectangle (Hobson, 1926). If the Fourier series of /(x, y) is valid in the (x, y) plane, then it is valid on the line py—qx — p9q— q&p, found by eliminating t from x — pt+9P and y = qt + 9q.
262
Processing Spatial Form
The boundaries of the curves given by cos x + k cos y = 0 (see figure 13.32). In the shaded area, cos x + k cos y > 0, elsewhere cos x + k cos y < 0, giving f ( x , y) = 0. Since f ( x , y) is an even function, its double Fourier expansion will be a cosine series given by
where
Since the region is symmetrical in both x and y, A±mn can be found by using one quarter of the plane. Hence
since f(x, y) — 0 when a; > arccos(—k cos y). The calculation for A±mn, when m = 2 and n = 0, is shown below.
Using the identity
and letting
then Z0 = K, the complete elliptic integral of the first kind, and Zs can be expressed in terms of Zs-i and Z s _4 by using the recurrence formula
David Regan and Marian P. Regan
263
for s > 4 (Bennet, 1933). From
where E is the complete elliptic integral of the second kind, we have that
This gives the amplitude of the frequency (mx ± ny)/2ir and the phase angle (m9p ± nOq). The values of the amplitudes for m and n — 0, 1, 2, 3, 4 are as follows:
The third and higher odd-order terms are zero, and
264
Processing Spatial Form
and
where k < 1. The case k > 1. We can rewrite /(#, j/) in the following way:
where I — 1/k < 1 and consequently
where
A'±rs is the coefficient of cos(ry ± sa;) which may be written as cos(sx ± ry). So for a given m and n, say M and JV, we will have to consider j4±MArfor fe < 1 and A'±NM for fc > 1. For example, let us consider the coefficient of cos 2x:
Therefore the function of amplitude g(k) ±mn is given by
When fc > 1, we have the following values for A ±nm when m and nareO,l,2,3,4:
David Regan and Marian P. Regan
265
Figure 13.33: Linear half-wave rectifier. Ordinates plot the amplitudes of frequency terms in the output of a half-wave linear rectifier whose input is the sum of two sinusoids, one of frequency F\ and the other of frequency F-2. The amplitude (Ai) of the FI input is constant and the amplitude (A 2) of the FZ input is varied. Values of k are plotted along the abscissa, where k — AifA\, Reprinted from Regan, M. P. and Regan, D. (1988). A frequency domain technique for characterizing nonlinearities in biological systems. /. Theor. Biol., 133: 293-317. Copyright 1988 with permission from Elsevier.
266
Processing Spatial Form
Figure 13.34: Half-wave square law rectifier. See the caption to figure 13.33 for details. Reprinted from Regan, M. P. and Regan, D. (1988). A frequency domain technique for characterizing nonlinearities in biological systems. J. Theor. Biol., 133: 293-317. Copyright 1988 with permission from Elsevier.
The third and higher odd-order terms are zero, and E and K are functions of 1 /k < 1.
The function g(k)±mn is shown for values of k from 0 to 4 are shown in figure 13.33. The elliptical integrals were calculated using well-known algorithms (King, 1924; Regan, 1986).
13.A3.3 Half-Wave Square Law Rectifier: Response to the Sum of Two Sinusoids If the rectifier is of the form y = ex2, x > 0 and y = 0, x < 0 and if k < 1 then, as for the half-wave linear rectifier, we can consider the rectifier's output as the function /(x, y) where
David Regan and Marian P. Regan
267
where a; = (pi + #p) and y = (<jtf + Oq). Again, /(a;, y) is bounded in the rectangle (—TT, TT) * (-TT, TT) by cos x + k cos y and its Fourier expansion will be a cosine series given by
but now
since f(x, y) = 0 when a; > arccos(—fccosy). When fc > 1, we have
where / = 1/fc. See figure 13.34.
13.A3.4 Half-Wave Square Root Rectifier: Response to the Sum of Two Sinusoids If the rectifier is of the form y = c^x, x > 0 and y = 0, x < 0 and for fc < 1 then, we will have the function
where x = (pt + Op) and y = (qt + 6q). Thus
but now
since /(x, y) = 0 when a; > arccos(— fccosy). When fc > 1, we have
where / = 1/fc. See figure 13.35. Similarly, we can find the response to any half-wave rectifier whose equation is y = cxn,x > 0; y = 0, x < 0, where n is any real number.
268
Processing Spatial Form
Figure 13.35: Half-wave square-root rectifier. See the caption to figure 13.33 for details. Reprinted from Regan, M. P. and Regan, D. (1988). A frequency domain technique for characterizing nonlinearities in biological systems. /. Theor. Biol., 133: 293317. Copyright 1988 with permission from Elsevier.
13.A3.5
Two Cascaded Half-Wave Rectifiers, AC Coupled
If two identical half-wave linear rectifiers are DC coupled, the output will be the same as a single linear half-wave rectifier. Indeed, if two half-wave rectifiers are DC coupled and the first of the series is a linear rectifier, the final output will be the same as the second rectifier alone. (We assume that the two rectifiers are either both positive or both negative.) If the two rectifiers are AC coupled, after the two sinusoids pass through the first rectifier, their function is given by
where x = (pt + Op) and y = (qt + 0q). This has a DC level given by A00/2, the constant term in the double Fourier series expansion of f(x, y). If our two successive rectifiers are linked by AC coupling, this DC level must be removed so that the function entering the second rectifier is given by
David Regan and Marian P. Regan
269
Figure 13.36: Two cascaded linear half-wave rectifiers. See the caption to figure 13.33 for details. Reprinted from Regan, M. P. and Regan, D. (1988). A frequency domain technique for characterizing nonlinearities in biological systems. J. Theor. BioL, 133: 293-317. Copyright 1988 with permission from Elsevier. where
After passing through the second rectifier, the output is given by
This can be represented by a double Fourier series where the coefficients A ±mn are given by
This is represented graphically in figure 13.36.
13.A3.6
Cascaded Compressive Rectifiers
Figure 13.37 shows the result for two square-root (y = c^x, x > 0; y = 0, x < 0) rectifiers in series, and figure 13.29 shows the result for three square-root rectifiers in
270
Processing Spatial Form
Figure 13.37: Two cascaded square-root rectifiers. See the caption to figure 13.33 for details. Reprinted from Regan, M. P. and Regan, D. (1988). A frequency domain technique for characterizing nonlinearities in biological systems. J. Theor. BioL, 133: 293-317. Copyright 1988 with permission from Elsevier. series.
13.A3.7
Two Parallel Rectifiers whose Summed Outputs Pass through a Third Linear Rectifier: The Dichoptic Case
In this situation only one frequency (Fi) passes through rectifier number 1 and only one frequency (F2) passes through rectifier number 2 in parallel with the first rectifier. Then the outputs from both rectifiers combine to form the input of the third rectifier. The output from the first rectifier is f ( x ) where
with a DC level of P/TT. The output of the second rectifier is G(y), where
whose DC level is Pk/n. To adjust for the DC level, the input to the third rectifier will be the function
David Regan and Marian P. Regan
271
Figure 13.38: Two parallel half-wave linear rectifiers converging onto a third half-wave linear rectifier. See the caption to figure 13.33 for details. Reprinted from Regan, M. P. and Regan, D. (1988). A frequency domain technique for characterizing nonlinearities in biological systems. J. Theor. Biol, 133: 293-317. Copyright 1988 with permission from Elsevier.
The output from the third rectifier is given by
Hence the coefficients for the double Fourier series can be found for
This rectifier combination is shown in figure 13.29 for the case that all three rectifiers have a linear characteristic and coupling is AC rather than DC. Other cases such as mixed rectifiers (e.g., where numbers 1 and 2 are cube-root rectifiers and number 3 is a square law rectifier) are amenable to the same general mathematical treatment.
Processing Spatial Form
272
Figure 13.39: Rectifier characteristic with accelerating segment, compressive segment, and a threshold. See the caption to figure 13.33 for details. Reprinted from Regan, M. P. and Regan, D. (1988). A frequency domain technique for characterizing nonlinearities in biological systems. J. Theor. Biol., 133: 293-317. Copyright 1988 with permission from Elsevier.
13.A3.8
Half- Wave Rectifier Combining Accelerating and Compressive Segments
For this rectifier, the curve equation is given by
whered = l/64(4c)63'/16,5 = 63/64(4c)1/16, and c is chosen suitably. Consequently
where x = (pt + Op) and y = (qt + 6q). So
David Regan and Marian P. Regan
273
This is shown in figure 13.38 with c = 27T/15.
Acknowledgments We thank Derek Hamanansingh for valuable technical assistance. This work was sponsored by the Air Force Office of Scientific Research under grant F49620-03-1-0114. D.R. holds the NSERC/CAR Industrial Research Chair in Vision and Aviation.
References Aseltine, J. A. (1958). Transform Methods in Linear Systems Analysis. McGraw-Hill: New York. Barlow, H. B. and Levick, W. R. (1965). The mechanism of directionally selective units in rabbit's retina. J. Physiol. (Land.), 178: 477-504. Bennet, R. W. (1933). New results in the calculation of modulation products. Bell Syst. Tech. J., 228-243. Beverley, K. I. and Regan, D. (1973). Evidence for the existence of neural mechanisms selectively sensitive to the direction of movement in space. J. Physiol. (Land.), 235: 17-29. Blaquiere, A. (1966). Nonlinear Systems Analysis. Academic Press: New York. Bracewell, R. (1965). The Fourier Transform and Its Applications. McGraw-Hill: New York. Braddick, O. J. (1974). A short-range process in apparent motion. VJs. Res., 14: 519527. Cartwright, R. F. and Regan, D. (1974). Semi-automatic, multi-channel Fourier analyzer for evoked potential analysis. Electroencephalogr. Clin. Neurophysiol., 36: 547-550. Clynes, M., Kohn, M. and Lifshitz, K. (1964). Dynamics and spatial behaviour of light-evoked potentials, their behaviour under hypnosis, and on-line correlation in relation to rhythmic components. Ann. NYAcad. Sci., 112: 468-509. Cobb, W. A. and Dawson, G. D. (1960). The latency and form in man of the occipital potentials evoked by bright flashes. J. Physiol. (Land.), 152: 108-121. Cynader, M. and Regan, D. (1978). Neurons in cat parastriate cortex sensitive to the direction of motion in three-dimensional space. /. Physiol. (Lond.), 274: 549569. DeValois, R. L., Albrecht, D. G. andThorell, L. G. (1982). Spatial frequency selectivity of cells hi macaque visual cortex. Vis. Res., 22: 545-599. Ellis, B., Burrell, G. J., Wharf, J. H. and Hawkins, T. D. F. (1975). Independence of channels in colour contrast perception. Nature, 254: 691-692.
274
Processing Spatial Form
Gabor,D. (1946). Theory of communication. /. Inst. Electrical and Electronic Engin., 93: 429-456. Granit, R. (1955). Receptors and Sensory Perception. Yale University Press: New Haven, CT. Hagedorn, P. (1982). Nonlinear Oscillations. Oxford University Press: Oxford. Hayashi, C. (1964). Nonlinear Oscillations in Physical Systems. McGraw-Hill: New York. Hirsch, M. and Smale, S. (1974). Differential Equations, Dynamical Systems and Linear Algebra. Academic Press: New York. Hobson, E. W. (1926). The Theory of Functions of a Real Variable and the Theory of Fourier's Series. Cambridge University Press: Cambridge. Hong, X. H. and Regan, D. (1989). Visual field defects for unidirectional and oscillatory motion in depth. Vis. Res., 29: 809-819. Jenkins, G. M. and Watts, D. G. (1968). Spectral Analysis. Holden-Day: Oakland, CA. Julesz, B. (1981). Textons: The elements of texture perception and their interactions. Nature, 290: 91-97. Kaufman, L. and Williamson, S. J. (1980). The evoked magnetic field of the brain. Ann. NYAcad. Sci., 340: 45-65. Kaufman, L. and Williamson, S. J. (1982). Magnetic location of cortical activity. Ann. NYAcad. Sci., 388: 197-213. King, L. V. (1924). On the Direct Numerical Calculation of Elliptic Functions and Integrals. Cambridge University Press: Cambridge. Legge, G. E. and Foley, J. M. (1980). Contrast masking in human vision. /. Opt. Soc. Am., 70: 1458-1471. Lennie, P., Pokorny, J. and Smith, V. C. (1993). Luminance. J. Opt. Soc. Am. A, 10: 1283-1293. Lu, Z. L. and Kaufman, L. (2003). Magnetic Source Imaging of the Human Brain. Lawrence Erlbaum Associates: Mahwah, NJ. McCree, K. J. (1960a). Colour confusion produced by voluntary fixation. Optica Acta, 7: 281-290. McCree, K. J. (1960b). Small field tritanopia and the effects of voluntary fixation. Optica Acta, 7: 317. Movshon, J. A., Thompson, I. D. and Tolhurst, D. J. (1978). Receptive field organization of complex cells in the cat's striate cortex. J. Physiol. (Lond.), 283: 79-99. Nelson, J. I., Seiple, W. H., Kupersmith, M. J. and Carr, R. E. (1984). A rapid evoked potential index of cortical adaptation. Invest. Ophthal. Vis. Sci., 59: 454—464. Norcia, A. M. and Tyler, C. W. (1985). Spatial frequency sweep VEP: Visual acuity during the first year of life. Vis. Res., 25: 1399-1408.
David Regan and Marian P. Regan
275
Norcia, A. M., Tyler, C. W. and Allen, D. (1986). Electrophysiological assessment of contrast sensitivity in human infants. Am. J. Optom. Physiol, Optics, 63: 12-15. Norcia, A. M., Tyler, C. W. and Hamer, R. D. (1988). High visual contrast sensitivity in the young human infant. Invest. Ophthal. Vis. Sci., 29: 44—49. Norcia, A. M., Tyler, C. W. and Hamer, R. D. (1990). Development of contrast sensitivity in the human infant. Vis. Res., 30: 1475—1486. Orban, G. A. (1985). Velocity tuned cortical cells and human velocity discrimination. In D. Ingle, M. Jeannerod, and D. N. Lee (Eds.), Brain Mechanisms and Spatial Vision, pp. 36-52. Martinus Nijhoff: Dordrecht. Pieron, H. (1932). Les lois du temps du chroma. Annals de Psychologic, 30: 277-280. Pieron, H. (1952). The Sensations. F. Mullen London. Poggio, G. F. and Talbot, W. H. (1981). Neural mechanisms of static and dynamic stereopsis in foveal cortex of rhesus monkey. /. Physiol. (Land.), 315: 469—492. Regan, D. (1964). A Study of the Visual System by the Correlation of Light Stimuli and Evoked Electrical Responses. PhD Thesis, Imperial College, University of London. Regan, D. (1966a). An effect of stimulus colour on average steady-state potentials evoked in man. Nature, 210: 1056-1057. Regan, D. (1966b). Some characteristics of average steady-state and transient responses evoked by modulated light. Electroencephalogr. Clin. NeurophysioL, 20: 238-248. Regan, D. (1968a). A high frequency mechanism which underlies visual evoked potentials. Electroencephalogr. Clin. NeurophysioL, 25: 231-237. Regan, D. (1968b). Chromatic adaptation and steady-state evoked potentials. Vis. Res., 8: 149-158. Regan, D. (1970a). Objective method for measuring the relative spectral luminosity curve in man. J. Opt. Soc. Am., 60: 856-859. Regan, D. (1970b). Evoked potential and psychophysical correlates of changes in stimulus colour and intensity. Vis. Res., 10: 163-178. Regan, D. (1973a). Evoked potentials specific to spatial patterns of luminance and colour. Vis. Res., 13: 2381-2402. Regan, D. (1973b). Rapid objective refraction using evoked brain potentials. Invest. Ophthal., 12: 669-679. Regan, D. (1974). Electrophysiological evidence for colour channels in human pattern vision. Nature, 250: 437^39. Regan, D. (1975a). Colour coding in man investigated by evoked potential feedback and direct plot techniques. Vis. Res., 15: 175-183. Regan, D. (1975b). Recent advances in electrical recording form the human brain. Nature, 253: 401^07.
276
Processing Spatial Form
Regan, D. (1977). Speedy assessment of visual acuity in amblyopia by the evoked potential method. Ophthalmologica, Basel, 175: 159—164. Regan, D. (1979). Electrical responses evoked from the human brain. Sci. Am., 241: 134-146. Regan, D. (1983). Spatial frequency mechanisms in human vision investigated by evoked potential recording. Vis. Res., 23: 1401-1408. Regan, D. (1989). Human Brain Electrophysiology: Evoked Potentials and Evoked Magnetic Fields in Science and Medicine. Elsevier: New York. Regan, D. (2000). Human Perception of Objects. Sinauer Press: Sunderland, MA. Regan, D. and He, P. (1995). Magnetic and electrical responses of the human brain to texture-defined form and to textons. J. NeurophysioL, 74: 1167-1178. Regan, D. and He, P. (1996). Magnetic and electrical brain responses to chromatic contrast in human. Vis. Res., 36: 1—18. Regan, D. and Heron, J. R. (1969). Clinical investigation of lesions of the visual pathway: a new objective technique. J. Neural., Neurosurgery Psychiatry, 32: 479483. Regan, D. and Heron, J. R. (1970). Simultaneous recording of visual evoked potentials from the left and right hemisphere in migraine. In A. L. Cochrane (Ed.), Background to Migraine, pp. 68-77. Heinemann: London. Regan, D. and Lee, B. B. (1993). A comparison of the human 40 Hz response with the properties of macaque ganglion cells. Vis. Neurosci., 10: 439-445. Regan, D. and Regan, M. P. (1987). Nonlinearity in human visual responses to twodimensional patterns and a limitation of Fourier methods. Vis. Res., 27: 21812183. Regan, D. and Regan, M. P. (1988). Objective evidence for phase-independent spatial frequency mechanisms in the human visual pathway. Vis. Res., 28: 187-191. Regan, D., Schellart, N. A. M., Spekreijse, H. and Van der Berg, T. J. T. P. (1975). Photometry in goldfish by electrophysiological recording. Vis. Res., 15: 799807. Regan, D. and Spekreijse, H. (1970). Electrophysiological correlate of binocular depth perception in man. Nature, 255: 92-94. Regan, D. and Spekreijse, H. (1974). Evoked potential indications of colourblindness. Vis. Res., 14: 89-95. Regan, D. and Sperling, H. (1971). A method for evoking contour-specific scalp potentials by chromatic checkerboard patterns. Vis. Res., 11: 173-176. Regan, D. and Tyler, C. W. (197 la). Wavelength-modulated light generator. Vis. Res., 11:43-56. Regan, D. and Tyler, C. W. (1971b). Some dynamic features of colour vision. Vis. Res., 11: 1307-1324.
David Regan and Marian P. Regan
277
Regan, D. and Tyler, C. W. (1971c). Temporal summation and its limit for wavelength changes: An analog of Bloch's law for colour vision. J. Opt. Soc. Am., 61: 1414-1421. Regan, M. P. (1986). Analysis of a Nonlinearity with Application to Visual Processing. MSc Thesis, Dalhousie University, Nova Scotia, Canada. Regan, M. P., He, P. and Regan, D. (1995). An audio-visual convergence area in human brain. Exp. Brain Res., 106: 485^87. Regan, M. P. and Regan, D. (1988). A frequency domain technique for characterizing nonlinearities in biological systems. J. Theor. Biol., 133: 293-317. Regan, M. P. and Regan, D. (1989). Objective investigation of visual function using a nondestructive zoom-FFT technique for evoked potential analysis. Can. J. Neurolog. Sci., 16: 1-12. Reichardt, W. and Poggio, T. (1981). Theoretical Approaches in Neurobiology. MIT Press: Cambridge, MA. Reichardt, W., Poggio, T. and Hausen, K. (1983). Figure-ground discrimination by relative movement in the fly. Part II. Biol. Cybern.,46: 1-15. Schiller, P. H., Sandell, J. H. and Maunsell, J. H. R. (1986). Functions of the ON and OFF channels of the visual system. Nature, 332: 824-825. Sekuler, R., Pantle, A. and Levinson, E. (1978). Physiological basis of motion perception. In R. Held, H. W. Leibowitz, and H. L. Teuber (Eds.), Perception, pp. 67-96. Springer-Verlag: New York. Spekreijse, H. (1966). Analysis of EEC Responses in Man. PhD Thesis. Dr. W. Junk Publishers: The Hague. Spekreijse, H. and van der Tweel, L. H. (1965). Linearization of evoked responses to sine wave modulated light by noise. Nature, 205: 913-914. Spekreijse, H., van der Tweel, L. H. and Regan, D. (1972). Interocular sustained suppression: Correlations with evoked potential amplitude and distribution. Vis. Res., 12: 521-526. Stoker, J. J. (1950). Nonlinear Vibrations. Plenum Press: New York. Tigges, J. and Tigges, M. (1985). Subcortical sources of direct projections to visual cortex. In A. Peters and E. G. Jones (Eds.), Cerebral Cortex, Vol. 3, pp. 351378. Plenum Press: New York. van der Tweel, L. H. and Lunel, V. (1965). Human visual responses to sinusoidally modulated light. Electroencephalogr. Clin. Neurophysiol., 18: 587-598. Wright, W. D. (1928). A trichromatic colorimeter with spectral primaries. Trans. Opt. Soc. Land., 29: 225-241. Wright, W. D. (1928-29). A re-determination of the mixture curves of the spectrum. Trans. Opt. Soc. Land., 30: 141-164. Yamamoto, T. S. and DeValois, K. K. (1996). Chromatic-spatial selectivity for luminancevarying patterns. Invest. Ophthalmol. Vis. Sci., 37 (suppl.): S1064.
This page intentionally left blank
14. Linking Psychophysics and Physiology of Center-Surround Interactions in isual Motion Processing Duje Tadin and Joseph S. Lappin 14.1
Introduction: Moving Image Information
The eye is stimulated and informed by continually changing patterns. Images of environmental surfaces move over the retina as the eyes move, and the images expand, contract, and deform as objects move and the observer moves through the environment. The structure of these changing patterns is a principal form of information - about the shapes, locations, and movements of environmental objects and about the location and movement of the observer (Nakayama, 1985; Andersen, 1997; Lappin and Craft, 2000). The multipurpose contribution of image motion to visual function underscores the importance of investigating not only how motion is perceived but also how motion information is exploited by the visual system to perform other related functions. The speed, reliability, and precision with which the human visual system acquires information about the environment from changing stimulus patterns might well be considered miraculous. Indeed, the visual-motor coordination of muscular output with optical input is so effective and effortless that usually it is subconscious. Training in science or art is needed to recognize these commonplace miracles of physics, biology, and computation. As Martin Regan enjoys pointing out, the efficiency of motion perception is demonstrated by cricket. To Martin's eye, one of nature's finest achievements is displayed in a batsman's ability to swing his bat to intersect the path of a cricket ball approaching at about 90 mph, bounced off the ground, and released by the bowler less than half a second earlier. Regan (1992) calculates that the bat-ball intersection occurs within a window only about 10 cm wide and 2.5 ms duration. Keep in mind that information about the ball's trajectory must be extracted from moving images, and from a complex 279
280
Linking Psychophysics and Physiology of Center-Surround Interactions
background that is also moving relative to both the eye and the ball. Keep in mind as well that this information is extracted by a neural network of billions of interconnected cells. Such impressive visual achievements can be studied at many levels of analysis, including both psychophysics and physiology. Understanding visual motion perception requires both psychophysical analyses of the optical information at the eye and physiological analyses of the neural mechanisms that detect and transform this information. Progress has occurred on both levels, but the links between these two levels of knowledge are still limited. The purpose of this chapter is to review current evidence about links between physiological characteristics and behavioral functions of center-surround interactions in visual motion perception. We focus especially on the physiological characteristics of center-surround neurons in the primate cortical area MT (V5) and on perceptual functions that may be related to these cells. Some aspects of our discussion are necessarily speculative because the experimental evidence needed to link the physiological and psychophysical analyses of this aspect of motion perception is incomplete. Nevertheless, enough has been learned that it seems timely to review current progress and gaps in describing these links between physiology and perception. We begin by discussing the logic for linking evidence about visual perception and physiology. Next, in section 14.2, we review physiological evidence about the centersurround organization of receptive fields in area MT as well as psychophysical evidence of apparent perceptual correlates of this center-surround antagonism. Sections 14.3 and 14.4 then examine evidence about two potential perceptual functions of this centersurround motion mechanism - involvingfigure-groundsegregation and perception of surface shape, respectively.
14.1.1 Linking Macroscopic and Microscopic Analyses of Visual Information Flow A fundamental but challenging problem in all areas of science is to link macroscopic and microscopic analyses of a system. Understanding how a complex system operates in a changing environment requires knowledge about both the dynamics of environmental conditions and the dynamics of the system's components. Causal relationships operate simultaneously on multiple levels and also between levels. Macroscopic and microscopic processes are interdependent; analysis of either one alone is insufficient. Functional links between visual perception and physiology involve the transmission of information — involving correspondences between (a) environmental objects and events, (b) optical stimulation of the eyes, and (c) physiological response patterns. Spatiotemporal patterns of physiological responses must maintain sufficient information about environmental stimuli to permit real-time coordination of motor actions, recognition of objects, and comprehension of meaningful scenes and events. The informational capacity of visual motion perception is suggested by athletic skills in coordinating actions with the motion of a ball and with motions of other players. Such visual-motor coordination depends on the fidelity of information transfer between the environment and the brain. The structural correspondence between these very different
Du/'e Tadin and Joseph S. Lappin
281
physical domains is not physical, of course, but based on spatial and temporal patterns. Visual information transmission occurs simultaneously on both macroscopic and microscopic levels. The multilevel nature of the information transmission from optics to perception and action enables inferences about physiological processes from psychophysical experiments and inferences about perceptual functions from observations of physiology. A well-known paper by Teller (1984) reviews the logic of a variety of "linking propositions" for relating perceptual states with physiological states. As Teller points out, such linking propositions are at least implicit if not explicit in interpreting a large body of research in visual psychophysics and physiology. The best-known linking proposition for inferring physiological processes from perceptual behavior was articulated by Brindley (1960, p. 144): "[W]henever two stimuli cause physically indistinguishable signals to be sent from the sense organs to the brain, the sensations produced by those stimuli, as reported by the subject in words, symbols or actions, must also be indistinguishable." An important implication of this proposition is its contrapositive: if an observer can behaviorally discriminate between two stimuli, then these stimuli must elicit different physiological signals at the retina and at all subsequent neural stages leading to the behavioral response. That is, behavioral discrimination implies physiological differences. This proposition might seem almost trivially obvious, but it has nontrivial applications (Brindley, 1960; Teller, 1984; Lappin and Craft, 2000). The classic experiments by Hecht et al. (1942) and by Bouman (1950) offer good illustrations, where behavioral experiments were used to demonstrate that a single photon is sufficient to excite a single photoreceptor. Moreover, most of the response variability for a given stimulus condition was attributable to physical variability of the stimulus rather than to physiological signals or cognitive decisions. Westheimer's (1979) studies of spatial "hyperacuity" also illustrate how behavioral discriminations permit important inferences about the retinal encoding of spatial position. A striking finding in these psychophysical experiments was the surprisingly small amount of information that was lost by the vast neural network between retina and motor response. The linking proposition described by Brindley and others is a special case of a more general principle related to the second law of thermodynamics: information about the input stimulation of the retina can only be lost but not created by the visual processes that lead to behavioral responses. Therefore, behavioral discriminations cannot be more precise or reliable than differences between the retinal stimuli or physiological responses to these stimuli. In general, the links between physiology and perception may be understood in terms of the flow of information about spatial and temporal patterns. Spatiotemporal information associated with moving images may be described both macroscopically and microscopically. The macroscopic perceptual level involves perceptual correlations between retinal stimulation and behavioral actions, and the microscopic physiological level involves spatiotemporal patterns of neural spike trains. This scheme is illustrated in figure 14.1. The macroscopic and microscopic quantities of transmitted information
282
Linking Psychophysics and Physiology of Center-Surround Interactions
Figure 14.1: Schematic illustration of the relation between two levels of analysis of the flow of information in visual motion perception. The double-pointed arrows refer to correspondences (approximate isomorphisms) between spatiotemporal patterns (relational structures) in different domains. Inferential links between processes on the macroscopic and microscopic levels are based on the general requirement that the quantities of information transmission (from optical input to behavioral output) on the two levels of analysis must equal one another. must equal one another, and the information in the neural response patterns must correspond to that in the optical stimulation of the eye. In practice, however, measurements of the physiological information are sufficiently difficult that one can seldom achieve a quantitative match between the macro and micro descriptions of information. If two different trajectories of a moving object can be behaviorally discriminated, as in most ball-playing sports, then the temporal sequence of retinal images and the resulting signals stimulated by these two motions must also differ correspondingly at the retina and at subsequent neural stages. Moreover, spatiotemporal characteristics of the ball's trajectory must also be represented with sufficient precision by physiological signals to coordinate similar actions for similar trajectories. Correlations between a ball's trajectory and an athlete's actions imply additional correlations with both the optical patterns on the eyes and the physiological response patterns leading to the behavioral actions. The information-carrying patterns in these very different physical domains must correspond to one another; indeed, they must be essentially isomorphic. The flow of information is based on approximate isomorphism of relational structures or patterns (Lappin and Craft, 2000).] 'A relational structure is defined as "a set together with one or more relations on that set" (Krantz, Luce, Suppes, and Tversky, 1971, p. 8). Sets may consist of eilher discrete elements or continua (e.g., in space, time, or real numbers), and they may contain elements formed from other elements by operations such as concatenation, addition, differentiation, and products of sets. The relations may be simple equivalence vs. nonequivalence of categories, ordinal relations, differences, ratios, distances, or other such relations
Dufe Tadin and Joseph S. Lappin
283
Visual perception of distal environmental objects and motions involve discriminations between groups of patterns denned by invariances under many transformations of the proximal images. Variations in the positions and motions of an observer's eyes and body relative to an object, variations in the background context of other objects and motions, and variations in ambient illumination yield an infinite group of proximal retinal images potentially associated with a given moving object. If visual discriminations among environmental objects and motions are robust under variations in their retinal images, then the visual signals that describe these objects and motions must also remain relatively unchanged by these image transformations. The underlying physiological signals must do more than simply discriminate between different proximal image patterns - because uncertainty about the proximal image parameters leads inevitably to reductions in detection and discrimination (Shipley, 1960; Green and Swets, 1966; Lappin and Staller, 1981). If perceived trajectories of environmental objects are robust under movements of the observer's head and eyes, under variations in background context, and so on, then physiological mechanisms must extract motion information that also is robust under such image transformations. Relative motion, for example, is invariant under rigid image translations. Perhaps, this invariant information can be extracted by center-surround mechanisms. Such mechanisms may be critical to our ability to distinguish the motion of a target object from that of the background. We explore this hypothesis in the second half of this chapter. Lappin and Craft (2000) used this line of reasoning to reach conclusions about retinal spatial primitives that specify local surface shape. They showed first that the local shape of a smooth environmental surface is isomorphic with 2D second-order spatial derivatives of the retinal images as the object rotates in depth or is viewed stereoscopically. Lower-order properties such as first-order spatial derivatives do not provide such information because they are not invariant under object motions that change the object's orientation or distance from the observer. They also concluded from psychophysical experiments that this higher-order differential image structure that specifies local surface shape must be directly represented by retinal signals. The basis for the latter conclusion was that observers maintained hyperacuities (for relative motion and among multiple elements. The classical "information theory" of Shannon and Weaver (1949) is based on statistical correspondence between relational structures involving only equivalence/nonequivalence relations. An adequate theory of the information in spatiotemporal optical patterns and physiological signals, however, requires structures with stronger relations that are at least ordinal. The fact that relational structures may have higher orders of complexity, with multiple dimensions and higher orders of differentiation or exponentiation, is especially important in applying notions of relational structures to theories of vision. This idea is implicit in Koenderink and van Doom's (e.g., 1992a, 1992b) uses of differential geometry for describing images of surfaces and analyzing local image operators. This idea was also used by Lappin and Craft (2000) to study the structural correspondence between surfaces and their images and to identify the spatial primitive for perceiving surface shape from motion and stereo. "Isomorphisms" - one-to-one correspondences - between relational structures such as environmental objects, retinal images, and physiological response patterns are only "approximate" rather than exact for two main reasons: First, the correspondences may be statistically perturbed by random optical and physiological fluctuations. Second, environmental surfaces and their retinal images are related by projective geometry - where object surfaces are often partially occluded by nearer objects and surface regions, where there are unusual but possible ambiguities associated with accidental views of objects, and where the relative scales of distances in depth and in the frontal plane are ambiguous. Despite these largely technical qualifications, the concept of isomorphism is sufficiently close that we will use it as basic to our conception of information. Lappin and Craft (2000) provide a fuller discussion of the correspondences between surfaces and their images.
284
Linking Psychophysics and Physiology of Center-Surround Interactions
stereoscopic disparities) for discriminating the relative position of a point on a smooth surface even under noisy perturbations of lower-order spatial relations. Empirical support for this hypothesis needs to be expanded, but the methods and rationale of this study illustrate an expanded form of linking propositions for inferring physiological relations from psychophysical discriminations. How physiological signals represent such higher-order differential relations is not yet known. 2D second-order spatial derivatives involve spatial relations among at least five points, and relations between such spatial structures in two successive images would involve a relational structure of at least ten points. The complexity and multidimensionality of such relations obviously exceeds what can be represented by spike trains of individual neurons. Neural representation of such spatiotemporal relations would require relationships among neighboring neurons. Nevertheless, receptive-field characteristics of some MT cells may provide information about spatial derivatives of moving images, and such information might be involved in perceiving local surface shape. We review this hypothesis in more detail later in this chapter.
14.1.2 Inferring Perception from Physiology Linking macroscopic perceptual processes with microscopic physiological processes requires inferential links in both directions, from physiology to perception as well as from perception to physiology. The links in both directions are difficult; typically one cannot be certain whether signals recorded in neurons in a given visual area are sufficient or even necessary for perceptual responses to the given stimulus patterns. An apparent absence of difference in specific neural responses to two different stimuli typically does not imply that these stimuli are visually indiscriminable by an observer using responses of the whole visual system. With accumulating physiological evidence about receptive field characteristics of multiple visual areas, and with accumulating evidence about comparisons between the neural and behavioral responses to particular stimulus patterns, hypotheses about links from physiology to perception and behavior have grown less speculative. Converging physiological and psychophysical evidence over the past 20 years has begun to clarify the links between physiology and visual function in perceiving motion, though we still do not have a quantitative picture of the information flow from moving images through physiological mechanisms to perceived motions of environmental objects. The present chapter examines currently available knowledge about the links between physiological and perceptual functions of motion-sensitive center-surround neurons in primate area MT. Our interest in outlining this knowledge was sparked by our recent discoveries of apparent perceptual correlates of center-surround antagonism in the responses of many cells in MT (Tadin et al., 2003). Our experiments were psychophysical but they were stimulated by physiological findings.
Diy'e Tadin and Joseph S. Lappin
285
14.2 Center-Surround Interactions in Motion Processing Center-surround receptive field organization is a ubiquitous property of visual neurons (Allman et al., 1985a). Such mechanisms are well suited for extracting information about the spatial organization of retinal images. They amplify responses to spatial differences in properties such as luminance, and suppress responses to uniform image regions. The spatial organization of image variations usually is more informative about the structure of the environment than the uniform properties of retinal images. Given the computational demands of visual motion processing, center-surround mechanisms may play an important role in motion perception. Spatial variations in image motion carry important visual information about the relative locations, orientations, and shapes of surfaces, about the trajectories of moving objects, and about the observers locomotion through the world (Nakayama, 1985; Braddick, 1993; Regan, 2000; Lappin and Craft, 2000; Warren, 1995). Uniform motion fields, however, are often caused by eye or body movements and, as such, can make the perception of object motion more difficult. Center-surround mechanisms are well suited for extracting information about the spatial structure of moving fields and for suppressing information about uniform motions.
14.2.1 Center-Surround Mechanisms Found in MT and Elsewhere Center-surround interactions are frequently observed in the neural areas sensitive to motion. In the primate cortex, center-surround receptive field organization has been observed in the primary visual cortex (VI) (Jones et al., 2001; Cao and Schiller, 2003), MT (Allman et al., 1985b), and lateral MST (Eifuku and Wurtz, 1998). Other areas and species in which center-surround neurons have been found include rabbit retina (Olveczky et al., 2003), tectum of frog (Griisser-Cornehls et al., 1963) and pigeon (Frost and Nakayama, 1983), superior colliculus of both cat (Mandl, 1985) and macaque monkeys (Davidson and Bender, 1991), area 17 of cat (Hammond and Smith, 1982; Kastner et al., 1999), and PMSL of cat (von Grunau and Frost, 1983). Among these motion-sensitive areas, center-surround mechanisms have been described in most detail in MT (Allman et al., 1985b; Tanaka et al., 1986; Born and Tootell, 1992; Lagae et al., 1989; Raiguel et al., 1995; Bradley and Andersen, 1998; Born, 2000; Borghuis et al., 2003). The function of these mechanisms has been studied both neurophysiologically (Xiao et al., 1995, 1997a, b, 1998; Bradley and Andersen, 1998; Bom et al., 2000) and by computational modeling (Nakayama and Loomis, 1974; BuraCas and Albright, 1994, 1996; Liu and Van Hulle, 1998; Gautama and Van Hulle, 2001). This substantial literature provides a foundation for describing probable perceptual roles of MT center-surround mechanisms. First described in the owl monkey by Allman and Kaas (1971), area MT is traditionally considered part of the dorsal processing stream and is believed to play a central role in motion perception (Orban, 1997). Its association with the dorsal stream emphasizes functions in perceiving space and guiding motions - "where" or "how" functions. The functions of center-surround mechanisms in MT, however, probably also involve
286
Linking Psychophysics and Physiology of Center-Surround Interactions
Figure 14.2: Responses of a typical center-surround (A) and wide-field neuron (B) to random-dot motion of increasing patch size. Patchy 2-deoxyglucose uptake in MT (C) was obtained by presenting a large pattern of random dot motion to a macaque monkey (Born, 2000). Dark areas show clusters of wide-field neurons. Illustrations courtesy of Richard T. Born. so-called "what" functions in shape and object perception, functions often attributed to the ventral stream. Allman et al. (1985b) found that responses of most neurons in owl monkey MT were modulated by stimulation in the region surrounding the classical receptive field (figure 14.2A). The surround regions are often described as "silent" because stimulation of the surround alone does not affect the neuron's response. Most of the observed interactions were antagonistic: the firing rate to motion in the preferred direction in the center region was reduced when the motion pattern was expanded into the surround region. That is, center-surround neurons responded poorly to large fields of uniform motion. If the motion in the surround was in the anti-preferred direction, its suppressive effect diminished, and for some neurons the response was facilitated. Center-surround neurons are found in all layers of MT but are less common in layer IV (Raiguel et al., 1995; Born, 2000), suggesting that surround inhibition is probably mediated via intrinsic MT connections. The spatial extent of the surround is usually three to five times larger than the extent of the center region (Tanaka et al., 1986; Raiguel et al., 1995), and the directional tuning of the surround is broader than that of the center (Born, 2000). Initial reports described surrounds as encircling the central region of the receptive fields (Allman et al., 1985b; Tanaka et al., 1986). Subsequent explorations, however, found that the spatial extent of most MT surrounds is nonuniform (Xiao et al., 1995,1997a, 1997b, 1998), suggesting that such surrounds may have important computational properties (explored further below). In addition to neurons with such center-surround antagonism (sometimes called "local motion" neurons), some MT neurons prefer large moving fields and show no
Du/'e Tadin and Joseph S, Lappin
287
surround suppression ("wide-field" neurons) (figure 14.2B; Altaian et al., 1985b; Born and Tootell, 1992; Raiguel et al., 1995). These two types of neurons are clustered anatomically (figure 14.2C; Born and Tootell, 1992) and make different efferent connections, with wide-field neurons projecting to ventral MST and center-surround neurons projecting to dorsal MST (Berezovskii and Born, 2000). These two types are also believed to have different functions, with center-surround neurons coding object motion and wide-field neurons signaling background motion (Born et al., 2000). Currently available descriptions of center-surround interactions in primate MT have been generally consistent. The stimulus patterns used to characterize these receptive fields, however, have been almost exclusively high-contrast random-dot patterns. As we shall see, this restriction of the methods also restricts the description of these receptive fields. For example, the spatial organization of receptive fields in primate VI has been found to vary with both contrast (Sceniak et al., 1999) and color (Solomon et al., 2003). Pack and Born (personal communication, August 2003) found that centersurround antagonism in MT neurons substantially decreases or disappears at low contrast.
14.2.2 Perceptual Correlates of Center-Surround Antagonism If center-surround antagonism is indeed an integral part of motion processing, we might expect to see the perceptual signature of this antagonism in the form of impaired motion visibility with increasing stimulus size. Existing evidence, however, shows that increasing the size of a low-contrast moving stimulus enhances its visibility, presumably owing to spatial summation. Such psychophysical estimations of the spatial properties of motion mechanisms tend to be based on low-contrast (Anderson and Burr, 1991; Watson and Turano, 1995) or noisy stimuli (Lappin and Bell, 1976), while physiological descriptions of center-surround neurons have been obtained with high-contrast patterns. Several physiological studies of visual cortex have found that center-surround interactions depend on contrast, with surround suppression stronger at high contrast and spatial summation more pronounced at low contrast (Kapadia et al., 1999, Levitt and Lund, 1997; Sceniak et al., 1999). Thus, contrast thresholds may not fully describe the spatial properties of human motion mechanisms, especially at high contrast. Tadin et al. (2003) measured the threshold exposure durations needed for human observers to accurately identify the motion direction of a drifting Gabor patch. Observers viewed foveally centered Gabor patches which varied in size and contrast. The results (Figure 14.3) showed that at low contrast, duration thresholds decreased with increasing size. This result, implying spatial summation of motion signals, is consistent with earlier reports (Anderson and Burr, 1991; Watson and Turano, 1995). At high contrast, however, duration thresholds increased fourfold as the Gabor patch width increased from 0.7° to 5°. This surprising result implies neural processes fundamentally different from spatial summation. Several psychophysical characteristics found by Tadin et al. (2003) indicate that this result is attributable to center-surround antagonism in MT: (1) Impaired visual performance with larger stimuli has been construed as the perceptual signature expected from antagonistic center-surround mechanisms (Westheimer, 1967). (2) The "critical size" at which strong suppression is first observed is large enough to impinge on the
288
Linking Psychophysics and Physiology of Center-Surround Interactions
Figure 14.3: Duration thresholds as a function of stimulus size at different contrasts. Adapted from Tadin et al. (2003). surrounds of MT neurons with foveal receptive fields of the macaque monkey (figure 14.4; Raiguel et al., 1995). (3) The detrimental effect of stimulus size diminished in the visual periphery, consistent with the increase of MT receptive field sizes with eccentricity (Raiguel et al., 1995). (4) Motion aftereffect (MAE), a perceptual aftereffect attributed at least partly to MT activity (Huk et al., 2001), is weaker if induced with large high-contrast stimuli. This result would be expected if such stimuli inhibit the activity of MT neurons whose adaptation normally contributes to the MAE. (5) Isoluminant motion gratings did not produce surround suppression, a characteristic that dovetails with the finding that MT neurons respond much more weakly to motion of isoluminant gratings than to motion of luminance gratings (Gegenfurtner et al., 1994). Taken together, these results suggest that impaired motion perception for large highcontrast patterns is a perceptual correlate of center-surround antagonism in MT. Contrast Dependency of Center-Surround Antagonism It is intriguing that increasing stimulus contrast dramatically changes the spatial integration of motion signals. This psychophysical result is compatible with physiological evidence that in VI neurons the relative strength and/or spatial extents of the excitatory center and inhibitory surround change with contrast (Kapadia et al., 1999; Levitt and Lund, 1997; Sceniak et al., 1999). This psychophysically observed transition from summation to suppression occurs around 5% contrast (Tadin et al., 2003), which is the contrast where an average MT neuron attains about 25% of its maximum response (Sclar et al., 1990). The contrast dependency of center-surround antagonism may have a functional role. At high contrast, the perceptual benefits of surround suppression (Born et al., 2000; BuraCas and Albright, 1996; Gautama and Van Hulle, 2001; Liu and Van Hulle, 1998; Nakayama and Loomis, 1974; Xiao et al., 1997b) probably outweigh the re-
Duje Tadin and Joseph S. Lappin
289
Figure 14.4: Psychophysically estimated "critical size" shown relative to an average foveal MT receptive field. The dark dashed circle illustrates the stimulus size beyond which an average foveal MT center-surround neuron exhibits surround suppression (Raiguel et al., 1995). Full spatial extent of the stimulus is indicated by the light dashed circle. This comparison assumes that the properties of human and macaque MT are comparable (Rees et al., 2000), and that the receptive field sizes are similar for the two species (Kastner et al., 2001). Adapted from Tadin et al. (2003). suiting decreases in neural activity and motion sensitivity. Motion sensitivity becomes more important at low contrast, so it seems functionally beneficial that receptive field organization shifts with reduced contrast from center-surround antagonism to spatial summation. Spatial integration versus differentiation of motion signals seems, therefore, to reflect an adaptive process that adjusts processing of motion signals to fit the input signal/noise characteristics. Perceptually important suppressive mechanisms seem to operate only when the sensory input is sufficiently strong to guarantee visibility. Analogous contrast dependency has also been found in other psychophysical studies of interactions among spatially separate motion signals. Lorenceau and Shiffrar have studied the perceptual integration of separated moving contours of a shape (usually a diamond) viewed through multiple apertures that occluded its vertices (Lorenceau and Shiffrar, 1992, 1999; Shiffrar and Lorenceau, 1996; Lorenceau and Alais, 2001). Perception of the partially occluded shape in these displays required integration of contours moving in different directions inside spatially separate apertures. With lowcontrast contours, the spatially separate motions usually appeared as a rigidly moving and globally connected object. At high contrast, the same patterns of local contour motion usually appeared disconnected and unrelated. Lorenceau and Shiffrar (1999) also found that motion integration was more likely to occur in noisy, eccentric viewing, and isoluminant conditions - the same conditions in which Tadin et al. (2003) found surround suppression to be weaker. Alais et al. (1998), Takeuchi (1998), and Lorenceau and Zago (1999) also found that spatially separate patches of drifting gratings were
290
Linking Psychophysics and Physiology of Center-Surround Interactions
more likely to be perceived as a coherently moving form at low contrast. Specifically, Takeuchi (1998) found that perception of a rigidly moving form and perception of independently moving gratings were equally likely at about 5% contrast. This finding is consistent with that of Tadin et al. (2003), who found the transition from spatial summation to spatial suppression to occur at about 5% contrast. Other Psychophysical Results Consistent With Surround Suppression Psychophysical experiments have often measured motion discriminations near threshold values of contrast or statistical coherence. Such impoverished motion signals probably promote spatial summation, precluding observations of surround suppression. Whatever hints of surround suppression might be found in the literature are likely to be found in experiments using large high-contrast patterns. Indeed, Verghese and Stone (1996) found that when a large high-contrast pattern was divided into smaller parts, speed discriminations actually improved. The authors suggested that surround suppression was one possible explanation. Derrington and Goddard (1989) found that direction discriminations of brief large gratings decreased when contrast was increased. This result is consistent with those of Tadin et al. (2003), though the authors suggested a different explanation. Murakami and Shimojo (1993) studied induced motion in stationary test stimuli presented within a large patch of moving dots. They found that induced motion was replaced by motion assimilation when the test stimulus was small, low contrast, or presented in the visual periphery - suggesting that motion antagonism changes to motion summation under these conditions. Surround suppression is also suggested by findings in several MAE studies in which large high-contrast adaptation patterns produced relatively small MAEs (Sachtler and Zaidi, 1993; Murakami and Shimojo, 1995; Tadin et al., 2003). Kim and Wilson (1997) found that when the directions of motion in center and surround differed by 45 °, the perceived direction of the central motion could shift 30-40 ° away from the surround direction. Like most of the results above, these directional interactions increased with the size and contrast of the surround. The perceived shift in the direction of the center stimulus might be a result of selective inhibition of neurons tuned to directions similar to the surround. Broad directional tuning of surround suppression (Born, 2000) may be the cause behind this rather large perceived directional shift. At high contrast, direction discriminations are improved by reducing the size of the motion pattern (Tadin et al., 2003). This trend should reverse at some small size, however, as the motion of very small stimuli should be hard to identify. Thus, at any contrast where surround suppression is observed, there must be an intermediate size at which performance is best. This optimal size marks the transition between summation and suppression of motion signals. The question of whether this optimal size varies with contrast was investigated by Lappin et al. (2003). Duration thresholds were measured for discriminating motion directions of random-dot motion patches of various sizes and contrasts. At maximum contrast the optimal size was about 1 ° in diameter, but the optimal size increased as the contrast was reduced. A related result was found by Nawrot and Sekuler (1990), who investigated how high-contrast motion at one location influences the perception of adjacent incoherent
Duje Tadin and Joseph S. Lappin
291
random motion. Stripes of coherently moving random-dot patterns alternated with stripes of random motion. When the stripes were narrow, the perceived motion of the random stripes was assimilated to the direction of the coherent stripes, and the whole pattern appeared to move in the same direction. When the stripes were wide, the random stripes appeared to move in the opposite direction from the coherent stripes. The stripe width for this transition from motion assimilation to motion contrast occurred at about 1°, a value similar to that found by Lappin et al. (2003).
14.2.3 Interim Conclusions Center-surround antagonism in motion processing has been found in a diverse set of both physiological and psychophysical studies. The spatial interactions in these phenomena depend critically on several stimulus parameters including contrast, eccentricity, and signal/noise ratio. Thus, perceived motion can change substantially depending on the viewing conditions. The motion system has the difficult task of balancing two fundamentally conflicting processes: spatial integration and spatial differentiation (Braddick, 1993). The adaptive, contrast-dependent nature of center-surround interactions (Tadin et al., 2003), however, may allow the visual system to alternate between integration and differentiation depending on the available stimulus information. Describing these center-surround mechanisms and the conditions in which they operate is important, but it is only half the story. Understanding their functional contribution to vision is arguably more important. At first glance, center-surround antagonism may seem maladaptive - causing impaired motion perception and failure to integrate motion signals arising from a single object. One would expect such costs to be offset by significant visual benefits.
14.3
Segregating Surfaces
An important early step in visual processing is to organize the retinal image into surfaces and objects, segregating figure from ground. Objects can differ from their backgrounds in a variety of physical properties - including luminance, texture, color, motion, temporal synchrony, and binocular disparity (Regan, 2000; Lee and Blake, 1999). The extensive use of camouflage in the animal world (e.g., Thery and Casas, 2002) indicates the crucial visual role of figure-ground segregation. Even the best camouflage, however, breaks down when an animal is moving. Motion discontinuities between object and background provides important additional information for segregating images into separate surfaces (Nakayama and Loomis, 1974; Braddick, 1993; Regan, 2000). Surface segregation also defines regions within which motion signals should be integrated. Local motion signals are inherently ambiguous and often noisy. Perceiving the "veridical" motion of an object requires spatial integration of motion signals. It is critically important, however, to integrate only the motion signals arising from the same surface. Integrating motion signals from different surfaces will necessarily degrade motion perception. Constraining motion integration within object boundaries depends on figure-ground segregation.
292
Linking Psychophysics and Physiology of Center-Surround Interactions
14.3.1 Psychophysics of Motion-Based Figure-Ground Segregation Everyday experience suggests that we are very good at detecting objects moving against a background. We wave our arms when we want to be seen, and we stand still when we want to hide. Psychophysical observations accord with our intuitions. A single moving object immediately pops out from the background and strongly attracts attention (e.g., Dick et al., 1987). Evidence suggests that motion can be as good (Regan, 1989) and sometimes even better (Nawrot et al., 1996) than other visual cues in segregating figure from ground. Regan and his colleagues have extensively studied the perception of motion-defined form over the past two decades (reviewed in Regan, 2000). A general conclusion from Regan's research is that vision is very efficient in detecting, discriminating, and recognizing motion-defined objects. Baker and Braddick (1982) found that observers could effortlessly discriminate 2D shapes defined solely by differential motion from a random-dot background. Subsequent experiments by Regan showed that the perception of motion-defined forms is often as good as the perception of luminance-defined forms. At high contrast and with fast motions, orientation and vernier discriminations for luminance- and motiondefined forms are comparable (Regan, 1986,1989; Sary et al., 1994). Discriminations of aspect ratios of motion- and luminance-defined forms are also very similar (Regan and Beverley, 1984; Regan and Hamstra, 1991). This impressive sensitivity to motion-defined forms deteriorates quickly, however, if stimulus parameters such as speed and contrast are far from optimal values (Regan, 1989; Lappin et al., 2003). Motion-defined forms also have to be larger and longer in duration to match visual sensitivity for luminance-defined forms (Regan and Beverley, 1984). Conflicts Between Spatial Integration and Differentiation of Motion Signals Highlighting differences between local motion signals, however, is not always adaptive. Different local motion signals often belong to the same object, and should be integrated rather than differentiated. Biological motion patterns are a good illustration. Integration and segregation of motion signals can be guided by other visual cues (Rivest and Cavanagh, 1996; Croner and Albright, 1997,1998) and even form information (Lorenceau and Alais, 2001). Another solution that is independent of other visual submodalities may be to determine the spatial extent of motion integration based on the local motion signals. The strength and quality of motion signals can be substantially reduced at low contrast, by noise, or when defined by color. Apparent differences in the direction and velocity of local motion signals may be caused by noise, and the spatial segregation of such signals may lead to incorrect perception. Moreover, under such low-visibility conditions, motion patterns might require spatial integration just to be perceived at all. On the other hand, when motion signals are strong, spatial variations in the directions and speeds detected by local mechanisms are more likely to reflect the "true" motion pattern. Under conditions of good visibility, spatial differentiation should be favored. If the spatial organization of motion signals is adapted in this way to the visibility conditions, then it may also fail under some conditions. Lorenceau and Shiffrar (1992, 1999) have shown that the perceptual integration of moving contours belonging to a
Duje Tadin and Joseph S. Lappin
293
Figure 14.5: An illustration of a 2D motion-defined shape. The shape is visible because the motion within the oval region is different from the background motion (the oval outline is only for illustration). An observer's task was to discriminate the orientation of the oval shape. single rigid form seen through separate apertures is more likely under low contrast, eccentric viewing, isoluminant, and noisy conditions. Correspondingly, however, such moving contours are less likely to be correctly integrated when the local visibility increases. Moreover, vision also makes the complementary error of failing to segregate motion signals from different objects. Regan and Beverley (1984) and Regan (1989) found that motion-defined form discrimination is impaired when the strength of the motion signals is degraded, even when motion sensitivity is relatively unaffected. Lappin et al. (2003) found that discriminations of motion directions and of motion-defined forms are oppositely affected by variations in contrast - direction discriminations decreasing and form discriminations increasing with greater contrast. The trade-off between these two tasks suggests that spatial integration and differentiation of motion signals are adaptively controlled by local visibility conditions. Psychophysical Links Between Figure-Ground Segregation and Center-Surround Mechanisms Vision is very good at detecting relative motion and segregating surfaces, but it is an open question whether these visual abilities derive from centersurround mechanisms. Intuitively, center-surround mechanisms seem likely to be involved in motion-based figure-ground segregation, but this link needs experimental support. Psychophysical evidence indicates that surround suppression varies with the stimulus conditions (Tadin et al., 2003). If figure-ground discrimination depends on center-surround mechanisms, then figure-ground discriminations should be accurate in stimulus conditions with strong surround suppression and impaired in conditions with reduced surround suppression.
294
Linking Psychophysics and Physiology of Center-Surround Interactions
Figure 14.6: Duration thresholds for motion direction and figure-ground discriminations as a function of stimulus contrast.
Lappin et al. (2003) found such a relationship between surround suppression and motion-defined form discriminations using the task illustrated in figure 14.5. Tadin et al. (2003) had found previously that duration thresholds for discriminating the directions of large patterns increased with contrast, indicating that surround suppression increases with contrast (figure 14.6). In the form-discrimination task of Lappin et al., the same contrast increase yielded improved performance. At the highest contrast (black arrow), form discrimination was better than the direction discrimination of the same pattern. Interestingly, the improvements in form discrimination with increasing contrast were approximately equal to the decrements in direction discrimination, so that the two functions were nearly symmetrical around the horizontal dashed line in figure 14.6. Additional experiments are needed to clarify the relation between these two aspects of motion perception. The visual ability to accurately perceive motion-defined forms may seem surprising in relation to the supposed physiological separation between mechanisms for form and motion processing. The fact that motion-defined forms pop out from the background and attract attention (typical dorsal stream functions) does not imply that shape characteristics such as orientation and aspect ratio will be accurately perceived. The finding that motion cues are sufficient for perceiving both 2D and 3D shape is remarkable and suggests an interesting interaction between motion processing and ventral stream functions. The visual complexity and diverse phenomenology of motion-based form perception all but guarantees that its neural correlates will involve multiple neural mechanisms in multiple visual areas.
Duje Tadin and Joseph S. Lappin
295
14.3.2 Neurophysiology of Motion-Based Figure-Ground Segregation Given the visual sensitivity to relative motion and motion-defined forms, specialized neural mechanisms probably operate to detect differences in the spatial distribution of motion signals. Because of the high proportion of center-surround neurons in MT, and because of its central location within anatomical pathways of motion perception, MT seems likely to be involved in segregating figure from ground and perceiving motiondefined form. From the outset, we emphasize that MT mechanisms are unlikely to be involved in all aspects of motion-based figure-ground segregation. As is discussed below, MT is well equipped for segregating figure from ground but lacks mechanisms to directly extract detailed 2D shape of motion-defined forms. This distinction is in agreement with clinical evidence demonstrating that the detection of motion-defined forms can remain intact even when the identification of such forms is severely impaired (Regan et al., 1992; Schenk and Zihl, 1997; Cowey and Vaina, 2000). Neural mechanisms for figure-ground segregation of moving forms are discussed first, followed by the discussion of how 2D shape of such forms may be extracted. Segregation of Moving Objects from the Background Moving objects must first be detected and then- motion estimated. In principle, this can be done with little regard for detailed 2D shape. Once a moving object is detected and foveated, usually it can be recognized based on cues other than motion, as most objects are not perfectly camouflaged. Thus, the detection of moving objects is useful whether or not such objects can be recognized based on motion cues alone. The responses of center-surround MT neurons amplify the neural signature of objects moving relative to their background. The question, however, is whether such a simple mechanism is sufficient to support our ability to effortlessly segregate moving objects from the background. The observation that center-surround neurons are excited by relative motion and suppressed by uniform motion suggests a link between surround suppression and figureground segregation. That is, suppression occurs when the center and surround are stimulated by the motion of a relatively large visual feature. For this mechanism to be efficient, it should not be inhibited when different visual features stimulate the surround and center regions even if they are moving in the same direction. This may occur, for example, when two objects at different depths move with the same angular velocity or when the observer is moving and fixating at a point more distant then a moving object. In such cases, center and surround regions of some MT neurons will be stimulated by similar motion arising from different objects, resulting in response suppression. Because this suppression would be caused by object motion, it would somewhat diminish the ability of MT neurons to contribute to figure-ground segregation. Most MT cells, however, are disparity selective (DeAngelis and Newsome, 1999), a tuning property that may be exploited for "inhibiting" surround suppression if center and surround motions are at different depths. This hypothesis was investigated by Bradley and Andersen (1998), who found that the disparity tuning of center and surround regions tend to be different. That is, a neuron that is typically suppressed by a surround moving in its preferred direction becomes unsuppressed if the center and surround motions are at different depths. Surround suppression increased as either surround motion or its
296
Linking Psychophysics and Physiology of Center-Surround Interactions
depth became more similar to motion in the center. The disparity dependence of surround suppression indicates that MT neurons are modulated by motion fields arising from a single surface, but are unaffected by the motions of other surfaces at different depths. This "elaborated" surround suppression improves the ability of MT neurons to efficiently segregate moving objects from the background. In addition to detecting moving objects, our visual system must correctly estimate object speed and trajectory. This is critical, for example, in accurately foveating the moving object and controlling subsequent pursuit eye movements - skills essential in sports like baseball and cricket. Center-surround neurons may signal the presence of a moving object but cannot also signal its velocity because the responses of centersurround neurons are also influenced by the background speed and direction (Allman et al., 1985b). The responses of center-surround neurons, however, might be disambiguated by the neural signal representing the speed and the direction of background motion - information encoded by wide-field MT neurons (figure 14.2B). The hypothesis that center-surround and wide-field neurons jointly code object motion has received direct support from recent microstimulation experiments (Born et al., 2000). These researchers have exploited the fact that center-surround and wide-field neurons are anatomically segregated (figure 14.2C; Born and Tootell, 1992) and can be separately stimulated. Monkeys were trained to fixate a stationary target. A moving target then appeared in the periphery and the animal's task was to make a foveating saccade and visually pursue the target. On half of the trials, microstimulation was applied while the animal was estimating the direction and speed of the moving target. Microstimulation of MT sites with the center-surround neurons shifted pursuit eye movements in the direction similar to the preferred direction of the stimulated clusters of neurons. In contrast, microstimulation of the MT sites with wide-field neurons shifted pursuit eye movements in the direction opposite to the preferred direction of the stimulated neurons. These results suggest that the activity of center-surround MT neurons represents object motion whereas the activity of wide-field neurons signals background motion. Importantly, replacing microstimulation with large background motion had an effect similar to that of stimulating wide-field neurons. Neurophysiology of Motion-Defined 2D Shape Once a moving object is detected and visually segregated from its background, motion information can be used to perceive its 2D shape (Regan, 2000). Detailed motion-defined shape is conveyed by kinetic (motion-defined) boundaries - a building block (akin to edges) of motion-defined objects. One strategy for investigating neural mechanisms involved in perceiving the shapes of motion-defined objects is to look for neurons and brain areas with selectivity for kinetic boundaries. Brain imaging studies have found that MT responds strongly to kinetic boundaries, but this response does not differ from MT's response to uniform motion (Reppas et al., 1997; Van Oostende et al., 1997; Dupont et al., 1997; for an exception see Shulman et al., 1998). Reppas et al. (1997) have also shown that several early visual areas are activated by kinetic boundaries, but this activity is unlikely to be specific to motion-defined form because such areas are also activated by other types of boundaries (Leventhal et al., 1998). Orban and colleagues have suggested that the kinetic occipital area (KO)
Duje Tadin and Joseph S, Lappin
297
is an area specialized for processing of kinetic boundaries (Van Oostende et al., 1997; Dupont et al., 1997). Recently, however, KO has been shown to respond to boundaries denned by cues other than motion (Zeki et al., 2003). So far, then, imaging studies have not revealed whether MT or other cortical areas are specialized for processing kinetic boundaries. Surgical lesions of area MT (and adjacent regions) in nonhuman primates have produced conflicting results about the importance of MT in processing kinetic boundaries, with postlesion impairments ranging from mild (Schiller, 1993; Lauwers et al., 2000) to severe (Marcar and Cowey, 1992). Single cell results, however, are more consistent and show that single MT neurons are not selective for the orientation or location of kinetic boundaries (Marcar et al., 1995). MT neurons generally respond very poorly to kinetic boundaries. In fact, MT neurons respond as weakly to kinetic boundaries as they do to transparent motion (Snowden et al., 1991; Bradley et al., 1995). It should be emphasized, however, that neurons in other visual areas (primarily V2) are selective for the orientation of motion-defined boundaries (Marcar et al., 2000; Leventhal et al., 1998). V2 neurons tuned to the orientation of kinetic boundaries often exhibited similar orientation tuning to luminance edges, resulting in cue-invariant responses to visual boundaries (Marcar et al., 2000). Notably, the response to kinetic boundaries was delayed by about 40 ms (relative to the luminance boundary response), suggesting the role of cortical feedback. One possibility is that this feedback may arise from neural mechanisms sensitive to the coarse 2D shape of moving objects. The possibility that MT may contain such mechanisms is discussed next. Single MT neurons are not tuned to kinetic boundaries (Marcar et al., 1995), but the population response in MT might carry the neural signature associated with the coarse 2D shape of motion-defined objects (Snowden, 1994). Consider a population of antagonistic center-surround neurons responding to a kinetic edge (figure 14.7). Neurons with receptive fields stimulated by the kinetic edge will be suppressed due to the multiple motions within their receptive field center (Snowden et al., 1991; Marcar et al., 1995). Neurons far from the motion boundary will be suppressed due to the surround inhibition (Allman et al., 1995b). Thus, the center-surround neurons in the regions flanking the boundary will be most active within the population of neurons responding to the motion-defined edge. The emerging result is the segmentation of regions containing uniform or near-uniform motion. This coding scheme may be a part of a process that detects areas of near-uniform motion and then "draws" boundaries around such regions. Such a process is described as region- or continuity-based image segmentation (as contrasted with edge-based segmentation; Mbller and Hurlbert, 1996). We emphasize, however, that the proposed population-coding scheme (figure 14.7) is speculative. Interestingly, MT seems to rely on a population code to represent transparent motion (Treue et al., 2000) - a class of stimuli that, just as motion-defined boundaries, is composed of multiple motion directions. Furthermore, psychophysical evidence suggests that in some stimulus conditions, motion perception seems to rely on region-based segmentation algorithms. For example, reducing the salience of a motion-defined edge by introducing a gradual rather than an abrupt change in velocity was found to have very little effect on the ability to segment surfaces based on their motion (Smith and Curran, 2000; M611er
298
Linking Psychophysics and Physiology of Center-Surround Interactions
Figure 14.7: Illustration of how a population of hypothetical center-surround neurons would respond to a motion-defined boundary. "S" marks the receptive fields of neurons whose response would be suppressed, "LS" marks the neurons whose response would be less suppressed, and "NS" marks the neurons that would not be suppressed. Note that MT neurons with appropriately located asymmetric surrounds (see figure 14.8B; Xiao et al., 1995, 1997a) would give the strongest response to a motion boundary (marked with "NS"). and Hurlbert, 1996). M611er and Hurlbert (1996) also demonstrated that increasing the width of a motion-defined figure increased its visibility even when the detectability of its edges was kept constant. This effect was most pronounced at brief (<7 Oms) durations and decreased as the exposure duration increased, suggesting the existence of a fast region-based segmentation followed by a slower edge-based process. This observation is consistent with the finding that orientation and shape discriminations of small (~ 1°) motion-defined objects deteriorate sharply with decreasing exposure duration (Regan and Beverley, 1984; Regan and Hamstra, 1992), perhaps because the perception of small motion-defined features relies more on slower edge-based processes. The hypothesis of a fast and low-resolution region-based surface segmentation dovetails nicely with the large receptive fields (Raiguel et al., 1995) and fast response latency (Schmolesky et al., 1998; Raiguel et al., 1999) of area MT. It also makes functional sense to quickly segment moving objects from the background, even if this segmentation comes at the cost of lower spatial resolution. Recent MEG evidence demonstrates that the extrastriate response to a motion-defined form is faster than the extrastriate response to a luminance-defined form, though the neural response to a motion-defined form is longer (i.e., ends later; Schoenfeld et al., 2003). This fast MT response may reflect fast region-based figure-ground segregation, whereas the slower edge-based processes may account for the later part of the neural response to a motion-defined form. One speculation is that the initial region-based segmentation
Duje Tadin and Joseph S. Lappin
299
of the moving image hi MT may provide guidance (through cortical feedback) for the more detailed edge-based analysis elsewhere (e.g., V2). Interestingly, the response latencies of V2 neurons tuned to the orientations of the kinetic boundaries (Marcar et al., 2000) are about 30 ms longer than the latencies of the antagonistic center-surround neurons in MT (Raiguel et al., 1999).
14.3.3 Interim Conclusion In general, center-surround antagonism in MT neurons seems to yield enhanced visual sensitivity to relative motion. Diverse experimental evidence suggests that surround suppression is the essential part of the remarkably effective neural mechanisms for motion-based figure-ground segregation. MT center-surround neurons can appear to behave "intelligently" by employing suppressive interactions only in situations when motion stimulating their surrounds is likely to belong to the same surface (Bradley and Andersen, 1998) and by reducing surround inhibition when the visibility is low (Pack and Born, personal communication, August 2003). The precise role of MT in figure-ground segregation remains an open question, but MT mechanisms seem to be involved in the spatially crude aspects of motion-based figure-ground segregation, especially in detection and trajectory estimation (Born et al., 2000). Quite possibly, some coarse spatial analysis also occurs in MT. Detailed analysis of motion-defined forms, however, seems to rely on visual areas other than MT, although such mechanisms may partially depend on the MT output for information. Connections between visual areas, indeed, seem to play an important role in the perception of motion-defined form - a hypothesis supported by the observation that patients with multiple sclerosis, a demyelinating disease of white matter, are often impaired at perceiving motion-defined forms (Regan et al., 1991; Giaschi et al., 1992).
14.4
Perceiving 3D Surface Shape
The previous section considered mechanisms for perceiving 2D shape from motion. We perceive the world in three dimensions, however. As with 2D patterns, multiple forms of visual information enable perception of 3D spatial patterns. Motion is more than just a cue to the third dimension. As we move about in the environment and as objects move around us, the spatial pattern of motion on the retina provides information about the 3D layout of the world (Gibson, 1950; Nakayama and Loomis, 1974). Discontinuities in motion fields provide information for segmenting retinal images into different objects, and smooth spatial variations in velocity fields provide information about 3D shape.
14.4.1 Psychophysics of 3D Shape-from-Motion Perception of 3D structure derived exclusively from motion information (Wallach and O'Connell, 1953; Braunstein, 1976; Rogers and Graham, 1979) is compelling evidence that motion has an important role in perceiving 3D shape. Perception of 3D shape from motion appears to be effortless and automatic. Under some circumstances, 3D shape
300
Linking Psychophysics and Physiology of Center-Surround Interactions
can be perceived from just two motion frames (Lappin et al., 1980; Todd and Bressan, 1990), indicating that the underlying neural mechanisms are exceptionally effective. The retinal images of objects rotating in depth are velocity fields with smoothly varying spatial structure. A fundamental insight is that the space-differential structure of the retinal images corresponds to that of the environmental objects (Koenderink and van Doom, 1992a, b; Lappin and Craft, 2000). The local structure of the retinal velocity field fully specifies the qualitative local shape of a 3D surface, except for its relative scale in depth. The retinal velocity fields may be described in terms of their space-differential structure, with the local measurement of absolute velocity being the zero-order property. Higher-order spatial derivatives, described below, involve local structure of relations among neighboring velocities. The first-order directional derivatives of the velocity field (i.e., velocity gradients) specify the direction and magnitude of surface slant in depth. Perceptual estimates of surface slant, however, are often inaccurate for 3D planes (Proffitt et al., 1995; Cornilleau-Peres et al., 2002) and especially for curved surfaces (Perotti et al., 1998). Interestingly, Profitt et al. (1995) found that when observers made haptic responses instead of perceptual judgments, slant judgments were more accurate, suggesting that the motor system might have access to an accurate representation of surface slant. The second-order directional derivatives of the velocity fields (changes in velocity gradients) specify the local shape of the 3D surface. In principle, the second-order structure might be obtained from differences in neighboring first-order measures, but this computational procedure is impractical when there are measurement errors. The variance of a difference between two independent lower-order measures is twice that of the original measures; these errors are compounded by higher-order differences. Empirically, visual shape discriminations for both stereoscopic and motion-defined surfaces are more accurate than those for discriminations of surface slant, and the shape discriminations remain accurate under perturbations of lower-order spatial relations (Perotti et al., 1998; Lappin and Craft, 2000). Accurate second-order measures may be obtained directly from the retinal velocity fields, however - as found both theoretically (Koenderink and van Dorn, 1992a) and psychophysically (Lappin and Craft, 2000). Evidently, these higher-order changes in image structure are easily detected by the visual system. This psychophysical and theoretical work implies that neural mechanisms probably exist for representing the differential structure of retinal velocity fields. Another implication is that the second-order estimates are computed directly from the velocity fields and are not derived from the neural representation of first-order properties. Evidence reviewed in the next section indicates that populations of MT neurons with asymmetric surrounds might be equipped for estimating both first- and second-order derivatives from the local motion field. BuraCas and Albright (1994,1996) proposed this hypothesis prior to the discovery of MT receptive fields with asymmetric surrounds.
14.4.2 Contribution of MT to Shape-from-Motion Responses of MT neurons are often tuned to binocular disparity (DeAngelis and Newsome, 1999). Depth and motion selectivity are tightly coupled in MT, an observa-
Duje Tadin and Joseph S. Lappin
301
tion consistent with numerous interactions between motion perception and stereopsis (Nawrot and Blake, 1991; Tittle and Braunstein, 1993; van Ee and Anderson, 2001). Responses of MT neurons are significantly altered when parts of complex motion patterns are stereoscopically placed at different depths, even though the 2D motion patterns remain unchanged (Bradley and Andersen, 1998; Bradley et al., 1995). These results suggest that area MT may play a role in 3D shape perception derived from both motion and disparity cues. Representation of 3D Structure from Motion in MT The hypothesized role of MT neurons in perceiving 3D structure from motion (SFM) has been studied by measuring responses to such motion patterns. The observation that perception of SFM is permanently impaired by MT lesions (Siegel and Andersen, 1986) also supports this hypothesis. Imaging studies also indicate that MT is involved in processing 3D moving patterns in both humans (Vanduffel at al., 2002; Orban et al., 1999; Murray et al., 2003) and monkeys (Vanduffel at al., 2002; Sereno et al., 2002). To provide further evidence that MT contributes to SFM perception, Bradley et al. (1998) took advantage of the fact that many SFM displays are bistable. For example, parallel projection of a transparent, revolving cylinder coated with opaque dots results in a 2D motion stimulus perceived as a revolving 3D cylinder, but the rotation direction is ambiguous and appears to reverse after a few seconds. This perceived reversal corresponds to a change in the perceived depdi ordering of the two sets of dots moving in opposite directions on the front and back surfaces. If MT is involved in the computation of SFM, this reversal of surface depth order should modulate the activity of neurons selective for both disparity and motion. For example, a neuron preferring rightward and far motion should respond well if the cylinder is perceived as rotating leftward, and should respond weakly if the rotation reverses. Indeed, most MT neurons were found to behave in this manner, reflecting the perceptual state of the animal. The Bradley et al. (1998) results show that MT is involved in SFM processing but do not directly indicate that MT computes the shape of SFM displays. In their experiment, monkeys were essentially responding to a change in depth ordering, which is an important aspect of SFM stimuli (Nawrot and Blake, 1991), but it does not carry information about the specific 3D shape. At a minimum, MT seems to be involved in assigning local motion signals to different surfaces (Bradley et al., 1995, 1998). At a maximum, MT represents the 3D shape of moving surfaces. This hypothesis is explored below. Representation of Surface Shape in MT The discovery of MT neurons with asymmetric surrounds (Xiao et al., 1995, 1997a) suggests that some MT neurons may be capable of computing first- and second-order spatial derivatives of velocity fields (Buracas and Albright, 1996). Only about 20% of center-surround neurons in MT were found to have circularly symmetric surrounds (figure 14.8A; Xiao et al., 1995, 1997a). The remaining neurons have either asymmetric surrounds confined to one side (~50%; figure 14.8B) or bilaterally symmetric surrounds on each side of the receptive field center (~25%; figure 14.8C). Interestingly, about a quarter of the neurons also show facilitation in some parts of the surround - where the same direction of motion
302
Linking Psychophysics and Physiology of Center-Surround Interactions
Figure 14.8: Schematic of MT center-surround neurons with circular (A), asymmetric (B), and bilaterally symmetric (C) surrounds. Second and third columns show the stimulus patterns evoking strong and weak responses, respectively, for each type of the surround. Shaded backgrounds illustrate spatial speed distribution, with darker areas representing slower speeds. Note that the motion stimulating the center region is always in the preferred direction at preferred speed. Thin arrows in B indicate the direction of the speed gradient. facilitates the response in one location but inhibits the response in another location. In principle, these neurons are capable of representing 3D shape from motion, ranging from planar surfaces in depth to curved surfaces. The simplest local surface shape is a plane. Mathematically, a 3D plane is characterized by its tilt (angular orientation in the frontal plane) and slant (angle in depth from the frontal plane). The corresponding retinal motion field is a velocity gradient, where the direction of the velocity gradient and its steepness are related to the surface tilt and slant, respectively. The 3D orientation of a plane, in principle, can be specified by first-order directional derivatives of the velocity gradient (Lappin and Craft, 2000). MT neurons with asymmetric surrounds might measure first-order derivatives along the specific direction (figure 14.8B) and may be able to encode the tilt and/or slant of a plane. MT neurons with asymmetric surrounds are found to be selective for the 2D direction of velocity gradients - surface tilt (Xiao et al., 1997b; also see Treue and Andersen, 1996). Another requirement for tilt selectivity is surround suppression that changes depending on the relative speeds between the center and surround. For example, consider a neuron with surround asymmetry that is present when the surround speed is faster or equal than the center speed and absent when the surround speed is slower than the center speed (figure 14.8B). Such a neuron would be most inhibited by a speed gradient (e.g., slow to fast) in the direction of the inhibitory surround and most activated by an opposite speed gradient (figure 14.8B, second and third column). Facilitatory zones
DujeTadin and Josephs. Lappin
303
are sometimes observed in neurons with asymmetric surrounds (Xiao et al., 1997a) and might serve to further improve selectivity to velocity gradients. For example, consider a facilitatory zone in the region bilaterally opposite the inhibitory zone that is responsive only at fast speeds. This arrangement would increase responses to velocity gradients away from the inhibitory surround (figure 14.8B, second column), improving response tuning. This link between tilt selectivity and asymmetric MT surrounds suggests that such surrounds might contribute to the perception of surface orientation in depth. Accurate specification of surface orientation in depth, however, also requires surface slant estimation. Xiao et al. (1997b) found no evidence for slant selectivity in MT, but they examined only a small range of slants. If MT neurons are found not to be tuned to surface slants, that would dovetail nicely with poor perceptual slant judgments (Proffitt et al., 1995; Perotti et al., 1998). Finally, it would be interesting to determine whether the selectivity for velocity gradients is combined with corresponding selectivity to disparity gradients - a property that would further enhance MT's ability to represent 3D planes. This seems possible, given that the center and surround regions can have different disparity tuning (Bradley and Andersen, 1998). In addition to neurons with asymmetric surrounds, about one in six MT neurons has bilaterally symmetric inhibitory surrounds (figure 14.8C; Xiao et al., 1997a). Such neurons seem capable of comparing velocities at three neighboring locations along a single direction, thereby signaling a change in the velocity gradient or, equivalently, a second-order change in velocity. Perhaps groups of these neurons represent the secondorder differential structure of velocity fields specifying the local shape (Koenderink and van Doom, 1992a; Lappin and Craft, 2000). Buracas and Albright (1996) have hypothesized that such neurons may represent local shape from motion, though this hypothesis has not yet been tested. The mere existence of bilateral inhibitory surrounds, however, is not sufficient to support such tuning. Another requirement is that the surround regions exhibit speed tuning. That is, surround inhibition should be present at some speeds and absent at others. For example, consider a hypothetical neuron with an inhibitory surround that suppresses its response at all speeds except for a range of slower motions (figure 14.8C). Such a neuron should respond to horizontal surface curvature (e.g., a vertical cylinder) and respond poorly to surfaces without curvature hi the same direction (e.g., a horizontal cylinder or a plane). Neurons with bilaterally symmetric surrounds measure motion-defined curvature along the direction containing both surround regions. Full characterization of local shape (e.g., to distinguish a cylinder from a saddle), however, requires curvature measurements in at least two directions (Lappin and Craft, 2000). Thus, representations of local shape would require groups of neurons with bilaterally symmetric surrounds in multiple directions. Additional properties of disparity tuning and surround facilitation at certain speeds would further improve the ability of these neurons to represent surface curvature. As retinal motion patterns are ambiguous as to whether a surface is convex or concave and provide no information about scaling in depth, disparity information might help disambiguate such motion patterns. Whether disparity tuning coincides with the occurrence of bilaterally symmetric surrounds is not yet known, however.
304
Linking Psychophysics and Physiology of Center-Surround Interactions
14.4.3 Interim Conclusions Receptive fields for some MT neurons appear to enable perceptions of surface shape from spatial patterns of motion. A direct link between center-surround MT neurons and the perception of local shape characteristics is still lacking, however. Moving 3D shapes modulate the overall activity of area MT and the responses of its neurons, suggesting that MT neurons may indeed represent some properties of 3D shape (Orban et al., 1999; Murray et al., 2003, Vanduffel at al., 2002; Sereno et al., 2002). An important question is whether MT neurons represent only crude aspects of moving shapes e.g., the existence of surfaces at multiple depths - or basic shape characteristics denned by surface curvature. The existence of neurons with non-uniform surrounds provides some support for the latter possibility. In any case, the receptive fields of MT neurons are very diverse, suggesting a variety of functions. A subset of MT neurons - those with non-uniform surrounds may play a role in perceiving 3D shape from motion by responding to higher-order spatial derivatives of velocity fields. Both psychophysical and physiological evidence is needed to test this hypothesis, but such evidence is presently lacking. Sereno and Sereno (1999) found that the perception of 3D SFM is altered by adjacent 2D motion patterns, and one might speculate that these surrounding motion patterns may stimulate the surrounds of MT neurons, thereby altering the perceived 3D structure.
14.5
General Conclusions
Throughout the evolution of the visual system, center-surround mechanisms have been an essential part of motion perception (Horridge, 1987). This is suggested by the important role of surround suppression in insects (Egelhaaf at al., 1988) and in evolutionarily older brain structures such as the superior colliculus (Davidson and Bender, 1991) and by the occurrence of such mechanisms in a wide range of species and visual areas. The present review shows that although center-surround mechanisms may be evolutionarily primitive, they seem to be involved in several sophisticated functional aspects of primate motion perception. Initial explorations portrayed MT center-surround mechanisms as a simple way of enhancing relative motion, thereby aiding the detection and segregation of surfaces. Subsequent work has extended possible functional roles of center-surround mechanisms to include estimation of the trajectory of moving objects, representation of 2D and 3D shape, and representation of the 3D layout of visual world. Theoretical, psychophysical, and physiological studies have suggested links between center-surround antagonism and these important visual functions, though several of the links suggested in this review remain to be substantiated. For example, it is unknown whether MT neurons respond selectively to motion patterns associated with surface curvature or 3D shape. In any case, a large body of research indicates that the diversity of centersurround neurons in MT permits visual representations of more than merely 2D patterns of image motion. The perceptual functions considered in this review are some of the more important aspects of visual motion perception. We emphasized psychophysical and physiolog-
Duj'e Tadin and Joseph S. Lappin
305
ical evidence linking those perceptual functions with center-surround mechanisms in MT, but other brain areas and mechanisms undoubtedly contribute to our abilities to segregate surfaces and perceive 3D shape from motion. We began this chapter with the idea that links between psychophysical and physiological analyses of visual function are based on corresponding analyses of the information transmitted from optical input to response output. The links between the macroscopic and microscopic analyses of vision ultimately entail a quantitative equivalence of these two descriptions of visual information flow. Discriminations between stimuli should be the same whether they are based on behavioral tasks or physiological measures. If particular psychophysical discriminations and physiological processes are indeed functional correlates of one another, then variations in the stimulus parameters should have quantitatively similar effects on both the behavioral and physiological discriminations. And the behavioral and physiological discriminations should exhibit corresponding invariances under irrelevant transformations of the proximal image patterns. What is the strength of the linkage, then, between the center-surround antagonism described by physiological recordings from many MT neurons and psychophysical characteristics of human motion discrimination? The currently available evidence reviewed in this chapter suggests a probable functional link between the center-surround antagonism exhibited by the physiological responses of MT neurons and psychophysical discriminations of visual motion directions. The link between psychophysics and physiology in this case, however, is qualitative rather than quantitative. A quantitative correspondence has not yet been established between the firing rates of macaque MT neurons and the temporal thresholds for human motion discrimination, although both response measures seem to be similarly affected by variations in stimulus size, retinal eccentricity, and chromatic contrast (Tadin et al., 2003). Preliminary evidence also suggests that variations in stimulus contrast may have similar effects on both the physiological and behavioral center-surround antagonism effects (Pack and Born, personal communication, August 2003). The effects of stimulus size and contrast on the motion aftereffect also offer suggestive support for the hypothesis that MT neural responses underlie the behavioral observations. Current evidence, however, does not yet demonstrate a quantitative link (equivalence) between the motion information carried by the physiological and behavioral responses. The link between physiological center-surround antagonism and perceptual functions in segregating figure from ground, perceiving surfaces, perceiving 2D and 3D shape, and discriminating the 3D trajectories of moving objects is much more tenuous, though intriguing. Establishing clear quantitative links between physiology and such perceptual functions will be challenging, for several reasons. First, multiple visual areas are very likely involved in perceiving the 3D shapes and motion trajectories of environmental objects. Second, many of these discriminations exceed the capacities of univariate responses of individual neurons and must, therefore, involve complex relationships among the responses of multiple neighboring neurons. Nevertheless, the challenge of clarifying the visual functions of these center-surround mechanisms is scientifically important.
306
Linking Psychophysics and Physiology of Center-Surround Interactions
References Alais, D., van der Smagt, M. J., van den Berg, A. V. and van de Grind, W. A. (1998). Local and global factors affecting the coherent motion of gratings presented in multiple apertures. Vis. Res., 38: 1581-1591. Allman, J. and Kaas, J. H. (1971). Representation of the visual field in striate and adjoining cortex of the owl monkey (Aotus trivirgatus). Brain Res., 35: 89-106. Allman, J., Meizin, F. and McGuiness, E. (1985a). Stimulus specific responses from beyond the classical receptive field: Neurophysical mechanisms for local-global comparisons in visual neurons. Ann. Rev. Neurosci., 8: 407—430. Allman, J., Meizin, F. and McGuiness, E. (1985b). Direction- and velocity-specific responses from beyond the classical receptive field in the middle temporal visual area(MT). Percept., 14: 105-126. Andersen, R. A. (1997). Neural mechanisms of visual motion perception in primates. Neuron, 18: 865-872. Anderson, S. J. and Burr, D. C. (1991). Spatial summation properties of directionally sensitive mechanisms in human vision. J. Opt. Soc. Am. A, 8: 1330-1339. Baker, C. L. and Braddick, O. J. (1982). Does segregation of differently moving areas depend on relative or absolute displacement? Vis. Res., 22: 851-856. Berezovskii, V. K. and Born, R. T. (2000). Specificity of projections from wide-field and local motion-processing regions within the middle temporal visual area of the owl monkey. /. NeuroscL, 20: 1157-1169. Borghuis, B. G., Perge, J. A., Vajda, L, van Wezel, R. J. A., van de Grind, W. A. and Lankheet, M. J. M. (2003). The motion reverse correlation (MRC) method: A linear systems approach in the motion domain. J. NeuroscL Methods, 123: 153-166 Born, R. T. (2000). Center-surround interactions in the middle temporal visual area of the owl monkey. J. Neurophyslol., 84: 2658-2669. Born, R. T., Groh, J. M., Zhao, R. and Lukasewycz, S. J. (2000). Segregation of object and background motion in visual area MT: effects of microstimulation on eye movements. Neuron, 26: 725-734. Born, R. T. and Tootell, R. B. H. (1992). Segregation of global and local motion processing in primate middle temporal visual area. Nature, 357: 491-499. Bouman, M. A. (1950). Quanta explanation of vision. Documenta Ophthalmologica, 4:23-115. Braddick, O. (1993). Segmentation versus integration in visual motion processing. Trends Neurosci., 16: 263-268. Bradley, D. C. and Andersen, R. A. (1998). Center-surround antagonism based on disparity in primate area MT. /. NeuroscL, 18: 7552-7565. Bradley, D. C., Chang, G. C. and Andersen, R. A. (1998). Encoding of three-dimensional structure-from- motion by primate area MT neurons. Nature, 392: 714-717.
Duje Tadin and Joseph S. Lappin
307
Bradley, D. C., Qian, N. and Andersen, R. A. (1995). Integration of motion and stereopsis in middle temporal cortical area of macaques. Nature, 373: 609-611. Braunstein, M. L. (1976). Depth Perception through Motion. Academic Press: New York. Brindley, G. S. (1970). Physiology of the Retina and Visual Pathway. Williams Wilkins Co.: Baltimore, MD. Buracas, G. T. and Albright, T. D. (1994). The role of MT neuron receptive field surrounds in computing object shape from velocity fields. Adv. Neural Information Processing Sys., 6: 969-976. Buracas, G. T. and Albright, T. D. (1996). Contribution of area MT to perception of three-dimensional shape: Computational study. Vis. Res., 361: 869-887. Cao, A. and Schiller, P. H. (2003). Neural responses to relative speed in the primary visual cortex of rhesus monkey. Vis. Neurosci., 20: 77-84. Cornilleau-Peres, V. and Droulez, J. (2002). Visual perception of planar orientation: dominance of static depth cues over motion cues. Percept. Psychophys., 64: 717-731. Cowey, A. and Vaina, L. M. (2000). Blindness to form from motion despite intact static form perception and motion detection. Neuropsychologia, 38: 566-578. Croner, L. J. and Albright, T. D. (1997). Image segmentation enhances discrimination of motion in visual noise. Vis. Res., 37: 1415-1427. Croner, L. J. and Albright, T. D. (1999). Seeing the big picture: Integration of image cues in the primate visual system. Neuron, 24: 777-789. Davidson, R. M. and Bender, D. B. (1991). Selectivity for relative motion in the monkey superior colliculus. J. Neurophysiol., 65: 1115-1133. De Angelis, G. C. and Newsome, W. T. (1999). Organization of disparity-selective neurons in macaque area MT. J. Neurosci., 19: 1398-1415. Derrington, A. M. and Goddard, P. A. (1989). Failure of motion discrimination at high contrasts: evidence for saturation. Vis. Res., 29: 1767-1776. Dick, M., Ullman, S. and Sagi, D. (1987). Parallel and serial processes in motion detection. Science, 237: 400-402. Dupont, P., De Bruyn, B., Vandenberghe, R., Rosier, A., Michiels, J., Marchal, G., Mortelmans, L. and Orban, G. A. (1997). The kinetic occipital region in human visual cortex. Cerebral Cortex, 7: 283-292. Egelhaaf, M., Hausen, K., Reichardt, W. and Wehrhahn, C. (1988). Visual course control in flies relies on neuronal computation of object and background motion. Trends Neurosci., 11: 351-358. Eifuku, S. and Wurtz, R. H. (1998). Response to motion in extrastriate area MST1: center-surround interactions. /. Neurophysiol., 80: 282-296. Frost, B. J. and Nakayama, K. (1983). Single visual neurons code opposing motion independent of direction. Science, 220: 744-745.
308
Linking Psychophysics and Physiology of Center-Surround Interactions
Gautama, T. and Van Hulle, M. M. (2001). Function of center-surround antagonism for motion in visual area MT/V5: A modeling study. Vis. Res., 41: 3917-3930. Gegenfurtner, K. R., Kiper, D. C, Beusmans, J. M., Carandini, M., Zaidi, Q. and Movshon, J. A. (1994). Chromatic properties of neurons in macaque MT. Vis. NeuroscL, 11: 455^66. Giaschi, D., Regan, D. M., Kothe, A., Hong, X. H. and Sharpe, J. A. (1992). Motiondefined letter detection and recognition in patients with multiple sclerosis. Ann. NeuroL, 31: 621-628. Gibson, J. J. (1950). The Perception of the Visual World. Houghton Mifflin: Boston, MA. Green, D. M. and Swets, J. A. (1966). Signal Detection Theory and Psychophysics. Wiley: New York. Griisser-Cornehls, U., Griisser, O. and Bullock, T. H. (1963). Unit responses in frog's tectum to moving and non-moving visual stimuli. Science, 141: 820-822. Hammond, P. and Smith, A. T. (1982). On the sensitivity of complex cells in feline striate cortex to relative motion. Exp. Brain Res., 47: 457^-60. Hecht, S., Shlaer, S. and Pirenne, M. H. (1942). Energy, quanta, and vision. /. Gen. Physiol.,25: 819-840. Horridge, G. A. (1987). The evolution of visual processing and the construction of seeing systems. Proc. Roy. Soc. (Land.) B, 230: 279-292. Huk, A. C., Ress, D. and Heeger, D. J. (2001). Neuronal basis of the motion aftereffect reconsidered. Neuron, 32: 161-172. Jones, H. E., Grieve, K. L., Wang, W. and Sillito, A. M. (2001). Surround suppression in primate VI. /. Neurophysiol, 86: 2011-2028. Kapadia, M. K., Westheimer, G. and Gilbert, C. D. (1999). Dynamics of spatial summation in primary visual cortex of alert monkeys. Proc. Nat. Acad. Sci. USA, 96: 12073-12078. Kastner, S., De Weerd, P., Pinsk, M. A., Elizondo, M. I., Desimone, R. and Ungerleider, L. G. (2001). Modulation of sensory suppression: implications for receptive field sizes in the human visual cortex. J. Neurophysiol., 86: 1398-1411. Kastner, S., Nothdurft, H. C. and Pigarev, I. N. (1999). Neuronal responses to orientation and motion contrast in cat striate cortex. Vis. Neurosci., 16: 587-600. Kim, J. and Wilson, H. R. (1997). Motion integration over space: interaction of the center and surround motion. Vis. Res., 37: 991-1005. Koenderink, J. J. and van Doom, A. J. (1992a). Second-order optic flow. J. Opt. Soc. Am. A, 9: 530-538. Koenderink, J. J. and van Doom, A. J. (1992b). Generic neighborhood operators. IEEE PAMI, 14: 597-605. Krantz, D. H., Luce, R. D., Suppes, P. and Tversky, A. (1971). Foundations of Measurement. Vol. 1. Academic Press: New York.
Duje Tadin and Joseph S. Lappin
309
Lagae, L., Gulyas, B., Raiguel, S. and Orban, G. A. (1989). Laminar analysis of motion information processing in macaque V5. Brain Res., 496: 361-367. Lappin, J. S. and Bell, H. H. (1976). The detection of coherence in moving random-dot patterns. Vis. Res., 16: 161-168. Lappin, J. S. and Craft, W. D. (2000). Foundations of spatial vision: From retinal images to perceived shapes. Psychol. Rev., 107: 6—38. Lappin, J. S., Doner, J. F. and Kottas, B. L. (1980). Minimal conditions for the visual detection of structure and motion in three dimensions. Science, 209: 717-719. Lappin, J. S. and Staller, J. D. (1981). Prior knowledge does not facilitate the perceptual organization of dynamic random-dot patterns. Percept. Psychophys., 29: 445446. Lappin, J. S., Tadin, D., Patel, S. S. and Killingsworth, E. A. (2003). Psychophysical receptive fields for motion discrimination depend on contrast. J. Vis., 3: 47a. Lauwers, K., Saunders, R., Vogels, R., Vandenbussche, E. and Orban, G. A. (2000). Impairment in motion discrimination tasks is unrelated to amount of damage to superior temporal sulcus motion areas. J. Comp. Neurol., 420: 539-557. Lee, S.-H. and Blake, R. (1999). Visual form created solely from temporal structure. Science, 284: 1165-1168. Leventhal, A. G., Wang, Y., Schmolesky, M. T. and Zhou, Y. (1998). Neural correlates of boundary perception. Vis. Neurosci., 15: 1107-1118. Levitt, J. B. and Lund, J. S. (1997). Contrast dependence of contextual effects in primate visual cortex. Nature, 387: 73-76. Liu, L. and Van Hulle, M. M. (1998). Modeling the surround of MT cells and their selectivity for surface orientation in depth specified by motion. Neural Computation, 10: 295-312. Lorenceau, J. and Alais, D. (2001). Form constraints in motion binding. Nature Neurosci., 4: 745-751. Lorenceau, J. and Shiffrar, M. (1992). The influence of terminators on motion integration across space. Vis. Res., 32: 263-273. Lorenceau,!. and Shiffrar, M. (1999). The linkage of visual motion signals. Vis. Cogn., 6: 431-460. Lorenceau, J. and Zago, L. (1999). Cooperative and competitive spatial interactions in motion integration. Vis. Neurosci., 16: 755-770. Mandl, G. (1985). Responses of visual cells in cat superior colliculus to relative pattern movement. Vis. Res., 25: 267-281. Marcar, V. L. and Cowey, A. (1992). The effect of removing superior temporal cortical motion areas in the macaque monkey: 2. Motion discrimination using random dot displays. Euro. J. Neurosci., 4: 1228-1238. Marcar, V. L., Raiguel, S. E., Xiao, D. K. and Orban, G. A. (2000). Processing of kinetically defined boundaries in areas VI and V2 of the macaque monkey. /. Neurophysiol., 84: 2786-2798.
310
Linking Psychophysics and Physiology of Center-Surround Interactions
Marcar, V. L., Xiao, D. K., Raiguel, S. E., Maes, H. and Orban, G. A. (1995). Processing of kinetically defined boundaries in the cortical motion area MT of the macaque monkey. J. Neurophysiol., 74: 1258-1270. Marr, D. and Ullman, S. (1981). Directional selectivity and its use in early visual processing. Proc. Roy. Soc. (Lond.) B, 211: 150-180. Moller, P. and Hurlbert, A. C. (1996). Psychophysical evidence for fast region-based segmentation processes in motion and color. Proc. Nat. Acad. Sci. USA, 93: 7421-7426. Murakami, I. and Shimojo, S. (1993). Motion capture changes to induced motion at higher luminance contrasts, smaller eccentricities, and larger inducer sizes. Vis. Res., 33: 2091-2107. Murakami, I. and Shimojo, S. (1995). Modulation of motion aftereffect by surround motion and its dependence on stimulus size and eccentricity. Vis. Res., 35: 1835-1844. Murray, S. O., Olshausen, B. A. and Woods, D. L. (2003). Processing shape, motion and three-dimensional shape-from-motionin the human cortex. Cerebral Cortex, 13: 508-516. Nakayama, K. (1985). Biological image motion processing: a review. Vis. Res., 25: 625-660. Nakayama, K. and Loomis, J. M. (1974). Optical velocity patterns, velocity-sensitive neurons, and space perception: a hypothesis. Percept., 3: 63—80. Nawrot, M. and Blake, R. (1991). The interplay between stereopsis and structure from motion. Percept. Psychophys., 49: 230-244. Nawrot, M. and Sekuler, R. (1990). Assimilation and contrast in motion perception: Explorations in cooperativity. Vis. Res., 30: 1439-1451. Nawrot, M., Shannon, E. and Rizzo, M. (1996). The relative efficacy of cues for twodimensional shape perception. Vis. Res., 36: 1141-1152. Olveczky, B. P., Baccus, S. A. and Meister, M. (2003). Segregation of object and background motion in the retina. Nature, 423: 401-408 Orban, G. A. (1997). Visual processing in macaque area MT/V5 and its satellites (MSTd and MSTv). In K. S. Rockland, J. H. Kaas, and A. Peters (Eds.), Cerebral Cortex, Vol. 12, pp. 359^34. Plenum Press: New York. Orban, G. A., Sunaert, S., Todd, J. T., Van Hecke, P. and Marchal, G. (1999). Human cortical regions involved in extracting depth from motion. Neuron, 24: 929-940. Perotti, V. J., Todd, J. T., Lappin, J. S. and Phillips, F. (1998). The perception of surface curvature from optical motion. Percept. Psychophys., 60: 377-388. Proffitt, D. R., Bhalla, M., Gossweiler, R. and Midgett, J. (1995). Perceiving geographical slant. Psychol. Bull. Rev., 2: 409-428. Raiguel, S. E., Van Hullem, M. M., Xiao, D. K., Marcar, V. L. and Orban, G. A. (1995). Shape and spatial distribution of receptive fields and antagonistic motion
Dtye Tadin and Joseph S. Lappin
311
surround in the middle temporal area (V5) of the macaque. Eur. J. Neurosci., 7: 2064-2082. Raiguel, S. E., Xiao, D. K., Marcar, V. L. and Orban, G. A. (1999). Response latency of macaque area MT/V5 neurons and its relationship to stimulus parameters. J. Neurophysiol, 82: 1944-1956. Rees, G., Friston, K. and Koch, C. (2000). A direct quantitative relationship between the functional properties of human and macaque V5. Nature Neurosci., 3: 716723. Regan, D. M. (1986). Form from motion parallax and form from luminance contrast: Vernier discrimination. Spatial Vis., 1: 305-318. Regan, D. M. (1989). Orientation discrimination for objects denned by relative motion and objects defined by luminance contrast. Vis. Res., 29: 1389-1400. Regan, D. M. (1992). Visual judgments and misjudgments in cricket, and the art of flight. Percept., 21: 91-115. Regan, D. M. and Beverley, K. I. (1984). Figure-ground segregation by motion contrast and by luminance contrast. /. Opt. Soc. Am. A, 1: 433-442. Regan, D. M., Giaschi, D., Sharpe, J. A. and Hong, X. H. (1992). Visual processing of motion-defined form: Selective failure in patients with parietotemporal lesions. J. Neurosci., 12: 2198-2210. Regan, D. M. and Hamstra, S. J. (1991). Shape discrimination for motion-defined and contrast-defined form: squareness in special. Percept., 20: 315—336. Regan, D. M. and Hamstra, S. J. (1992). Dissociation of orientation discrimination from form detection for motion-defined bars and luminance-defined bars: Effects of dot lifetime and presentation duration. Vis. Res., 32: 1655-1666. Regan, D. M. (2000). Human Perception of Objects. Sinauer Press: Sunderland, MA. Reppas, J. B., Niyogi, S., Dale, A. M., Sereno, M. I. and Tootell, R. B. H. (1997). Representation of motion boundaries in retinotopic human visual cortical areas. Nature, 388: 175-179. Rivest, J. and Cavanagh, P. (1996). Localizing contours defined by more than one attribute. Vis. Res., 36: 53-66. Rogers, B. and Graham, M. (1979). Motion parallax as an independent cue for depth perception. Percept., 8: 125-134. Sachtler, W. L. and Zaidi, Q. (1993). Effect of spatial configuration on motion aftereffects. J. Opt. Soc. Am. A, 10: 1433-1449. Sary, G., Vogels, R. and Orban, G. A. (1994). Orientation discrimination of motiondefined gratings. Vis. Res., 34: 1331-1334. Sceniak, M. P., Ringach, D. L, Hawken, M. J. and Shapley, R. (1999). Contrast's effect on spatial summation by macaque VI neurons. Nature Neurosci., 2: 733-739. Schenk,T. andZihl, J. (1997). Visual motion perception after brain damage: JQ. Deficits in form-from-motion perception. Neuropsychologia, 35: 1299-1310.
312
Linking Psychophysics and Physiology of Center-Surround Interactions
Schiller, P. H. (1993). The effects of V4 and middle temporal (MT) area lesions on visual performance in the rhesus monkey. Vis. Neurosci., 10: 717-746. Schmolesky, M. T, Wang. Y.-C., Hanes, D. P., Thompson, K. G., Leutgeb, S., Schall, J. D. and Leventhal, A. G. (1998). Signal timing across the macaque visual system. /. Neurophysiol., 79: 3272-3278. Schoenfeld, M. A., Woldorff, M., Duzel, E., Scheich, H., Heinze, H. J., Mangun, G. R. and Mund, T. (2003). Form-from-motion: MEG evidence for time course and processing sequence. J. Cogn. Neurosci., 15: 157-172. Siegel, R. M. and Andersen, R. A. (1986). Motion perceptual deficits following ibotenic acid lesions of the middle temporal in the behaving rhesus monkey. Soc. Neurosci. Abst., 12: 1183. Sclar, G., Maunsell, J. H. and Lennie, P. (1990). Coding of image contrast in central visual pathways of the macaque monkey. Vis. Res., 30: 1-10. Sereno, M. E. and Sereno, M. I. (1999). 2-D center-surround effects on 3-D structurefrom-motion. J. Exp. Psychol. Hum. Percept. Perf., 25: 1834-1854. Sereno, M. E., Trinath, T., Augath, M. andLogothetis, N. K. (2002). Three-dimensional shape representation in monkey cortex. Neuron, 33: 635-652. Shannon, C. E. and Weaver, W. (1949). The Mathematical Theory of Communication. University of Illinois Press: Urbana, IL. Shiffrar, M. and Lorenceau, J. (1996). Increased motion linking across edges with decreased luminance contrast, edge width and duration. Vis. Res., 36: 20612067. Shipley, E. F. (1960). A model for detection and recognition with signal uncertainty. Psychometrika, 25: 273-289. Shulman, G. L., Schwarz, J., Miezin, F. M. and Petersen, S. E. (1998). Effect of motion contrast on human cortical responses to moving stimuli. J. Neurophysiol., 79: 2794-2803. Smith, A. T. and Curran, W. (2000). Continuity-based and discontinuity-based segmentation in transparent and spatially segregated global motion. Vis. Res., 40: 1115-1123. Snowden, R. J. (1994). Motion processing in the primate cerebral cortex. In A. T. Smith and R. J. Snowden (Eds.), Visual Detection of Motion, pp. 51-84. Academic Press: London. Snowden, R. J., Treue, S., Erickson, R. G. and Andersen, R. A. (1991). The response of area MT and VI neurons to transparent motion. J. Neurosci., 11: 2768-2785. Solomon, S. G., Peirce, J. W, Krauskopf, J. and Lennie, P. (2003) Chromatic sensitivity of surround suppression in macaque VI and V2. J. Vis., 3: 140a. Tadin, D., Lappin, J. S., Gilroy, L. A. and Blake, R. (2003). Perceptual consequences of centre-surround antagonism in visual motion processing. Nature, 424: 312-315. Takeuchi, T. (1998). Effect of contrast on the perception of moving multiple Gabor patterns. Vis. Res., 38: 3069-3082.
Duj'e Tadin and Joseph S. Lappin
313
Tanaka, K., Hikosaka, K., Saito, H., Yukie, M., Fukada, Y. and Iwai, E. (1986). Analysis of local and wide-field movements in the superior temporal visual areas of the macaque monkey. J. NeuroscL, 6: 134-144. Teller, D. Y. (1984). Linking propositions. Vis. Res., 24: 1233-1246. Theiy, M. and Casas, J. (2002). Predator and prey views of spider camouflage. Nature, 415: 133. Tittle, J. S. and Braunstein, M. L. (1993). Recovery of 3-D shape from binocular disparity and structure from motion. Percept. Psychophys., 54: 157-169. Todd, J. T. and Bressan, P. (1990). The perception of 3-dimensional affine structure from minimal apparent motion sequences. Percept. Psychophys., 48: 419-430. Treue, S. and Andersen, R. A. (1996). Neural responses to velocity gradients in macaque cortical area MT. Vis. Neurosci., 13: 797-804. Treue, S., Hoi, K. and Rauber, H. J. (2000). Seeing multiple directions of motionphysiology and psychophysics. Nature NeuroscL, 3: 270-276. van Ee, R. and Anderson, B. L. (2001). Motion direction: speed and orientation in binocular matching. Nature, 410: 690-694. Van Oostende, S., Sunaert, S., Van Hecke, P., Marchal, G. and Orban, G. A. (1997). The kinetic occipital (KO) region in man: An fMRI study. Cerebral Cortex, 1: 690-701. Vanduffel, W., Fize, D., Peuskens, H., Denys, K., Sunaert, S., Todd, J. T. and Orban, G. A. (2002). Extracting 3D from motion: Differences in human and monkey intraparietal cortex. Science, 298: 413-415. Verghese, P. and Stone, L. S. (1996). Perceived visual speed constrained by image segmentation. Nature, 381: 161-163. von Grunau, M. and Frost, B. J. (1983). Double-opponent-process mechanism underlying RF-structure of directionally specific cells of cat lateral suprasylvian visual area. Exp. Brain Res., 49: 84-92. Wallach, H. and O'Connell, D. (1953). The kinetic depth effect. J. Exp. Psychol, 45: 205-217. Warren, W. H. (1995). Self-motion: Visual perception and visual control. In W. Epstein and S. Rogers (Eds.), Perception of Space and Motion, 2nd Ed., pp. 263-325. Academic Press: New York. Watson, A. B. and Turano, K. (1995). The optimal motion stimulus. Vis. Res., 35: 325-336. Westheimer, G. W. (1967). Spatial interaction in human cone vision. J. Physiol. (Lond.), 190: 139-154. Westheimer, G. W. (1979). The spatial sense of the eye. Invest. Ophthalmol. Vis. Sci., 18: 893-912. Xiao, D. K., Marcar, V. L., Raiguel, S. E. and Orban, G. A. (1997b). Selectivity of macaque MT/V5 neurons for surface orientation in depth specified by motion. Eur. J. Neurosci., 9: 956-964.
314
Linking Psychophysics and Physiology of Center-Surround Interactions
Xiao, D. K., Raiguel, S. E., Marcar, V., Koenderink, J. J. and Orban, G. A. (1995). Spatial heterogeneity of inhibitory surrounds in the middle temporal visual area. Proc. Nat. Acad. Sci. USA, 92: 11303-11306. Xiao, D. K., Raiguel, S. E., Marcar, V. and Orban, G. A. (1997a). Spatial distribution of the antagonistic surround of MT/V5 neurons. Cereb. Cortex, 7: 662-677. Xiao, D. K., Raiguel, S. E., Marcar, V. and Orban, G. A. (1998). Influence of stimulus speed upon the antagonistic surrounds of area MT/V5 neurons. Neuroreport, 9: 1321-1326. Zeki, S., Perry, R. J. and Bartels, A. (2003). The processing of kinetic contours in the brain. Cerebral Cortex, 13: 189-202.
15. Transparent Motion: A Powerful Tool to Study Segmentation, Integration, Adaptation, and Attentional Selection Thomas Papathomas, Zoltan Vidnyanszky, and Erik Blaser 15.1 Introduction Martin Regan and his colleagues made significant contributions to how motion cues alone can segregate surfaces from the background, thus enabling the form of the moving surface to be recognized (Beverley and Regan, 1980; Regan and Beverley, 1984; Regan, 1986; Regan and Hamstra, 1991; Kohly and Regan, 2002). Regan's classical demonstration illustrates this point brilliantly. Back in the days of using viewgraphs on overhead projectors, the demonstration consisted of two overlaid transparent sheets. One sheet contained the background, or noise: numerous randomly placed segments with random orientations that formed a dense texture covering most of the sheet's area. The other sheet contained a figure that depicted a small flying bird, which was rendered with the same type of randomly oriented segments. When the sheets were simply overlaid and placed on the projector, the "bird" was masked by the noise and was invisible. However, as soon as the figure sheet was moved to and fro, the bird was seen "flying" transparently through the background, revealing the details of its form.: 'For a modern version of this demonstration, the reader may visit Akos Feher's implementation at http://zeus.mtgers.edurfeher/bird/examplel.html. 315
316
Transparent Motion
Regan's transparent "flying bird" demonstration illustrates the power of motion to readily segment spatially superimposed stimuli and evoke the percept of transparent surfaces and objects in the absence of other cues (luminance, color, disparity, etc.). In analyzing how these motion cues enable humans to produce these percepts, one may distinguish two main tasks that the visual system must achieve (Braddick, 1993): on the one hand, it must segregate local motion signals that "belong" to the figure from those that belong to the background or to other objects in the scene; on the other hand, it must group, or integrate the local motion signals that belong to the same object, in order to arrive at a stable representation of the surface and its global motion direction. In this chapter we discuss the basic mechanisms of motion-based segmentation and how bivectorial transparent and nontransparent stimuli can be used as a valuable tool to study the processes of motion adaptation and attentional selection. Section 15.2 presents a summary of psychophysical, neurophysiological, and brain imaging evidence that motion is processed in multiple stages. In section 15.3 we briefly cover transparent motion and the important variant of locally paired dot motion. Section 15.4 is devoted to a review of neural mechanisms that may be involved in motion segmentation and integration. In section 15.5 we discuss the motion aftereffect (MAE) and present studies that investigated the locus of motion adaptation. The special case of MAE following adaptation to transparent motion is taken up in section 15.6, in which we briefly review how the use of motion transparency in studies of motion adaptation provided crucial information about the integration of the different adapted directional signal. Finally, section 15.7 deals with the nature of attentional selection in motion processing, especially in transparent motion, and its relevance in investigating locationbased, feature-based, and object-based accounts of attention.
15.2
Stages of Motion Processing
It is widely accepted that the neural mechanisms for motion perception are best understood as organized into multiple stages. This conclusion has been reached by converging experimental evidence from psychophysical (e.g., Morrone, Burr, and Vaina, 1995; Kaiser, Montegut, and Proffitt, 1995; Bex, Metha, and Makous, 1998), neurophysiological (e.g., Born and Tootell, 1992; Lagae, Maes, Raiguel, Xiao, and Orban, 1994), and brain imaging (e.g., Greenlee, 2000; Morrone, Tosetti, Montanaro, Fiorentini, Cioni, and Burr, 2000) studies that have investigated visual motion perception and corresponding neural processing (see Blake, Sekuler, and Grossman (2004) for an excellent review). The first stage of motion processing detects "local motion" signals at, relatively, the smallest possible scale. This stage is subserved by neurons in the striate cortex (VI) that are known to have very small receptive fields (for review, see Snowden, 1994). As a result of their small receptive fields, these neurons may individually register velocities that can be quite different from the "global motion" of a moving object, as shown in figure 15.1. This figure demonstrates that local-velocity signals need to be processed by subsequent stages to extract the global motion of objects. This global motion contributes to the parsing of the visual scene into surfaces and objects (Braddick, 1993), as manifested in motion transparency (section 15.3).
Thomas Papathomas, Zoltan Vidnyanszky, and Erik Blaser
317
Figure 15.1: A gray disk is shown moving to the right, as indicated by the large arrow. The small numbered circles denote local areas of the visual field that are comparable to the size of VI neurons. Direction-selective VI neurons, which form the first stage of motion processing, would fire for velocities that are within a narrow band of the direction indicated by the small arrows. It is interesting to note that only locations 1 and 5 signal rightward motion; locations 2 and 6 signal motion along the +45 ° direction, locations 4 and 8 along the —45° direction, whereas locations 3 and 7 contains very little directional motion. The need for subsequent stages of motion processing is obvious. The middle temporal area (MT, also known as V5) seems to be involved significantly in the global motion stage of motion processing that follows local motion processing in VI (Newsome, Britten, Salzman, and Movshon, 1990; Movshon and Newsome, 1992). There is physiological evidence that MT plays a role in the extraction of complex motion patterns (Rodman and Albright, 1987) including motion in depth (Maunsell and Van Essen, 1983), in the segregation of surfaces that move relative to the background (Movhson and Newsome, 1992), in the furnishing of signals for pursuit eye movement (Recanzone and Wurtz, 1999; Movshon, Lisberger, and Krauzlis, 1990), and in the motion aftereffect (Huk, Ress, and Heeger, 2001). Grunewald, Bradley, and Andersen (2002) reported on neurophysiological experiments that relate global motion percepts to neural tuning properties in area MT, but not in VI. Parenthetically, there is strong evidence for several other motion processing areas that are responsible for more complex types of motion. Indeed, this is to be expected, as motion processing is too elaborate to involve just two stages. For example, Duffy and Wurtz (1991) recorded from the medial superior temporal (MST) area, which is further than MT in the hierarchy. They found neurons that were selective to rotational and translational motion, and they concluded that MST contributes to the analysis of optic flow fields. Lagae et al. (1994) compared the response of macaque MT and MST cells to translation and to elementary optic flow components (rotation, deformation, and
318
Transparen t Motion
expansion/contraction) and found, among other things, that MST cells responded more to rotation, and were more selective for expansion/contraction, than MT cells; MST cells also exhibited position invariance to optic flow components, unlike MT cells. In addition to these neurophysiological studies, several psychophysical studies provided evidence for mechanisms that are specialized for particular patterns of complex motion (Blake, 1995; Meese and Harris, 2001).
15.3 Transparent and Non-Transparent Bi-Vectorial Motion The need for the local motion signals to go through a process of segregation and integration to form surfaces and objects is best illustrated by the case of transparent motion. The two most commonly used stimuli for transparent motion are gratings (Levinson and Sekuler, 1975; Adelson and Movshon, 1982) and random dots (Julesz, 1965; Braddick, 1973). Since we used random-dot stimuli in our experiments, we will concentrate on this type of stimulus in this chapter. Random-dot displays contain two families of randomly positioned, interspersed dots. The two families move at different velocities ("bivectorial"), while the dots within each family share the same velocity (see figure 15.2A, B for two families of dots that move in orthogonal and opposite directions, respectively). If the directions and/or speeds of the two families are sufficiently different, then a robust percept of two surfaces that slide transparently over one another is created. If the speeds are the same, then a direction difference of about 10-30° is sufficient to elicit transparent motion (Treue, Hoi, and Rauber, 2000; Zanker, 2000, 2002). If the speeds are sufficiently different, then transparency is still created (Verstraten, van der Smagt, and van de Grind, 1998), even if the families move in the same direction. Qian and his colleagues (Qian, Andersen, and Adelson, 1994a, b; Qian and Andersen, 1994) created a class of stimuli that proved to be a very powerful tool for studying motion transparency, segregation, and integration. They manipulated a typical transparent random-dot motion display by locally pairing each dot from one family with a dot from the other family. To use a concrete example, let us assume that the two families of dots move in opposite directions, one to the right, and the other to the left (figure 15.2B). Then, to create a "locally paired dot" display, we would pair each rightward-moving dot in each frame of the animation with a leftward moving dot as shown in figure 15.2D. Provided that the distance between dots in each pair is sufficiently small, this local pairing has a dramatic effect on the perceived motion: it produces directionless flicker, as if the opposite-direction motion components cancel each other out. Observers do not have any impression of coherent motion along any direction, but instead see a dynamic flickering pattern. In general, bidirectional locally paired dot stimuli produce unidirectional motion with a velocity which is the vector average of the component velocities, as observed experimentally by Curran and Braddick (2000). If, instead of opposite, the constituent velocities are orthogonal, composed of rightward and upward motions as in figure 15.2C, observers perceive motion along the 45° diagonal.
Thomas Papathomas, Zoltan Vidnyanszky, and Erik Blaser
319
Figure 15.2: Random-dot stimuli with two dot families for transparent motion. The rectangular panels show the motion stimuli schematically, with small arrows showing the direction of motion. Dots in the two families are rendered in black and white to facilitate the notation, but they can be of identical color in motion perception experiments. The distribution of white dots is the same in all the panels. The circular icons below each panel indicate the motion percept symbolically. A, B: Classical randomdot stimuli for transparent motion in orthogonal (A) and opposite (B) directions. C, D: Locally paired dots. C and D are the same as A and B, respectively, but now each white dot has been locally paired with a black dot (in D, the dot trajectories are shown with a small vertical offset; in the real stimuli, the trajectories are coincident; i.e., they occur along the same horizontal line). This results in motion in the vector average of the component velocities. Thus, in C, observers perceive motion along the +45 °; in D, the opposite-direction vectors cancel each other to produce directionless flicker; nevertheless, observers can perceive horizontally oriented patterns, indicated schematically by the horizontal headless arrows.
320
Transparent Motion
Figure 15.3: Detailed representation of one dot pair in the locally paired dot stimulus of figure 15.2C, where "black" and "white" dots move upward and rightward with velocities V and H, respectively. A: The number in each dot denotes the temporal frame that the corresponding dot appears in. The "center-of-gravity" element in each frame is shown by an icon composed of a pair of small touching circles. This element moves with a velocity M. B: The resultant velocity M is the vector mean of H and V, because it has the same direction as the vector sum S of H and V, and half its size. A heuristic argument as to why the two component motions integrate into a unitary motion whose velocity is the vector average, rather than the vector sum, is as follows. Because of the spatial proximity of the dots in a pair, a simplistic assumption is to consider that, in each frame, the dot pair is equivalent to a single "compound" element located in the middle of the distance between the dots. In figure 15.3A, this "center-ofgravity" element is shown by an icon composed of a pair of small touching circles. In figure 15.3B vectors H and V are the component vectors of the horizontal and vertical dot motions, respectively. The motion of the compound element could determine the global perceived motion. As shown in figure 15.3, the resultant velocity of this element is half of S, the vector sum of H and V, i.e., the vector mean of H and V.
15.4
Neural Mechanisms of Motion Segmentation and Integration
In general, VI direction-tuned neurons respond almost equally strongly when motion in their preferred direction is presented by itself as they do when it is one of the components of a transparent motion display (Snowden, Treue, Erickson, and Andersen, 1991; Qian and Andersen, 1995). In contrast, the response of MT neurons to motion in their preferred direction is inhibited in the transparent motion case (Snowden, Treue, Erickson, and Andersen, 1991). Muckli, Singer, Zanella, and Goebel (2002) used fMRI techniques and found that the human MT complex (MT/MST or hMT/V5) responded
Thomas Papathomas, Zoltan Vidnyanszky, and Erik Blaser
321
differently to transparent motion than to a single motion vector. Thus, the evidence indicated that it was MT, and not VI, where different motion directions interact and global motion directions are computed. The failure of humans to perceive the individual component motion vectors when presented with locally paired dot stimuli could be the result of, among others, the following: (i) These components are too close, and they are not resolved spatially; thus they are not registered as different motion directions anywhere in the visual system. Or, (ii) they are registered in some brain areas, but they are integrated at some later stage; thus no global motion is perceived in the component direction. Physiological experiments with single-cell recordings on macaque monkeys (Qian and Andersen, 1994) and fMRI brain imaging studies with humans (Heeger, Boynton, Demb, Seidemann, and Newsome, 1999) have provided evidence in favor of the second alternative. Namely, when comparing the response of areas VI and MT with opposite-motion transparent and locally paired dot stimuli, the pattern of results is quite different. VI neurons respond almost identically to both types of stimuli, indicating that the motion component signals produce distinct direction specific neural responses in VI in both cases. However, neurons in MT respond significantly more vigorously to transparent than to locally paired dot displays. Qian and Andersen (1994) concluded that "these results demonstrate a neural correlate of the perceptual transparency at the level of MT." It is widely accepted that MT neurons that are excited by the presence of motion signals in, or close to, their preferred direction send inhibitory signals to neurons that are tuned to different directions (Snowden, Treue, Erickson, and Andersen, 1991; Mikami, 1992) as long as these neurons are in their immediate "suppression vicinity." These inhibitory signals are thought to have a broad direction tuning (Snowden and Verstraten, 1999), as indicated by psychophysical results (Snowden, 1989); the inhibitory influence peaks for the opposite-to-preferred direction. These interactions may play a role in smoothing the spatial distribution of velocities, allowing later stages to interpolate spatially disjoint pieces of the same surface that share similar velocities. Parametric studies in which one varies the distance between elements of each pair in a locally paired dot stimulus (Qian, Andersen, and Adelson, 1994a) have shown that the size of this "suppression vicinity" is about 10-20 arc min. By construction, bivectorial locally paired displays ensure that any given dot moving in a particular direction is paired within this suppression vicinity - with a dot moving in the other direction (as opposed to conventional bivectorial transparent motion, where it is highly improbable that a dot has a different-direction partner within its suppression vicinity). Thus, the failure of the visual system to perceive motion transparency with locally paired dot displays appears to be due to the failure of neurons in MT (and possibly other areas) to segregate the motion components. The mutual suppression of neighboring neurons tuned to opposite motion directions may account for the lack of a transparent motion percept in the case of locally paired dots. However, this suppression cannot explain the percept of a unidirectional field that moves at the vector-average velocity when presented with locally paired dots moving at different nonopposite directions. Curran and Braddick (2000) hypothesize that, in addition to the suppression, the visual system also employs a pooling operation across neurons that are tuned to different directions. There is also physiological evidence that MT neurons play a role in integrating similar motion signals across the visual field by providing facilitatory signals to nearby
322
Transparent Motion
neurons that are tuned to similar directions of motion (Mikami, 1992; Livingstone, Pack, and Born, 2001). However, these facilitatory signals are much weaker than the inhibitory signals in MT. Recanzone, Wurtz, and Schwarz (1997) studied the responses of MT and MST neurons to one and two moving objects and observed that the twoobject responses were predicted by averaging the responses that the neuron gave in the two single-object conditions. They concluded that "areas MT and MST probably use a similar integrative mechanisms to create their responses to complex moving visual stimuli."
15.5 Integration of Motion Directions during the Motion Aftereffect (MAE) The most striking manifestation of adaptation in the motion mechanisms is the motion aftereffect (Aristotle, ca. 330 B.C.), also known as the "waterfall illusion" (Addams, 1834; Wade and Verstraten, 1998): after prolonged exposure to motion of the same type (translational, rotational, expanding, helical), observers experience an illusory motion bias in the opposite direction when viewing a subsequent test field. The best test field to yield an unadulterated illusory motion percept is a field that has no net directional motion. It can either be completely static, be scintillating randomly, or contain motion signals that are balanced in all directions (the use of static vs. dynamic test fields is itself a separate issue that can provide useful information on motion mechanisms (Nishida and Sato, 1995)). Thus, for example, if the adapting stimulus moves to the right, or rotates clockwise, expands, or spirals outward clockwise, then the MAE exhibits motion, respectively, to the left, rotating counterclockwise, shrinking, or spiraling inward counterclockwise - albeit more weakly in absolute magnitude than the adapting motion. The fact that the MAE "copies in reverse" the adapted type of motion so faithfully suggests that the same neural mechanisms that are involved in the percept of the adapting motion itself may also participate in the generation of the MAE. On the other hand, there are reports of "cross-attribute MAE," in which the adapting and the test stimuli are defined by entirely different attributes (luminance and stereoscopic disparity), yet an MAE is obtained (Patterson, Bowd, Phinney, Pohndorf, Barton-Howard, and Angilletta, 1994; Bowd, Donnelly, Shorter, and Patterson, 2000). This suggests that this cross-attribute MAE is due to some common motion mechanisms that are shared by the luminance-driven and disparity-driven motion. Numerous psychophysical studies have addressed the issue of the MAE locus (e.g., Wiesenfelder and Blake, 1990; Raymond, 1993; Nishida and Sato, 1995; Blake, 1995). The consensus from these studies is that the MAE involves multiple processing stages, just as motion perception itself does. A number of single-cell physiological studies have also addressed the MAE locus (Giaschi, Douglas, Marlin, and Cynader, 1993; Hammond, Mouat and Smith, 1986; Kohn and Movshon, 2003) and concluded that the known properties of cortical neurons account for both adaptation and MAE phenomena, namely, that the illusory motion during the MAE is subserved by the same mechanisms that subserve real motion. The MAE locus issue has been investigated very actively in brain imaging studies in recent years. Most of these have concluded
Thomas Papathomas, Zoltan Vidnyanszky, and Erik Blaser
323
that MT is very active during the percept of MAE (Tootell et al., 1995; He, Cohen and Hu, 1998; Culham et al., 1999; Taylor, Schmitz et al., 2000; Huk, Ress and Heeger, 2001; Hautzel et al., 2001; Berman and Colby, 2002). The team of Taylor et al. (2000) initially interpreted their findings as implicating other brain areas, such as the cingulate gyrus; however, in a subsequent study, the same team (Hautzel et al., 2001) observed that these other areas are not selectively activated during the MAE, since they were also active during a reference condition in which the MAE was absent. They concluded that "the perceptual illusion of motion (during MAE) arises exclusively in the motion-sensitive visual area V5/MT." Those authors believe that more brain imaging studies are needed, with a wide variety of adapting and testing stimuli, to assess the neural site(s) involved in the many different manifestations of the MAE.
15.6
MAE with Transparent Motion - Integration during MAE
It turns out that the MAE following adaptation to transparently moving stimuli is not itself transparent, but is unidirectional (Mather, 1980; Verstraten, Fredericksen, and van de Grind, 1994), and moves with a velocity which is the opposite of the vector average of the component velocities. This is a case where the MAE does not provide the opposite of the motion percept during adaptation. Thus, transparent motion, in addition to being a useful tool for studying the neural correlates of motion detection, offers another rich area in the study of the motion aftereffect. In fact, this observation of unidirectional MAE after adaptation to transparent motion has challenged early MAE theories and models (for a review of these models, see Mather and Harris, 1998), such as the different-direction ratio model (Sutherland, 1961) and the opposite-direction temporalimbalance model (Barlow and Hill, 1963), and it motivated the development of models that account for this effect. For example, Mather's distribution-shift theory (1980) emphasizes interactions among outputs from direction-selective neurons that are tuned to the entire spectrum of directions, rather than only those tuned to opposite directions. Wilson and his colleagues (Wilson, Ferrera, and Yo, 1992; Wilson and Kim, 1994) developed computational models, based on the distribution-shift theory, that incorporated additional stages of motion processing, as evidenced by research results after Mather's 1980 paper. The model by Wilson and Kim (1994) contains a sensor layer, followed by an opposite-direction opponency layer, ascribed to brain areas VI and V2, and a multidirection motion integration layer, ascribed to area MT. Grunewald (1996) developed a similar model, which accounts for the so-called "orthogonal MAE" resulting from adaptation to oppositely directed transparent motion (Grunewald and Lankheet, 1996). We have proposed a simple theory for this unidirectional MAE, based on a strong analogy between locally-paired dot motion and the MAE following adaptation to transparent motion (Vidnyanszky, Blaser, and Papathomas, 2002). In addition, this theory helps explain several other transparent-MAE phenomena, as discussed below. The key insight was the realization that the motion-adapted system responds to test stimuli in MAE studies in much the same way that the unadapted visual system responds to
324
Transparent Motion
locally paired dot motion stimuli. For illustration purposes, let us consider two component motions, upward and rightward. First, examine locally-paired dot motion: each location in the visual field supplies two directions to the unadapted visual system; we know that these two motion components are integrated to elicit unidirectional motion, moving with the vector-average velocity along the 45 ° direction. Second, consider adaptation to two transparent components moving leftward and downward. If either of the two directions were present alone, the ensuing MAEs would be rightward and upward, respectively. The presence of both of them in the adapted system supplies two directions of motion in every location of the adapted visual field. Thus, just as in the locally paired dot case, the predicted end result is the (observed) unidirectional motion along the 45° direction. Our theory can account for at least two additional MAE phenomena, namely: (1) A seeming "transparent" motion aftereffect resulting from adaptation to bivectorial transparent motion with the two dot families assigned to different binocular disparities2 (Verstraten, Verlinde, Fredericksen, and van de Grind, 1994). (2) Following adaptation to opposite-moving transparent dots, say, leftward and rightward, Grunewald and Lankheet (1996) reported orthogonal motion MAE, upward and downward. The first phenomenon is easily explained based on the analogy with locally paired dot motion, since even locally paired dots give rise to transparent motion when the dots in each pair are presented in different depth planes. The second is explained in terms of a distribution-shift model of motion perception (Vidnyanszky, Blaser, and Papathomas, 2002). Namely, adaptation to leftward and rightward motion reduces the relative sensitivity to horizontal motion, and thus in the test phase, given a dynamic test pattern that contains balanced motion in all directions, the stimulus is artificially unbalanced and appears to contain relatively strong upward and downward motion components.
15.7
Nature of Attentional Selection in Motion Processing
Attention has traditionally been thought to affect higher stages of visual processing. Since motion perception had been classified as a "preattentive" process, common wisdom held that motion perception would not be influenced by attention. This may explain why evidence for the effect of attention on motion has been obtained rather late (Chaudhuri, 1990). He used psychophysical methods to provide evidence that directbig attention away from a moving stimulus during adaptation resulted in a reduction of MAE strength. Lankheet and Verstraten (1995) showed that attention to one of the two components in opposite-direction transparent random-dot motion biased the normally nondirected MAE in favor of the attended component. Von Grunau, Bertone, and Pakneshan (1998) obtained similar results for grating-based "plaid" transparent motion. Our research team has obtained evidence for inhibition of the unattended component in transparent motion (Sohn, Vidnyanszky, Blaser, and Papathomas, 2001, 2004). These 2
It must be emphasized, however, that the dots comprising the post-adaptation test field must also be segregated in two depth planes and, further, that these depth planes must be very close to those used during adaptation in order to exhibit transparent MAE.
Thomas Papathomas, Zoltan Vidnyanszky, and Erik Blaser
325
and related psychophysical studies (i.e., Raymond, O'Donnell, and Tipper, 1998; Alais and Blake, 1999; Mukai and Watanabe, 2001) converge with neurophysiological and imaging studies investigating the neural mechanisms that underlie attentional effects in motion perception. In the last few decades, three models of visual attention have emerged. These models are distinguished by what they consider to be the actual targets of visual attention: locations, features, or "objects?" Initially, most studies focused on location as the unit of attentional selection (Posner, Snyder, and Davidson, 1980; Hoffman and Nelson, 1981). When an observer directs attention to an object or a feature at a specific location, then all the objects and features at or near the attended location will also benefit (Shih and Sperling, 1996). It was also observed that attention could be modeled as directed to a visual feature (color, orientation, direction of motion, etc.), whereby attentional enhancement extended to spatially disparate regions of visual space that also contained the attended feature (e.g., Mounts and Melara, 1999; Papathomas, Gorea, Feher, and Conway, 1999; Blaser, Sperling, and Lu, 1999; Vidnyanszky, Sohn, Kovacs, and Papathomas, 2003). This account of attentional selection is supported by some single-cell recording experiments (e.g., Treue and Martinez-Trujillo, 1999) and fMRI studies (e.g., Saenz, Buracas, and Boynton, 2002). Finally, the object-based account of attentional selection holds that — no matter how attention is cued - the actual target of the attention is an object (understood as some "constellation" of visual features), and enhancement will spread to all its features. Under this explanation, even when an observer attempts to just attend to, say, the color of a grating pattern, attention will, by default, wind up allocated to all the features of that object: color, orientation, spatial frequency, and so forth. Direct psychophysical tests of this prediction have supported this view (Duncan, 1984; Desimone and Duncan, 1995; Blaser, Pylyshyn, and Hoicombe, 2000), while neurophysiological and imaging studies have provided converging evidence (O'Craven, Downing, and Kanwisher, 1999; see Duncan, Humphreys, and Ward, 1997, for a review). Transparent motion offers a powerful tool for studying object-based attention. By having observers attend to one of two superimposed surfaces, location-based selection explanations can be all but ruled out, because the dots are distributed hi the same visual region, thereby preventing a spatially preferential deployment of attention; furthermore, the use of limited-lifetime dots discourages the tracking of individual dots (Valdes-Sosa, Cobo, and Pinilla, 2000; Reynolds, Alborzian, and Stoner, 2003; Mitchell, Stoner, Fallah, and Reynolds, 2003; Sohn, Vidnyanszky, Blaser, and Papathomas, 2001). Also, there are no systematic differences in the two dot families' spatial frequency characteristics, because the two families share the same random spatial distribution. Finally, when the dots of the two families have identical luminance and color, then there is no possibility of engaging feature-based attentional selection (Mitchell, Stoner, Fallah, and Reynolds, 2003). Given a bivectorial transparent motion stimulus, one can also introduce a consistent color difference between the two dot families. Then one is able to instruct observers, in the spirit of feature-based allocation of attention, to attend to a particular color. So, under these conditions, will attention spread to the motion of the field so colored (as predicted by object-based theories)? Or remain faithfully, and exclusively, allocated to just color (as predicted by feature-based theories)? We addressed this question by using
326
Transparent Motion
a random-dot transparent-motion paradigm together with the locally paired dot stimulus of Qian, Andersen, and Adelson (1994a) to test whether cross-attribute attentional effects are surface, location, or feature based (Sohn et al., 2004). The stimuli were composed of two sets of dots: the effectors, moving rightward, and the contenders, moving periodically upward and downward, thus contributing nothing to the (leftward) MAE; the name "effectors" denotes that these dots were responsible for the motion aftereffect. Effectors and contenders were always of a different color. Observers attended to the color of the effectors or to that of the contenders. The attentive task was to detect sudden luminance increases in the attended family. We compared the magnitude of the MAE under these two attentional conditions, and we found significant differences: the MAE duration after attending to the color of the effectors was larger than that after attending to the color of the contenders by a factor of approximately 3.14, averaged across four observers. The above results were obtained with the dots in each family independently positioned, creating the percept of two surfaces sliding transparently past each other in orthogonal directions during adaptation, as shown in figure 15.2A. These color-to-motion cross-attribute effects (so named because attending to the color of a surface during adaptation produced a differential effect on the motion mechanisms), as recorded by the MAE, provide evidence for object-based binding of features: attending to the effectors' color enhanced the attended surface's motion processing, producing a larger MAE than attending to the color of the contenders' surface. Moreover, when we repeated the same procedure with the locally paired dots, shown in figure 15.2C (the percept during adaptation was that of a single bicolored surface that moved diagonally) the MAE durations following the attend-to-effectors and the attend-to-contenders adaptation conditions were identical.3 Our interpretation is that, since the locally paired-dot stimulus comprises only one surface, there is no possibility for differential effects of attention: attending to the color of either the effectors or the contenders affects the motion processing of the very same surface, thus producing the same MAE strength. Thus, these results provide strong evidence for object-based attentional selection.
15.8
Conclusions
The motion-based segmentation paradigm introduced by Regan is important not only because it shows how powerful motion is in segmenting objects, but also because it provides a valuable tool to study other processes of visual perception, including adaptation and attentional selection. In particular, the use of transparent random-dot motion stimuli enables the investigation of object-based attention, without the confounding effects of feature-based and location-based attention. Finally, the juxtaposition of motion stimuli consisting of randomly placed transparent dots and locally paired dots affords the study of the locus of motion segmentation, adaptation, and attentional selection by psychophysical and neurophysiological methods. 3
It is important to note that observers were able to distinguish the two families on the basis of their color, as evidenced by their performance on the attentive luminance-change task, which was statistically the same as that in the transparent motion case.
Thomas Papathomas, Zoltan Vidnyanszky, and Erik Blaser
327
Acknowledgments We thank John Tsotsos for extending the invitation to the Regan festschrift and, of course, David Martin Regan for providing the inspiration for many of the studies mentioned in this chapter. Many thanks are due to Laurence Harris and Michael Jenkin for their editorial efforts. Akos Feher helped us with many technical aspects of the work. We are grateful to Wonyeong Sohn for conducting key experiments in our projects, and to Randolph Blake for providing his review article on motion processing in visual cortex. Our research projects have been supported by grant EYO1375 8-01 from NEI/NIH.
References Addams, R. (1834). An account of a peculiar optical phaenomenon seen after having looked at a moving body. London and Edinburgh Philosophical Magazine and Journal of Science, 5: 373-374. Adelson, E. H. and Movshon, J. A. (1982). Phenomenal coherence of moving visual patterns. Nature, 300: 523-525. Alais, D. and Blake, R. (1999). Neural strength of visual attention gauged by motion adaptation. Nat. NeuroscL, 1: 1015-1018. Aristotle (1955). Parva Naturalia. Revised text with introduction and commentary by W. D. Ross. Oxford University Press. Barlow, H. B, and Hill, R. M. (1963). Evidence for a physiological explanation for the waterfall phenomenon and figural aftereffects. Nature, 200: 1345-1347. Berman, R. A. and Colby, C. L. (2002). Auditory and visual attention modulate motion processing in area MT. Brain Res. Cogn. Brain Res., 14: 64-74. Beverley, K. I. and Regan, D. (1980). Visual sensitivity to the shape and size of a moving object: Implications for models of object perception. Perception, 9: 151-160. Bex, P. J., Metha, A. B. and Makous, W. (1998). Psychophysical evidence for a functional hierarchy of motion processing mechanisms. J. Opt. Soc. Am. A, 15: 769-776. Blake, R. (1995). Psychoanatomical strategies for studying human visual perception. In T. V. Papathomas, C. Chubb, A. Gorea, and E. Kowler (Eds.), Early Vision and Beyond, pp. 17-25. MIT Press: Cambridge, MA. Blake, R., Sekuler, R. and Grossman, E. (2004). Motion processing in human visual cortex. In J. H. Kaas and C. E. Collins (Eds.), The Primate Visual System, pp. 311-344. CRC Press: New York. Blaser, E., Sperling, G. andLu, Z. L. (1999). Measuring the amplification of attention. Proc. Natl. Acad. Sci. USA, 96: 11681-11686. Blaser, E., Pylyshyn, Z. W. and Holcombe, A. O. (2000). Tracking an object through feature space. Nature, 408: 196-199.
328
Transparent Motion
Born, R. T. and Tootell, R. B. (1992). Segregation of global and local motion processing in primate middle temporal visual area. Nature, 357: 497-499. Bowd, C., Donnelly, M., Shorter, S. and Patterson, R. (2000). Cross-domain adaptation reveals that a common mechanism computes stereoscopic (cyclopean) and luminance plaid motion. Vis. Res., 40: 331-339. Braddick, O. (1973). The masking of apparent motion in random-dot patterns. Vis. Res., 13: 355-369. Braddick, O. (1993). Segmentation versus integration in visual motion processing. Trends Neurosci., 16: 263-268. Chaudhuri, A. (1990). Modulation of the motion aftereffect by selective attention. Nature, 344: 60-62. Culham, J. C., Dukelow, S. P., Vilis, T., Hassard, F. A., Gati, J. S., Menon, R. S. and Goodale, M. A. (1999). Recovery of fMRI activation in motion area MT following storage of the motion aftereffect. J. NeurophysioL, 81: 388-393. Curran, W. and Braddick, O. J. (2000). Speed and direction of locally-paired dot patterns. Vis. Res., 40: 2115-2124. Desimone, R. and Duncan, J. (1995). Neural mechanisms of selective visual attention. Ann. Rev. Neurosci., 18: 193-222. Duncan, J. (1984). Selective attention and the organization of visual information. J, Exp. Psychol. Gen., 113: 501-517. Duncan, J., Humphreys, G. and Ward, R. (1997). Competitive brain activity in visual attention. Curr. Opin. NeurobioL, 1: 255-610. Duffy, C. J. and Wurtz, R. H. (1991). Sensitivity of MST neurons to optic flow stimuli. I. A continuum of response selectivity to large-field stimuli. J. NeurophysioL, 65: 1329-1345. Giaschi, D., Douglas, R., Martin, S. and Cynader, M. (1993). The time course of direction-selective adaptation in simple and complex cells in cat striate cortex. /. NeurophysioL, 70: 2024-2034. Greenlee, M. W. (2000). Human cortical areas underlying the perception of optic flow: Brain imaging studies. Int. Rev. NeurobioL, 44: 269-292. Grunewald, A. (1996). A model of transparent motion and non-transparent motion aftereffects. In S. Touretzky, B. Fritzke, and T. Leen (Eds.), Advances in Neural Information Processing Systems, Vol. 8, pp. 837-843. MIT Press: Cambridge, MA. Grunewald, A., Bradley, D. C. and Andersen, R. A. (2002). Neural correlates of structure-from-motionperception in macaque VI and MT. J. Neurosci., 22: 61956207. Grunewald, A. and Lankheet, M. J. (1996). Orthogonal motion after-effect illusion predicted by a model of cortical motion processing Nature, 384: 358-360. Hammond, P., Mouat, G. S. and Smith, A. T. (1986). Motion after-effects in cat striate cortex elicited by moving texture. Vfo. Res., 26: 1055-1060.
Thomas Papathomas, Zoltan Vidnyanszky, and Erik Blaser
329
Hautzel, H., Taylor, J. G., Krause, B. J., Schmitz, N., Tellmann, L., Ziemons, K., Shah, N. J., Herzog, H. and Muller-Gartner, H. W. (2001). The motion aftereffect: More than area V5/MT? Evidence from 15O-butanol PET studies. Brain Res., 892: 281-292. He, S., Cohen, E. R. and Hu, X. (1998). Close correlation between activity in brain area MT/V5 and the perception of a visual motion aftereffect. Curr. Biol., 8: 1215-1218. Heeger, D. J., Boynton, G. M, Demb, J. B., Seidemann, E. and Newsome, W. T. (1999). Motion opponency in visual cortex. J. Neurosci., 19: 7162-7174. Hoffman, J. E. and Nelson, B. (1981). Spatial selectivity in visual search. Percept. Psychophys., 30: 283-290. Huk, A. C, Ress, D. and Heeger, D. J. (2001). Neuronal basis of the motion aftereffect reconsidered. Neuron, 32: 161-172. Julesz, B. (1965). Texture and visual perception. Sci. Am., 212: 38^8. Kaiser, M. K., Montegut, M. J. and Proffitt, D. R. (1995). Rotational and translational components of motion parallax: Observers' sensitivity and implications for three-dimensional computer graphics. J. Exp. Psychol. Appl., 1: 321-331. Kohly, R. and Regan, D. (2002). Fast long-range interactions in the early processing of motion-defined form and of combinations of motion-defined, luminance-defined, and cyclopean form. Vis. Res., 42: 661-668. Kohn, A. and Movshon, J. A. (2003). Neuronal adaptation to visual motion in area MT of the macaque. Neuron, 39: 681-691. Lagae, L., Maes, H., Raiguel, S., Xiao, D. K. and Orban, G. A. (1994). Responses of macaque STS neurons to optic flow components: a comparison of areas MT and MST. J. Neurophysiol., 71: 1597-1626. Lankheet, M. J. and Verstraten, F. A. (1995). Attentional modulation of adaptation to two-component transparent motion. Vis. Res., 35: 1401-1412. Levinson, E. and Sekuler, R. (1975). The independence of channels in human vision selective for direction of movement. J. Physiol. (Land.), 250: 347-366. Livingstone, M. S., Pack, C. C. and Born, R. T. (2001). Two-dimensional substructure of MT receptive field. Neuron, 30: 781-793. Mather, G. (1980). The movement aftereffect and a distribution-shift model for coding the direction of visual movement. Perception, 9: 379-392. Maunsell, J. H. and Van Essen, D. C. (1983). Functional properties of neurons in middle temporal visual area of the macaque monkey. II. Binocular interactions and sensitivity to binocular disparity. J. Neurophysiol., 9: 1148-1167. Meese, T. S. and Harris, M. G. (2001). Independent detectors for expansion and rotation, and for orthogonal components of deformation. Perception, 30: 11891202.
330
Transparent Motion
Mikami, A. (1992). Spatiotemporal characteristics of direction-selective neurons in the middle temporal visual area of the macaque monkeys. Exp. Brain Res., 90: 40-46. Mitchell, J. R, Stoner, G. R., Fallah, M. and Reynolds, J. H. (2003). Attentional selection of superimposed surfaces cannot be explained by modulation of the gain of color channels. Vis. Res., 43: 1323-1328. Morrone, M. C, Burr, D. C. and Vaina, L. M. (1995). Two stages of visual processing for radial and circular motion. Nature, 376: 507-509. Morrone, M. C., Tosetti, M., Montanaro, D., Fiorentini, A., Cioni, G., and Burr, D. C. (2000). A cortical area that responds specifically to optic flow, revealed by fMRI. Nat. Neurosci., 3: 1322-1328. Mounts, J. R. and Melara, R. D. (1999). Attentional selection of objects or features: Evidence from a modified search task. Percept. Psychophys., 61: 322-341. Movshon, J. A., Adelson, E. H., Gizzi, M. S. and Newsome, W. T. (1986). The analysis of moving visual patterns. In C. Chagas, R. Gattas, and C. G. Gross (Eds.), Pattern Recognition Mechanisms, pp. 117-151. Springer-Verlag, New York. Movshon, J. A., Lisberger, S. G. and Krauzlis, R. J. (1990). Visual cortical signals supporting smooth pursuit eye movements. Cold Spring Harb. Symp. Quant. Biol, 55: 707-716. Movshon, J. A. and Newsome, W. T. (1992). Neural foundations of visual motion perception. Curr. Dir. Psychol. Sci., 1: 35-39. Muckli, L., Singer, W., Zanella, F. E. and Goebel, R. (2002). Integration of multiple motion vectors over space: An fMRI study of transparent motion perception. Neuroimage, 16: 843-856. Mukai, I. and Watanabe, T. (2001). Differential effect of attention to translation and expansion on motion aftereffects (MAE). Vis. Res., 41: 1107-1117. Newsome, W. T, Britten, K. H., Salzman, C. D. and Movshon, J. A. (1990). Neuronal mechanisms of motion perception. Cold Spring Harb. Symp. Quant. Biol.,55: 697-705. Nishida, S. and Sato, T. (1995). Motion aftereffect with flickering test patterns reveals higher stages of motion processing. Vis. Res., 35: 477—490. O'Craven, K., Downing, R and Kanwisher, N. (1999). fMRI evidence for objects as the units of attentional selection. Nature, 401: 584-587. Papathomas, T. V., Gorea, A, Feher, A. and Conway, T. E. (1999). Attention-based texture segregation. Percept. Psychophys.,61: 1399-1410. Patterson, R., Bowd, C., Phinney, R., Pohndorf, R., Barton-Howard, W. J. and Angilletta, M. (1994). Properties of the stereoscopic (cyclopean) motion aftereffect. Vis. Res., 34: 1139-1147. Posner, M. L, Snyder, C. R. and Davidson, B. J. (1980). Attention and the detection of signals. Exp. Psychol. Gen., 109: 160-174.
Thomas Papathomas, Zoltan Vidnyanszky, and Erik Blaser
331
Qian, N. and Andersen, R. A. (1994). Transparent motion perception as detection of unbalanced motion signals. II. Physiology. /. Neurosci., 14: 7367-7380. Qian, N. and Andersen, R. A. (1995). VI responses to transparent and nontransparent motions. Exp. Brain Res., 103: 41-50. Qian, N., Andersen, R. A. and Adelson, E. H. (1994a). Transparent motion perception as detection of unbalanced motion signals. I. Psychophysics. J. Neurosci., 14: 7357-7366. Qian, N., Andersen, R. A. and Adelson, E. H. (1994b). Transparent motion perception as detection of unbalanced motion signals. III. Modeling. J. Neurosci., 14: 7381-7392. Raymond, J. E. (1993). Complete interocular transfer of motion adaptation effects on motion coherence thresholds. Vis. Res., 33: 1865-1870. Raymond, J. E., O'Donnell, H. L. and Tipper, S. P. (1998). Priming reveals attention modulation of human motion sensitivity. Vis. Res., 38: 2863-2867. Recanzone, G. H. and Wurtz, R. H. (1999). Shift in smooth pursuit initiation and MT and MST neuronal activity under different stimulus conditions. J. Neurophysiol., 82: 1710-1727. Recanzone, G. H., Wurtz, R. H. and Schwarz, U. (1997). Responses of MT and MST neurons to one and two moving objects in the receptive field. J. Neurophysiol., 78: 2904-2915. Regan, D. and Beverley, K. I. (1984). Figure-ground segregation by motion contrast and by luminance contrast. J. Opt. Soc. Am., 1: 433-442. Regan, D. (1986). Form from motion parallax and form from luminance contrast: Vernier discrimination. Spatial Vis., 1: 305-318. Regan, D. and Hamstra, S. (1991). Shape discrimination for motion-defined and contrastdefined form: Squareness is special. Percept., 20: 315-336. Reynolds, J. H., Alborzian, S. and Stoner, G. R. (2003). Exogenously cued attention triggers competitive selection of surfaces. Vis. Res., 43: 59-66. Rodman, H. R. and Albright, T. D. (1987). Coding of visual stimulus velocity in area MT of the macaque. Vis. Res., 27: 2035-2048. Saenz, M., Buracas, G. T. and Boynton, G. M. (2002). Global effects of feature-based attention in human visual cortex. Nat. Neurosci., 5: 631-632. Seidemann, E. and Newsome, W. T. (1999). Effect of spatial attention on the responses of area MT neurons. J. Neurophysiol., 81: 1783-1794. Shih, S. I. and Sperling, G. (1996). Is there feature-based attentional selection in visual search? J. Exp. Psychol. Hum. Percept. Perform., 22: 758-779. Snowden, R. J. (1989). Motions in orthogonal directions are mutually suppressive. /. Opt. Soc. Am. A, 6: 1096-1101. Snowden, R. J. (1994). Motion processing in the primate cerebral cortex. In A. T. Smith and R. J. Snowden (Eds.), Visual Detection of Motion, pp. 51-83. Academic Press, London.
332
Transparent Motion
Snowden, R. J., Treue, S., Erickson, R. G. and Andersen, R. A. (1991). The response of area MT and VI neurons to transparent motion. J. NeuroscL, 11: 2768-2785. Snowden, R. J. and Verstraten, F. A. (1999). Motion transparency: Making models of motion perception transparent. Trends Cogn. Sci., 3: 369-377. Sohn, W., Papathomas, T. V., Blaser, E. and Vidnyanszky, Z. (2004). Object-based cross-attribute attentional modulation from color to motion. Vis. Res., 44: 14371443. Sohn, W., Vidnyanszky, Z., Blaser, E. and Papathomas, T. V. (2001). Attention to one component of bivectorial transparent motion strongly inhibits the processing of the unattended component. J. Vis., 1: 85a. Sutherland, N. S. (1961). Figural aftereffects and apparent size Q. J. Exp. Psychol., 13: 222-228. Taylor, J. G., Schmitz, N., Ziemons, K., Grosse-Ruyken, M. L., Gruber, O., MuellerGaertner, H. W. and Shah, N. J. (2000). The network of brain areas involved in the motion aftereffect. Neuroimage, 11: 257-270. Tootell, R. B., Reppas, J. B., Dale, A. M., Look, R. B., Sereno, M. I., Malach, R., Brady, T. J. and Rosen, B. R. (1995). Visual motion aftereffect in human cortical area MT revealed by functional magnetic resonance imaging. Nature, 375: 139141. Treue, S., Hoi, K. and Rauber, H. J. (2000). Seeing multiple directions of motionphysiology and psychophysics. Nat. NeuroscL, 3: 270-276. Treue, S. and Martinez-Trujillo, J. C. (1999). Feature-based attention influences motion processing gain in macaque visual cortex. Nature, 399: 575-579. Valdes-Sosa, M., Cobo, A. and Pinilla, T. (2000). Attention to object files defined by transparent motion. J. Exp. Psychol. Hum. Percept. Per/., 26: 488-505. Verstraten, F. A., Fredericksen, R. E. and van de Grind, W. A. (1994). Movement aftereffect of bi-vectorial transparent motion. Vis. Res., 34: 349-358. Verstraten, F. A., van der Smagt, M. J. and van de Grind, W. A. (1998). Aftereffect of high-speed motion. Percept., 27: 1055-1066. Verstraten, F. A., Verlinde, R., Fredericksen, R. E. and van de Grind, W. A. (1994). A transparent motion aftereffect contingent on binocular disparity. Percept., 23: 1181-1188. Vidnyanszky, Z., Blaser, E. and Papathomas, T. V. (2002). Motion integration during motion aftereffects. Trends Cogn. Sci., 6: 157-161. Vidnyanszky, Z., Sohn, W., Kovacs, G. and Papathomas, T. V (2003). Global featurebased attentional effects provide evidence for visual binding outside the locus of attention. Perception (Suppl.), 32:. von Grunau, M. W., Bertone, A. and Pakneshan, P. (1998). Attentional selection of motion states. Spat. Vis., 11: 329-347.
Thomas Papathomas, Zoltan Vidnyanszky, and Erik Blaser
333
Wade, N. J. and Verstraten, F. A. (1998). Introduction and historical overview. In G. Mather, F. A. Verstraten, and S. Anstis (Eds.), The Motion Aftereffect: A Modern Perspective, pp. 1-23. MIT Press: Cambridge, MA. Wiesenfelder, H. and Blake, R. (1990). The neural site of binocular rivalry relative to the analysis of motion in the human visual system. J. Neurosci., 10: 3880-3888. Wilson, H. R., Ferrera, V. P. and Yo, C. (1992). A psychophysically motivated model for two-dimensional motion perception. Vis, Neurosci., 9: 79-97. Wilson, H. R. and Kim, J. (1994). A model for motion coherence and transparency. Vis. Neurosci., 11: 1205-1220. Zanker, J. M. (2000). Motion transparency and multiple motion directions. Invest. Ophthal. Vis. Sci.,41: S720. Zanker, J. M. (2002). Motion segmentation and transparency: A computational analysis and some observations. Spat. Vis., 15: 246—247.
This page intentionally left blank
16. Neurological Correlates of Damage to the Magnocellular and Parvocellular Visual Pathways: Motion, Form, and Form from Motion after Cerebral Lesions James A. Sharpe, Ji Soo Kim, and Josee Rivest Motion direction discrimination, assessed with random-dot cinematograms in the peripheral and central visual fields of patients with unilateral cerebral hemispheric lesions, reveals directional asymmetries of foveal motion perception. Direction discrimination is impaired for motion toward the side of their lesions. Some patients have contralateral retinotopic field defects for perception of motion direction in all directions, despite intact fields for static objects. Patients with ipsidirectional defects have lesions that overlap in the white matter underlying the lateral temporo-occipital cortex, involving the magnocellular pathway. Assessment of form perception using an aftereffect paradigm for shape distortion identifies impairment in patients with lesions of the posterior part of the mid and inferior temporal lobe and white matter deep to the fusiform and lingual gyri on the mesial surface of the temporal lobe. Some of these patients have metamorphopsia: an impaired perception of elementary shapes. The impaired perception of shape distortion is nonretinotopic. The posterior and inferior temporal lobe, in the parvocellular stream of visual processing, is critical for perception of shape distortion and elementary shapes. Detection and recognition of form that is derived only from motion has been tested with motion-defined letters. Recognition or detection of motion-defined letters is lost 335
336
Motion, Form and Form, from Motion
after lesions at the lateral parietotemporo-occipital junction. This implicates disruption of connections between area V5 in the magnocellular pathway and elements of the parvocellular pathway.
16.1
Introduction
Visual processing is divided into two parallel but anatomically and functionally interconnected pathways. One, the magnocellular (M) pathway, is concerned with motion and spatial analysis, and the other, the parvocellular (P), with form and color perception (Livingstone and Hubel, 1987, 1988; Deyoe and Van Essen, 1988). Within the cerebral hemisphere of monkeys the M pathway is dorsal to the P pathway and they are designated as the dorsal and ventral streams. The dorsal occipitotemporal-parietal stream conveys information about location and motion, the "where pathway," whereas the more ventral occipito-temporal stream conveys color, form, and pattern information, the "what pathway." The M-visual channel in monkeys projects to striate cortex (area VI), then to cortical areas V2 and V3 and MT (middle temporal, V5), then to the adjacent area MST (medial superior temporal, V5a), and then posterior parietal cortex. Functional imaging and magnetoencephalography (MEG) of the human brain show cortical areas in addition to the ascending limb of the inferior temporal sulcus (the homologue of MT/V5) that are activated by moving patterns (Ahlfors et al., 1999; Nakamura et al., 2003; Schoenfeld et al., 2003; Sunaert et al., 1999). The ascending limb of the inferior temporal sulcus is considered to include satellites of MT and is referred to as hMT/V5+. Other brain areas that also respond to moving in preference to stationary patterns include lingual cortex, lateral occipital sulcus (area LOS/KO), dorsolateralposterior occipitoparietal cortex (hV3A), anterior-dorsal-intraparietal sulcus (DIPSA), postcentral sulcus, and a small area in the precentral cortex, being perhaps the frontal eye field (Ferber et al., 2003; Sunaert et al., 1999). These areas are activated to different amounts in individual subjects, and in some subjects not at all. Only hMT/V5+ is significantly activated in practically every brain (Sunaert et al., 1999). The contributions, if any, of the other areas to motion processing, perception, and behavior are unknown. Lesion as well as functional MRI evidence point to the junction of Brodmann areas 19 and 39 as the homologue of simian MT/MST in humans (Barton et al., 1998; Dursteler and Wurtz, 1988; Morrow and Sharpe, 1990, 1993; Newsome and Pare, 1988; Newsome et al., 1985; Vaina et al., 2004). Functional MRI reveals that both first-order motion (luminance based) and second-order motion (as of textures) activate the same visual areas (Seiffert et al., 2003). The ventral stream projects to inferotemporal (IT) cortex of monkeys. Beyond VI anatomical segregation in processing "where" and "what" is not strict, and the two anatomical streams share functions (Braddick et al., 2000; Ferrera et al., 1992; Maunsell et al., 1990; Nealy and Maunsell, 1994). A homologue of area IT for perception of form in humans has not been established. Little is known of the roles of the human ventral pathway in form perception. Although distorted shape perception, or metamorphopsia, can be a feature of damage to the posterior cerebrum, it is often subtle and has not been quantified or correlated with
James A. Sharps, Ji Soo Kim, and Jos6e Rivest
337
a particular lesion site. The dorsal visual pathway is postulated to govern action, whereas the ventral pathway subserves recognition (Goodale and Milner, 1992; Milner and Goodale, 1993). Thus neurons in the parietal lobe and posterior temporal region (dorsal pathway) may code visual form that drives action such as reaching and saccades to objects, while ventral stream neurons may be responsible for perception of form without action. Positron emission tomography (PET) and functional MRI indicate that regional activity depends on the task performed, not on the passive visual attributes of the stimuli alone (Orban et al., 1996). PET scanning reveals that attention to shape activates the collateral sulcus (similarly to color), fusiform and parahippocampal gyri, and temporal cortex along the superior temporal sulcus (Corbetta et al., 1991). Lesions of the lingual and caudal fusiform gyri cause cerebral achromatopsia in humans with relative sparing of nonchromatic vision. Color and form perception can be dissociated (Zeki, 1990). Functional imaging by PET or MRI demonstrates areas that are metabolically active during tasks or perception but does not reveal which areas are critical elements of the neural networks that are responsible for those actions or perceptions. For human neurobiology, lesion effects on quantifiable elements of behavior are required to establish essential roles of distinct regions of the brain. Described below are behavioral and anatomical correlates of damage to the magnocellular and parvocellular pathways in humans with cerebral lesions as determined by three paradigms used in laboratories at the University Health Network in Toronto to detect disorders of visual motion direction discrimination, form (shape) perception, and form-from-motion detection and recognition.
16.2
Methods
Motion direction discrimination Random-dot cinematograms (RDCs) (figure 16.1) were presented at 16 points in the peripheral visual fields and in the central field of patients and control subjects (Barton et al., 1995,1996). Each frame of the RDC contained stationary white dots (luminance 37.5 cd/m2, size 0.071 deg2) within a small borderless region of a black background. Each RDC stimulus consisted of five frames, each lasting 80 ms and without an interstimulus interval, giving a total RDC duration of 400 ms. Apparent motion was created by displacing dots in successive frames. A subset of dots was programmed to move in a specific direction together (signal dots) (figure 16.1). These signal dots were displaced 0.28 ° between frames, giving an effective velocity of 3.5°/s. The remainder of the dots (noise dots) were randomly repositioned within the stimulus area. Dots moving out of the stimulus area reappeared on the opposite side of the display. The percent coherence motion (%CM) was defined as the number of dots assigned to the signal pool divided by the total number of dots, x 100. Each test consisted of a series of RDC stimuli, which varied in signal direction (right, left, up, down) and %CM. The computer varied the direction of signal motion randomly. An automated stau-case algorithm varied the %CM in the RDCs, starting at 100%CM (only signal dots). Patients replied whether they perceived motion as left, right, up or down; i.e., a
338
Motion, Form and Form, from Motion
Figure 16.1: Schematic of random dot cinematogram. At less than 100% coherence the image contains a mixture of signal and noise dots. four-alternative forced-choice task. For testing the peripheral field, each frame of the RDC contained 50 small white dots randomly plotted within a borderless circle with a diameter of 3° of visual angle, giving a dot density of 1.2 dots/deg2. For testing the central field, each frame of the directional RDC contained 134 white dots within a borderless square of 4° x 4° on a black background, giving a dot density of 8.4 dots/deg2. Patients with focal cerebral infarcts (20 patients) or resections for tumors (three patients) and 17 control volunteers served as subjects. Form perception: the shape distortion effect When two shapes were presented successively and briefly to normal subjects, the form of the second shape (test) appeared dissimilar to the form of the first shape (priming) (figure 16.2). This shape distortion effect may originate from adaptation of nonretinotopic and nonfeature-specific neurons in areas of the ventral stream (Suzuki and Cavanagh, 1998). For each trial, flashed in succession were a priming rectangle (vertically or horizontally; 45 ms), a gap of stimulus onset asynchrony (180 ms), a test circle (60 ms), and a random-dot mask (255 ms). The priming rectangle and the test circle were presented in the middle of each quadrant of intact field. After each trial, using the method of adjustment, the % elongation (longer diameter — shorter diameter)/(shorter diameter x 100) of the reproduced ellipse was computed (Kim et al., 2000). Motion-defined form Like a tiger in the jungle, motion-defined (MD) forms are camouflaged except when they move (Giaschi et al., 1992; Regan et al., 1991, 1992). We used MD letters (Regan et al., 1992) to examine 13 patients with unilateral focal cerebral hemispheric lesions and 20 normal control subjects. MD letters of 50 arc min (equivalent to 20/60 Snellen optotypes) were composed of dots moving at 0.070.45 °/s2 horizontally while background dots moved at the same speed in the opposite
James A. Sharpe, Ji Soo Kim, and Josee Rivest
339
Figure 16.2: Normal shape distortion effect. When two shapes are presented to normal subjects briefly and successively with an interval of about 200 ms, the second shape (test shape, a circle) is perceived dissimilar to the form of the first shape (priming shape, a rectangle). In this case the circle is perceived as an oval with its long axis orthogonal to the priming rectangle. direction (figure 16.3). Correct letter recognition or correct letter detection was determined at 75% threshold at the lowest recognition or detection speed.
16.3
Results
Motion direction discrimination Global motion of dots moving in the same direction amidst background random direction dot motion had a normal direction discrimination at a threshold below ~35% coherence of direction. Some patients with unilateral cerebral lesions had directional defects toward the side of the lesion with thresholds from 38% to 93% (Barton et al., 1995). Some patients were found to have retinotopic defects of motion direction discrimination in four directions for RDC presented in the contralateral visual fields. Patients with defective motion-direction discrimination had lesions that overlapped at the lateral occipitotemporal lobe junction and involve Brodmann cortical areas 19 and 37 or underling white matter (Barton et al., 1995), providing genuine functional evidence that supports imaging evidence of its role as the homologue of simian area V5 (MT/MST). Form perception: impairment of the shape distortion effect Some patients with unilateral cerebral infarcts or hemorrhages showed a subnormal shape distortion effect in all areas of their intact field, both ipsilateral and contralateral to their lesions. Some of them also had impaired perception of intermixed control shapes, constituting a subtle metamorphopsia. In patients with a subnormal distortion effect, brain lesions
340
Motion, Form and Form, from Motion
Figure 16.3: Motion-defined letters. The letter "Z" is camouflaged when dots are stationary (A) Movement of letter dots rightward and background dots leftward makes the letter visible to the eye, but not to the still camera that made the figure (B). When only letter dots are present and moving (C) or stationary, the letter is visible to a camera. overlapped in the posterior part of the mid and inferior temporal lobe, encompassing cortex and white matter of Brodmann area 37, the posterior aspect of area 21, the ventrocaudal portion of area 39, and white matter deep to areas 36 and 37 and the fusiform and lingual gyri on the mesial surface of the temporal lobe. Patients with more anterior or posterior lesions showed a distortion comparable to that of control subjects (Rivest et al., 2004). Motion-defined form Some patients, who could detect coherent motion and discriminate its direction were unable to recognize MD letters, even though they would detect
James A. Sharpe, Ji Soo Kim, and Josee Rivest
341
them (designated as a type I loss). This is not an alexia because they recognize contrastdefined letters (Regan et al., 1992). Other patients could neither detect nor recognize MD letters (type II loss). All patients could discriminate global motion direction. Patients with MD perception defects had lesions in parietotemporal white matter or cortex and white matter at the level of Brodmann areas 18,19,27,39, and 21. Lesions would be either right or leftsided. Patients with lesions in other areas had normal perception of MD letters. Defective detection of form derived only from motion was caused by lesions at the lateral parietotemporo-occipital junction (Regan et al., 1992).
16.4
Discussion
Impairment of motion direction discrimination Lesions of lateral occipitotemporal cortex or white matter, near the junction of Brodmann areas 19 and 37 (V5/MT), are associated with defective discrimination of motion direction (Barton et al., 1995), indicating its role in the magnocellular pathway (table 16.1). These discrimination defects have been demonstrated with tests that require a global averaging of motion vectors, or a reduction in motion noise. Damage to this area can cause retinotopic motion direction discrimination defects, which are restricted to regions of the contralateral visual hemifield (Barton et al., 1995). This constitutes a field deficit for motion-direction discrimination, despite intact detection of static objects and global motion. These deficits are uncommon because most occipitotemporal lesions also disrupt striate cortex or the optic radiation, causing homonymous field defects to all stimuli and masking the motion defect (Barton et al., 1997). However, selective damage to area V2, but not VI or V5, has been reported to impair first-order motion direction discrimination in a retinotopic pattern, while sparing static perception and second-order motion perception based on texture contrast (Vaina et al., 2000). Lesions in this area also impair discrimination of horizontal motion direction in central vision, even when the display is viewed entirely in the ipsilateral hemifovea (Barton et al., 1995). These defects are usually specific for motion towards the side of the lesion (Barton et al., 1995) (ipsidirectional), though unilateral hemisphere lesions can cause bidirectional defects. Such contralateral retinotopic defects and the ipsidirectional defects had also been identified for smooth-pursuit eye movements in humans and monkeys after MT+(V5) lesions (Morrow and Sharpe, 1993). Direction-selective defects of motion perception and also wrong direction perception of motion have also been shown in other patients with unilateral posterior hemispheric lesions (Blanke et al., 2003). Moreover, electrical stimulation of posterior cerebral regions of normal subjects can transiently impair motion perception selectively in one direction (Blanke et al., 2002). Functional imaging in humans suggests that the retinotopic discrimination of motion activates an MT homologue in area V5, and that direction-specific nonretinotopic processing activates a distinct homologue of area MST in area V5 (Huk et al., 2002). Perception of shape distortion The shape distortion effect is impaired after damage to the posterior part of the middle and inferior temporal gyri (Kim et al., 2000). The
342
Motion, Form and Form, from Motion Magnocellular (M) pathway Function
Cortical projection Simian areas
Human areas
Motion pathway "Where pathway" direction, speed, orientation action Dorsal stream MT (V5): middle temporal region of superior temporal sulcus MST (V5a): medial superior temporal VIP: ventral intraparietal Lateral occipitotemporal cortex (junction of Brodmann areas 19, 37, and 39) on ascending limb of inferior temporal sulcus (hMT/V5+)
Parvocellular (P) pathway Form pathway "What" pathway form, color, pattern recognition (perception) Ventral stream Inferotemporal (IT) cortex
Uncertain
Table 16.1: Comparison between magnocellular and parvocellular pathways.
posterior and inferior temporal lobe has been known to be critical for shape perception. In monkeys, lesions to the posterior temporal lobe (area TEO) induce deficits in discrimination of patterns and objects (Kikuchi and lawai, 1980). The shape distortion effect occurs even when the priming and test stimuli are presented in different positions of the visual field and is independent of the luminance, color, or apparent motion of the priming test shapes (Suzuki and Cavanagh, 1998). These features match properties of cells in simian IT cortex where the ventral stream projects (Ito et al., 1995). Receptive fields of neurons in the IT cortex and adjacent superior temporal sulcus tend to cover the entire visual fields. They are not retinotopically organized and are shape selective but position invariant (Ito et al., 1995). The nonretinotopic shape distortion effect has been postulated to originate from a repulsive aftereffect in a shape dimension, causing a test shape to appear "repelled away" from a priming shape in the direction of greater dissimilarity (Suzuki and Cavanagh, 1998). We found that the shape distortion effect can be lost with hemispheric lesions on either side (Kim et al., 2000). Though right-sided lesions were more common than left-sided ones, exclusion of patients with aphasia from left hemispheric lesions during patient selection may underlie these findings. Some patients failed to discriminate between circles and ellipses or showed subnormal response to ellipses during the control trials (Kim et al., 2000). This type of deficit constitutes a subtle metamorphopsia. These results provide evidence that the shape-distortion effect depends on the coding of cells in the P, ventral, stream of processing (table 16.1).
James A. Sharpe, Ji Soo Kim, and Josee Rivest
343
Detection and recognition of form from motion Selective loss of the ability to recognize MD letters, while the ability to detect those same letters is spared, together with a spared ability to detect coherent motion and discriminate its direction (type I loss) is an agnosia for MD form. Loss of ability both to recognize and to detect MD letters, while the ability to detect coherent motion and discriminate its direction is spared (type II loss), is a more elementary defect in perception of MD form (Regan et al., 1992). Failure to recognize MD letters correlates with extensive lesions in parietotemporal white matter underlying Brodmann cortical areas 18, 19, 37, 39, 21, and 22. The lesions can be in the left or right hemisphere (Regan et al., 1992). Another study subsequently also identified deficits in form from motion in two patients with bilateral parietal lesions (Schenk and Zihl, 1997). Functional imaging reveals that both the motion-sensitive MT complex and the object sensitive lateral occipital complex (LOG) are activated by MD form (Ferber et al., 2003). MEG indicates the sequence the of processing of form from motion: first in MT/V5, then LO and IT cortex (Schoenfeld et al., 2003). Conclusions are as follows: 1. The loss of the ability of patients with parietotemporal damage to recognize MD letters is specific to the fact that they are MDletters rather than being a general loss of letter-recognition ability. 2. This visual loss is specific to MD form rather than being a general failure of motion processing. 3. The visual loss is not produced by lesions that did not involve the localized cerebral region specified above (Regan et al., 1992). To explain the existence of type I loss (detection without recognition) and of type II loss (loss of detection and recognition) with the sparing of the detection and discrimination of coherent motion, it is proposed that motion information is processed hierarchically. The magnocellular motion and the parvocellular color/form pathways are apparently disconnected in patients with selective loss of recognition of MD form.
Acknowledgments This work was supported by Canadian Institutes of Health Research (CIHR) grants ME 5509 and MT 15362 (JAS), and by an Elizabeth Barford Award, University of Toronto (JSK, JAS). We gratefully acknowledge Drs. Martin Regan, Debbie Giashi, Jason Barton, and Jane Raymond for earlier work described in this chapter.
References Ahlfors, S. P., Simpson, G. V., Dale, A. M., Belliveau, J. W., Liu, A. K., and Korvenoja, A. (1999). Spatiotemporal activity of a cortical network for processing visual motion revealed by MEG and fMRI. J. Neuropkysiol., 82: 2545-2555.
344
Motion, Form and Form, from Motion
Barton, J. J. S., Sharpe, J. A. and Raymond, J. E. (1995). Retinotopic and directional deficits in motion discrimination in humans with cerebral lesions. Ann. NeuroL, 37: 665-675. Barton, J. J. S. and Sharpe, J. A. (1997). Motion direction discrimination in blind hemifields. Ann. NeuroL, 41: 255-264. Barton, J. J. S., Sharpe, J. A. and Raymond, J. E. (1996). Directional defects in pursuit and motion perception in humans with unilateral cerebral lesions. Brain, 119: 1535-1550. Blanke, O., Landis, T., Mermoud, C, Spinelli, L. and Safran, A. B. (2003). Directionselective motion blindness after unilateral posterior brain damage. Eur. J. Neurosci., 18: 709-722. Blanke, O., Landis, T., Safran, A. B. and Seeck, M. (2002). Direction-specific motion blindness induced by focal stimulation of human extrastriate cortex. Eur. J. Neurosci., 15: 2043-2048. Braddick, O. J., O'Brien, J. M., Wattam-Bell, J., Atkinson, J. and Turner, R. (2000). Form and motion coherence activate independent, but not dorsal/ventral segregated, networks in the human brain. Current Biol., 10: 731-734. Corbetta, M., Miezin, F. M., Dobmeyer, S., Shulman, G. L. and Petersen, S. E. (1991). Selective and divided attention during visual discriminations of shape, color, and speed: Functional anatomy by positron emission tomography. /. Neurosci., 11: 2383-2402. Deyoe, E. A. and Van Essen, D. C. (1988). Concurrent processing streams in monkey visual cortex. Trends Neurosci., 11: 219-226. Dursteler, M. R. and Wurtz, R. H. (1988). Pursuit and optokinetic deficits following chemical lesions of cortical areas MT and MST. /. Neurophysiol., 60: 940-965. Ferber, S., Humphrey, G. K. and Vilis, T. The lateral occipital complex subserves the perceptual persistence of motion-defined groupings. Cerebral Cortex, 13: 716721. Giaschi, D., Regan, D., Kothe, A., Hong, X. H. and Sharpe, J. A. (1992). Motiondefined letter detection and recognition in patients with multiple sclerosis. Ann. NeuroL, 31: 621-628. Ferrera, V. P., Nealy, T. A. and Maunsell, J. H. (1992). Mixed parvocellular and magnocellular geniculate signals in visual area V4. Nature, 358: 756-761. Goodale, M. A. and Milner, A. D. (1992). Separate visual pathways for perception and action. Trends. Neurosci., 15: 20-25. Huk, A. C., Dougherty, R. F. and Heeger, D. J. (2002). Retinotopy and functional subdivision of human areas MT and MST. J. Neurosci., 22: 7195-7205. Humphrey, G. K., Goodale, M. A., Corbetta, M. and Aglioti, S. (1995). The McCollough effect reveals orientation discrimination in a case of cortical blindness. Curr. Biol., 5: 545-551.
James A. Sharpe, Ji Soo Kim, and Josee Rivest
345
Ito, M., Tamura, H., Fujita, I. and Tanaka, K. (1995). Size and position invariance of neuronal responses in monkey inferotemporal cortex. /. Neurophysiol., 73: 218-226. Kikuchi, R. and lawai, E. (1980). The locus of the posterior subdivision of the inferotemporal visual learning area in the monkey. Brain Res., 198: 347-360. Kim, J. S., Rivest, L, Suzuki, S., Intrilligator, J. and Sharpe, J. A. (2000). The shape distortion effect after cerebral hemispheric lesions. Invest. Ophthalmol. Vis. ScL, 41:8216. Livingstone, M. S. and Hubel, D. H. (1987). Psychophysical evidence for separate channels for the perception of form, color, movement, and depth. /. Neurosci., 7: 3416-3468. Livingstone, M. S. and Hubel, D. H. (1988). Segregation of form, color, movement and depth: Anatomy, physiology and perception. Science, 240: 740-749. Maunsell, J. H. R., Nealy, T. A. and Depriest, D. D. (1990). Magnocellular and parvocellular contributions to responses in the middle temporal visual area (MT) of the macaque monkey. /. Neurosci., 10: 3323-3334. Milner, A. D. and Goodale, M. A. (1993). Visual pathways to perception and action. Prog. Brain Res., 95: 317-337. Morrow, M. J. and Sharpe, J. A. (1990). Cerebral hemispheric localization of smooth pursuit asymmetry. Neurol., 40: 284-292. Morrow, M. J. and Sharpe, J. A. (1993). Retinotopic and directional deficits of smooth pursuit initiation after posterior cerebral hemispheric lesions. Neurol., 43: 595603. Nakamura, H., Kashii, S., Nagamine, T., Matsui, Y., Hashimoto, T., and Honda, Y. (2003). Human V5 demonstrated by magnetoencephalography using random dot kinematograms of different coherence levels. Neurosci. Res., 46: 423-433. Nealy, T. A. and Maunsell, J. H. R. (1994). Magnocellular and parvocellular contributions to the responses of neurons in macaque striate cortex. /. Neurosci., 14: 2069-2079. Newsome, W. T. and Pare, E. B. (1988). A selective impairment of motion perception following lesions of the middle temporal visual area (MT). /. Neurosci., 8: 22012211. Newsome, W. T., Wurtz, R. H., Dursteler, M. R. and Mikami, A. (1985). Deficits in visual motion processing following ibotenic acid lesions of the medial temporal visual area of the macaque monkey. J. Neurosci., 5: 825-840. Orban, G. A., Dupont, P., Vogels, R., De Bruyn, B., Bormans, G. and Mortelmans, L. (1996). Task dependency of visual processing in the human visual system. Behav. Brain Res., 76: 215-223. Regan, D., Giaschi, D., Sharpe, J. A. and Hong, X. H. (1992). Visual processing of motion-defined form: Selective failure in patients with parietotemporal lesions. J. Neurosci., 12: 2198-2210.
346
Motion, Form and Form, from Motion
Regan, D., Sharpe, J. A. and Kothe, A. C. (1991). Recognition of motion-defined shapes in patients with multiple sclerosis and optic neuritis. Brain, 114: 11291155. Rivest, J., Kim, J. S., Intrilligator, J. and Sharpe, J. A. (2004). Effects of aging on shape perception distortion. Gerontology, 50: 142-151. Sary, G., Vogels, R. and Orban, G. A. (1993). Cue-invariant shape selectivity of macaque inferior temporal neurons. Science, 260: 995-997. Schenk, T. and Zihl, J. (1997). Visual motion perception after brain damage: II. Deficits in form-from-motion perception. Neuropsychologia, 35: 1299-1310. Schoenfeld, M. A., Woldorff, M., Duzel, E., Scheich, H., Heinze, H. J. and Mangun, G. R. (2003). Form-from-motion: MEG evidence for time course and processing sequence. J. Cog. Neurosci., 15: 157-172. Seiffert, A. E., Somers, D. C., Dale, A. M. and Tootell, R. B. (2003). Functional MRI studies of human visual motion perception: Texture, luminance, attention and after-effects. Cerebral Cortex, 13: 340-349. Sunaert, S., Van Hecke, P., Marchal, G. and Orban, G. A. (1999). Motion-responsive regions of the human brain. Exp. Brain Res., 127: 355-370. Suzuki, S. and Cavanagh, P. (1998). A shape-contrast effect for briefly presented stimuli: Human perception and performance. /. Exp. Psychol., 24: 1315-1341. Vaina, L. M., Cowey, A., Eskew, R., Lemay, M. and Kemper, T. (2004). Anatomical correlates of global motion perception: Evidence from unilateral cortical brain damage. Brain, in press. Vaina, L. M., Soloviev, S., Bienfang, D. C., Cowey, A. and lawai, E. (2000). A lesion of cortical area V2 selectively impairs the perception of the direction of first-order visual motion. Neuroreport, 11: 1039-1044. Zeki, S. (1990). A century of cerebral achromatopsia. Brain, 113: 1721-1777.
17. The Effect of Diverse Dopamine Receptors on Spatial Processing in the Central Retina: A Model Ivan Bodis-Wollner and Areti Tzelepi Ganglion cells of the mammalian retina receive input from the intratretinal vertical pathways (photoreceptors - bipolar cells - ganglion cells) and lateral pathways (photoreceptors - horizontal cells - bipolar cells - amacrine cells) of retinal interneurons. In the mammalian there is no evidence of descending control of the retina from intracranial neurons. Ganglion cell responses can be easily recorded from the optic nerve or tract. On the other hand, the three classes of retinal interneurons (horizontal, amacrine, and bipolar cells) are not easily accessible, and recordings are further limited by the small size of neurons, especially in higher vertebrates. However, recording of the massed activity of retinal ganglion cells may give indications of preganglionic responses. To derive an "inverse solution" for the activity of the preganglionic retina from retinal ganglion cell output requires assumptions, derived from experimental data. In this chapter we take advantage of the results of experimental dopaminergic manipulations of the preganglionic retina and propose a model concerning dopamines push-pull action on shaping receptive field properties of monkey retinal ganglion cells, with clinical- pathophysiological evidence linking the proposed model to the role of dopamine in the human retina.
17.1 Retinal Circuitry Together with photoreceptors and the ganglion cells, the horizontal, bipolar, and amacrine neurons make up the complex network of the mammalian retina. They provide the signals to shape the receptive field of ganglion cells. Ganglion cells have a characteristic antagonistic center-surround receptive field organization (Kuffler, 1953; Rodieck and 347
348
Dopamine and Retinal Processing
Figure 17.1: The receptive field model based on signal pooling and weighting of the rapidly adapting retinal ganglion cell receptive field, as described by Enroth-Cugell and Robson (1966). Upper diagram illustrates the concentric center and surround region. The model relies on linear interaction between "center" and "surround" organization. Signals from the center (C) and surround (S) have an antagonistic effect on the ganglion cell, expressed by the opposite sign of the C and S signal, either on center (C+) or an off-center (C—). Lower diagrams show the Gaussian profiles, assumed to describe the sensitivity of the center and surround. The bars drawn below the center and the surround weighting functions are 5rc and 5rs respectively. Reprinted from EnrothCugell and Robson (1966) with permission of The Physiological Society. Stone, 1966; Enroth-Cugell and Robson, 1966) (figure 17.1). Based on studies on lower vertebrates, horizontal cells are thought to dominate in the surround organization, while their direct influence on the center is thought to be comparatively negligible (Mangel and Dowling, 1985). Horizontal cells form local circuits, summing information from a wide spatial area in the outer plexiform layer through their lateral interconnections. These interconnections are electrotonic in many species and were shown in the turtle to be controlled by Dl receptors. A feedback connection from horizontal cells to the photoreceptors has also been described in lower vertebrates (Baylor etal., 1971;DjamgozandKolb, 1993). The bipolar cells, driven by the photoreceptor input, include an inhibitory response from a large area in the retina, resulting in their surround organization. Bipolar cells represent the more direct pathway from photoreceptors to ganglion cells, carrying information from the outer to inner plexiform layer. Anatomical studies using various techniques revealed at least nine different types of bipolar cells (Boycott and Wassle, 1991; Kolb et al., 1992) in the human retina, eight related to cones and one related to rods. Five of them receive convergent information from cones and are known as diffuse cone bipolars. Three cone bipolar types appear to have single cone contacts in a one-to-one relationship and are known as midget bipolars. The center of their receptive field appears to be directly connected to the cones. Receptive field surrounds at the bipolar cell level have also been described in the nonmammalian retina (Werblin and Dowling, 1969; Matsumoto and Naka, 1972; Schwartz, 1974), originating from
Ivan Bodis-Wollner and Areti Tzelepi
349
horizontal cell feedback connections (Kamermans and Spekreijse, 1999). Recently, Dacey et al. (2000) demonstrated evidence of surround organization of bipolars in the monkey. Interestingly, they showed that although both midget and diffuse bipolar types are characterized by center-surround organization, small "midget" bipolars are more center dominated, while "large" diffuse bipolars are more surround dominated. Amacrine cells in the inner plexiform layer belong to the lateral interconnecting network in the pathway from photoreceptors to ganglion cells (Mariani, 1990; Kolb et al., 1992). Similar to horizontal cell feedback connections in the outer plexiform layer, amacrine cells support feedback mechanisms in the inner plexiform layer. Consequently, amacrine cells also participate in the formation of the surround receptive field in bipolars. From neuroanatomical studies, at least 25 morphologically different amacrine cell types have been identified in the primate retina. Dopaminergic cells are found in all vertebrates, including primates. In fish (Lasater and Dowling, 1985) and turtle (Piccolino et al., 1989) retina, dopaminergic cells are found to influence significantly horizontal cell activity by uncoupling horizontal cell junctions. In the mammalian retina, a similar but weaker effect has been reported (Xin and Bloomfield, 1999). However, in mammals, amacrine cells are dopaminergic and their effects are expected to be more pronounced in the inner plexiform layer. In the monkey and in some lower species two types of dopaminergic amacrines exist: their morphology and their vulnerability to l-methyl-4-phenyl-l, 2, 5, 6 tetrahydropyridine (MPTP) distinguish them. They may influence other amacrines in a lateral interconnecting circuit in the inner plexiform layer or connect other neurons, in a feedback manner, via paracrine release of dopamine (DA). Additionally, they may electrotonically interact with each other, as seen with electrotonic cell junctions of horizontal cells in the outer plexiform layer. Some amacrines may uncouple interconnected amacrine cells, as has been observed with the All cell type (Vaney, 1990, 1994). In the mammalian retina, various DA receptors have been identified in several levels, falling in two major classes: Dl and D2 receptors (McGonigle et al., 1988; Denis et al., 1990; Schorderet and Nowak, 1990; Strormann et al., 1990). Ganglion cells are larger than the preceding interneurons and through their axons, which form the optic nerve, the visual information is passed to higher brain centers. A great deal of our knowledge of ganglion cells came from studies in the cat, which is probably the most extensively studied retina in mammals. There are different types of ganglion cells in the cat retina. A widely accepted morphological classification scheme, as proposed by Boycott and Wassle (1974), describes the most common ganglion cells into four different cell types; alpha, beta, gamma, and delta. This morphological classification has been related to the physiological classification in X, Y, and W cells (Boycott and Wassle, 1974; Enroth-Cugell and Robson, 1966; Cleland and Levick, 1974). The correspondence between morphology and physiology in independent pathways hints to a parallel vertical organization in the retina. Enroth-Cugell and Robson (1966) described the existence of two ganglion cell types, X and Y, with different spatiotemporal characteristics. Their "null" test, the introduction and withdrawal of a sinusoidal grating at ±90 ° relative to the receptive field center, left no doubt about X ganglion cells linear response versus Y ganglion cells nonlinear response. Subsequent studies further supported a correspondence between alpha and Y cells, and beta and X cells. W cells revealed various morphological
350
Dopamine and Retinal Processing Cat
Primate
a Y Parasol M
(3 X Midget P
7
6 W
Morphology Physiology Morphology Physiology
Table 17.1: Morphological and physiological correlates of the most common ganglion cell types hi the cat and primate retina. types including gamma and delta cells (Fukuda et al., 1984, 1985). Another classification scheme was proposed by Kolb et al. (1981), who described morphological types of ganglion cells including and going beyond the gamma and delta cells, named from G4 to G23 (from smallest types, G4, to largest types, G23). In the human retina, neuroanatomical, but not neurophysiological, studies have revealed at least 16 morphological types of ganglion cells. Almost all of them correspond to ganglion cells in the cat retina. The most common ganglion cell types in human are the parasol and the midget ganglion cells (Rodieck et al., 1985; Kolb et al., 1992; Dacey, 1993) which correspond to the cat a and ft cells, respectively. They are also known as P cells (midget ganglion cells) because they project to the parvocellular layers of the LGN, and M cells (parasol ganglion cells) because they project to the magnocellular layers of the LGN (Shapley and Perry, 1986; Kaplan and Shapley, 1986). P and M systems mediate different signal properties. M cells have larger receptive fields and respond to large stimuli concerned more about gross features and movement. P cells have smaller receptive fields. They respond to smaller stimuli and are thought to process fine detail and color. The most common ganglion cell types in the cat and the primate retina and then- correspondence are summarized in table 17.1. The spatial arrangement of the two ganglion cell types is not random. In the cat, small beta cells show high concentrations in the central part of the retina, converging information from only a few cones. In humans, this concentration of P or midget cells in the fovea can reach a one-to-one relationship with cones. Large alpha cells dominate mostly in the periphery, but with a much lower convergence ratio.
17.2
Receptive Fields of Ganglion Cells
The output of ganglion cells is based on the antagonism between the center and the surround of their circular receptive field. Kuffler (1956), using small spots of light, showed that these excitatory and inhibitory areas are organized concentrically and were best described as a circular center surrounded by a ring of opposite polarity. This mechanism has been studied extensively using small spots of light as well as sinusoidal grating patterns. Enroth-Cugell and Robson (1966), and Rodieck (1965), proposed a model of the experimental results based on the difference of two Gaussian functions (figure 17.1). The center and the surround of the receptive field can be described by one Gaussian, as defined by its radius r and strength w, respectively. Each Gaussian peaks at the receptive field center, and their interaction is subtractive. As an extension of Kuffler's concept, it was shown that "center" and "surround" are coex-
Ivan Bodis-Wollner and Areti Tzelepi
351
tensive. This model cannot account for ganglion cells responses in the central retina, without considering nonlinarities expressed in the cells called "Y" in the cat. The "Y" has the basic structure of the X cells of the cat retina with an addition of nonlinear subunits, which may be most important for temporal processing (Victor, 1988). The linear model, however, does give a satisfactory explanation for the most characteristic property of retinal ganglion cells: spatial tuning. Different ganglion cells respond optimally to different stimulus sizes. It is thought that center summation, with little influence of the subtractive surround, mediates the high-spatial-frequency leg of the contrast sensitivity curve (see below), while with decreasing spatial frequencies large ganglion cells with significant surrounds crucially determine center-surround interaction and hence the final output of the ganglion cell. Spatial tuning of individual ganglion cells is reflected in the contrast sensitivity function that has an inverted "U" shape. The contrast sensitivity function expresses the minimum contrast required to detect a stimulus as a function of its size under equiluminant conditions. The overall contrast sensitivity curve, obtained with psychophysical methods in either human or monkey, initially increases with spatial frequency, reaches a peak around 5 cpd, and then it falls off sharply with increasing spatial frequency. The difference-of-Gaussians model can successfully describe the contrast sensitivity function. The striking similarity of the contrast sensitivity curve to the X ganglion cell response raises the question of whether or not contrast sensitivity simply reflects the profile of a single group of cells. Abnormal contrast sensitivity functions (Bodis-Wollner, 1972; Bodis-Wollner and Camisa, 1980) in clinical conditions affecting practically any part of the visual pathway have revealed losses only partially affecting the contrast sensitivity curve, suggesting selective damage of different pathways. It is unknown, however, whether selective spatial frequency losses, such as shown in multiple sclerosis (Regan et al., 1991), may cause optic nerve deficits, i.e., affect retinal ganglion cell properties.
17.3
Retinal Processing and Dopamine's Action
Retinal ganglion cell processes in vivo in humans can be studied by recording the pattern electroretinograms (PERG). This electrophysiological measurement predominantly represents the massed activity of diverse ganglion cells of the central retina (Maffei and Fiorentini, 1986) and only indirectly the preganglionic processing in the retina. In monkeys, optic nerve section results in the loss of the PERG, while diffuse light still elicits an ERG response (Maffei et al., 1985). PERG studies using sinusoidal grating stimuli have clearly shown that the PERG does depend on the spatial frequency of the eliciting stimulus, and furthermore that it actually responds as a spatial transfer tuning function, similarly to the contrast sensitivity curve in monkey and human. The peak of the spatial transfer function and its descending limbs are differentially vulnerable to neuronal pathology (Marx et al., 1988; Ghilardi et al., 1988; BodisWollner, 1996; Tagliati et al., 1996). The contrast sensitivity decrease at low-spatialfrequencies can be attributed to decreasing contribution of low-spatial-frequency neurons in the central retina, because of either their scarcity or the dominating surround mechanisms, or both. The peak of the curve may represent the optimal interaction between center and surround while the exponential contrast sensitivity decrease at higher
352
Dopamine and Retinal Processing
spatial frequencies can be related to decreasing summation in the center mechanism as stimuli become smaller. Dopamine has a differential effect on the two limbs of the PERG spatial transfer function. The two major classes of dopamine receptors, Dl and D2, although acting antagonistically at the cellular level (Picollino et al., 1987), act synergistically at the overall-response level to establish the tuned spatial transfer function in the retina. More specifically, we suggested that dopamine modulates spatial contrast by a "push-pull" action on center and surround mechanisms of different ganglion cells in the primate retina (Bodis-Wollner and Tzelepi, 1998). In the following pages we first summarize previous experimental results using different dopamine (DA) manipulations. Based on experimental data, we propose a model for the balancing action of dopamine in shaping spatial processing in the central retina of monkey and humans.
17.4
Dopaminergic Effects on the PERG in the Monkey
17.4.1 Retinal Spatial Timing in the MPTP Primate Model One method for dopamine depletion in the macaque retina is by MPTP treatment. It has been shown that MPTP causes a Parkinsonian syndrome in monkey and humans via its oxidation product, MPP+ (Nicklas et al., 1987). Histological and neurochemical findings suggest that MPTP decreases the release of dopamine, and that it is selectively toxic to the pigmented dopaminergic neurons of the pars compacts of the substantia nigra in monkeys (Burns et al., 1983; Ghilardi et al., 1988), similar to the effects observed in humans (Davis et al., 1979). The monkey model is behaviorally and pharmacologically nearly identical to the human Parkinson's disease (Burns et al., 1983). The destructive effect of MPTP on DA retinal neurons was first shown in the rabbit (Wong et al., 1985). Using systemic MPTP treatment (5mg/kg), monkeys develop a bilateral Parkinsonian syndrome. Simultaneous recordings of VEPs and PERG reveal spatial frequency dependent losses in monkeys treated with MPTP (Ghilardi et al., 1988). Administration of L-Dopa with carbidopa significantly improved the VEP and PERG (figure 17.2). The tuning ratio, which is defined as the ratio of the amplitude to the peak response (around 3-4 cpd) divided by the response amplitude to the low spatial frequency (0.5 cpd), provides a convenient estimate of retinal spatial tuning. The tuning ratio, after MPTP treatment, decreased on the average by a factor of 2. PERG responses to 2.5 and 3.5 cpd stimuli became abnormal, while responses at 0.5 and 1.2 cpd were less impaired. The average tuning ratio of five monkeys was 0.88 before (baseline condition) and 0.38 after MPTP treatment; i.e., systematically administered MPTP changed spatial tuning on the average by a factor of 2.3.
17.4.2 The Effects of Intravitreal 6-OHDA on Spatial Tuning Another method for dopamine depletion in the macaque retina is based on injection of the neurotoxin 6-hydroxydopamine (6-OHDA) into the vitreous of the eye (Ghilardi et al., 1989). This drug destroys dopaminergic neurons in the nigrostriatal system and decreases dopaminergic neurotransmission (Ungerstedt and Arbuthnott, 1970). The
Ivan Bodis-Wollner and Areti Tzelepi
353
Figure 17.2: Spatial frequency-dependent effect of dopamine in the transient PERG of the macaque retina. For each row, top represents the PERG in the dopamine deficient monkey model of Parkinson's disease, induced by MPTP; bottom represent the transient PERG after administration of L-Dopa with carbidopa. A; Responses 20 days following the initial MPTP treatment. B: Responses 40 days following the initial MPTP treatment. Notice the transient and spatial-frequency-dependent effect of levodopa: it increased PERG amplitude (and slightly increased latency) for "peak" spatial frequencies without much effect at low spatial frequencies. Reprinted from Ghilardi, M. G., Bodis-Wollner, I., Onofrj, M. C, Marx, M. S. and Golver, A. A. (1988) "Spatial frequency-dependent abnormaliities of the pattern electroretinogram and visual evoked potentials in a Parkinsonian monkey model." Brain, 11: 131-184. Copyright Oxford University Press.
retinal effect of 6-OHDA was demonstrated in the carp (Cohen and Dowling, 1983), in the turtle (Witkovsky et al., 1987), and in the rabbit (Oliver et al., 1986). Uniocular 6-OHDA administered in the monkey intravitreally also causes spatial frequency selective deficits. Stimuli of the same spatial frequency as in the MPTP study were used to compare results. The same effect was observed: PERG and VEP responses were significantly attenuated in response to 2.5 and 3.5 cpd stimuli while responses at lower spatial frequencies (0.5 cpd and 1.2 cpd) were less impaired after repeated treatments. The final results show a profound effect of 6-OHDA in three monkeys on the responses at 3.5 cpd patterns. The spatial tuning ratio became less than one in all three monkeys. The tuning ratio was changed by a factor of 2,5, and 3, respectively, in the three monkeys. Thus the effects of systemically administered MPTP on the PERG and the VEP are similar to the effects of intravitreal injections of 6-OHDA. However, repeated administration of 6-OHDA has a more profound effect on spatial tuning than MPTP does.
354
Dopamine and Retinal Processing
Figure 17.3: Effect of L-sulpiride on the PERG spatial transfer function for one monkey. At a lower dose, L-sulpiride attenuates peak spatial frequency responses. At a higher dose, lower spatial frequencies are also affected. The plot represents the average of three runs for each spatial frequency. Reprinted from Tagliati, M., Bodis-Wollner, I., Kovanecz, I., Stanzione, P. (1994) "Spatial frequency tuning of the monkey pattern ERG depends on D2 receptor-linked action of dopamine." Vision Res., 34: 2051-2057. Copyright 1994 with permission from Elsevier.
17.4.3 The Effect of the D2 Receptor Blocker L-Sulpiride on Spatial Frequency Tuning Besides dopaminergic neuronal toxins, three different types of dopamine receptorligands in the monkey were used. L-sulpiride, a D2 antagonist (Tagliati et al., 1994), was systematically administered in three monkeys. The concentration of L-sulpiride was varied to evaluate whether there was a dose-dependent effect of the drug. For this reason, we separately explored the effects of 0.07 and 0.35 mg/kg, in separate experiments. The PERG was recorded before and after the administration of L-sulpiride. The higher dose suppressed responses at all spatial frequencies. The lower dose attenuated responses only at the peak of the spatial frequency curve, resulting in a decrease of the tuning ratio (by a factor of 2.5,1.7, and 3.2) in the three monkeys (figure 17.3).
17.4.4
The Effect of CY 208-243, a Dl Agonist, on Spatial Tuning
The PERG responses to a range of spatial frequencies between 0.5 and 6.9 cpd were recorded before and after the administration of a Dl receptor agonist, CY 208-243. Following drug administration, the responses to the low-spatial-frequency stimuli were
Ivan Bodis-Wollner and Areti Tzelepi
355
Figure 17.4: The effect of a Dl agonist, CY208-243, on the PERG responses on two monkeys, Marcy and Charlie. Notice that this Dl agonist suppressed selectively low spatial frequencies. The results suggest that Dl and D2 receptors may not be evenly distributed as far as the circuitry of center and surround spatial pooling is concerned. Reprinted from Peppe, A., Antal, A., Tagliati, M., Stanzione, P., Bodis-Wollner, I. (1998) "Dl agonist CY208-243 attentuates the pattern electroretinogram to low spatial frequency stimuli in the monkey." Neurosci. Lett., 243: 5-8. Copyright 1998 with permission from Elsevier. significantly decreased. As can be seen in figure 17.4, the amplitude of the responses at 0.5 cpd after drug administration was at the noise level, while the responses to 2.3 cpd remained almost intact. The suppression of low-spatial-frequency responses after using a Dl agonist is opposite to the effect of the low-dose D2 antagonist L-sulpiride, which attenuated middle-spatial-frequency responses.
17.4.5 Synthesis of Experimental Results From the experiements, we observed that different manipulation of DA mechanisms revealed three different kind of changes in the PERG spatial tuning function (table 17.2). First, we observed similar results from the PERG experiments with 6-OHDA and MPTP. They showed that dopamine depletion resulted in decreased amplitude at peak spatial frequencies. A slight increase in low spatial frequencies was also observed; however, it was not significant. The loss of the bandpass nature of the spatial transfer function was not identical in the two experiments. The use of 6-OHDA yielded a more profound loss than MPTP did. The same effect was observed with a D2 blocker with low-dose L-sulpiride. This similarity suggested that a deficiency related to the D2 receptors attenuated medium-spatial-frequency responses, thereby rendering the spatial transfer function effectively a low-pass filter. It also showed that the degree of
356
Dopamine and Retinal Processing
Low spat. freq. Middle-high spat. freq.
MPTP
6-OHDA
n.s. |
n.s. j
Treatment D2 antagonist L-sulpiride Low High Dose Dose n.s. 1 1 1
Dl agonist CY208-243
i n.s.
= decrease, n.s. = not significant. Table 17.2: Summary of experimental results following different DA manipulations. loss varied, although the qualitative effect was the same. However, the similarity of presynaptic neuronal damage to the effects of low-dose L-sulpiride suggests that it is probable that D2 receptors are located postsynaptically in the preganglionic retina. The opposite effect was observed with CY208-243, a Dl agonist. The PERG spatial transfer function showed a supression of low-spatial-frequencies. It follows that a Dl deficiency would result in an increase of the response amplitude to lower-spatialfrequency stimuli. Hence, the response to dopamine, an agonist for both Dl and D2 receptors, would be expected to attenuate middle- and high-frequency responses, while affecting little or raising responses to low spatial frequency stimuli. In the monkey and in human Parkinson's disease patients we did observe a slight increase in sensitivity at low spatial frequencies. Finally, we observed a global loss in all spatial frequencies with a high dose of L-sulpiride, a D2 antagonist. The fact that a D2 antagonist in high concentration has a similar effect to a Dl agonist and also reduces low-spatial frequency responses argues for interaction of Dl and D2 pathways. This argument is also supported by the small, albeit not statistically significant, increase of low spatial frequencies observed with low-dose L-sulpiride. If low-affinity presynaptic D2 receptors are on the Dl pathway, dopamine and D2 agonists in high concentrations could provide inhibitory presynaptic input to the Dl pathway. This will result in increasing postsynaptic Dl action, leading to stronger surround contribution. Consistent with this interpretation is that blocking D2 input results in enhanced Dl response.
17.5
The Model
17.5.1
The Normal Retina
The PERG spatial tuning function could be represented with a model based on the difference of two Gaussian functions. Enroth-Cugell and Robson (1966) suggested that signal summation over a retinal ganglion cell can be modeled with the difference of two Gaussians representing the antagonistic relation between the center (C) and surround (S) organization of the ganglion cell receptive field. We realize however
Ivan Bodis-Wollner and Areti Tzelepi
357
Figure 17.5: Right: a model of the PERG spatial transfer function based on the difference of two Gaussian functions representing the signals for the center C(R) and surround S(R) of the 'equivalent' ganglion cell (Left). that Enroth-Cugell and Robson based their model on the sensitivity profile of a cell, while our experimental data represent a response profile elicited by higher contrast stimuli. We assume that the PERG spatial tuning function is an envelope of all ganglion cells in the central retina we stimulated. Given that the PERG represents ganglion cell responses (Maffei et al., 1985), this is a reasonable assumption. As a consequence of this assumption, we postulate that the spatial contrast response function of the PERG represents the response of the average of all participating retinal ganglion cells. While our model addresses the role of dopamine and diverse dopamine receptors, we do not exclude interactions of dopamine with different neurotransmitter systems of the retina. According to our assumption of an equivalent ganglion cell of the retina, the PERG spatial transfer function is modeled as shown in figure 17.5 A. The center and the surround of the "equivalent" ganglion cell is represented with two Gaussian functions (figure 17.5B). Each Gaussian is characterized by two parameters: its radius (RC/RS for the center and the surround, respectively) and its gain (GC/GS for the center and the surround, respectively):
17.5.2 The Dopamine-Deficient Retina We assume that spatial summation of signals C(R) and S(R) is altered in DA deficiency in a different way. The way and the amount that these signals will change are determined by two factors. The first one is related to the direct effect of Dl and D2
358
Dopamine and Retinal Processing
pathways in the gain (Gc or Gs), and/or radius (Rc or Rs) of the C and R signals in the deficient retina. The second one is related to the possible indirect effects of Dl and D2 in the gain and/or radius of these signals, i.e. if, and at what degree, Dl and D2 could interact with each other. In the model, Dl and D2 have a direct effect on the gain of S and C, respectively. In other words, a Dl agonist is expected to increase the gain of S, and a D2 antagonist is expected to decrease the gain of C. At the same time, the radii of C and S signals can also change. This is consistent with the known action of Dl receptors on horizontal cell coupling, albeit less is known of direct receptor coupling and receptive field center size change. We found that the best fit to the experimental data results when the signal's (C or S) radius changed in the opposite direction than the signal's gain. In this respect, when the gain of the surround increases, its radius is expected to decrease, and in the same way, when the gain of the center decreases, its radius is expected to increase. Again, this is a reasonable assumption as far as the surround size change is concerned: we expect higher gain when surround signals are concentrated in the local surround area of a ganglion cell. It is unknown, however, why center size and center gain behave oppositely. To account for this, one could possibly assume receptor to receptor reciprocal inhibition. Finally, we assumed that Dl and D2 pathways are not independent from each other, but they interact. This interaction is based on reciprocal inhibition. As a consequence, blocking the receptors from one of these pathways is expected to lead to an enhanced response of the other, and vice versa. L-sulpiride mainly blocks the D2 pathway. Then, according to the model, the gain of the center decreases and its radius increases. At the same time, because of the lateral inhibition, blocking the D2 receptors leads to an enhanced Dl response with a higher surround gain and a smaller surround radius. Figure 17.6A shows the response profile of the "equivalent" ganglion cell, after implementing these changes to the model. The result is similar to the experimental data. At low dose, the modeled response does not show a bandpass profile but becomes a low-pass filter suppressing middle to high spatial frequencies, and slightly increasing low spatial frequencies. At high doses, the modeled response decreased at all spatial frequencies. The tuning of the function at middle to high spatial frequencies decreased by a factor of 2-3, similarly to the experimental data. CY, as a Dl agonist, enhances the Dl pathway. Then, according to the model, the gain of the surround increases and its radius decreases. At the same time, because of the lateral inhibition, the response of the D2 pathway is expected to decrease, giving a lower center gain and a larger center radius. Figure 17.6B shows the effect of CY on the response profile of the model. The result is similar to the experimental data. The modeled response is significantly decreased at low spatial frequencies, while responses to middle and high spatial frequencies are preserved. A summary of the changes introduced to the model parameters according to the different DA manipulations is shown in table 17.3, together with the percentage of increase or decrease for each parameter. A closer examination of these values shows that the efficiency of Dl and D2 in altering the gain of the surround and the center, respectively, is not the same. The effect of D2 on the center's gain is highly dependent on the amount of administered dose. A low dose introduced only an 8% decrease of the center's gain, while a high dose introduced a decrease of 58%, a significant decrease. In
359
Ivan Bodis-Wollner and Areti Tzelepi
center surround
Re Gc Rs Gs
D2 antagonist Low Dose High Dose t44% |88% i -58% j-8% 0% j-2% T47% T53%
Dl agonist I 25% 1-8% J. - 29% T 133%
I = decrease, | = increase. Table 17.3: Percentage of increase or decrease introduced in the model parameters, following different DA manipulations. contrast, Dl was very effective in altering the surround's gain. With CY, a Dl agonist, the surround's gain showed an increase of 133%. Furthermore, the Dl pathway was also very effective in increasing the surround's gain when it was enhanced indirectly, through a D2 receptor blocker L-sulpiride. However, in this case and in contrast to the effect of D2 on the center's gain, the surround's gain did not appear to depend strongly on the administered dose.
17.6
Dopamine's Role in Retinal Mechanisms
It is well known that there are two major classes of dopamine receptors in all species, Dl and D2. Their action is combined in forming a paradoxical collaboration: they act antagonistically at the cellular level, but it is this antagonism which regulates their action resulting in a behavioral synergy. Pharmacological studies in the turtle retina (Piccolino et al., 1987) have clearly shown that at the horizontal cell level, Dl and D2 produce opposite effects. We found that the suppression at low spatial frequencies caused by a Dl agonist is consistent with a "stronger" surround with larger radius and higher gain. Conversely, Dl deficiency would weaken the surround response and lead to an enhanced response of largish neurons for which the surround is not negligible. This would lead to enhanced low spatial frequency responses. It has been shown that dopamine modulates horizontal cell coupling via Dl receptors (Piccolino et al., 1985,1987), and dopamine in this pathway reduces coupling among horizontal cells, forming smaller summation units (Gereshenfeld et al., 1982; Teranishi et al., 1983, 1984; Lasater and Dowling, 1985; Hankins and Dceda, 1991). One possible effect of smaller horizontal cell units is a dissipation of signal spread; i.e., the surround of each ganglion cell becomes stronger. In this way, Dl action leads to a stronger surround of neurons with large surrounds. This interpolation is consistent with the results of Dl and D2 manipulation on the horizontal cells in the turtle. Piccolino et al. (1985) have shown in the turtle retina that under photopic conditions with increasing levels of DA, D2 agonists reverse horizontal cell narrowing, thereby resulting in wider and larger surround response. The results of their experiments are consistent with a presynaptic D2 pathway reducing Dl responses. We postulate that a different dopamine-dependent mechanism produces the effect
360
Dopamine and Retinal Processing
Figure 17.6: Effect of different DA manipulations in the model. The thin black line represents the response profile of the model for the normal retina. A: The effect of L-sulpiride in the response profile of the model. The dashed h'ne illustrates the effect of low dose L-sulpiride, and the solid black line illustrates the effect of high dose. B: The effect of CY in the response profile of the model (solid black line).
Ivan Bodis-Wollner and Areti Tzelepi
361
observed with MPTP and 6-OHDA. It is likely that they both predominantly destroy the type of DA neurons which affect D2 receptors. Damage to these neurons and/or D2 receptor blockade at low concentrations causes attenuated peak spatial frequency responses. In this case, we found that introduction into the model of a "weaker" center resulted in a loss at middle and high spatial frequencies. The higher the gain and the smaller its summating area, the more efficiently and selectively a center mechanism can perform. A lower gain and a higher radius of the center's response in the model resulted in a decrease at middle and high spatial frequencies. It is thought that dopamine uncouples All amacrine cells in the mammalian retina (Hampson et al., 1992; NguyenLegros et al., 1999). It is interesting that in a model simulation of An amacrine cells, increased coupling in the All network resulted in expanded receptive field centers of amacrine and ganglion cells (Vardi and Smith, 1996). Furthermore, we suggest that D2 autoreceptors are involved in the "surround" Dl dopamine pathway in the primate; blocking them allows a greater DA effect on Dl receptors and enhances "surround" signals leading to attenuated low-spatial-frequency responses, as it is observed with high dose L-sulpiride. This is also consistent with the small increase at low spatial frequencies with low dose L-sulpiride. Through this negative feedback circuit, Dl and D2 can regulate the cell's response. The combined effect of Dl and D2 action is illustrated in figure 17.7. Based on our results and consistent with the model, the spatial contrast curve cannot represent a single type of ganglion cell in the central retina. Rather, the results compel us to assume that Dl action reduces the response at low spatial frequencies alone, because Dl has little effect on the center. Dl increases the surround strength which significantly contributes to ganglion cell responses when low-spatial-frequency stimuli are used.. This means that Dl is important for ganglion cells with large surrounds. Conversely, the use of a D2 antagonist reduced the peak-spatial-frequency response, but not the low-spatial-frequency response. It is known that neurons which respond to high spatial frequencies are center dominated and have little weight for the surround. This can be also seen in the spatial contrast transfer function. An exponential decay is observed at high spatial frequencies which is consistent with pure spatial summation without much inhibition from the surrounds response. Thus, D2 is important for neurons with smaller but dominant centers, compared to neurons which respond to low spatial frequencies. Hence, the results of various dopamine manipulations on the massed response of ganglion cells in the primate central fovea suggest that two types of ganglion cells determine contrast sensitivity: One, with dominant surrounds for which type Dl is crucial and which mediates the response to low spatial frequencies; the other type with dominant centers (corresponding to D2) mediates the response to middle and high spatial frequencies. The notion of two types of ganglion cells gains further support from the two types of bipolar cells, as described by Dacey et al. (2000). They described centersurround organization in two types of bipolar cells: the smaller ones with stronger centers, and the larger ones with relatively stronger surrounds. Thus, it is possible that each type of ganglion cell receives input which is akeady segregated to some degree at the bipolars. Dopaminergic systems vary among species. The exact function of DA in humans is not clear. Most of our knowledge comes from nonprimate studies, mostly from the
362
Dopamine and Retinal Processing
Figure 17.7: Left: The network of the mammalian retina consists of five different classes of neurons arranged in different layers: photoreceptors (rods R and cones C), horizontal cells (H), bipolar cells of different classes (invaginating midget bipolar cells (1MB), flat midget bipolars (FMB), invaginating diffuse bipolars (DB), and rod bipolars (RB), amacrine cells (A), and ganglion cells (falling into two main classes, midget ganglion cells (MG), and parasol ganglion cells (P)). Right: Simplified schema of the D1-D2 interaction in the retina. Dl pathway enhances the surround signal, while D2 enhances the center signal. Experimental results suggest that these pathways are not independent from each other: D2 is involved in the Dl pathway participating in a negative feedback loop, providing a greater Dl effect when D2 receptors are blocked. A presynaptic reciprocal lateral interaction (not sketched here) between the Dl and D2 pathway is not excluded by our model.
cat (Maguire and Smith, 1985; Maguire and Hamasaki, 1994) and the rabbit (Jensen and Daw, 1986). However, the functional differences between nonprimate and adult primate retinas do not allow a direct comparison. Yet, from PERG experiments in the monkey, we suggest that dopamine has a push-pull effect in the primate retina, similar to non-primate retina: it strengthens the response of neurons with small centers and strengthens the surround response of neurons with large surrounds (figure 17.8). Dl and D2 act antagonistically at the cellular level, while at the overall retina level they synergistically modulate the properties of ganglion receptive field. The net result of DA's push-pull action is the tuned spatial transfer function, without which the function is low-pass.
Ivan Bodis- Wollner and Areti Tzelepi
363
Figure 17.8: The antagonistic effect of Dl and D2 receptor activation acting on two different arms of the seesaw. As a consequence, the doubly opposite effects produce an overall synergistic action. The space underneath the curve represents the overall spatial frequency transfer function of the retina: low-frequency decline occurs where Dl receptors are active. The peak of the curve is created by the seesaw pointing to the right. Reprinted from Bodis-Wollner et al. (1993). "Visual and visual perceptual disorders in neurodegenerative diseases" Bailliere's Clinical Neurology, 2: 461-490. Copyright 1993 with permission from Elsevier.
References Barlow, H. B. and Hill, R. M. (1963). Selective sensitivity to direction of movement in ganglion cells of the rabbit retina. Science, 139: 412-414. Baylor, D. A., Fuortes, M. G. F. and O'Bryan, P. M. (1971). Receptive fields of the cones in the retina of the turtle. /. Physiol. (Lond.), 214: 265-294. Bodis-Wollner, I. (1996). Electrophysiological assessment of retinal dopaminergic deficiency. Funct. Neurosci., 46: 35-41 (suppl.). Bodis-Wollner, I. (1972). Visual acuity and contrast sensitivity in patients with cerebral lesions. Science, 178: 769-771. Bodis-Wollner, I. (1990). Visual deficits related to dopamine deficiency in experimental animals and Parkinson's disease patients. Trends Neurosci., 13: 296-302. Bodis-Wollner, I. and Camisa, J. M. (1980). Contrast sensitivity in clinical diagnosis. In S. Lessell and J. T. W. Van Dalen (Eds.), Neuro-Ophthalmology, pp. 373^01. Elsevier Science: Amsterdam. Bodis-Wollner, I., Marx, M. and Ghilardi, M. F. (1989). Systematic haloperidol administration increases the amplitude of the light and dark adapted flash ERG in the monkey. Clin. Vis. Sci., 4: 19-26.
364
Dopamine and Retinal Processing
Bodis-Wollner, I., Tagliati, M, Peppe, A. and Antal, A. (1993). Visual and visual perceptual disorders in neurodegenerative diseases. Bailliere's Clin. Neural., 2: 461^90. Bodis-Wollner, I. andTzelepi, A. (1998). The push-pull action of dopamine on spatial tuning of the monkey retina: The effects of dopaminergic deficiency and selective Dl and D2 receptor ligands on the pattern electroretinogram. Vis. Res., 38: 1479-1487. Boycott, B. B. and Wassle, H. (1974). The morphological types of ganglion cells of the domestic cat's retina. / Physiol. (Land.), 240: 397-419. Boycott, B. B. and Wassle, H. (1991). Morphological classification of bipolar cells of the primate retina. Eur. J. NeuroscL, 3: 1069-1088. Burns, R. S., Chiuech, C. C, Markey, S., Ebert, M. H., Jacobowitz, D. M. and Kopin, J. (1983). A primate model of Parkinson's disease: Selective destruction of substantia nigra pars compacta dopaminergic neurons by N-methyl-4-phenyl1,2,3,6-tetrahydropyridine. Proc. Nat. Acad. Sci. USA, 80: 4546-4550. Cleland, B. G. and Levick, W. R. (1974). Properties of rarely encountered types of ganglion cells in the cat's retina. /. Physiol. (Land.), 240: 457-492. Cohen, J. L. and Bowling, J. E. (1983). The role of the retinal interplexiform cell: Effects of 6-hydroxydopamine on the spatial properties of carp horizontal cells. Brain Res., 264: 307-310. Dacey, D., Packer, O., Diller, L., Brainard, D., Peterson, B. and Lee, B. (2000). Center surround receptive field structure of cone bipolar cells in the primate retina. Vis. Res., 40: 1801-1811. Dacey, D. M. (1993). The mosaic of midget ganglion cells in the human retina. /. NeuroscL, 13: 5334-5355. Davis, G. C., Williams, A. C., Markey, S. P., Ebert, M. H., Caine, E. D., Reichert, C. M. and Kopin, I. J. (1979). Chronic Parkinsonism secondary to intravenous injections of meperidine analogues. Psychiatry Res., 1: 249-254. De Monasterio, F. M. and and Gouras, P. (1975). Functional properties of ganglion cells of the rhesus monkey retina. J. Physiol. (Land), 251: 167-195. Djamgoz, M. B. A. and Kolb, H. (1993). Ultrastructural and functional connectivity of intracellularly stained neurons in the vertebrate retina: Correlative analyses. Microscopy Res. Techn.,24: 43-66. Enroth-Cugell, C. and Robson, J. G. (1966). The contrast sensitivity of retinal ganglion cells of the cat. /. Physiol. (Lond.), 187: 517-552. Fukuda, Y, Hsiao, C. F., Watanabe, M. and Ito, H. (1984). Morphological correlates of physiologically identified Y-, X-, and W-cells in cat retina. /. Neurophysiol., 52: 999-1013. Fukuda, Y., Hsiao, C. F. and Watanabe, M. (1985). Morphological correlates of Y, X and W type ganglion cells in the cat's retina. Vis. Res., 25: 319-327.
Ivan Bodis- Wollner and Areti Tzelepi
365
Gerschenfeld, H. M., Neyton, J., Piccolino, M. and Witkovsky, P. (1982). L-horizontal cells of the turtle: network organization and coupling modulation. Biomed Res., 1982,3: 21-32. Ghilardi, M. R, Bodis-Wollner, I., Onofrj, M. C, Marx, M. S. and Glover, A. A. (1988). Spatial frequency-dependent abnormalities of the pattern electroretinogram and visual evoked potentials in a Parkinsonian monkey model. Brain, 11: 131-184. Ghilardi, M. R, Marx, M. X., Bodis-Wollner, I., Camras, C. B. and Glover, A. A. (1989). The effect of intraocular 6-hydroxydopamine on retinal processing of primates. Ann. Neurol., 25: 359-364. Hampson, E. C., Vaney, D. I. and Weiler, R. (1992). Dopaminergic modulation of gap junction permeability between amacrine cells hi mammalian retina. /. Neurosci., 12:4911-4922. Hankins, M. W. and Ikeda, H. (1991). The role of dopaminergic pathways at the outer plexiform layer of the mammalian retina. Clin. Vis. Sci., 6: 87-93. Jensen, R. J. and Daw, N. W. (1986). Effects of dopamine and its agonists and antagonists on the receptive field properties of ganglion cells in the rabbit retina. Neurosci., 17: 837-855. Kamermans, M. and Spekreijse, H. (1999). The feedback pathway from horizontal cells to cones: A mini review with a look ahead. Vis. Res., 39: 2449-2468. Kaplan, E. and Shapley, R. M. (1986). The primate retina contains two types of ganglion cells, with high and low contrast sensitivity. Proc. Natl. Acad. Sci. USA, 83: 2755-2757. Kolb, H., Nelson, R. andMariani, A. (1981). Amacrine cells, bipolar cells and ganglion cells of the cat retina: A Golgi study. Vis. Res., 21: 1081-1114. Kolb, H., Linberg, K. A. and Fisher, S. K. (1992). Neurons of the human retina: A Golgi study. J. Comp. Neurol., 31: 147-187. Kuffler, S. W. (1953). Discharge patterns and functional organization of mammalian retina. J. Neurophysiol., 16: 37-68. Lasater, E. M. and Dowling, J. E. (1985). Dopamine decreases conductance of the electrical junctions between cultured retinal horizontal cells. Proc. Natl. Acad. Sci. USA, 82: 3025-3029. Maffei, L. and Fiorentini, A. (1981). Electroretinographic responses to alternating gratings before and after section of the optic nerve. Science, 211: 953-955. Maffei, L., Fiorentini, A., Bisti, S. and Hollander, H. (1985). Pattern ERG in the monkey after section of the optic nerve. Exp. Brain Res., 59: 423-425. Maguire, G. W. and Hamasaki, D. I. (1994). The retinal dopamine network alters the adaptational properties of retinal ganglion cells in the cat. J. Neurophysiol., 72: 730-741. Maguire, G. W. and Smith, E. L. (1985). Cat retinal ganglion cell receptive-field alterations after 6-hydroxydopamine induced dopaminergic amacrine cell lesions. /. Neurophysiol., 53: 1431-1443.
366
Dopamine and Retinal Processing
Mangel, S. C. and Dowling, J. E. (1985). Responsiveness and receptive field size of carp horizontal cells are reduced by prolonged darkness and dopamine. Science, 229: 1107-1109. Mariani, A. P. (1990). Amacrine cells of the rhesus monkey retina. J. Comp. Neurol., 301: 382-400. Marx, M. S., Podos, S. M., Bodis-Woller, I., Lee, P. Y, Wang, R. F. and Severin, C. (1988). Signs of early damage in glaucomatous monkey eyes: Low spatial frequency losses in the pattern ERG and VEP. Exp. Eye Res., 46: 173-184. Matsumoto, N. and Naka, K. I. (1972). Identification of intracellular responses in the frog retina. Brain Res., 42: 59-71. Nguyen-Legros, J., Versaux-Botteri, C. and Vernier, P. (1999). Dopamine receptor localization in the mammalian retina. Mol. Neurobiol., 19: 181-204. Nicklas, W. J., Youngster, S. L., Lindt, M. V. and Heikkila, R. E. (1987). Molecular mechanisms of MPTP induced toxicity. IV. MPTP, MPP+ and mitochondrial function. Life ScL, 40: 721-729. Oliver, P., Jolicoeur, F. B., Lafond, B., Drumheller, A. T. and Brunette, J. R. (1986). Dose-related effects of 6-OHDA on rabbit retinal dopamine concentrations and ERG b-wave amplitudes. Brain Res. Bull., 16: 751-753. Peppe, A., Antal, A., Tagliati, M., Stanzione, P. and Bodis-Wollner, I. (1998). Dl agonist CY 208-243 attenuates the pattern electroretinogram to low spatial frequency stimuli in the monkey. Neurosci. Lett., 242: 1-4. Piccolino, M., De Montis, G., Witkovsky, P., Bodis-Wollner, I. and Mirolli, M. (1987). Dl and D2 dopamine receptors involved in the control of electrical transmission between retinal horizontal cells. In G. Biggio, P. F. Spano, G. Toffano, and G. L. Gessa (Eds.), Symposia in Neuroscience. Central and Peripheral Dopaminergic Receptors, pp. 1-12. Liviana Press: Padova. Piccolino, M., Witkovsky, P. and Trimarchi, C. (1987). Dopaminergic mechanisms underlying the reduction of electrical coupling between horizontal cells of the turtle retina induced by d-amphetamine, bicuculline, and veratridine. /. Neurosci., 7: 2273-2284. Regan, D., Kothe, A. C. and Sharpe, J. A. (1991). Recognition of motion-defined shapes in patients with multiple sclerosis and optic neuritis. Brain, 114: 1129— 1155. Rodieck, R. W. and Stone, J. (1965). Analysis of receptive fields of cat retinal ganglion cells. J. NeuwphysioL, 28: 833-849. Rodieck, R. W, Binmoeller, K. F. and Dineen, J. (1985). Parasol and midget ganglion cells of the human retina. J. Comp. Neural., 233: 115-132. Schwartz, E. A. (1974). Response of bipolar cells in the retina of the turtle. J. Physiol. (Land.), 236:211-224. Shapley, R. and Perry, V. H. (1986). Cat and monkey retinal ganglion cells and their visual functional roles. Trends Neurosci., 9: 229-235.
Ivan Bodis-Wollner and Areti Tzelepi
367
Tagliati, M., Bodis-Wollner, I., Kovanesz, I. and Stanzione, P. (1994). Spatial frequency tuning in the monkey retina depends on D2 receptor-linked action of dopamine. Vis. Res., 34: 2051-2057. Tagliati, M., Bodis-Wollner, I. and Yahr, M. D. (1996). The pattern electroretinogram in Parkinson's disease reveals lack of retinal spatial tuning. Electroenceph. Clinic. Neurophys., 100: 1-11. Teranishi, T., Negishi, K. and Kato, S. (1983). Dopamine modulates S-potential amplitude and dye-coupling between external horizontal cells in carp retina. Nature, 301: 243-246. Teranishi, T., Negishi, K. and Kato, S. (1984). Regulatory effect of dopamine on spatial properties of horizontal cells in carp retina. J. Neurosci., 4: 1271-1280. Ungerstedt, U. and Arbuthnott, G. W. (1970). Quantitative recording of rotational behavior in rats after 6-hydroxydopamine lesions of the nigrostriatal dopamine system. Brain Res., 24: 485^93. Vaney, D. I. (1994). Patterns of neuronal coupling in the retina. Prog. Ret. Eye Res., 13: 301-355. Vaney, D. I. (1990). The mosaic of amacrine cells in the mammalian retina. Prog. Ret. Res., 9: 49-100. Vardi, N. and Smith, R. G. (1996). The An amacrine network: Coupling can increase correlated activity. Vis. Res., 36: 3743-3457. Victor, J. D. (1988). The dynamics of the cat retinal Y cell subunit. J. Physiol. (Land.), 405: 289-320. Werblin, F. S. and Dowling, J. E. (1969). Organization of the retina of the mudpuppy, Necturus maculosus. II. Intracellular recording. J. Neurophysiol., 32: 339-355. Witkovsky, P., Alones, V. and Piccolino, M. (1987). Morphologcial changes induced in turtle retinal neurons by exposure to to 6-hydroxydopamine and 5,6-hydroxytryptamine. J.Neurocytol., 16: 55-67. Wong, C, Ishibashi, T, Tucker, G. and Hamasaki, D. (1984). Responses of the pigmented rabbit retina to NMPTP, a chemical inducer of Parkinsonism. Exp. Eye Res., 40: 509-519. Xin, D. and Bloomfield, S. A. (1999). Dark- and light-induced changes in coupling between horizontal cells in mammalian retina. /. Comp. Neural., 405: 75-87.
This page intentionally left blank
PartV
Development
This page intentionally left blank
18. Improving Abnormal Spatial Vision in Adults with Amblyopia Uri Polat 18.1
Background
A critical question in spatial vision is how the visual system detects small, local luminance differences (contrast) and groups them into behaviorally relevant objects. According to the classical view of the organization of the visual cortex, neurons mediate visual information from the retina to the visual cortex through a few successive stages, while each stage elaborates on the feature selectivity developed at earlier stages. In this view, classical receptive fields of simple cells in the primary visual cortex, which have been tuned selectively for location, orientation, and spatial frequency, form the fundamental units of analysis. Thus, every image location is represented by a population of linear spatial filters that analyze locally the image parts, and their outputs produce a field of local signals that can be integrated at later stages of signal processing. During the last decade it has been demonstrated that neural response is also determined by lateral interactions in the visual cortex. The contrast detection threshold in humans and the contrast response function of neurons in the primary visual cortex are context dependent and can be modulated by remote image parts. The contrast response can be either enhanced or suppressed by the lateral placement of other images. Our studies have uncovered several rules in which we can predict the dependency of the sign of the context effect. An important finding was that the facilitatory long-range interactions between neurons are configuration specific, suggesting that contrast is summating preferentially along the extended contours. Recently, it has been shown that the collinear interactions are abnormal in amblyopia, an effect that might be underlie their abnormal spatial vision. 371
372
18.2
Improving Abnormal Spatial Vision in Adults with Amblyopia
Amblyopia
Amblyopia is a unilateral or bilateral reduction of best-corrected visual acuity that cannot be attributed directly to the effect of any structural abnormality of the eye or the posterior visual pathway. It is caused by abnormal binocular visual experience early in life, during the "critical period" and prevents normal development of the visual system.
18.2.1 Abnormal Spatial Vision in Amblyopia Amblyopia is generally defined by a decrease in visual acuity. This, however, represents only one limit of visual capacity. There are several other functional abnormalities (Hess et al., 1990; Levi, 1991), including reduction in contrast sensitivity function (Bradley and Freeman, 1981; Gstalder and Green, 1971; Hess and Howell, 1977; Levi and Harwerth, 1977), vernier acuity (Bradley and Freeman, 1981; Levi and Klein, 1982), spatial distortion (Bedell and Flom, 1981, 1983; Hess et al., 1978; Lagreze and Sireteanu, 1991), abnormal spatial interactions (Ellemberg et al., 2002; Levi et al., 2002; Polat et al., 1997), and impaired contour detection (Hess et al., 1997; Kovacs et al., 2000). In addition, amblyopic observers suffer from binocular abnormalities such as lack of stereoacuity, abnormal binocular summation, or even monocular vision.
18.2.2 Contrast Sensitivity and Amblyopia Contrast is one of the most important attributes of the visual stimulus and determines visual neuronal response. The ability to perceive spatial detail is determined mainly by the ability to detect contrast (Ciuffreda et al., 1991). Most knowledge about early mechanisms in vision is based on threshold data, since the threshold is best suited for exploring independent mechanisms and considered to be stable over time. Contrast sensitivity function (CSF) expresses the sensitivity of the visual system to contrast, and it is considered to express the sensitivities of many neurons in the visual cortex, each tuned to respond to specific combinations of spatial frequency, orientation and location's in the visual field. The general assumption is that the sensitivity of each neuron is independent of the responses of other neurons responding to stimuli presented in nearby location in the visual field. Abnormal CSF occur's in most amblyopic eyes, and mainly at high spatial frequencies, with little or no loss of contrast sensitivity at low spatial frequencies (Gstalder and Green, 1971; Levi and Harwerth, 1977). In other words, the contrast sensitivity in amblyopic eyes is usually reduced, effect that explained as increased intrinsic noise, which may form the basis of their abnormal CSF (Levi, 1991; Levi and Klein, 1990). During the last decade we demonstrated that neural response is also determined by lateral interactions in the visual cortex (digure 18.1). The contrast detection threshold in humans (Polat and Norcia, 1996, 1998; Polat and Sagi, 1993, 1994a, b, c, 1995) and the contrast response function of neurons in the primary visual cortex (Mizobe et al., 2001; Polat et al., 1998) are context dependent and can be modulated by remote
Uri Polat
373
Figure 18.1: Lateral masking curves. A comparison is shown, exposing the absence of lateral facilitation in the amblyopes' data. The x-axis denotes the target to flankers' separation in A (wavelengths) units. The y-axis denotes the amount of threshold elevation of the target in log units. Zero represents the contrast detection of the target without flankers. Values above 0 indicates suppression, and below zero, facilitation. Error bars denote ±1 SE. Data from untrained observers were taken during the first lateral-masking session and were averaged across observers (amblyopes: n = 40; nonamblyopes, n = 16). The range of spatial frequencies was 3-12 cpd (5.9 ± 2.8 (mean ± SD), amblyopes; 7.7 ± 3.6, normal group). image parts. The contrast response can be either enhanced or suppressed by the lateral placement of other images. These studies have uncovered several rules hi which the sign of the response is depending on the spatial configuration of the stimuli. An important finding was that the facilitatory long-range interactions between neurons are configuration specific, suggesting that contrast is summating preferentially along the extended contours (Polat, 1999; Polat and Norcia, 1998; Polat and Tyler, 1999). Our model of lateral interactions (Polat, 1999) is based on the assumption that excitation and inhibition produce a network of neuronal connectivity that determines the neuronal response depending on the context. In this model (see figure 18.2), each filter receives three types of visual input: i. direct thalamic-cortical excitatory input (feedforward) ii. lateral excitation and inhibition and iii. top-down input. The lateral excitation is organized along the filter's optimal orientation and is superimposed on a suppressive area surrounding the neuron. The balance of the network may control the contrast response function of the individual neuron. We suggest that facilitative and suppressive center-surround interactions may be organized differently to subserve
374
Improving Abnormal Spatial Vision in Adults with Amblyopia
Figure 18.2: Model of collinear lateral interactions. Excitation and inhibition produce a network of neuronal connectivity that determined the neuronal response. In this model, each filter receives three types of visual input: (i) direct thalamic-cortical excitatory input (feedforward) (ii) lateral excitation and inhibition, and (iii) top down input (not shown here). The lateral excitation is organized along the filter's optimal orientation and is superimposed on a suppressive area surrounding the neuron. The balance of the network may control the contrast response function of the individual neuron. different functional roles. Facilitative interactions may be forming a collinear integration field that may underlie the detection of extended contours. Suppression is a more general phenomenon that may act as a contrast gain control and may serve to enhance surface perception.
18.2.3 Models of Amblyopia Leading explanations to account for the abnormal spatial vision in amblyopia consider passive mechanisms that undergo abnormal spatial representation caused either by a reduction in the number of neurons (under-sampling; (Levi and Klein, 1986) or by disarray (jitter) in the spatial relationships of neurons (Hess and Field, 1994). An alternative model of active (dynamic) mechanisms (Polat, 1999; Polat et al., 1997) suggests that the lateral interactions between neurons mediating spatial vision are compromised.
18.2.4 Abnormal Spatial Interactions and Crowding Abnormal spatial interactions were recently observed in amblyopic observers (Ellemberg et al., 2002; Levi et al., 2002; Polat et al., 1997, 2000, 2001). In amblyopia, abnormal neuronal interactions exhibited a reduced facilitation and increased suppression (see figure 18.1) (Levi et al., 2002; Polat et al., 1997). More recently it has been shown
375
UriPolat
that amblyopic observers failed in the task of contour integration (Chandna et al., 2001; Hess et al., 1997; Kovacs et al., 2000). The results show a deficit in the amblyopic eye as compared to the fellow eye of amblyopic observers, and to the performance of normal subjects. These results support the suggestion that there is a connection between impaired lateral interactions and grouping global targets in amblyopia. Crowding is the phenomena when our ability to identify a letter is better when it is presented alone than when it is flanked by other letters in close proximity. It have been suggested that lateral suppression may underlie the crowding effect (Polat and Sagi, 1993) and the increased suppression in amblyopia (Levi et al., 2002; Polat et al., 1997) may be correlated, at least in part, with the increased crowding in amblyopia.
18.3
Perceptual Learning and Neural Plasticity
Perceptual learning has a major influence on the understanding of the development and plasticity of the visual system. When a person is asked to perform a visual (or any other sensory) task it is often the case that he or she improves with practice, even on very simple tasks. This improvement occurs without any reinforcement and does not seem to involve conscious effort, but rather it seems to be controlled by some inherent subconscious process. The past years have yielded a large amount of psychophysical and physiological studies demonstrating that practice on specific perceptual tasks results in increased sensitivity to weak visual signals. All together, the adult visual system is not immutable and can change according to behavioral demands, and has implications for potential rehabilitation (for review see Fahle, 2002; Fahle and Poggio, 2002; Gilbert et al., 2001; Sagi and Tanne, 1994). One important development in understanding perceptual learning is the finding that learning involves modification of the sensory representation in the brain. Thus perceptual learning is not only a way of training attention to pick up distinctive stimulus features (Dosher and Lu, 1999; Gibson, 1969) or of improving sensory processing due to increased alertness (Wolford et al., 1988). Learning involves improvement of stimulus-response associations (correlated activities) within the sensory system. Some insight into the mechanism of learning comes from lateral masking experiments (Polat and Sagi, 1994b, 1995). Learning experiments showed that practice increases the range of the lateral interactions by a factor of six, but only along the collinear directions. A range increase could not be obtained by practicing on the large distances alone; rather, it required practicing with a mixture of distances, including the small ones. Typically, the improvements on visual tasks revealed in perceptual learning experiments have been shown to hold even when tested after several years without further practice (Kami and Sagi, 1991,1993; Kami et al., 1994).
18.3.1 Plasticity in Amblyopia Plasticity in adults with amblyopia has recently been observed (Levi and Polat, 1996; Levi et al., 1997). Repetitive practice led to a substantial improvement in vernier acuity in the amblyopic eyes of adults with naturally occurring amblyopia. In some of
376
Improving Abnormal Spatial Vision in Adults with Amblyopia
the observers the improvement in vernier acuity was accompanied by a commensurate improvement in Snellen acuity up to normal vision. There have been reports of improvement of visual acuity in adults in their amblyopic eye when their previously normal fellow eye has lost vision, secondary to age-related macular degeneration (El Mallah et al., 2000) or cataract (Wilson, 1992).
18.4 Treatment of Adult Amblyopia A generally practiced principle of treatment is that therapy can only be effective during the critical period, generally considered to end around 8-9 years of age (von Noorden, 1981), when the visual system is considered sufficiently plastic for cortical modifications to occur. Available treatment is thus traditionally directed towards children. However, in adults, the visual deficiencies are thought to be irreparable after the first decade of life, once the developmental maturation window has been terminated. The vision loss is thought to result from abnormal operation of the neuronal networks within the primary visual cortex, most notably, orientation-selective neurons and their interactions (Polat, 1999). The perceptual learning procedure described here was designed to train this network by efficiently stimulating these neuronal populations and effectively promoting their spatial interactions.
18.4.1 Perceptual-Learning-Based Technique for Treating Amblyopia During treatment the patients' task was to detect the presence of a target (Gabor patch). The contrast threshold was measured in a procedure consisting of a forced choice of two alternatives. The target was presented in only one out of two images each lasting 80-320 ms, at an interval of 500 ms. A visible fixation circle indicated the location of the target between presentations, and the patients activated the presentations at their own pace. They were informed of a wrong answer by an auditory feedback after each pair of presentations. The treatment was conducted in a dark cubicle, where the only ambient light came from the display. A standard training session included a contrast detection task of a Gabor patch presented alone or with two Gabor patches at its sides, called flankers. The flankers were at the same size and spatial frequency as the target but with a higher contrast. In each session the separation between the target and the flankers was varied between one and nine of the grating's wavelengths. Through the training sessions, the size (spatial frequency) and orientation of the stimuli were changed, starting with lower spatial frequencies and progressively moving to the higher ones, with four orientations at each size. The first two sessions were devoted to measuring the basic spatial functions such as contrast sensitivity and performance on spatial interactions, the latter representing degrees of cortical lateral suppression and lateral facilitation. For each patient, subsequent sessions were individually designed depending on performance in the previous session.
Uri Polat
377
Figure 18.3: Individual amblyope's improvement with perceptual learning technique: change in best corrected visual acuity (BCVA) by each patient of the treatment group (arranged by the patients' initial BCVA). The initial BCVA is shown as the bottom of each arrow, with the final BCVA (end of treatment, n=44) shown as the top of the arrow. Inset shows changes in BCVA (ETDRS lines ± SE) measured after 12 sessions in the treatment (n=44) and control group (n=10). The control group was given a similar treatment but with high spatial frequency. Treatment Progress and Goals An inherent goal of the treatment was to improve the thresholds of the amblyopic eyes to a desired range from the standard database (i.e., standard plus some deviation). Once the patients have achieved this goal, they were presented with the next level of treatment, such as different orientation of the same spatial frequency, faster presentation time, higher spatial frequency, or spatial alignment task.
Visual Functions Testing The best corrected visual acuity (BCVA) was examined by optometrists at baseline, after every four treatment sessions, at the final examination, and at 3,6,9, and 12 months after the last treatment session. The BCVA was measured using three different ETDRS charts, randomized to prevent memorization. CSF was measured at baseline and after the treatment using a wall-mounted chart from a distance of 3 m with controlled room lighting. At baseline and final examination, cycloplegic refraction was performed using cyclopentolate 1% with an autorefractometer. Baseline examination also included slit-lamp biomicroscopy and dilated fundus examination performed by ophthalmologists. Orthoptists performed a comprehensive assessment of ocular motility and alignment. All clinical personnel (independent) were masked as to which patients were in the treatment or control groups.
Figure 18.4: Group data are (Top) related to etiology of the patient, and (Bottom) Related to the age of the patients. Note that all groups improved by about 2.5 ETDRS lines. Improvement of Visual Acuity The extent of average BCVA improvement in the treatment group was 2.5 ± 0.2 ETDRS lines (mean ± SE) with no significant improvement in control group (figure 18.3). After 12 sessions, improvement in the treatment group was 1.5 ± 0.14 lines, and in the control group was 0.07 ± 0.2 lines. This effect indicates that the treatment is effective and specific and that improvement is not due to the total of 6 hours patching of the nonamblyopic eye during treatment. At the end of the treatment 71 % achieved a significant and meaningful improvement of 2 ETDRS lines or more. The BCVA of five patients improved to better than 20/20. No significant difference in the extent of improvement was found between the types
Uri Polat
379
Figure 18.5: Contrast sensitivity function (CSF). CSF (group average) before (dotted line) and after (continuous line) treatment. Note that CSF improved by a factor of two at all points, with the low spatial frequency being well into the normal range after treatment and the higher range entering the norm (shaded area).
of amblyopia (anisometropic, strabismic, or combined; figure 18.4). There was no correlation between the age of the patients and the improvement in BC VA, suggesting that age is not a limiting factor to the efficacy of the treatment. The differences in improvement seem to be related to differences in the average initial BCVA; the amount of improvement was higher in patients with lower initial BCVA.
380
Improving Abnormal Spatial Vision in Adults with Amblyopia
Figure 18.6: Individual improvement and retention. A: The BCVA change in each patient (n=44) from pretreatment (x-axis) to posttreatment (y-axis). Diagonal dash line shows no change. Note that the data points scattered above the line indicate better visual acuity after treatment. B: The stability of retention by comparing the posttreatment BCVA (x-axis) to the BCVA at 12 months after the end of treatment (y-axis) (n=24). Improvement of CSF Before treatment, the amblyopic eyes had a lower CSF threshold as compared with normal-sighted-eyes, with the low spatial frequencies near the normal values and the high-spatial-frequencies showing the worse deficit. The treatment produced a significant improvement of sensitivity in all spatial frequencies including the high spatial-frequency range, raising the function to within the normal range except in the highest frequency where it became nearly so (figure 18.5). Persistence of the Improved Functions The visual functions were tested 12 months after cessation of the treatment without any interventions. The patients were instructed to use their optical correction if needed. The results show that most of the patients retained the improved visual functions 12 months after cessation of the treatment (figure 18.6). This result is consistent with the long-lasting improvements found in other studies using perceptual learning. More interesting is the result that CSF after 12 months not only remained but was also improved at the high spatial frequencies. This result may indicate that the high spatial frequencies are being used after the treatment in daily tasks and thus naturally practiced.
18.5 Summary These findings indicate that plasticity in the visual processing of adult with amblyopia exists and can be used as a foundation to improve visual functions. Perceptual learning thus seems to be applicable for the treatment of adult amblyopia, challenging the
UriPolat
381
current standard clinical practice of not treating amblyopia in older children and adults. The perceptual-learning-based procedure used aimed at modifying the abnormal lateral interactions between cortical neurons and thus elevating them into the normal operating range, presumably by reducing suppression and increasing facilitation. This effect was achieved by repetitive practice of target detection, with and without flankers, covering a sufficient range of spatial frequencies and orientations in order to lead to an improvement of the patients' CSF and thus in their BCVA. While the results do not clearly prove these putative mechanisms described above, it seems likely that they are operative in achieving the significant improvement in visual functions demonstrated in this study. Improvement of the visual functions to within a close range of the normal vision leads to a diminishing if not preventing of the suppression on the amblyopic eye, caused by the good eye. Suppression of the amblyopic eye is considered to be one of the main causes for amblyopia, and reducing the amount of suppression is expected to diminish the likelihood for recurrent amblyopia, thus retaining the improved vision. In conclusion, a normal output from the neurons in the early visual cortex is essential for normal visual functions. In amblyopia, deficiencies of some of the neurons, especially those that are sensitive for the high spatial frequencies, may prevent further visual processes from functioning normally. Proper perceptual training that improved neuronal sensitivity has provided the basis for more efficient visual processing. This has enabled the restoration of visual functions that were otherwise dysfunctional.
References Bedell, H. D. and Flom, M. C. (1981). Monocular spatial distortion in strabismic amblyopia. Invest. Ophthalmol. Vis. ScL, 20: 263-268. Bedell, H. E. and Flom, M. C. (1983). Normal and abnormal space perception. Am. J. Optom. Physiol. Optics, 60: 426-435. Bradley, A. and Freeman, R. D. (1981). Contrast sensitivity in anisometropic amblyopia. Invest. Ophthalmol. Vis. ScL, 21: 467-^-76. Chandna, A., Pennefather, P. M., Kovacs, I. and Norcia, A. M. (2001). Contour integration deficits in anisometropic amblyopia. Invest. Ophthalmol. Vis. Sci., 42: 875-878. Ciuffreda, K. J., Levi, D. M. and Selenow, A. (1991). Amblyopia: Basic and Clinical Aspects. Butterworth-Heinemann: Stoneham. Dosher, B. A. andLu, Z. L. (1999). Mechanisms of perceptual learning. Vis. Res., 39: 3197-3221. El Mallah, M. K., Chakravarthy, U. and Hart, P. M. (2000). Amblyopia: Is visual loss permanent? Br. J. Ophthalmol., 84: 952-956. Ellemberg, D., Hess, R. F. and Arsenault, A. S. (2002). Lateral interactions in amblyopia. Vis. Res., 42: 2471-2478. Fahle, M. (2002). Perceptual learning: Gain without pain? Nat. Neurosci., 5: 923-924.
382
Improving Abnormal Spatial Vision in Adults with Amblyopia
Fahle, M. and Poggio, T. (2002). Perceptual Learning. MIT Press: Cambridge, MA. Gibson, E. J. (1969). Principles of Perceptual Learning. Appleton-Century-Crofts: New York. Gilbert, C. D., Sigman, M. and Crist, R. E. (2001). The neural basis of perceptual learning. Neuron, 31: 681-697. Gstalder, R. J. and Green, D. G. (1971). Laser interferimetric acuity in amblyopia. J Pediatric Ophthalmol, 8: 251-256. Hess, R. R, Campbell, R W. and Greenhalgh, T. (1978). On the nature of the neural abnormality in human amblyopia; neural aberrations and neural sensitivity loss. Pfliigers Archiv. Eur. J. Physiol, 377: 201-207. Hess, R. R, Field, D. and Watt, R. J. (1990). The puzzle of amblyopia. In C. Blakemore (Ed.), Vision: Coding and Efficiency, pp. 267-280. Cambridge University Press: Cambridge. Hess, R. R and Field, D. J. (1994). Is the spatial deficit in strabismic amblyopia due to loss of cells or an uncalibrated disarray of cells? Vis. Res., 34: 3397-3406. Hess, R. R and Howell, E. R. (1977). The threshold contrast sensitivity function in strabismic amblyopia: Evidence for a two type classification. Vis. Res., 17: 1049-1055. Hess, R. R, Mcllhagga, W. and Field, D. J. (1997). Contour integration in strabismic amblyopia: The sufficiency of an explanation based on positional uncertainty. Vis. Res., 37: 3145-3161. Kami, A. and Sagi, D. (1991). Where practice makes perfect in texture discrimination: Evidence for primary visual cortex plasticity. Proc. Nat. Acad. Sci. USA, 88: 4966-4970. Kami, A. and Sagi, D. (1993). The time course of learning a visual skill. Nature, 365: 250-252. Kami, A., Tanne, D., Rubenstein, B. S., Askenasy, J. J. and Sagi, D. (1994). Dependence on REM sleep of overnight improvement of a perceptual skill. Science, 265: 679-682. Kovacs, I., Polat, U., Pennefather, P. M., Chandna, A. and Norcia, A. M. (2000). A new test of contour integration deficits in patients with a history of disrupted binocular experience during visual development. Vis. Res., 40: 1775-1783. Lagreze, W. D. and Sireteanu, R. (1991). Two-dimensional spatial distortions in human strabismic amblyopia. Vis. Res., 31: 1271-1288. Levi, D. M. (1991). Spatial vision in amblyopia. In D. M. Regan (Ed.), Spatial Vision, pp. 212-238. Macmillan: London. Levi, D. M., Hariharan, S. and Klein, S. A. (2002). Suppressive and facilitatory spatial interactions in amblyopic vision. Vis. Res., 42: 1379-1394. Levi, D. M. and Klein, S. A. (1982). Hyperacuity and amblyopia. Nature, 298: 268270.
Uri Polat
383
Levi, D. M. and Klein, S. A. (1986). Sampling in spatial vision. Nature, 320: 360-362. Levi, D. M. and Klein, S. A. (1990). Equivalent intrinsic blur in amblyopia. Vis. Res., 30: 1995-2022. Levi, D. M. and Polat, U. (1996). Neural plasticity in adults with amblyopia. Proc. Nat. Acad. Sci. USA, 93: 6830-6834. Levi, D. M., Polat, U. and Hu, Y. S. (1997). Improvement in vernier acuity in adults with amblyopia. Invest. Ophthalmol. Vis. Sci., 38: 1493-1510. Levi, M. and Harwerth, R. S. (1977). Spatio-temporal interactions in anisometropic and strabismic amblyopia. Invest. Ophthalmol. Vis. Sci., 16: 90-95. Mizobe, K., Polat, U., Pettet, M. W. and Kasamatsu, T. (2001). Facilitation and suppression of single striate-cell activity by spatially discrete pattern stimuli presented beyond the receptive field. Vis. Neurosci., 18: 377-391. Polat, U. (1999). Functional architecture of long-range perceptual interactions. Spat. Vis., 12: 143-162. Polat, U., Ma-Naim, T. and Belkin, M. (200la). Treatment of adult amblyopia by perceptual training. Invest. Ophthalmol. Vis. Sci., 42: S400. Polat, U., Ma-Naim, T., Sagi, D. and Bonneh, Y. (2001b). Abnormal spatial interactions and their plasticity in human adults with amblyopia. Soc. Neuro. Sci. Abstr., 27: 619.57. Polat, U., Mizobe, K., Pettet, M. W., Kasamatsu, T. andNorcia, A. M. (1998). Collinear stimuli regulate visual responses depending on cell's contrast threshold. Nature, 391:580-584. Polat, U. and Norcia, A. M. (1996). Neurophysiological evidence for contrast dependent long-range facilitation and suppression in the human visual cortex. Vis. Res., 36: 2099-2109. Polat, U. and Norcia, A. M. (1998). Elongated physiological summation pools in the human visual cortex. Vis. Res., 38: 3735-3741. Polat, U. and Sagi, D. (1993). Lateral interactions between spatial channels: Suppression and facilitation revealed by lateral masking experiments. Vis. Res., 33: 993-999. Polat, U. and Sagi, D. (1994a). The architecture of perceptual spatial interactions. Vis. Res., 34: 73-78. Polat, U. and Sagi, D. (1994b). Spatial interactions in human vision: from near to far via experience-dependent cascades of connections. Proc. Nat. Acad. Sci. USA, 91: 1206-1209. Polat, U. and Sagi, D. (1994c). Spatial interactions in normal and amblyopic observers: Is there a qualitative difference? Invest. Ophthalmol. Vis. Sci., 35: 1257. Polat, U. and Sagi, D. (1995). Plasticity of spatial interactions in early vision. In B. Julesz and I. Kovacs (Eds.), Maturational Windows and Adult Cortical Plasticity, XXIV, pp. 1-15. Addison-Wesley': Reading, MA.
384
Improving Abnormal Spatial Vision in Adults with Amblyopia
Polat, U., Sagi, D. and Norcia, A. M. (1997). Abnormal long-range spatial interactions in amblyopia. Vis. Res., 37: 737-744. Polat, U. and Tyler, C. W. (1999). What pattern the eye sees best. V?*. Res., 39: 887-895. Sagi, D. and Tanne, D. (1994). Perceptual learning: Learning to see. Curr. Opin. NeurobioL, 4: 195-199. von Noorden, G. K. (1981). New clinical aspects of stimulus deprivation amblyopia. Am. J. Ophthalmol, 92: 416-421. Wilson, M. E. (1992). Adult amblyopia reversed by contralateral cataract formation. /. Pediatr. Ophthalmol. Strabismus, 29: 100-102. Wolford, G., Marchak, F. and Hughes, H. (1988). Practice effects in backward masking. /. Exp. Psychol Hum. Percept. Perform., 14: 101-112.
19. Visual Development with One Eye Martin J. Steinbach and Esther G. González
19.1 Introduction We have been privileged to work with the patients of an outstanding Toronto scientist and ophthalmologist, Dr. Brenda L. Gallie, over the past 16 years. Dr. Gallic specializes in retinoblastoma, an insidious pediatric tumor that has both inherited and sporadic occurrence, and if left untreated can kill the child unlucky enough to have the cancer (DiCiommo et al., 2000; Richter et al., 2003). Many of her patients had the unilateral, i.e., present in one eye only, variety and thus were left monocular by a treatment that frequently required the removal of the affected eye soon after birth. This gave us an opportunity to study the effects of total monocular deprivation as compared to the partial deprivation that can occur with strabismus, cataract, ptosis, and anisometropia (Daw, 1995; Day, 1995; von Noorden and Campos, 2002). We have measured a variety of psychophysical and oculomotor characteristics in these children, along the way testing normal children as well, and have uncovered some interesting consequences of one-eyed vision, both literally with the enucleated children, and figuratively with consideration of the cyclopean eye (or egocenter) origin for visual direction (Ono and Mapp, 1995; Ono et al., 2002). We review the results of these studies, starting with the psychophysical measures of form, motion, and depth perception, as well as the optokinetic eye movements of children deprived of normal binocular experience. The second half of the review is concerned with the egocenter and how we cope with it in performing monocular viewing tasks. 385
386
19.2
Visual Development with One Eye
Form (Contrast, Texture and Motion Defined), Motion (including OKN), and Depth
The loss of vision is one of the most dreaded of all disabilities. It can, however, give valuable insights into the workings of the normal visual system, which in turn can be used to benefit those affected. In the study of visual plasticity and visual development, one-eyed observers provide us with a useful model to study the roles of deprivation and binocular competition. We are particularly interested in examining in humans the psychophysical consequences of the cortical reorganization and recovery of function following visual loss documented in animal studies. The visual system has been shown in animals to exhibit a remarkable plasticity in response to visual deprivation (see reviews in Daw, 1995; Guillery, 1989; Kiorpes and Movshon, 2003; Mitchell and Timney, 1984). There is evidence of recruitment after deprivation which increases the cortical space innervated by the remaining eye. Cells dominated partially or completely by an eye undergo a reorganization after lack of visual input and become primarily responsive to the other eye (e.g., Gilbert and Wiesel, 1992; Hubel and Wiesel 1962; Hubel et al., 1977; Kratz and Spear, 1976). In addition, monocular enucleation reduces apoptosis in ganglion cells in the remaining eye and preserves or even expands their central connections (Guillery, 1989). This reorganization can occur within hours, depending on the nature of the lesion (Schmid et al., 1995), and can involve other sensory modalities (Kahn and Krubitzer, 2002; Kujala et al., 2000). In humans, autopsies of enucleated children showed that early monocular enucleation obliterates the ocular dominance columns in the striate cortex (Horton and Hocking, 1998). Although the animal results are complex and extrapolation to humans is made difficult by both empirical and technical considerations, the anatomical and physiological changes found as a result of deprivation and changes in binocular competition suggest the possibility of psychophysical correlates in humans. In a variety of visual functions involving contrast-defined form (see Day, 1995, for a review), the remaining eye of the enucleated observers exhibits superior levels of visual function compared to the fellow or "non-deprived" eye of strabismic observers. These findings have been replicated in our laboratory for acuity (Gonzalez et al., 2002; Reed et al., 1996), optokinetic nystagmus (OKN) (Reed et al., 1991), and visual alignment (Reed et al., 1995). We have concluded that they are probably the result of the abnormal binocular interactions present in strabismus and other forms of form deprivation.] At least three kinds of processes may lie behind the superior performance of enucleated observers in certain visual tasks: (a) monocular practice over the years after enucleation, (b) recruitment of the resources normally assigned to the missing eye, and (c) the absence of binocular inhibitory interactions resulting from the removal of one eye. These three factors must be considered when evaluating the differences between enucleation and other forms of visual deprivation such as strabismic amblyopia, 'just as age at onset for strabismic children, age at enucleation is not a perfect estimate of the moment at which binocularity was interrupted. There could be a brief period of abnormal binocular competition, even in retinoblastoma children; however, the tumor grows so rapidly that retinal detachment occurs quickly in most cases. The child brought to clinic with a white pupil — the prime presenting symptom - has the eye enucleated usually within 48 hours.
Martin J. Steinbach and Esther G. Gonzalez
387
cataracts, and anisotropia which have been documented psychophysically as well as physiologically.
19.2.1
Form
Vernier acuity In our search for evidence of enhancement of visual function we tested medium-contrast vernier acuity in enucleated subjects and found that their thresholds were similar to those of monocularly viewing controls (Gonzalez et al., 2002; figure 19.1). Two other approaches also failed to find supernormal vernier acuity in one-eyed people. One was a study of identical twins, one with a congenital posterior subcapsular cataract (Johnson et al., 1982); the second a study of the peripheral and foveal acuity of the remaining eye of enucleated and of the fellow eye of amblyopic observers (Osuobeni, 1992). It is well documented that vernier acuity shows vast improvements with practice (McKee and Westheimer, 1978; Poggio et al., 1992), and it is possible that the superior foveal vernier acuity in enucleated observers found by Bradley and Freeman (1980) was confounded by the difficulty in controlling for this factor. Various spatial tasks such as vernier acuity, separation discrimination, and orientation discrimination are differentially affected by contrast reduction, and vernier acuity is the least robust (Westheimer et al., 1999). It is possible that only demanding visual tasks that involve detection and discrimination at low contrasts expose differences in visual abilities due to unilateral enucleation, so we decided to test our subjects in identification tasks at various levels of contrast. Letter acuity Foveal We examined foveal visual acuity using Regan's letter charts (Regan, 1988) at various contrast levels and found superior performance by the enucleated observers when compared with monocularly viewing controls and strabismic observers viewing with their unaffected eye (Reed et al., 1996). The acuity of the enucleated subjects was found to match the binocular thresholds of binocularly normal controls (Reed et al., 1997). Eccentric Differences in acuity between the two hemifields are larger in infants than in adults, and the temporal hemifield develops faster than the nasal visual field (Lewis and Maurer, 1992). For the enucleated subjects we found a small difference in favor of the temporal hemifield (Gonzalez et al., 2002; figure 19.1). This effect may be due to the complete absence of binocular competitive mechanisms during early visual development. Given that the temporal field develops earlier than the nasal field, enucleation may produce a stronger effect on the still-developing nasal field and the temporal field can achieve a higher level of maturity. The 7° eccentric acuity of the enucleated group was better than that of two groups of strabismic subjects (amblyopic and nonamblyopic) tested with their unaffected eye, even though all exhibited comparable foveal decimal acuity of 1.0 or better. The enucleated observers had better eccentric acuity than the binocularly normal controls at the
388
Visual Development with One Eye
Figure 19.1: Nasal and temporal (7°) acuity for enucleated and control observers at three contrast levels: 4.7%, 13.5%, and 96%. Errors are ± 1 SE. Adapted from Gonzalez et al. (2002). two lower contrast levels (4.7% and 13.5%) and similar acuity at high contrast (96%). Congruent with our previous findings, their foveal acuity was better than that of the controls viewing monocularly and comparable to their binocular performance at all contrast levels. The anatomical consequences of changes in binocular competition can explain the enhanced contrast-defined performance in favor of the remaining eye, but factors other than recruitment could also play a role. Neurons in primary and secondary cortical visual areas are binocular and exhibit an intracortical system of inhibitory interactions. Nicholas et al. (1996) found that the peak contrast sensitivity at 4 cyc/deg of the earlyenucleated subjects was greater than the binocular performance of the controls by a factor greater than \/2, the theoretical limit attainable if all the cortical cells were driven by the remaining eye (Campbell and Green, 1965). It is possible that the enucleated subjects' performance may be enhanced by the removal of the inhibitory binocular interactions known to underlie the tuning to retinal disparity (Poggio et al., 1988) and binocular rivalry (Fox, 1991; Mueller, 1990). It is also likely that in monocular viewing tests the performance of normally binocular subjects may be adversely affected by the binocular rivalry produced by the eye patch commonly used for such tests. Although this view contradicts Levelt's (1965a, b) proposition that a contourless stimulus cannot suppress a patterned one and itself remains suppressed indefinitely, it has received ample support from a number of studies (see Howard, 2002, for a review). We hypothesize that if differences in retinal illumination degrade monocular performance, a featureless field of equivalent brightness might improve it relative to a dark one, as was shown by Freeman and Jolly (1994). Alternatively, the superior monocular sensitivity of monocular observers could also be predicted by a model involving simple cortical pooling and a "winner-take-all rule" as proposed by McKee et al. (2003). We find that this model can explain some of
Martin J. Steinbach and Esther G. Gonzalez
389
our data but fails to predict the diminished acuity of nonamblyopic and amblyopic strabismic observers using their preferred eye. Radial frequency We are currently exploring the possibility that the plasticity resulting from enucleation may affect the performance of enucleated observers at a level of analysis above that of local orientation-tuned components or even local curvature. Sensitivity to small deviations from circularity has been shown to reach hyperacuity levels as small as 2-4 sec of arc (Wilkinson et al., 1998) with stimuli involving the global pooling of contour information. We are in the process of measuring the detection thresholds of enucleated and binocularly normal controls for Wilkinson et al.'s stimuli at different levels of contrast. Thresholds in this task seem to exhibit no discernible learning effects, and even subjects with no experience as psychophysical observers attain hyperacuity levels of performance (Wilkinson, personal communication). We hypothesize that the absence of binocular inhibitory interactions will improve the performance of enucleated observers relative to the controls tested monocularly in the detection and recognition of radial frequency patterns, particularly at low contrast, where the effects of binocular rivalry will be strongest. We are comparing the performance of enucleated observers with that of subjects with normal binocular vision tested binocularly and in two monocular conditions using a stereoscope. Our preliminary data (Steeves et al., 2001) show that the performance of normally binocular subjects is affected when the nonviewing eye sees a dark field, as is the case in most studies where an eye patch has been used. However, even though monocular performance improved when the fellow eye viewed an equilluminant featureless field, it did not reach the level of binocular viewing. The enucleated observers' thresholds were as good as the binocular ones. It is also possible that the eye patch over the nonviewing eye could result in consensual changes in pupil size which may degrade acuity (Horowitz, 1949). Although our testing conditions do not result in extremely small or large pupils, this possibility requires verification, which we are in the process of carrying out. Troxler fading The relation between Troxler fading (the disappearance of peripheral targets with prolonged central fixation) and binocular rivalry is unclear (Crovitz and Lockhead, 1967). Liu et al. (1992) proposed that it is particularly important at low-contrast and not at high-contrast levels. We found that enucleated observers exhibited times to fading comparable to those of binocular viewing for people with normal stereopsis. Monocular (i.e., patched) viewing produced the shortest times to fading (Gonzalez et al., 2003). For all, Troxler fading was a function of contrast but not of polarity. From our normally binocular subjects' phenomenological reports it appears as if Troxler fading and binocular rivalry could be independent. Subjects sometimes report the "black cloud" of the rivaling patched eye even at high contrasts but can experience Troxler fading in its absence. Whether related or not, apart from the effects of plasticity and recruitment,
390
Visual Development with One Eye
enucleated subjects have the advantage of no binocular rivalry and longer times to fading which are particularly advantageous for low-contrast stimuli. Furthermore, our pattern of results cannot be explained by local adaptation alone. Like binocular rivalry, Troxler fading seems to occur at more than one level (Blake and Logothetis, 2002; Ooi and He, 2003), is subject to the effects of voluntary attention (Lou, 1999; Mennemeier et al., 1994), and can be affected by the intracortical system of inhibitory binocular interactions (Lee and Blake, 2002). Texture-defined form The perception of form denned by attributes other than contrast does not seem to be affected by enucleation. Unilaterally enucleated observers had similar thresholds in texture-defined form (Regan and Hong, 1994) as normal controls viewing monocularly (Sleeves et al., 1998).
19.2.2 Motion Motion coherence We found that the absence of binocular competition during the development of the motion-processing pathways appears to disrupt the perception of motion (Steeves et al., 2002). In a motion-coherence task we found that for the enucleated group, temporalward motion coherence thresholds were significantly higher than nasalward thresholds. Binocularly normal controls showed no such asymmetry when viewing either monocularly or binocularly. These findings are consistent with data derived from subjects with weak or absent stereopsis (see Tychsen, 1993, for a review).
Motion-defined form As in the motion-coherence task and as a group, enucleated observers had significantly poorer performance in Regan and Hong's (1990) motiondefined task did than the binocularly viewing controls (Steeves et al., 2002). While a few were equivalent or even better than the controls, these data are consistent with the finding that motion-defined form has a longer developmental time course than luminance-defined form perception (Giaschi and Regan, 1997), and the cessation of binocularity will thus affect them differently Shear sensitivity A study of relative motion discrimination (Bowns et al., 1994) found that early monocularly enucleated adults and binocularly normal age-matched controls have similar thresholds for detecting relative motion but exhibit opposite biases in the perceived velocity of stimuli in the upper and lower hemifields. Using a textured surface with a discontinuity between the upper and lower halves caused by moving them at different speeds but in the same direction, controls were more likely to judge the top section as faster and enucleated observers judged the bottom section as faster. Bowns et al. hypothesized that if enucleated and other subjects with weak stereopsis use motion parallax (a system used for far space) as a substitute for stereoscopic information (a system used for near space), a reversal of the normal bias (Previc, 1990; Skrandies, 1987) could occur.
Martin J. Steinbach and Esther G. Gonzalez
391
19.2.3 Monocular Practice Even though monocular practice may be an important component of the superior visual performance of enucleated observers (see the section on vernier acuity above), age at enucleation rather than years since enucleation is a better predictor of visual performance. When comparing two early- (under 2 years or age) and one late-enucleated patients (in adulthood) with two normal controls we found an advantage for the early enucleated observer in learning to identify motion-defined letters (Gonzalez et al., 1998). In fact, the learning rate of the early enucleated observers was higher than that of the controls and the late-enucleated subject. Our finding agrees with the data of Nicholas et al. (1996), who also found a significant difference in contrast sensitivity in favor of the early- over the late-enucleated group.
19.2.4 Optokinetic Nystagmus (OKN) Reed et al. (1991) found that 63% of early-enucleated observers had small but significant asymmetries of OKN, favoring nasally directed motion in the visual field. This tendency resembles the well-known nasalward preference for optokinetic stimuli seen in infants (e.g., Atkinson and Braddick, 1981; Naegele and Held, 1983). These data suggest that the infant cortex may have asymmetric motion processing that results in asymmetrical VEP and OKN responses. Furthermore, binocular input during early visual development including normal levels of binocular competition may be necessary for the establishment of symmetrical pathways for nasotemporal motion processing. A stronger nasalward asymmetry in the strabismic observers was replicated by Sleeves et al. (1999), emphasizing the deleterious effect of abnormal binocular input. Our data from enucleated children show a "double dissociation": enhancement of luminance-defined form processing but impaired motion processing.
19.2.5 Time to Collision Sleeves et al. (2000) found that unilaterally enucleated observers cannot estimate time to collision of an approaching object based on the monocular cue 6/(dO/dt) (Hoyle, 1957) better than the controls, and some are actually worse. Five out of six subjects relied on the stimulus's starting size, which was task irrelevant in Ihis study but which, in the real world where objects have familiar sizes, is of significant use. The authors concluded thai enucleated observers learn lo use as many optical variables as possible to compensate for the lack of binocular information. Loss of binocularity and its disruption of motion processing may also be a contributing factor.
19.3
Depth
We were also interested in finding out whether enucleated observers showed any superiority in the use of monocular cues for depth and tested them in a test modified from the standard Howard Dolman depth test in which the only cue for depth was motion parallax. Much to our surprise, we found that enucleated children do not spontaneously
392
Visua] Development with One Eye
move their head laterally when determining depth (Gonzalez et al. ,1989) and suggested that they be trained to do so in early childhood. Our findings stand in contrast with those of Marrotta et al. (1995), who found larger and faster head movements in enucleated subjects. A significant age gap between our young (mean age = 12 yrs) and their older (mean age = 32.4 yrs) subjects is the likely explanation for this discrepancy. Consistent with this is the fact that the proportion of self-generated lateral and vertical head movements versus forward head movements increases as a function of postenucleation time (Marotta et al., 1995). This suggests that enucleated people learn to increase the proportion of lateral and vertical head movements while reducing the forward movements that produce less helpful information for estimating depth (Marrotta et al., 1995; Simpson, 1993).
19.4
Egocenter: Role of Binocular Experience
We have two eyes yet experience a singular view of the world. It has been known since the second century A.D. (Ptolemy, cited in Howard, 2002, pp. 47-52) that we behave as though there were a single origin for our visual experience and orientation in space, this origin being called the egocenter or cyclopean eye (see also Ono and Mapp, 1995). Figure 19.2 shows how to demonstrate the cyclopean percept where lines falling along the visual axes of the two eyes appear to hit the observer "right between the eyes," i.e., in the egocenter. The signals from the two eyes are integrated and projected to this egocenter without the observer having any conscious eye of origin information (Steinbach et al., 1985). You can see a demonstration by drawing lines on a piece of cardboard, as shown in demonstration 2 on the accompanying CD (see Howard and Rogers (1995), p. 595, for additional explanations).
19.4.1 The Visual Direction of Objects Alignment to the Head The laws of visual direction have other consequences (see figures 19.2 and 19.3). In the dark, a horizontal luminescent stick viewed from about 2 or 3 cm below the eyes is aligned to the head's landmarks - ears and nose - following the predictions of the laws of visual direction for both enucleated and patched normal subjects. A luminescent dot looming towards the observer - and seen at the intersection of the visual axes - showed smaller errors when aligned to the same landmarks. Alignment to the Body Dengis et al. (1995) found that when binocularly normal and strabismic observers use both hands in everyday tasks like threading a needle or looking down a ruler, they align the objects with their midline if they use both eyes and, like the enucleated observers, with their viewing eye if viewing monocularly.
19.4.2 Hints from a Birthday Party We were alerted to the possibility of a plastic egocenter in the enucleated children when an observant parent noticed that her five-year-old daughter could not blow out a candle because she aligned the candle with her remaining eye, putting her mouth
Martin J. Steinbach and Esther G. Gonzalez
393
Figure 19.2: Cyclopean direction. The tree and the chimney are seen in the same visual direction when the observer fixates on the fixation point on the glass. Adapted from Howard and Rogers' (1995, p. 595) illustration of Bering's (1942/1879) demonstration of the laws of headcentric binocular direction.
Figure 19.3: For a two-eyed observer with a central egocenter, actual and apparent locations of points lying along the visual axis, of the viewing eye. All points appear on the common axis and only the point at the intersection of the visual axes - the fixation point - is seen in its veridical location. This diagram assumes that the distances of the points are correctly perceived. Adapted from Ono and Barbeito (1982).
394
Visua] Development with One Eye
Figure 19.4: Cyclopean responses over the first five years of life decline in normals, strabismics, and enucleated patients at the same rate. Adapted from Dengis et al. (1993). to one side of the candle, producing puffs of air that were futile for extinguishing the flame. It appeared as if her midline orienting system had shifted to her remaining eye. We decided to test this observation more formally, using the Roelofs (1959) method of determining egocenter location, but adapted to become a child-friendly task. Children were asked to align a nonvisible rod, using their hands, with a visible "fireman's hose" that was aligned with the visual axis of their remaining eye. Binocular children with one eye covered aligned the rod with their midline. The monocular children we tested (five years of age and older) aligned the rod close (75%) to the position of the remaining eye (Moidellet al., 1988).
19.4.3 The "Cyclops Effect" When we described these results to others in the York vision group, a former graduate student of Hiroshi Ono's, Ralph Barbeito, asked us why we did not see the consequences of what he referred to as the "cyclops effect." He had confirmed a clinical observation that young children will place a tube in the middle of the head when asked to "look through the tube." He tested children in a nursery school and frequently found a cyclops effect in children aged 3-4 years (Barbeito, 1983). We wondered about the roles of age and experience because most adults look through a tube by effortlessly placing it over one eye. How does this skill develop? What role does having normal binocular vision have? Would one-eyed children, or other children with early atypical binocular experience (e.g., those with strabismus), not show a cyclops effect?
Martin J. Steinbach and Esther G. Gonzalez
395
Figure 19.5: Shift of the egocenter in the direction of the remaining eye in agreement with Bering's (1942/1879) prediction.
19.4.4 The Egocenter is "Built In" Carol Dengis, for her dissertation research, went out into the field (a la Jane Goodall) and tested young children in their homes, looking at the consequences of having a midline orienting system. Using the "look through a tube" technique she found that the egocenter was essentially built in, i.e., present in the youngest children she could test (1.1 years). More astounding, the presence of normal binocular experience was not necessary for the cyclops response. She found a declining trend from two to five years in the numbers of cyclopean responses produced by children, irrespective of whether they were binocular normals, had only one eye from a very early (e.g., 6 months) age, or had early-onset strabismus (figure 19.4). The rate of decline was the same for all three groups as well.
19.4.5 The Egocenter is not so "Plastic" Even though the position of the egocenter can shift in totally monocular young children (figure 19.5), its location is surprisingly robust in those with some remaining binocular vision. It was unchanged in strabismic subjects and unaltered by strabismus surgery (Dengis et al., 1993). This robustness was maintained even though there were shifts in localizing targets in space (measured by open-loop pointing responses) made following this type of surgery (Steinbach et al., 1988). In normal binocular adults who are monocularly patched for a one month period, there was virtually no change in egocenter location (Dengis et al., 1992). We have some unpublished observations in adults who underwent monocular enucleation after a lifetime of normal binocular vision which show limited plasticity in egocenter location.
396
Visual Development with One Eye
Figure 19.6: From left to right and beginning at the top middle, cyclops effect, incomplete cyclops, preadult, and adult. Adapted from Dengis et al. (1996).
19.4.6 Learning to Perform Monocular Tasks If this cyclopean eye is built in, how does one overcome it and learn to use one eye when performing monocular tasks, like looking through a telescope or microscope? Carol Dengis looked at this in normal children who were tested with a variety of tubes (they differed in shape and length) and observed a series of behavioral strategies that lead ultimately to adult performance (figure 19.6). Children went from performing the cyclops response to performing what she referred to as a "partial cyclops" response, i.e., placing the tube between the middle of the head and the preferred eye. The third stage was "preadult," where the tube was placed over the preferred eye but this was accomplished with the other eye left open and with the head turned to bring the viewing eye closer to the body midline (this also had the effect of reducing any double vision because the tube obscured the line of sight from the nonpreferred eye). The final, or "adult" stage occurred when the tube was brought to the eye with the head held facing forward and the other eye closed, prior to looking through the tube with the preferred eye. The ability to wink voluntarily follows a similar developmental trend (Dengis et al., 1997). For children who are strabismic, winking skills develop somewhat later than those with normal binocular vision, presumably reflecting a lessened need to avoid diplopia when performing some monocular tasks. That is, their amblyopia, and/or their suppression patterns, would ensure they were less troubled by the double vision that would be present in the binocular normal children. We also looked at the head turn (sometimes called "face turn" in the clinical literature) that frequently occurs in one-eyed children (Goltz et al., 1997). We found that it almost always occurred with the head turned so as to bring the remaining eye closer to the midline, i.e., towards the missing eye. This would have a beneficial effect of also
Martin J. Steinbach and Esther G. Gonzalez
397
Figure 19.7: The mean distance from midline to viewing eye for control adults and children, strabismic and enucleated children. Adapted from Dengis, Simpson, Steinbach and Ono (1998). moving the nose out of the way of the lower visual field. Other authors have described a head turn in the opposite direction (e.g., Helveston et al., 1985) but these instances are associated with the presence of an abduction nystagmus, which has a null point (a position of the eye in the orbit where the nystagmus dampens) which our patients virtually never exhibited.
19.4.7 Performance on a Monocula Task sans Feedback Every monocular task we perform will include immediate feedback about its success. If we try to look through a microscope, we know from tactile and visual feedback whether or not the preferred eye is aligned with the eye piece. What happens if we prevent this feedback? Dengis et al. (1998) gave adults and children a tube and instructed them to look through it. Placed directly in front of the subject's face was a liquid-crystal shutter. As soon as the subject started moving the tube to the eye the shutter became opaque, preventing any visual feedback. At the same time, the glass plate of the shutter prevented any tactile feedback about where on the face the tube might have touched. The results were surprising: the tube was placed at the midline by adults and children with normal binocular visual development as well as by those who were strabismic. The enucleated subjects all placed the tube over their remaining eye (figure 19.7). These results suggest that our orienting responses, when moving ourselves through space, use the midline egocenter as the origin from which we judge direction. Only when we are forced into a monocular task, and have feedback about how we perform that task, do we use a learned pattern of responses developed with a preferred eye. The person who lost an eye at an early age will have that origin for visual direction moved towards the remaining eye. One additional lesson coming from our studies is the need
Visual Development with One Eye
398
Figure 19.8: Dynamic path condition: estimated error (in cm) in aligning a looming dot to the nose and ears. Fixed Path condition: Estimated error (in cm) in aligning a rod to the nose and ear (errors are ±1 SE). Bold characters show actual mean distances in cm. Adapted from Gonzalez et al. (1999).
to study orienting behavior using more natural tasks. We have tried this in the past with some success (Dengis et al., 1995; Gonzalez et al., 1999, see figure 19.8), and others are appreciating its value as well (e.g., Steinman, 2003).
19.5
Conclusions
David Martin Regan was not the first person to study abnormal visual systems in order to learn about normal functions, but he provides an excellent exemplar of the value of doing so. We have followed in this tradition, learning about plasticity in the developing child who undergoes the removal of an eye. Comparing the results of this complete form of deprivation to that resulting from strabismus, cataract, or anisometropia can provide insights about the role of binocular competition in normal development. What we learn about sensitive or critical periods for the development of different visual functions can have implications for the timing of treatments. The development of novel ways of testing young children can provide baseline information on what typical development is, as well as giving us techniques for rehabilitating visual loss in those instances where some plasticity remains. We were pleased to be part of a symposium honoring Professor Regan and his contributions to our understanding of human vision in health and disease.
Martin J. Steinbach and Esther G. Gonzalez
399
Acknowledgments We acknowledge the tremendous debt we owe to the children with retinoblastoma, and their families, all of whom unfailingly agreed to participate in our studies. They cheerfully extended the time they spent in the hospital, and they allowed us to come into their homes so that we could take our measurements. Despite the stresses associated with their condition, they willingly gave of their time and energy to take part in scientific activities unrelated to their treatment. They are a very special and generous group of people. We are very grateful to Linda Lillakas for her comments and editorial assistance. Support for these studies has come from the Medical Research Council of Canada, the Natural Sciences and Engineering Research Council, the National Eye Institute of the U. S. National Institutes of Health, the Hospital for Sick Children Foundation, the Sir Jules Thorn Charitable Trust, the Krembil Family Foundation, Atkinson College, and the Vision Sciences Research Program at the Toronto Western Hospital.
References Atkinson, J. and Braddick, O. (1981). Development of optokinetic nystagmus in infants: an indicator of cortical binocularity? In D. F. Fisher, R. A. Monty, and J. W. Senders (Eds.), Eye Movements: Cognition and Visual Perception, pp. 53-64. Erlbaum: Hillsdale, NJ. Barbeito, R. (1983). Sighting from the cyclopean eye: The cyclops effect in preschool children. Percept. Psychophys. 33: 561-564. Bowns, L., Kirshner, E. L. and Steinbach, M. J. (1994). Shear sensitivity in normal and monocularly enucleated adults. Vis, Res., 34: 3389-3395. Blake, R. and Logothetis, N. K. (2002). Visual competition. Nature Rev. Neurosci., 3: 1-11. Bradley, A. and Freeman, R. D. (1980). Monocularly deprived humans: Non-deprived eye has supernormal vernier acuity. J. Neurophysiol., 43: 1645—1653. Campbell, F. W. and Green, D. G. (1965). Optical and retinal factors affecting visual resolution. J. Physiol. (Land.), 181: 576-593. Crovitz, H. F. and Lockhead, G. R. (1967). Possible monocular predictors of binocular rivalry of contours. Percept. Psychophys., 2: 83-85. Daw, N. W. (1995). Visual Development. Plenum: New York. Day, S. (1995). Vision development in the monocular individual: Implications for the mechanisms of normal binocular vision development and the treatment of infantile esotropia. Trans. Am. Ophthalmol. Soc., 97: 523-581. Dengis, C. A., Simpson, T., Steinbach, M. J. and Ono, H. (1998). The cyclops effect in adults: Sighting without visual feedback. Vis. Res.,3&: 327-331. Dengis, C. A., Steinbach, M. J., Goltz, H. C. and Stager, C. (1993). Visual alignment from the midline: A declining developmental trend in normal, strabismic and
400
Visual Development with One Eye monocularly enucleated children. J. Ped. Ophthalmol. and Strabismus, 30: 323-326.
Dengis, C. A., Steinbach, M. J. and Kraft, S. P. (1992). Monocular occlusion for one month: Lack of effect on a variety of visual functions in normal adults. Invest. Ophthalmol. Vis. Sci. Suppl., 33: 1154. Dengis, C. A., Steinbach, M. J., Ono, H. and Gunther, L. (1997). Learning to wink voluntarily and to master monocular tasks: A comparison of normal vs. strabismic children. Binoc. Vis., 12: 113-118. Dengis, C. A., Steinbach, M. J., Ono, H., Gunther, L. N., Fanfarillo, R., Sleeves, J. K. E. and Postiglione, S. (1996). Learning to look with one eye: The use of head turn by normals and strabismics. Vis. Res., 36: 3237-3242. Dengis, C. A., Steinbach, M. J., Ono, H., Gunther, L. N. and Postiglione, S. (1995). Eye-hand coordination tasks in normals, strabismics and enucleates. Invest. Ophthalmol. Vis. Sci. Suppl., 36: S645. Dengis, C. A., Steinbach, M. J., Ono, H., Kraft, S. P., Smith, D. R. and Graham, J. E. (1993). Egocentre location in strabismics is in the median plane and is unchanged by surgery. Invest. Ophthalmol. Vis. Sci., 34: 2990-2995. DiCiommo, D., Gallic, B. L. and Bremner, R. (2000). Retinoblastoma: the disease, gene and protein provide critical leads to understand cancer. Semin. Cancer Biol., 10: 255-269. Fox, R. (1991). Binocular rivalry. In D. Regan (Ed.), Vision and Visual Dysfunction, Vol. IX, Binocular vision, pp. 93-110. CRC Press: Boca Raton, FL. Freeman, A. W. and Jolly, N. (1994). Visual loss during interocular suppression in normal and strabismic subjects. Vis. Res., 34: 2043-2050. Giaschi, D. and Regan, D. (1997). Development of motion-defined figure-ground segregation in pre-school and older children, using a letter-identification task. Optorn. Vis. Sci., 74: 761-767. Gilbert, C. D. and Wiesel, T. N. (1992). Receptive field dynamics in adult primary visual cortex. Nature, 356: 150-152. Goltz, H. C., Steinbach, M. J. and Gallic, B. L. (1997). Head turn in 1-eyed and normally sighted individuals during monocular viewing. Arch. Ophthalmol., 115: 748-750. Gonzalez, E. G., Steinbach , M. J., Ono, H. and Gallic, B. L. (1999). Localization of facial landmarks in binocular and monocular children. Binoc. Vis., 14: 127-136. Gonzalez, E. G., Sleeves, J. K. E., Kraft, S. P., Gallic, B. L. and Steinbach, M. J. (2002). Foveal and eccentric acuity in one-eyed observers. Behav. Brain Res., 128: 71-80. Gonzalez, E. G., Sleeves, J. K. E. and Steinbach, M. J. (1998). Perceptual learning for motion-defined letters in unilaterally enucleated observers and monocularly viewing normal controls. Invest. Ophthalmol. Vis. Sci. Suppl., 39: S400.
Martin J. Steinbach and Esther G. Gonzalez
401
Gonzalez, E. G., Steinbach, M. J., Ono, H. and Rush-Smith, N. (1992). Vernier acuity in monocular and binocular children. Clin. Vis. Sci.,7: 257-261. Gonzalez, E. G., Steinbach, M. J., Ono, H. and Wolf, M. (1989). Depth perception in humans enucleated at an early age, Clin. Vis. Sci., 4: 173-177. Gonzalez, E. G., Weinstock, M. and Steinbach, M. J. (2003). Monocular observers resist peripheral target (Troxler) fading. Invest. Ophthalmol. Vis. Sci., 44: E4815. Guillery, R. W. (1989). Competition in the development of the visual pathways. In J. G. Parnavelas, C. D. Stern, and R. V. Stirling (Eds.), The Making of the Nervous System, pp. 319-339. Oxford University Press: Oxford. Helveston, E. M., Pinchoff, B., Ellis, F. D. and Miller, K. (1985). Unilateral esotropia after enucleation in infancy. Am. J. Ophthalmol., 100: 96-99. Bering, E. (1942). Spatial sense and movements of the eye. Am. Acad. Optom. Bait., p. 38. Radde CA translation. Original work published 1879. Horowitz, M. W. (1949). An analysis of the superiority of binocular over monocular visual acuity. J. Exp. Psychol., 39: 581-596. Horton, J. C. and Hocking, D. R. (1998). Effect of early monocular enucleation upon ocular dominance columns and cytochrome oxidase activity in monkey and human visual cortex. Vis. NeuroscL, 15: 289-303. Howard, I. P. (2002). Seeing in Depth., Vol. 1. Porteus Press: Thornhill, ON. Howard, I. P. and Rogers, B. J. (1995). Binocular Vision and Stereopsis. Oxford University Press: New York. Hoyle, F. (1957). The Black Cloud, pp. 26-27. Penguin: London. Hubel, D. H. and Wiesel, T. N. (1962). Receptive fields binocular interaction and functional architecture in the cat's visual cortex. J. Physiol. (Land.), 160: 106154. Hubel, D. H., Wiesel, T. N. and and LeVay, S. (1977). Plasticity of ocular dominance columns in monkey striate cortex. Phil. Trans. Roy. Soc. Land. B, 278: 377409. Johnson, C. A., Post, R. B., Chalupa, L. M. and Lee, T. J. (1982). Monocular deprivation in humans: A study of identical twins. Invest. Ophthal. Vis. Sci., 23: 135-138. Kahn, D. M. and Krubitzer, L. (2002). Massive cross-modal cortical plasticity and the emergence of a new cortical area in developmentally blind mammals. Proc. Nat. Acad. Sci. USA, 99: 11429-11434. Kiorpes, L. and Movshon, S. P. (2003). Neural limitations on visual development in primates. In L. M. Chalupa and J. S. Werner (Eds.), pp. 159-188. The Visual Neurosciences. MIT Press: Cambridge, MA. Kratz, K. E. and Spear, P. D. (1976). Effects of visual deprivation and alterations in binocular competition on responses of striate cortex neurons in the cat. /. Comp. Neural, 170: 141-152.
402
Visual Development with One Eye
Kujala, T., Alho, K. and Naatanen, R. (2000). Cross-modal reorganization of human cortical functions. Trends Neurosci., 23: 115-120. Lee, S. H. and Blake, R. (2002). VI activity is reduced during binocular rivalry. J. Vis., 2: 618-626. Levelt, W. J. M. (1965a). Binocular brightness averaging and contour information. Br. J.Psychol.,56: 1-13. Levelt, W. J. M. (1965b). On Binocular Rivalry. Institute for Perception: Soesterberg, The Netherlands. Lewis, T. and Maurer, D. (1992). The development of the temporal and nasal visual fields during infancy. Vis. Res., 32: 903-911. Liu, L., Tyler, C. W. and Schor, C. M. (1992). Failure of rivalry at low contrast: Evidence of a suprathreshold binocular summation process. Vis. Res., 32: 1471— 1479. Lou, L. (1999) Selective peripheral fading: Evidence for inhibitory sensory effect of attention. Percep., 28: 519-526. Marotta, J. J., Perrot, T. S., Nicolle, D. and Goodale, M. A. (1995). The development of adaptive head movements following enucleation. Eye, 9: 333-336. Marotta, J. J., Perrot, T. S., Nicolle, D., Servos, P. and Goodale, M. A. (1995). Adapting to monocular vision: grasping with one eye. Exp. Brain Res., 104: 107-114. McKee, S. P., Levi, D. M. and Movshon, J. A. (2003). The pattern of visual deficits in amblyopia. J. Vis., 3: 380-405. McKee, S. P. and Westheimer, G. (1978). Improvement in vernier acuity with practice. Percept. Psychophys., 24: 258-262. Mennemeier, M. S., Chatterjee, A., Watson, R. T., Wertman, E., Carter, L. P. and Heilman, K. M. (1994). Contributions of the parietal and frontal lobes to sustained attention and habituation. Neuropsychologia, 32: 703-716. Mitchell.D. E. andTimney,B. (1984). Postnatal development of function in the mammalian visual system. In Handbook of Physiology Section I: The Nervous System, Vol. 3 Part I, Sensory Processes, pp. 507-555. American Physiological Society: Bethesda, MD. Moidell, B., Steinbach, M. J. and Ono, H. (1988). Egocenter location in children enucleated at an early age. Invest. Ophthalmol. Vis. Sci., 29: 1348-1351. Mueller, T. J. (1990). A physiological model of binocular rivalry. Vis. Neurosci., 4: 63-73. Naegele, J. and Held, R. (1983). Development of optokinetic nystagmus and effects of abnormal visual experience during infancy. In A. M. Jeannerod (Ed.) Spatially Oriented Behavior, pp. 155-174. Springer Verlag: New York. Nicholas, J., Heywood, C. A. and Cowey, A. (1996). Contrast sensitivity in one-eyed subjects. Vis. Res., 26: 175-180. Ooi, T. L. and He, Z. J. (2003). A distributed intercortical processing of binocular rivalry: psychophysical evidence. Percept.,32: 155-166.
Martin J. Steinbach and Esther G. Gonzalez
403
Ono, H. and Barbeito, R. (1982). The cyclopean vs. the sighting-dominant eye as the centre of visual direction. Percept. Psychophys., 32: 201-210. Ono, H. and Mapp, A. P. (1995). A restatement and modification of Wells-Hering's laws of visual direction. Percept., 24: 237-252. Ono, H., Mapp, A. P. and Howard, I. P. (2002). The cyclopean eye in vision: The new and old data continue to hit you right between the eyes. Vis. Res., 42: 13071324. Osuobeni, E. P. (1992). Monocular vernier acuity in normally binocular, monocular, and amblyopic subjects. Optom. Vis. ScL, 69: 550-555. Poggio, G. R, Fahle, M. and Edelman, S. (1992). Fast perceptual learning in visual hyperacuity. Science, 256: 1018-1021. Poggio, G. F., Gonzalez, F. and Krause, F. (1988). Stereoscopic mechanisms in monkey visual cortex: Binocular correlation and disparity selectivity. J. Neurosci., 8: 4531-4550. Previc, F. H. (1990), Functional specialization in the lower and upper visual fields in humans: Its ecological origins and neurophysiological implications. Behav. Brain ScL, 13:519-575. Reed, M. Steeves, J. K. E., Kraft, S. P., Gallic, B. L. and Steinbach, M. J. (1996). Contrast letter thresholds in the non-affected eye of strabismic and unilateral eye enucleated children. Vis. Res., 36: 3011-3018. Reed, M. J., Steeves, J. K. E. and Steinbach, M. J. (1997). A comparison of contrast letter thresholds in unilateral eye enucleated subjects and binocular and monocular control subjects. Vis. Res., 37: 2465-2469. Reed, M. J., Steinbach, M. J., Anstis, S. M., Gallic, B. L., Smith, D. R. and Kraft, S. P. (1991). The development of optokinetic nystagmus in strabismic and monocularly enucleated subjects. Behav. Brain Res., 46: 31-42. Reed, M. J., Steinbach, M. J., Ono, H., Kraft, S. and Gallic, B. L. (1995). Alignment ability in strabismic and eye enucleated subjects on the horizontal and oblique meridians. Vis. Res., 35: 2523-2528. Regan, D. (1988). Low-contrast visual acuity test for pediatric use. Can. J. Ophthalmol., 23: 224. Regan, D. and Hong, X. H. (1990). Visual acuity for optotypes made visible by relative motion. Optom. Vis. ScL, 67: 49-55. Regan, D. and Hong, X. H. (1994). Recognition and detection of texture-defined letters. Vis. Res., 34: 2403-2407. Richter, S., Vandezande, K., Chen, N., Zhang, K., Sutherland, J., Anderson, J., Han, L., Panton, R., Branco, P. and Gallic, B. L. (2003). Sensitive and efficient detection of RBI gene mutations enhances care for families with retinoblastoma. Am. J. Hum. Genet., 72: 253-269. Roelofs, C. O. (1959). Considerations on the visual egocenter. Acta Psychologica, 16: 226-234.
404
Visual Development with One Eye
Schmid, L. M, Rosa, M. G. P. and Calford, M. B. (1995). Retinal detachment induces massive immediate reorganization in visual cortex. Neurol. Report, 6: 1349— 1353. Simpson, W. A. (1993). Optic flow and depth perception. Spatial Vis., 1: 35-75. Skrandies, W. (1987). The upper and lower visual field of man: Electrophysiological and functional differences. In D. Ottoson (Ed.), Progress in Sensory Physiology, pp. 1-92. Springer: Berlin. Steeves, J. K. E., Gonzalez, E. G., Gallic, B. L. and Steinbach, M. J. (2002). Early unilateral enucleation disrupts motion processing. Vis. Res., 42: 143-150. Steeves, J. K. E., Gonzalez, E. G., Steinbach, M. J. and Gallic, B. L. (1998). Detection and recognition of texture and motion-defined letters in unilaterally enucleated observers and monocularly andbinocularly viewing normal controls. Invest. Ophthalmol. Vis. Sci. Suppl. 39: S402. Steeves, J. K. E., Gray, R., Steinbach, M. J. and Regan, D. (2000). Accuracy of estimating time to collision using only monocular information in unilaterally enucleated observers and monocularly visewing normal controls. Vis. Res., 40: 3783-3789. Steeves, J. K. E., Reed, M. J., Steinbach, M. J. and Kraft, S. P. (1999). Monocular horizontal optokinetic nystagmus in observers with early- and late-onset strabismus. Behav. Brain Res., 103: 135-143. Steeves, J. K. E., Wilkinson, F., Gonzalez, E. G., Wilson, H. R. and Steinbach, M. J. (2001). Sensitivity to radial frequency patterns at reduced contrast in unilaterally enucleated observers and monocularly and binocularly-viewing controls. Invest. Ophthalmol. Vis. Sci. Suppl., 42: S385. Steinbach, M. J., Howard, I. P. and Ono, H. (1985). Monocular asymmetries in vision: We don't see eye-to-eye. Can. J. Psychol., 39: 476-478. Steinbach, M. J., Smith, D. R. and Crawford, J. S. (1988). Egocentric localization changes following unilateral strabismus surgery. /. Pediatric Ophthalmol. Strabismus, 25: 115-118. Steinman, R. M. (2003). Gaze control under natural conditions. In L. M. Chalupa and J. S. Werner (Eds.), The Visual Neurosciences. pp. 1339-1356. MIT Press: Cambridge, MA. Tychsen, L. (1993). Motion sensitivity and the origins of infantile strabismus. In K. Simons (Ed.), Early Visual Development: Normal and Abnormal, pp. 364-390. Oxford University Press: New York. von Noorden, G. K. and Campos, E. C. (2002). Binocular Vision and Ocular Motility. Theory and Management of 'Strabismus, 6th ed. Mosby: St. Louis. Westheimer, G., Brincat, S. and Wehrhahn, C. (1999). Contrast dependency of foveal spatial functions: Orientation, vernier, separation, blur and displacement discrimination and the tilt and Poggendorf illusions. Vis. Res., 39: 1631-1639. Wilkinson, E, Wilson, H. R. and Habak, C. (1998). Detection and recognition of radial frequency patterns. Vis. Res., 38: 3555-3568.
A. Appendix: Selected Publications of David Regan
Books Regan, D. (2000). Human Perception of Objects: Early Visual Processing of Spatial Form Defined by Luminance, Color, Texture, Motion, and Binocular Disparity. Sinaeur, Sunderland, MA. Regan, D. (1989). Human Brain Electrophysiology: Evoked Potentials and Evoked Magnetic Fields in Science and Medicine. New York: Elsevier. Regan, D. (1972). Evoked Potentials in Psychology, Sensory Physiology and Clinical Medicine. London: Chapman and Hall; New York: Wiley. Reprinted 1975. Regan, D., Shapley, R. M. and Spekreijse, H. (Eds.) (1985). Systems Approach in Vision. New York: Pergamon. 405
406
Selected Publications of David Regan
Regan, D. (Ed) (1991). Binocular Vision. Vision and Visual Dysfunction series Vol. 9. London: Macmillan. Regan, D. (Ed) (1991). Spatial Vision. Vision and Visual Dysfunction series Vol. 10. London: Macmillan. Binocular and Monocular Information about Motion in Depth and Time to Collision Regan, D. and Beverley, K. I. (1973). Disparity detectors in human depth perception: Evidence for directional selectivity. Science 18: 877-879. Regan, D. and Beverley, K. I. (1973). Some dynamic features of depth perception. Vision Res., 13:2369-2379. Regan, D. and Beverley, K. I. (1973). The dissociation of sideways movements from movements in depth: Psychophysics. Vision Res., 13: 2403-2415. Beverley, K. I. and Regan, D. (1973). Evidence for the existence of neural mechanisms selectively sensitive to the direction of movement in space. J. Physiol. (Land.), 235: 17-29. Richards, W. and Regan, D. (1973). A stereo field map with implications for disparity processing. Invest. Ophthalmol., 12: 904-909. Beverley, K. I. and Regan, D. (1974). Temporal integration of disparity information in stereoscopic perception. Exp. Brain Res., 19: 228-232. Beverley, K. I. and Regan, D. (1974). Visual sensitivity to disparity pulses: Evidence for directional selectivity. Vision Res., 14: 357-361. Beverley, K. I. and Regan, D. (1975). The relation between discrimination and sensitivity in the perception of motion in depth. /. Physiol. (Lond.), 249: 387-398. Regan, D. and Beverley, K. I. (1978). Looming detectors in the human visual pathway. Vision Res., 18:415^121. Regan, D. and Beverley, K. I. (1978). Illusory motion in depth: Aftereffect of adaptation to changing size. Vision Res., 18: 209-212. Regan, D., Beverley, K. I. and Cynader, M. (1979). Stereoscopic subsystems for position in depth and for motion in depth. Proc. Roy. Soc. Lond. B, 204: 485-501. Regan, D. and Beverley, K. I. (1979). Visually-guided locomotion: Psychophysical evidence for a neural mechanism sensitive to flow patterns. Science, 205: 311313. Beverley, K. I. and Regan, D. (1979). Separable aftereffects of changing-size and motion-in-depth: Different neural mechanisms? Vision Res., 19: 727-732. Beverley, K. I. and Regan, D. (1979). Visual perception of changing-size: The effect of object size. Vision Res., 19: 1093-1104. Regan, D. and Beverley, K. I. (1979). Binocular and monocular stimuli for motionin-depth: Changing-disparity and changing-size feed the same motion-in-depth stage. Vision Res., 19,1331-1342.
Selected Publications of David Regan
407
Regan, D., Beverley, K. I. and Cynader, M. (1979). The visual perception of motion in depth. Sci. Am., 241: 136-151. Beverley, K. I. and Regan, D. (1980). Visual sensitivity to the shape and size of a moving object: Implications for models of object perception. Perception, 9: 151-160. Regan, D. and Beverley, K. I. (1980). Visual responses to changing size and to sideways motion for different directions of motion in depth: Linearization of visual responses. J. Opt. Soc. Am., 11: 1289-1296. Beverley, K. I. and Regan, D. (1980). Temporal selectivity of changing-size channels. /. Opt. Soc. Am., 11: 1375-1377. Regan, D. and Beverley, K.I.(1981). Motion sensitivity measured by a psychophysical linearizing technique. /. Opt. Soc. Am., 71: 958-965. Petersik, J. T., Beverley, K. I. and Regan, D. (1981). Contrast sensitivity of the changingsize channel. Vision Res., 21,829-832. Beverley, K. I. and Regan, D. (1982). Adaptation to incomplete flow patterns: No evidence for "filling in" the perception of flow patterns. Perception, 11: 275278. Regan, D. and Beverley, K. I. (1982). How do we avoid confounding the direction we are looking with the direction we are moving? Science, 215: 194-196. Beverley, K. I. and Regan, D. (1983). Texture changes versus size changes as stimuli for motion in depth. Vision Res., 23: 1387-1400. Regan, D. and Beverley, K. I. (1983). Visual fields for frontal plane motion and for changing size. Vision Res., 23: 673-676. Regan, D. (1985). Visual flow and direction of locomotion. Science, 227: 1063-1065. Regan, D. (1986). Visual processing of four kinds of relative motion. Vision Res., 26: 127-145. Regan, D., Collewijn, H. and Erkelens, C. J. (1986). Necessary conditions for the perception of motion in depth. Invest. Ophthalmol. Vis. Sci., 27: 584-597. Regan, D., Erkelens, C. J. and Collewijn, H. (1986). Visual field defects for vergence eye movements and for stereomotion perception. Invest. Ophthalmol. Vis. Sci., 27: 806-819. Erkelens, C. J. and Regan, D. (1986). Human ocular vergence movements induced by changing size and disparity. J. Physiol. (Land.), 379: 145-169. Hong, X. and Regan, D. (1989). Visual field defects for unidirectional and oscillatory motion in depth. Vision Res., 29: 809-819. Regan, D. and Hamstra, S. (1993). Dissociation of discrimination thresholds for time to contact and for rate of angular expansion. Vision Res., 33: 447-462. Regan, D. (1993). The divergence of velocity and visual processing. Perception, 22: 497-499.
408
Selected Publications of David Regan
Regan, D. (1993). Binocular correlates of the direction of motion in depth. Vision Res., 33: 2359-2360. Regan, D. and Kaushal, S. (1994). Monocular judgement of the direction of motion in depth. Vision Res., 34: 163-167. Regan, D. and Vincent, A. (1995). Visual processing of looming and time to contact throughout the visual field. Vision Res., 35: 1845-1857. Gray, R. and Regan, D. (1995). Cyclopean motion perception produced by oscillations of size, disparity and location. Vision Res., 35: 655-666. Regan, D., Hamstra, S. J., Kaushal, S., Vincent, A., Gray, R. and Beverley, K. I. (1995). Visual processing of an object's motion in three dimensions for a stationary or a moving observer. Perception, 24: 87-103. Portfors-Yeomans, C. V. and Regan, D. (1996). Cyclopean discrimination thresholds for the direction and speed of motion in depth. Vision Res., 36: 3265-3279. Vincent, A. and Regan, D. (1996). Judging the time to collision with a simulated textured object: Effect of mismatching rates of expansion of size and of texture elements. Percept. Psychophys., 59: 32-36. Portfors-Yeomans, C. V. and Regan, D. (1997). Discrimination of the direction and speed of a monocularly-visible target from binocular information alone. /. Exp. Psychol. Hum. Percept. Perform., 23: 227-243. Portfors, C. V. and Regan, D. (1997). Just-noticeable difference in the speed of cyclopean motion in depth and of cyclopean motion within a frontoparallel plane. /. Exp. Psychol. Hum. Percept. Perform., 23: 1074-1086. Gray, R. and Regan, D. (1998). Accuracy of estimating time to collision using binocular and monocular information. Vision Res., 38: 499-512. Gray, R. and Regan, D. (1998). Motion in depth: Adequate and inadequate simulation. Percept. Psychophys., 61: 236-245. Kohly, R. and Regan, D. (1998). Evidence for a mechanism sensitive to the speed of cyclopean form. Vision Res., 39: 1011-1024. Gray, R. and Regan, D. (1999). Adapting to expansion increases perceived time to collision. Vision Res., 39: 2602-2607. Gray, R. and Regan, D. (2000). Estimating the time to collision with a rotating nonspherical object. Vision Res., 40: 49-63. Gray, R. and Regan, D. (1999). Do monocular time to collision estimates necessarily involve perceived distance? Percept., 28: 1257-1264. Gray, R. and Regan, D (2000). Self-motion causes error in judging tune to collision: Implications for highway safety. Curr. Biol., 10: 587-590. Regan, D. and Gray, R. (2000). Visual factors in collision avoidance and collision achievement. Trends Cog. Sci., 7: 99-107. Regan, D. and Gray, R. (2001). Hitting what one wants to hit and missing what one wants to miss. Vision Res., 41: 3321-3329.
Selected Publications of David Regan
409
Regan, D. (2002). Binocular information about time to collision and time to passage. Vision Res., 42: 2479-2484. Gray, R., Macuga, K. and Regan, D. (2004). Long-range interactions between objectmotion and self-motion in the perception of movement in depth. Vision Res., 44: 179-195. Regan, D. (2003). Fast visual judgements in highway driving and sport. Hebb Award lecture. Canadian Psychology, in press. Early Visual Processing of Spatial Form Defined by Luminance, Color, Motion, Texture, and Binocular Disparity Regan, D. and Beverley, K. I. (1983) Visual fields described by contrast sensitivity, by acuity and by relative sensitivity to different orientations. Invest. Ophthalmol. Vis. ScL, 24: 754-759. Regan, D. and Beverley, K. I. (1983). Spatial frequency discrimination and detection: Comparison of postadaptation thresholds. /. Opt. Soc. Am., 73: 1684-1690. Burbeck, C. A. and Regan, D. (1983). Independence of orientation and size in spatial discriminations. /. Opt. Soc. Am., 73: 1691-1694. Regan, D. and Beverley, K. I. (1984). Figure-ground segregation by motion contrast and by luminance contrast. /. Opt. Soc. Am., 1: 433-442. Wilson, H. R. and Regan, D. (1984). Spatial frequency adaptation and grating discrimination: Predictions of a line element model. /. Opt. Soc. Am. A., 1: 1091-1096. Regan, D. and Beverley, K. I. (1985). Postadaptation orientation discrimination. /. Opt. Soc. Am. A, 2: 147-155. Regan, D. and Beverley, K. I. (1985). Visual responses to vorticity and the neural analysis of optic flow. J. Opt. Soc. Am. A, 2: 280-283. Regan, D. (1985). Masking of spatial frequency discrimination. J. Opt. Soc. Am. A, 2: 1153-1159. Regan, D. (1985). Storage of spatial-frequency information and spatial-frequency discrimination. J. Opt. Soc. Am. A., 2: 619-621. Regan, D. (1986). Form from motion parallax and form from luminance contrast: Vernier discrimination. Spatial Vis., 1: 305-318. Regan, D. and Price, P. (1986). Periodicity in orientation discrimination and the unconfounding of visual information. Vision Res., 26: 1299-1302. Morgan, M. J. and Regan, D. (1987). Opponent model for line interval discrimination: Interval and vernier performance compared. Vision Res., 27: 107-118. Regan, D. (1989). Orientation discrimination for objects defined by relative motion and objects defined by luminance contrast. Vision Res., 29: 1389-1400. Regan, D. and Hong, X. (1990). Visual acuity for optotypes made visible by relative motion. Optom. Vis. Sci., 67: 49-55.
410
Selected Publications of David Regan
Regan, D. and Hamstra, S. (1991). Shape discrimination for motion-defined and contrastdefined form: Squareness is special. Perception, 20: 315-336. Regan, D. (1991). Prentice medal lecture. Specific tests and specific blindness: Keys, locks and parallel processing. Optom. Vis. Sci., 68: 489-512. Regan, D. and Hamstra, S, (1992). Shape discrimination and the judgement of perfect symmetry: Dissociation of shape from size. Vision Res., 32: 1845-1864. Regan, D. and Hamstra, S. (1992). Dissociation of orientation discrimination from form detection for motion-defined bars and luminance-defined bars: Effects of dot lifetime and presentation duration. Vision Res., 32: 1655—1666. Regan, D., Nakano, Y. and Kaiser, P. K. (1993). Dissociation of chromatic and achromatic processing of spatial form by the titration method. J. Opt. Soc. Am., 10: 1314-1328. Regan, D. and Hamstra, S. (1994). Shape discrimination for rectangles defined by disparity alone, disparity plus luminance and by disparity plus motion. Vision Res., 34:2277-2291. Regan, D. and Hong X. H. (1994). Recognition and detection of texture-defined letters. Vision Res., 34: 2403-2407. Hamstra, S. and Regan, D. (1995). Orientation discrimination in cyclopean vision. Vision Res., 35: 365-374. Regan, D., Gray, R. and Hamstra, S. J. (1995). Evidence for a neural mechanism that encodes angles. Vision Res., 36: 323-330. Simpson, T. L. and Regan, D. (1995). Test-retest variability and correlations between tests of texture processing, motion processing, visual acuity and contrast sensitivity. Optom. Vis. Sci.,12: 11-16. Vincent, A. and Regan, D. (1995). Parallel independent processing of o ntation, spatial frequency and contrast. Perception, 24: 491-499. Regan, D. (1995). Orientation discrimination for bars defined by orientation texture. Perception, 24: 1131-1138. Gray, R. and Regan, D. (1996). Accuracy of reproducing angles: Is a right angle special? Perception, 531-542. Regan, D., Hajdur, L. V. and Hong, X. H. (1996). Two-dimensional aspect ratio discrimination for shape defined by orientation texture. Vision Res., 36: 3695-3702. Gray, R. and Regan, D. (1996). Vernier step acuity and bisection acuity for texturedefined form. Vision Res.,37: 1713-1723. Gray, R. and Regan, D. (1998). Spatial frequency discrimination and detection characteristics for gratings defined by orientation texture. Vision Res., 38: 2601-2617. Giaschi, D. and Regan, D. (1997). Development of motion-defined figure-ground segregation in preschool and older children, using a letter-identification task. Optom. Vis. Sci., 74: 761-767. Kwan, L. and Regan, D. (1998). Orientation-tuned spatial filters for texture-defined form. Vision Res., 38: 3849-3855.
Selected Publications of David Regan
411
Kohly, R. P. and Regan, D. (2000). Coincidence detectors: Visual processing of a pair of lines: Implications for shape discrimination. Vision Res., 40: 2291-2306. Kohly, R. P. and Regan, D. (2000). Long-distance interactions in cyclopean vision. Proc. Roy. Soc. Lond. B, 268: 213-219. Kohly, R. and Regan, D. (2002). Fast long-range interactions in the early processing of luminance-defined form. Vision Res., 42: 49-63. Kohly, R. and Regan, D. (2002). Fast long-range interactions in the early processing of motion-defined form and of combinations of motion-defined, luminance-defined, and cyclopean form. Vision Res., 42: 661-668. Grove, P. M. and Regan, D. (2002). Spatial frequency discrimination in cyclopean vision. Vision Res.,42: 1837-1846. Colour Vision Dynamics Regan, D. and Tyler, C. W. (1971). Wavelength-modulated light generator. Vision Res., 11:43-56. Regan, D. and Tyler, C. W. (1971) Some dynamic features of colour vision. Vision Res., 11: 1307-1324. Regan, D. and Tyler, C. W. (1971). Temporal summation and its limit for wavelength changes: an analog of Bloch's law for color vision. J. Opt. Soc. Am., 61: 14141421. Theoretical: Parallel Sets of Filters Regan, D. (1982). Visual information channeling in normal and disordered vision. Psychol. Rev., 89: 407-444. Basic Research on Human Visual Evoked Potentials and Visually-Evoked Magnetic Fields of the Brain Regan, D. (1966). An effect of stimulus colour on average steady-state potentials evoked in man. Nature, 210: 1056-1057. Regan, D. (1966). An apparatus for the correlation of evoked potentials and repetitive stimuli. Med. Biol. Eng., 4: 168-177. Regan, D. (1966). Some characteristics of average steady-state and transient responses evoked by modulated light. Electroenceph. Clin. Neurophysiol.,20: 238-248. Regan, D. (1968). A high frequency mechanism which underlies visual evoked potentials. Electroenceph. Clin. Neurophysiol.,25: 231-237. Regan, D. (1968). Chromatic adaptation and steady-state evoked potentials. Vision Res., 8: 149-158. Regan, D. (1968). Evoked potentials and sensation. Percept. Psychophys., 4,347-350. Regan, D. (1970). Evoked potentials and psychophysical correlates of changes in stimulus colour and intensity. Vision Res., 10: 163-178.
412
Selected Publications of David Regan
Regan, D. (1970). Objective method of measuring the relative spectral luminosity curve in man. J. Opt. Soc. Am., 60: 856-859. Regan, D. and Cartwright, R. F. (1970). A method of measuring the potentials evoked by simultaneous stimulation of different retinal regions. Electroenceph. Clin. Neurophysiol.,28: 314-319. Regan, D. and Spekreijse, H. (1970). Electrophysiological correlate of binocular depth perception in man. Nature, 255: 92-94. Regan, D. and Sperling, H. G. (1971). A method of evoking contour-specific scalp potentials by chromatic checkerboard patterns. Vision Res., 11: 173-176. Regan, D. and Richards, W. (1971). Independence of evoked potentials and apparent size. Vision Res., 11: 679-684. Spekreijse, H., van der Tweel, L. H. and Regan, D. (1972). Interocular sustained suppression: Correlations with evoked potential amplitude and distribution. Vision Res., 12: 521-526. Regan, D. (1973). An evoked potential correlate of colour: evoked potential findings and single-cell speculations. Vision Res., 13: 1933-1941. Regan, D. (1973). Evoked potentials specific to spatial patterns of luminance and colour. Vision Res., 13: 2381-2402. Regan, D. and Richards, W. (1973). Brightness contrast and evoked potentials. J. Opt. Soc. Am., 63: 606-611. Regan, D. and Beverley, K. I. (1973). Relation between the magnitude of flicker sensation and evoked potential amplitude in man. Perception, 2: 61-65. Regan, D. and Beverley, K. I. (1973). Electrophysiological evidence for the existence of neurones sensitive to the direction of depth movement. Nature, 246: 504-506. Cartwright, R. F. and Rega D. (1974). Semi-automatic, multi-channel Fourier analyzer for evoked potential analysis. Electroenceph. Clin. Neurophysiol., 36: 547-550. Regan, D. (1974). Electrophysiological evidence for colour channels in human pattern vision. Nature, 250: 437-449. Regan, D. (1975). Colour coding of pattern responses in man investigated by evoked potential feedback and direct plot techniques. Vision Res., 15: 175-183. Regan, D. (1975). Recent advances in electrical recording from the human brain. Nature, 253: 401-407. Regan, D. (1976). Latencies of evoked potentials to flicker and to pattern speedily estimated by simultaneous stimulation method. Electroenceph. Clin. Neurophysiol., 40: 654-660. Regan, D. and Spekreijse, H. (1977). Auditory-visual interactions and the correspondence between perceived auditory space and perceived visual space. Perception, 6: 133-138. Regan, D. (1977). Steady state evoked potentials. J. Opt. Soc. Am., 67: 1475-1489.
Selected Publications of David Regan
413
Regan, D. (1978). Assessment of visual acuity by evoked potential recording: Ambiguity caused by temporal dependence of spatial frequency selectivity. Vision Res., 18: 439-445. Regan, D. (1978). Investigations of normal and defective colour vision by evoked potential recording. Mod. Probl. Ophthal., 19: 19-28. Regan, D. (1979). Electrical responses evoked from the human brain. Sci. Am., 241: 134-146. Regan, D. (1982). Comparison of transient and steady-state methods. Proc. N.Y. Acad. Sci., 388: 46-71. Spekreijse, H., Dangelie, G., Maier, J. and Regan, D. (1985). Flicker and movement constituents of the pattern reversal response. Vision Res., 25: 1297-1304. Regan, D. and Spekreijse, H. (1986). Evoked potentials in vision research: 1961-1985. Vision Res., 26: 1461-1480. Regan, D. and Regan, M. P. (1987). Nonlinearity in human visual responses to twodimensional patterns and a limitation of Fourier methods. Vision Res., 27: 21812183. Regan, D. and Regan, M. P. (1988). Objective evidence for phase-independent spatial frequency analysis in the human visual pathway. Vision Res., 28: 187-191. Regan, D. (1989). Magnetic fields generated by the human brain. Can. Res., 22: 11-15. Regan, D, (1995). Spatial vision in children and adults: A tribute to Russel Harter. Internal. J. NeuroscL, 80: 153-172. Regan, D. and He, P. (1995). Magnetic and electrical responses of the human brain to texture-defined form and to textons. /. Neurophysiol., 74: 1167-1178. Regan, D. and He, P. (1995). Magnetic and electrical brain responses to chromatic contrast in human. Vision Res., 36: 1-18. Regan, M. P., He, P. and Regan, D. (1995). An audio-visual convergence area in human brain. Exp. Brain Res., 106: 485^87. Regan, M. P. and Regan, D. (2Q02). Orientation characteristics of a mechanism in the human visual system sensitive to cyclopean form. Vis. Res., 42: 661-668. Visual Psychophysical and Electrophysiological Research in Ophthalmology and neuro-ophthalmology Regan, D. and Heron, J. R. (1969). Clinical investigation of lesions of the visual pathway: A new objective technique. J. Neural. Neurosurg. Psychiat. 32: 479-483. Regan, D. (1973). Rapid objective refraction using evoked brain potentials. Invest. OphthalmoL, 12: 669-679. Regan, D. and Spekreijse, H. (1974). Evoked potential indications of colour blindness. Vision Res., 14: 89-95.
414
Selected Publications of David Regan
Heron, J. R., Regan, D. and Milner, B. A. (1974). Delay in visual perception in unilateral optic atrophy after retrobulbar neurities. Brain, 97: 755-772. Milner, B. A., Regan, D. and Heron, J. R. (1974). Differential diagnosis of multiple sclerosis by visual evoked potential recording. Brain, 97: 755-772. Heron, J. R., Milner, B. A. and Regan, D. (1975). Measurement of acuity variations within the central visual field caused by neurological lesions. /. Neural. Neurosurg. Psychiat., 38: 356-362. Regan, D., Milner, B. A. and Heron, J. R. (1976). Delayed visual perception and delayed visual evoked potentials in the spinal form of multiple sclerosis and in retrobulbar neuritis. Brain, 99: 43-66. Regan, D., Varney, P., Purdy, J. and Kraty, N. (1976). Visual field analyzer: Assessment of delay and temporal resolution of vision. Med. Biol. Eng., 14: 8-14. Galvin, R. J., Regan, D. and Heron, J. R. (1976). A possible means of monitoring the progress of demyelination in multiple sclerosis: Effect of body temperature on visual perception of double light flashes. J. Neurol. Neurosurg. Psychiat., 39: 861-865. Galvin, R. J., Regan, D. and Heron, J. R. (1976). Impaired temporal resolution of vision after acute retrobulbar neuritis. Brain, 99: 255-268. Regan, D. (1977). Speedy assessment of visual acuity in amblyopia by the evoked potential method. Ophthalmologica, 175: 159-164. Galvin, R. J., Heron, J. R. and Regan, D. (1977). Subclinical optic neuropathy in multiple sclerosis. Arch. Neurol., 34: 666-670. Regan, D., Silver, R. and Murray, T. J. (1977). Visual acuity and contrast sensitivity in multiple sclerosis: Hidden visual loss. Brain, 100: 563-579. Regan, D., Murray, T. J. and Silver, R. (1977). Effect of body temperature on visual evoked potential delay and visual perception in multiple sclerosis. /. Neurol. Neurosurg. Psychiat., 40: 1083-1091. Regan, D. (1978). Investigations of normal and defective colour vision by evoked potential recording. Mod. Probl. Ophthal., 19: 19-28. Regan, D. and Milner, B. A. (1978). Objective perimetry by evoked potential recording: limitations. Electroenceph. Clin. Neurophysiol., 44: 393-397. Regan, D., Whitlock, J., Murray, T. J. and Beverley, K. I. (1980). Orientation-specific losses of contrast sensitivity in multiple sclerosis. Invest. Ophthalmol. Vis. Sci., 19: 324-328. Regan, D. (1980). Speedy evoked potential methods for assessing vision in normal and amblyopic eyes: Pros and cons. Vision Res., 20: 265-269. Raymond, J., Regan, D. and Murray, T. J. (1981). Abnormal adaptation of visual contrast sensitivity in multiple sclerosis patients. Can. J. Neurol. Sci., 8: 221234.
Selected Publications of David Regan
415
Regan, D., Raymond, J., Ginsburg, A. and Murray, T. J. (1981). Contrast sensitivity, visual acuity and the discrimination of Snellen letters in multiple sclerosis. Brain, 104: 333-350. Regan, D., Regal, D. M. and Tibbies, J. A. R. (1982). Evoked potentials during recovery from blindness recorded serially from an infant and his normally sighted twin. Electroenceph. Clin. Neurophysiol., 54: 465—468. Regan, D., Bartol, S., Murray, T. J. and Beverley, K. I. (1982). Spatial frequency discrimination in normal vision and in patients with multiple sclerosis. Brain, 105: 735-754. Regan, D. (1983). Visual psychophysical tests in demyelinating disease. Bull. Soc. Belg. Ophthal., 208: 303-321. Regan, D. and Neima, D. (1983). Low-contrast letter charts as a test of visual function. Ophthalmology, 90: 1192-1200. Neima, D. and Regan, D. (1984). Pattern visual evoked potentials and spatial vision in retrobulbar neuritis and multiple sclerosis. Arch. Neural., 41: 198-201. Regan, D. and Neima, D. (1984). Low contrast letter charts in early diabetic retinopathy, ocular hypertension, glaucoma and Parkinson's disease. Br. J. Ophthalmol., 68: 885-889. Neima, D., LeBlanc, R. and Regan, D. (1984). Visual field defects in ocular hypertension and glaucoma. Arch. Ophthalmol., 10:, 1042-1045. Regan, D. and Neima, D. (1984). Visual fatigue and VEPs in multiple sclerosis, glaucoma, ocular hypertension and Parkinson's disease. /. Neurol. Neurosurg. Psychiat., 47: 673-678. Regan, D. and Neima, D. (1984). Balance between pattern and flicker sensitivities in the visual fields of ophthalmological patients. Br. J. Ophthalmol., 68: 310-315. Regan, D. (1985). Evoked potentials and their application to neuro-ophthalmology. Neuro-ophthalmology, 5: 73-108. Regan, D. and Maxner, C. (1986). Orientation-dependent loss of contrast sensitivity for pattern and flicker sensitivity in multiple sclerosis. Clin. Vision Sci., 1: 1-23. Regan, D. and Maxner, C. (1987). Orientation-selective visual loss in patients with Parkinson's disease. Brain, 110: 239-271. Apkarian, P., Tijssen, R., Spekreijse, H. and Regan, D. (1987). Origin of notches in CSF: optical or neural? Invest. Ophthalmol. Vis. Sci., 28: 607-612. Regan, D. (1988). Low contrast letter charts and sinewave grating tests in ophthalmological and neurological disorders. Clin. Vis. Sci., 2: 235-250. Regan, D. (1988). Low-contrast visual acuity test for paediatric use. Can. J. Ophthalmol., 23: 224-227. Regan, D. (1990). High and low contrast acuity. Optom. Vis. Sci., 67: 650-653. Kothe, A. C. and Regan, D. (1990). Crowding depends on contrast. Optom. Vis. Sci., 67: 283-286.
416
Selected Pubiications of David Regan
Kothe, A. and Regan, D. (1990). The component of gaze selection/control in the development of visual acuity in children. Optom. Vis. Sci., 67: 770-778. Regan, D., Kothe, A. C. and Sharpe, J. A. (1991). Recognition of motion-defined shapes in patients with multiple sclerosis and optic neuritis. Brain, 114: 11291155. Regan, D. (1991). Do letter charts measure contrast sensitivity? Clin. Vis. Sci., 6: 401-^08. Regan, D., Giaschi, D., Sharpe, J. A. and Hong, X. H. (1992). Visual processing of motion-defined form: Selective failure in patients with parietotemporal lesions. /. Neurosci., 12: 2198-2210. Giaschi, D., Regan, D., Kraft, S. and Hong, X. H. (1992). Defective processing of motion in the fellow eye of unilateral amblyopes. Invest. Ophthal. Vis. Sci., 33: 2483-2489. Regan, D., Giaschi, D., Kraft, S. and Kothe, A. C. (1992). Method for identifying amblyopes whose reduced line acuity is caused by defective selection and/or control of gaze. Ophthal. Physiol. Opt., 12: 425^32. Giaschi, D., Regan, D., Kothe, A. C., Sharpe, J. A. and Hong, X. H. (1992). Motiondefined letter detection and recognition in patients with multiple sclerosis. Ann. Neural., 31:621-628. Regan, D., Giaschi, D. and Fresco, B. (1993). Measurement of glare susceptibility in cataract patients using low-contrast letter charts. Ophthal. Physiol. Opt., 13: 115-123. Giaschi, D., Regan, D., Kraft, S. and Kothe, A. C. (1993). Crowding and contrast in amblyopia. Optom. Vis. Sci., 70: 192-197. Regan, D., Giaschi, D. and Fresco, B. (1993). Measurement of glare susceptibility using low-contrast letter charts. Optom. Vis. Sci., 70: 969-975. Regan, D. and Simpson, T. L. (1995). Multiple sclerosis can cause visual processing deficits specific to texture-defined form. Neurology, 45: 809-815. Giaschi, D. E., Trope, G. E., Kothe, A. C. and Hong, X. H. (1996). Loss of sensitivity to motion-defined form in patients with primary open angle glaucoma and ocular hypertension. /. Opt. Soc. Am. A, 13: 707-716. Giaschi, D. E., Lang, A. and Regan, D. (1997). Reversible dissociation of sensitivity to dynamic stimuli in Parkinson's disease: Is magnocellular function essential to reading motion-defined letters? Vision Res., 37: 3531-3534. Sleeves, J. K. E., Gray, R., Steinbach, M. J. and Regan, D. (2000). Accuracy of estimating time to collision using only monocular information in unilaterally enucleated observers and monocularly viewing normal controls. Vision Res., 40: 3783-3789. Regan, D. (2002). An hypothesis-based approach to clinical psychophysics and to the design of visual tests. Proctor Lecture. Invest. Ophthalmol. Vis. Sci., 43: 13111323.
Selected Publications of David Regan
417
Theoretical Modeling Regan, M. P. and Regan, D. (1988). A frequency domain technique for characterizing nonlinearities in biological systems. J. Theoret. BioL, 133: 293-317. Regan, M. P. and Regan, D. (1989). Objective investigation of visual function using a nondestructive zoom-FFT technique for evoked potential analysis. Can. J. Neurol. Sci., 16: 168-179. Regan, D. and Hong, X. H. (1995). Two models of the recognition and detection of texture- defined letters compared. Biol. Cybern., 72: 389-396. Vision in Aviation, Highway Safety, and Sport Beverley, K. I. and Regan, D. (1980). Device for measuring the precision of eye-hand coordination while tracking changing size. Aviat. Space Environ. Med., 51: 688-693. Kruk, R., Regan, D., Beverley, K. I. and Longridge, T. (1981). Correlations between visual test results and flying performance on the Advanced Simulator for Pilot Training (ASPT). Aviat. Space Environ. Med., 52: 455-460. Regan, D. (1992). Visual judgements and misjudgements in cricket, and the art of flight. Perceptional: 91-115. Kruk, R., Regan, D., Beverley, K. I. and Longridge, T. (1983). Flying performance on the advanced simulator for pilot training and laboratory tests of vision. Human Factors, 25: 457-466. Kruk, R. and Regan, D. (1983). Visual test results compared with flying performance in telemetry-tracked aircraft. Aviat. Space Environ. Med., 54: 906-911. Regan, D. (1995). Spatial orientation in aviation: Visual contributions. /. Vestib. Res., 5: 455-471. Kruk, R. and Regan, D. (1995). Collision avoidance: A helicopter simulator study. Aviat. Space Environ. Med., 67: 111-114. Regan, D. (1996). Visual factors in catching and hitting. J. Sports Sci., 15: 533-538. Voisin, A., Elliott, D. B. and Regan, D. (1997). Babe Ruth: With vision like that, how could he play baseball? Optom. Vis. Sci., 74: 144-146. Gray, R. and Regan, D. (2000). Risky driving behaviour: A consequence of visual motion adaptation for visually-guided goal-directed motor action. J. Exp. PsychoL: Hum. Percept. Per/., 26: 1721-1732. Visual Single Unit and Slow-Wave Electrophysiology in Animals Regan, D., Schellart, N. A. M., Spekreijse, H. and van den Berg, T. J. T. P (1975). Photometry in goldfish by electrophysiological recording: Comparison of criterion response method with heterochromatic flicker photometry. Vision Res., 15: 799-807.
418
Selected Publications of David Regan
Cynader, M. and Regan, D. (1978). Neurones in cat parastriate cortex sensitive to the direction of motion in three-dimensional space. /. Physiol. (Lond.), 274: 549569. Regan, D. and Cynader, M. (1979). Neurons in area 18 of cat visual cortex selectively sensitive to changing size: Nonlinear interactions between responses to two edges. Vision Res., 19: 699-711. Cynader, M. and Regan, D. (1982). Neurons in cat visual cortex tuned to the direction of motion in depth: Effect of positional disparity. Vision Res., 22: 967-982. Regan, D. and Cynader, M. (1982). Neurons in cat visual cortex tuned to the direction of motion in depth: Effect of stimulus speed. Invest. Ophthalmol. Vis. Sci., 22: 535-550. Regan, D. and Lee, B. B. (1993). A comparison of the human 40 Hz response with the properties of macaque ganglion cells. Vis. Neurosci., 10: 439-445. Basic and Clinical Research on Hearing Regan, D. and Tansley, B. W. (1979). Selective adaptation to frequency-modulated tones: Evidence for an information-processing channel selectively sensitive to frequency changes. /. Acoust. Soc. Am., 65: 1249-1257. Tansley, B. W. and Regan, D. (1979). Separate auditory channels for unidirectional frequency modulation and unidirectional amplitude modulation. Sensory Proc., 3: 132-140. Noseworthy, J., Miller, J., Murray, T. J. and Regan, D. (1981). Auditory brainstem responses in postconcussion syndrome. Arch. Neurol., 38: 275-278. Tansley, B. W., Regan, D. and Suffield, J. B. (1982). Measurement of the sensitivities of information processing channels for frequency change and for amplitude change by a titration method. Can. J. Psychol, 36: 723-730. Quine, D. B., Regan, D. and Murray, T. J. (1983). Delayed auditory tone perception in multiple sclerosis. Can. J. Neurol. Sci., 10: 183-186. Quine, D. B., Regan, D., Beverley, K. I. and Murray, T. J. (1984). Patients with multiple sclerosis experience hearing loss specifically for shifts of tone frequency. Arch. Neurol., 41: 506-508. Quine, D. B., Regan, D. and Murray, T. J. (1984). Degraded discrimination between speech-like sounds in multiple sclerosis and in Friedreich's ataxia. Brain, 107: 1113-1122. Regan, D. and Regan, M. P. (1988). The transducer characteristic of hair cells in the human ear: A possible objective measure. Brain Res., 438: 363-365 Regan, M. P. and Regan, D. (1993). Nonlinear terms produced by passing amplitudemodulated sinusoids through Corey and Hudspeth's hair cell transducer function. Biol. Cybern., 69: 439-446.
Selected Publications of David Regan
419
Regan, M. P. and Regan, D. (1993). Rectification of amplitude-modulated sinusoids with particular reference to a hair cell transducer function. Proc. IEEE Conf. Systems, Man Cyberbetics, 495-500. Regan, M. P. and Regan, D. (2001). Simulated hair cell transduction of amplitudemodulated, frequency-modulated and quasi-FM tones. Hearing Res., 158: 6570.
This page intentionally left blank
Author Index Achtman, R. L., 48,53 Addams, R., 139,147, 322, 327 Adelson, E. H., 91, 92, 98, 318, 321, 326, 327, 330, 331 Aglioti, S., 344 Ahlfors, S. P., 336, 343 Mais, D., 290, 292,306,309, 325,327 Alborzian, S., 325, 331 Albrecht, D. G., 273 Albright, T. D., 285, 289, 292, 300, 302, 303, 307, 317,331 Mho, K., 402 Allen, D., 275 Allman, J., 285-287, 296,297, 306 Alones, V, 367 Anand,S.,113,117 Andersen, R. A., 279,285, 296,299, 301, 303, 306, 307, 312, 313, 317, 318, 321, 326, 328,331, 332 Anderson, B. L., 99, 301, 373 Anderson, E., 186 Anderson, J., 403 Anderson, S. J., 287,306 Angilletta, M., 322,330 Anstis, S. M., 48,53, 97, 99,403 Antal, A., 364,366 Araujo, C., 174, 785 Arbuthnott, G. W., 353,367 Arend, L., 16,18, 32 Arguin, M., 115 Aristotle, 322, 327 Arsenault, A. S., 381 Aseltine, J. A., 257,273 Ashkenazy, R., 210 Askensay, J. J., 382 Assad, W., 56 Assoku, E., 118 Atkinson, J., 115,116, 344, 391, 399 Altmann, C. G., 54 Au Young, S., 115
Augath, M., 54,56,312 Baccus, S. A., 310 Badcock, D. R., 45, 51, 52, 53, 55, 56 Bagrow, L, 154, 765 Bahcall, D. O., 181, 185 Bahill, A. T., 140-143,147,151 Bair, W., 44,48, 55 Baker, C.L., 292,306 Bollard, D. H., 170, 785 Baker, R., 12,30 Bammes, G., 17, 30 Banks, S. A., 151 Barbeito, R., 394,399,403 Barlow, H. B., 55, 97, 99, 252, 273, 323, 327, 363 Barrett, B., 106,115 Bartels, A., 314 Barton, J. J. S., 336, 337, 339, 341, 344 Barton-Howard, W., 322,330 Batteli, L., 115 Baylor, D. A., 348, 363 Beall, A. C., 123, 124,147 Beckett, P., 128, 148 Bedell, H. D., 372, 387 Belhumeur, P., 24,30 Belkin, M, 383 Bell, H. H., 287, 309 Belliveau, J. W., 343 Bender, D. B., 285, 304, 307 Bennet, R. W., 263,273 Berezovskii, V. K., 287, 306 Berman, R. A., 323, 327 Bernstein, N. A., 143, 147 Bertone, A., 325, 332 Bertulis, A. V., 95,99 Best, M., 115 Beusmans, J. M., 308 Beverley, K. I., 43, 55, 57, 66, 134, 136, 139, 147, 149, 150, 252, 273, 292, 293, 311,
421
422 315,327,331 Bex, P. J., 45,52,53, 117, 316,327 Bhalla, M., 310 Biddle, K., 119 Bienfang, D. C., 346 Binmoeller, K. E, 366 Bischof, W. E, 54 Bishop, P. O., 48,54 Bisti, S., 365 Bjornson, B., 115,116 Blake, R., 88, 199, 277, 291, 301, 309, 310, 372, 316, 318, 323, 325, 327, 333, 390, 399,402 Blakeslee, B., 92, 99 Blanke,O.,34l,344 Blaquiere, A., 257, 273 Blaser, E., 186, 324-326, 327, 332 Bloomfield, S. A., 349, 367 Bobak, P., 115 Boden, C., 117 Bodis-Wollner, I., 110,115,351, 352,363-367 Bonneh, Y., 383 Booth, J. R., 199, 209, 210 Bootsma, R., 132, 133,142, 146,147 Borghuis, B. G., 285, 306 Bormans, G., 345 Born, R. T., 285-287, 289, 290, 296, 299, 305, 306,316,322,328,329 Bouman, M A., 281, 306 Bowd,C.,322,328,330 Bowers, P., 119 Bowns, L., 390, 399 Boycott, B. B., 349,364 Boynton, G. M, 775, 321, 325, 329,331 Bracewell, R., 257,273 Braddick, O., 59, 66,115, 775, 776, 234, 273, 285, 291, 292, 306, 316, 318, 321, 328, 336, 344, 391, 399 Bradley, A., 115, 372, 381, 387, 399 Bradley, D. C., 285, 296, 297, 299, 301, 303, 306,307,317,328 Bradshaw, M., 116 Brady, J., 194,212 Brady, T.J., 316, 323,332 Brainard, D., 364 Branca, P., 403 Braun, J., 54 Braunstein, M. L., 300, 301, 307, 313 Breitmeyer, B., 86 Brelstaff, G., 166
Author Index Bremner, R., 400 Brentano,E, 11,30 Bresson, R, 300, 373 Brincat, S., 404 Brindley,G.S.,281,307 Britten, K. H., 317,330 Brunett, J. R., 366 Buckingham, R., 109, 117 Buckley, D., 86 Bullock, T. H., 308 Buracas, G. T., 285, 289, 300, 302, 303, 307, 325, 337 Burdett, E, 109, 118 Burns, R. S., 352,364 Burr, D. C., 45, 53, 55, 181, 785, 287, 306, 316,330 Burrell, G. J., 273 Caine, E. D., 364 Cal, K., 118 Calford, M. B., 404 Calvert, E. S., 123, 148 Camisa,J.M., 351,363 Campbell, F. W., 382, 388, 399 Campos, E. C., 385,404 Camras, C. B., 365 Cao, A., 285, 307 Carandini, M., 308 Cardinal, K.S.,51,53 Carl Zeiss Jena, 12 Carr, R. E., 274 Carter, L. P., 402 Cartwright, R. F., 239,273 Casas, J., 313 Cavanagh, P., 51, 54, 88, 115, 115, 117, 292, 311,338,342,346 Chakravarthy, U., 381 Chalupa, L. M, 407 Chandna, A., 117, 375, 387, 382 Chang, G. C., 306 Chatterjee, A., 402 Chaudhry, S., 136, 748 Chaudhuri, A., 325, 328 Chen, C.-C., 114, 115 Chen, L., 118 Chen, N., 403 Chiuech, C. C., 364 Christou, C., 17, 18, 20, 21, 32 Chubb, C., 59,66,66 Cioni,G., 116, 316,330
Author Index
423
De Bruyn, B., 307, 345 De Monasterio, F. M., 364 De Montis, G., 366 de Ridder, H., 15,16,18, 22, 26,30, 33 De Valois, R. L., 53,56, 232, 252, 273,277 De Weerd, P., 308 Deiber, M. P., 54 Delabarre, E. B., 170, 785 Dell'Osso, L. R, 200, 201, 204-207, 210, 211 Demanins, R., 117 Demb, J,, 109,110, 115, 321,329 Dengis, C. A., 392, 395-398,399, 400 Denton, G. G., 138,139, 748 Denys, K., 313 Depriest, D., 345 Derrington, A. M, 290, 307 Desimone, R.,5l, 55, 308, 325,328 Deubel, H., 181, 185 Deyoe, E., 104, 116, 336, 344 Dick. M., 292, 307 DiCiommo, D., 385, 400 Diller, L., 364 Dineen, J., 366 DiPietro, S., 45,55 Djamgoz, M. B. A., 348,364 Do Carmo, M., 14, 31 Dobmeyer, S., 344 Dominici, L,, 116 Donahue, S., 102, 116 Doner, J. F., 309 Donnelly, M, 322, 328 Dasher, B., 172,181, 185, 186, 375,381 Dougherty, R., 116,117, 344 Douglas, R., 116, 323,328 Dowling, J. E., 348, 349, 353, 359,364-367 Downie, A., 102, 776 Downing, P., 325,330 Draovitch, P., 151 Dreher, B., 117 Da Vinci, L, 13,30 Droulez, J., 307 Dacey, D., 349, 350, 364 Drumheller, A. T., 366 Dakin, S. C., 45,48, 52,53 Dale, A. M, 54,118, 311, 316, 323, 332, 343, Du, A., 118 Duffy, C. J., 318, 328 346 Dukelow, S. P., 323, 328 Davidson, B. J., 325,330 Duncan, J., 325, 328 Davidson, R. M., 285, 304, 307 Dupont, P., 297,307,345 Davis, G. C., 352, 364 Dursteler, M. R., 336, 344, 345 Daw, N. W., 362,365, 385, 386, 399 Duzel, E., 312,346 Dawson, G. D., 217, 273 Day, S., 385, 386,399 Ebert,M.H.,364 De Angelis, G. C., 296, 301,307 Ciuffreda, K. J., 372,381 Claparede, E., 30 Clarke, D. D., 136,148 Cleland, B. G., 349,364 Clifford, C. W. G., 51, 52,53,56,93,99 Clifford, W. K., 24,30 Cline,R.,117 Cfynes, M., 252, 273 Cobb, W. A., 217, 273 Cobo, A., 325,332 Cohen, E. R., 323, 329 Cohen, J. L., 353, 364 Colby, C. L, 323,327 Collewijn, H., 188,190,209-212 Conway, T. E., 325, 330 Cook, F. F., 151 Cope, P., 54 Corbetta, M., 337,344 Cornells, E. V. K., 18, 22, 26, 30 Cornelissen, P., 109,115 Cornilleau-Peres, V., 300, 307 Cory, E., 116 Cowey, A., 48, 54, 295, 297, 307, 309, 346, 402 Coxeter, H. S. M., 24,30 Craft, W. D., 279, 281, 282, 284, 285, 300, 302, 303,309 Crawford, J. S., 404 Crist, R. E., 382 Croner, L. J., 292, 307 Crooks, L. E., 122,148 Crovitz, H. F., 390, 399 Cudworth, C. J., 138, 150 Culham, J. C., 323, 328 Cunningham, V.J.,54 Curran, W., 298, 312, 318, 321,328 Cutzu,F., 181,185 Cynader, M., 43,55, 252, 273
424
Author Index
Edelman, S., 116, 403 Fredericksen, R. E., 323, 324, 332 Edgell, D.,116 Freeman, A. W., 387, 388, 400 Edwards, M, 45, 53,185, 194, 210 Freeman, R. D., 372, 381, 387, 399 Edwards, V., 109, 110, 116, 117 Fresco, B. B., 131,150 Egelhaaf, M, 304,307 Friedman, H. S., 166 Frisby, J. P., 86 Ehrenstein, W., 118,154, 765, 766 Eifuku, S., 285, 307 Frisk, V., 116 Eighmy, B. B., 193, 277 Frisian, K., 54, 311 Einstein, A., 36, 40 Frost, B. J., 285, 307, 313 El Mallah, M. K., 376, 381 Fujita, I., 345 Fukada, Y., 313 Elizondo, M. I., 308 Ellemberg, D., 372, 375,381 Fukuda, Y., 350, 364 Ellis, B., 232, 273 Fuortes, M. G. K, 363 Ellis, F. D., 401 Enroth-Cugell, C., 348-351, 357, 364 Gabor,D.,219, 245, 274 Epelboim, J., 170, 174, 185, 189, 194, 195, Galanis, G., 128, 148 199, 205, 208, 209, 209-272 Gallant, J. L., 48, 51,54 Erchul, D. M, 277 Gallie, B. L., 400, 403, 404 Erickson, R. G., 312, 321, 332 Garness, S. A., 148 Erkelens, C. J., 177, 185,188, 209-211 Gati, J., 56, 323, 328 Eskew, R., 346 Gautama, T., 285, 289, 308 Eskin,T.,113,117 Gegenfurtner, K. R., 288, 308 Evans, L., 138,148 GOT, G.A., 136, 148 Everatt, J., 109, 110,116 Gersch,T.,181,185 Gerschenfeld, H. M., 365 Fahle, M., 375, 381, 382, 403 Ghilardi, M. K, 352, 353, 363, 365 Fallah, M., 325, 330 Giaschi, D., 7, 101-103, 105, 106, 109, 110, Fan, S., 118 115-118, 131, 750, 299, 30S, 377, 323, Fanfarillo, R., 400 328, 339, 344, 345, 390, 400 Faugeras, O., 194, 211 Gibson, E. J., 375, 382 Favreau, O.E.,51,54 Gibson, J.J.,11, 29, 31, 122, 148, 299, 308 Feher, A., 325, 330 Gilbert, C. D., 44, 54, 305, 375, 382, 386, 400 Felmingham, K., 102, 110, 116 Gilroy,L.A.,312 Ferber, S., 336, 343, 344 Gizzi, M. S., 330 Ferrera, V. P., 45, 53, 55, 56, 323, 333, 336, Glass, L., 44, 48, 54 344 Goddard, P. A., 290, 307 Field, D., 44, 48, 53,117, 374, 382 Goebel, R., 321,330 Fiorentini, A., 116, 316, 330, 351, 365 Goltz, H. C., 397, 399, 400 Fisher, S. K., 365 Gombrich, E. H., 24, 37 Fitch, H. L, 143, 748 Gonzalez, E. G., 386-388, 390-392, 398, 400, Fize, D., 313 401, 403, 404 Flach, J. M., 127, 128, 138,148, 149 Goodale, M. A., 5, 7, 56, 323, 325, 337, 344, 345, 402 Flanagan, P., 51, 54 Flam, M. C., 372, 381 Gorea, A., 86, 88, 325, 330 Foley, J. M., 252,274 Gossweiler, R., 310 Gouras, P., 364 Forofonova, T. I., 272 Graham, J. E., 400 Forsyth, D., 16,24,31 Fowler, S., 115 Graham, M., 300, 311 Fox, R., 388, 400 Granit, R.,216, 274 Frackowiak, R. S., 54 Gray, R., 122, 124, 127, 130, 133, 136, 138,
Author Index 139, 141, 143, 145, 147,148-150, 404 Green, D. G., 372, 382, 388, 399 Green, D. M., 283, 308 Greenhalgh, T., 382 Greenlee, M. W., 316, 328 Gregory, R. L., 84, 86 Grieve, K. L., 308 Grill-Spector, K., 115,116 Gmeger,J.A.,138,149 Groh, J. M., 306 Grosse-Ruyken, M. L., 323, 332 Grossman, E., 316, 327 Gruber, O., 323, 332 Grunewald, A., 317,323, 324,328 Griisser, O., 308 Grusser-Comehls, U., 285, 308 Gstalder, R. J., 372, 382 Guillery, R. W., 386, 401 Gulyas, B., 309 Gunn, A., 105, 106,116 Gunther, L., 400 Guzetta, A., 116 Habak, C., 404 Hadjikhani, N., 48,54 Hagedorn, P., 257,274 Hagen, B. A., 128,148 Hamasaki, D., 362, 365, 367 Homer, R. D., 275 Hammond, P., 285,308, 323,328 Hampson, E. C., 361, 365 Hamstra, S., 292, 298, 311, 315, 331 Han, L., 403 Hanes, D. P., 312 Hankins, M. W., 359,365 Hansen, P., 118 Hariharan, S., 382 Harris, L. R., 5, 7 Harris, M. G., 318, 323, 329 Hart, P. M., 381 Harwerth, R. S., 372, 383 Hashimoto, T., 345 Hassard, F. A., 323, 328 Hassenstein, B., 57, 66 Hatton, R. G., 17, 31 Hausen, K., 277,307 Hautzel, H., 323, 329 Hawken, M. J., 311 Hawkins, T. D. F., 273 Hayashi, C., 257, 274
425 Hayes, A., 44, 53 He, P., 233,236,250,276, 277 He, S., 323,329 He, Z. J., 402 Head, H., 84, 87 Hecht, H., 16,18, 32 Hecht, 5., 281,308 Hedgd,J., 51,54 Heeger, D. J., 115, 308, 317, 321, 323, 329, 344 HeikMla, R., 366 Heilman, K. M., 402 Heino,A., 138,151 Heinze, H. J., 312, 346 Held, R.,39l, 402 Helmholtz, H., 37,40 Helveston, E. M., 397,401 Henaff, M.-A., 115 Hendler, T., 116 Hendry,S.,104,116 Hering, E., 401 Heron, J. R., 238, 239, 276 Herst, A., 211 Herzon, H., 323,329 Hess, R. F., 44, 48, 53, 75, 87, 109, 777, 778, 372, 374, 375, 387, 382 Heywood, C. A., 48, 54,402 Hong, X. H., 7 Hibbard, P., 116 Hikosaka, K., 313 Hildebrand, A., 13, 31 Hill, R. M, 323, 327,363 Hirsch, M., 257, 274 Ho, C., 109, 117 Hoag, R., 113, 117 Hobson,E.W.,26l,274 Hocking, D. R., 386,401 Hoffman, D., 166 Hoffman, J. E., 181, 185, 325, 329 Hoffmann, E. R., 127,139,149 Hoi, K., 313,318,332 Holcombe, A. O., 325, 327 Hollander, H., 365 Holliday, 1. E., 75, 87 Homes, G. M., 87 Honda, Y., 345 Hong, X. H., 101,116,118,252,274,308,311, 344, 345, 390, 403 Hooge, I. T. C., 117, 185 Hooven, T. A., 211
426 Horowitz, M. W., 389, 401 Horridge, G. A., 304,308 Horton, J. C., 386, 401 Hotson,J.,ll3,lI7 Howard, I. P., 385, 388, 392, 401, 403, 404 Howell, E. R., 372,382 Hoyte, F., 130,149, 391, 401 Hsiao, C. F., 364 Hu, X., 323, 329 Hu, Y. S., 383 Hubbard, A. W., 122,128, 140,149 Hubel, D. H., 3, 7,44, 54, 336, 345, 386, 401 Hughes, H., 384 Huk, A. C., 288,308, 317, 323,329, 341,344 Humphrey, G. K., 337, 344 Humphreys, G., 325, 328 Hurlbert, A. C., 297, 298, 310 lawai, E., 342, 345, 346 Ikeda, H., 359, 365 Intrilligator, J., 115, 345, 346 Ishibashi, T., 367 /to, H., 364 /to, M, 342,345 Itzchak, K, 116 Iwai, E., 313 Jacobowitz, D. M, 364 Jacobs, J. B., 201,211 Jacobs, T. S., 18, 31 Jaglom, I. M., 25, 26, 31 Jakobson, L., 102,110,116 James, T. W., 56 Jansen, T. C,, 210 Jeffcoat, G. O., 136,149 Jenkin, M., 5, 7 Jenkins, G. M., 257,274 Jennings, A., 128,148 Jensen, R. J., 362, 365 Jobe, F. W., 150 Johnson, C. A., 387,401 Jolicoeur, F. B., 366 Jolly, N., 388,400 Jones, A., 191, 211 Jones, H. E., 285, 308 Jones, J., 136,148 Julesi, B., 45, 51, 54, 74, 80, 84, 86, 87, 236, 274, 318, 329 Kaas, J. H., 286, 306
Author Index Kahn, D. M, 386, 401 Kaiser, M, 142, 150, 316, 329 Kamermans, M., 349, 365 Kanizsa, G., 70, 71,87 Kanwisher, N., 54, 325, 330 Kapadia, M. K., 287, 288, 308 Kapin, J., 364 Kaplan, E., 350, 365 Kappers, A. M. L, 11, 12, 14-16, 18, 19, 21, 22, 26,27, 29, 31-33 Karnavas, W. J., 141-143, 147 Kami, A., 375, 382 Kasamatsu, T., 383 Kashii, S., 345 Kastner, S., 285, 308 Kato, S., 367 Katz, L., 117 Kaufman, L., 233, 274 Kaufmann, F., 117 Kaufmann-Hayoz, R., 105,117 Kaushal, S., 132, 133, 142, 150 Kawato, M., 16, 32 Kelly, D. M., 52, 54 Kelly, L., 148 Kelly, S., 109,117 Kemper, T., 346 Kennard, C.,54 Kikuchi, R., 342, 345 Killingsworth, E. A., 309 Kim, J., 290,308, 323,333, 338, 342,345, 346 King, L. V., 266, 274 Kingdom, F., 92, 99, 100 Kiorpes, L., 386, 401 Kiper, D.C., 5l, 53, 308 Kirshner, E. L, 399 Klein, F., 31 Klein, S., 87, 372, 374, 382, 383 Kleiss, J. A., 128, 149 Klitz, T. S., 211 Koch, C., 311 Koenderink, J. J., 11, 12, 14-22, 24, 26-29, 31-33, 75, 88, 282, 300, 303, 308, 3/4 Koffka, K., 154, 159, 165 Kohly,R.,315,329 Kohn, A., 323,329 Kohn, M., 273 Kolb, H., 348-350, 364, 365 Kolers, R A., 97,100, 199,211 Kontsevich, L. L, 74-76, 81, 82, 87, 88 Kopin, I. J., 364
Author Index Korvenoja, A., 343 Kothe, A., 116,118, 308, 344, 346, 366 Kottas, B. L, 309 Kourtzi,Z.,44,51,54 Kovdcs, G., 325, 332 Kovacs, I., 45, 51,54,109,117, 372, 375,381, 382 Kovanesz, I., 367 Kowler, E., 170, 174, 177, 181, 182,185,186, 188,210-212 Kraft, C. L., 125,149 Kraft, S. P., 101, 116,400,403,404 Krantz, D. H., 282,308 Kratz, K. E., 386, 401 Krause, B. J., 323, 329 Krause, F., 403 Krauskopf, J., 312 Krauzlis,R.J.,317,330 Kriegman, D., 24,30 Kropfl, W., 86 Krubitzer, L., 386,401 Kruk, R., 134, 147,149 Kuffler, S. W., 348, 350, 365 Kujala, T., 386, 402 Kulikowski, J. J., 48, 54 Kushnir, T., 116
427
Legge, G. E., 199,211,252,274 Lemay, M., 346 Lennie, P., 104, 117, 232, 274, 372 Leutgeb, S., 312 LeVay,S.,401 Levelt, W. J. M., 388,402 Leventhal, A., 104, 117, 297, 309, 372 evi, D. M, 75, 87, 106, 117, 372, 374-376, 381-383, 402 Levick, W. R., 44,55,252,273,349,364 Levinson, E., 277, 318, 329 Levinson, J. Z., 194, 200,212 Levitt, J. B., 287, 288, 309 Lewis, M. R, 128, 149 Lewis, T., 387,402 Lifshitz, K., 273 Likova, L. T., 76, 78,87 Lindberg, K. A., 365 Lindt, M. V., 366 Lisberger,S.G.,317,330 Lishman, R. R., 138, 750 Liu, A. K., 54, 343 Liu, L., 88, 285, 289, 309, 390, 402 Livingstone, M. S., 322, 329, 336, 345 Lockhead, G. R., 390,399 Logothetis, N. K., 54,56,312, 390,399 Logvinenko, A. D., 209, 211 Longridge, T., 134, 749 Lafond, B., 366 Look, R..B.,316,323,332 Lagae, L., 285,309, 316, 318, 329 Loomis, J. M., 123, 124, 747, 285, 289, 291, Lagreze, W. D., 372, 382 299,310 Lammertsma, A. A., 54 Lorenceau, J., 290, 292, 293, 309, 372 Land, M., 173,186 Lou, L., 390, 402 Landis, T., 55, 344 Lovegrave, W., 51, 55 Landy, M. S., 88 Lu, Z. L., 181, 185, 186, 233, 274, 327, 375, Lang, A., 116 387 Langewiesche, W., 122, 749 Luce, R. D., 282, 308 Lankheet, M. J., 306, 323-325, 328, 329 Lappin, J. S., 17,18,20, 21,32, 279,281-285, Lueck, C. J., 48, 54 287, 290-294, 300, 302, 303, 309, 310, Lukasewycz, S., 306 Lund, J. S., 287, 288, 309 372 Luneburg, R. K., 39, 40,41 Larish, J. K, 128,138,148,149 Luner, V., 237, 277 LaRitz, T., 140, 147 Lyons, C., 116,117 Lasater, E. M., 349, 359,365 Lauwers, K., 297, 309 Ma-Nairn, T., 383 Ledgeway, T.,118 Lee, B., 238,276, 364 Macho, K. A., 4, 7 Maes, H., 310, 316,329 Lee, D. N., 45,54,133,138, 749 Lee, P. Y., 366 Maffei, L., 116, 351, 357,365 Lee, S. H., 291, 309, 390,402 Maguire, G. W., 362, 365 Lee, T. J., 401 Makous,W., 316,327
428 Malach, R., 116, 316, 323,332 Malinov, I. V., 189, 208, 277 Malladi,R., 71,88 Maloney, R. K., 45,55 Mandl, G., 285, 309 Mangel, S. C., 348, 366 Mangun, G. R., 312, 346 Mansfield, J. S.,211 Mapp, A. P., 385, 392, 403 Marcar, V. L., 297, 299, 309-311, 313, 314 Marchak, F., 384 Marchal, G., 307, 310, 313, 346 Marchel, G., 118 Mariani, A., 349,365, 366 Markey, S., 364 Marotta, J. J., 392,402 Man, D., 72, 74, 79, 80, 86, 87, 194, 211, 310 Marrocco, R. T., 55 Marsh, G., 123, 149 Martin, P. R., 51, 53 Martinez-Trujillo, J. C., 325, 332 Martins, A. J., 197,211 Marx, M., 775, 352, 363, 365, 366 Mason, A., 115 Mather, G., 97, 99, 323, 329 Matsui, Y., 345 Matsumoto, N., 349, 366 Maunsell, J. H., 104, 117, 277, 312, 317, 329, 336,344, 345 Maurer, D., 387, 402 Meaner, C., 110,118 May hew, J. E., 86 Mayhoe, M. M, 185 Mazer, J. A., 54 McBeath, M. K., 142, 749 McCollough,C.,51,55 McCourt, M. E., 92, 99 McCree,K.J.,2l6,274 McGraw, P., 52, 55,118 McGuiness, E., 306 Mcllhagga, W., 117,382 McKee, S. P., 87, 387, 389, 402 McPeek, R. M., 177, 186 Meese, T. S., 318, 329 Meister, M, 370 Meizin, E, 306 Melara, R. D., 325, 330 Melcher, D., 174,186 MelvillJones, G., 193,212 Meng, M., 118
Author Index Mennemeier, M. S., 390,402 Mennie, N., 173,186 Menon, R., 56, 323, 328 Merigan, W., 104,113, 117 Mermoud, C., 344 Mertens, H. W., 128, 149 Metha,A.B.,3l6,327 Metzger, W., 165 Michiels, J., 307 Midgett, J., 310 Miezin, F. M., 312, 344 Mikami, A., 321,322,330,345 Miles, F. A., 193,211 Miller, J., 84, 87 Miller, K., 401 Milner, A. D., 5, 7, 337, 344, 345 Mingolla,E., 15,32 Mirolli, M., 366 Mishkin,M.,4,7 Mitchell, D. E., 386, 402 Mitchell, J. F., 325, 330 Mitchison, G. J., 55, 87 Mitra, S., 115 Miyogl, S., 118 Mizobe, K., 373,383 Moidell, B., 394, 402 Mbller, P., 297, 298,310 Monasterio, F. M., 55 Montanaro, D., 316, 330 Montegut, M. J., 316, 329 Morgan, M., 57, 59, 66, 66,75, 87 Morinaga, S., 157, 165 Morrill, P., 115 Morrone, M. C., 45, 55, 185, 316,330 Morrow, M. J., 336, 341, 345 Mortelmans, L, 307, 345 Mortimer, R. G., 127,139, 149 Mouat, G. S., 323,328 Moulden, B., 92, 99, 700 Mounts, J. R., 181, 186, 325, 330 Movshon, J. A., 44, 48, 55, 56, 252, 274, 308, 317, 318, 323, 327, 329, 330, 402 Movson, S. P., 386,401 Muckli,L.,32l,330 Mueller, T. J., 388,402 Mueller-Gaertner, H. W., 323, 332 Mukai, L, 325, 330 Mulder, M., 130, 149 Muller-Gartner, H. W., 323, 329 Mund, T., 312
429
Aufftor Index Murakami, I., 290, 310 Murray, S. O., 301, 304, 310 Mylin, L., 115 Naatanen, R., 402 Naegele,J.,391,402 Nagamine, T., 345 Naka, K. /., 349, 366 Nakamura, H., 336, 345 Nakayama, K., 73,87,165,165, 186,279,285, 289,291,299,307,310 Nawrot, M., 118, 291, 292, 301, 310 Neafy, T. A., 336, 344, 345 Needham, T., 28, 32 Negishi, K., 367 Nelson, B., 325, 329 Nelson, J. L, 242, 274 Nelson, R., 365 Newsome, W. T., 296,301,307, 317,321,329337, 336, 345 Neyton, J., 365 Nguyen-Legros, J., 361,366 Nicholas, J., 388, 391, 402 Nicklas, W. J., 352,366 Nicolle, D., 402 Nishida, S., 322, 323,330 Niyogi, S., 311 Norcia, A. M., 80, 87,117, 242,274, 275, 373, 381-384 Norman, J. F., 73, 87 Nothdurft, H. C., 308 Nurnberg,W.,2l,32 O'Brien, J., 115,344 O'Bryan, P., 363 O 'Cornell, D., 300, 313 O'Craven, K., 325,330 O'Donnell,H.L.,325,331 O'Regan, J. K., 170,186,199,211 Ogle, K. N., 133, 750 Oliver, P., 353, 366 Olshausen, B. A., 310 Olveczky, B. P., 285,310 Ono, H., 385, 392,399-404 Onofrj, M. C., 365 Ooi, T. L., 390,402 Orban, G. A., 118, 252, 275, 286, 297, 301, 304, 307, 309-377, 373, 374, 316, 329, 337,345, 346 Osuobeni, E. P., 387, 403
Oudejans, R. R., 133, 747 Pacey, L, 115 Pacioli, L., 87 Pack, C. C., 322, 32 Packer, O., 364 Pakneshan, P., 325, 332 Palmer, C., 211 Pantle,A.,277 Ponton, R., 403 Papathomas, T. V., 324-326, 330,332 Pare, E.B., 336, 345 Parkin, B., 52, 56 Parrish, E., 106, 117 Patel, S. S., 309 Patterson, R., 322, 328,330 Paul, P., 109, 777 Pavel, M., 172, 174, 185 Peirce, J. W., 312 Pek, J., 185 Pennefather, P., 117, 381, 382 Peper, C. E., 142,146, 147 Peppe, A., 364, 366 Perez, R., 44,54 Perge, J. A., 306 Perotti,V.J.,300,303,310 Perrot, T. S., 402 Perry, J., 150 Perry, R. J., 314 Perry, V., 104, 118, 350, 366 Petersen, S., 312, 344 Peterson, B., 364 Pettet, M. W., 383 Peuskens, H., 313 Phelps, M. T., 52,55 Phillips, R, 310 Phinney, R., 322,330 Piccolino, M., 116, 349, 353, 359, 365-367 Pieron, H., 216,275 Pigarev, I. N., 308 Pinchoff, B., 401 Pinilla, T., 325, 332 Pink, M., 150 Pinna, B., 154, 155, 162, 164, 165, 765, 766 Pinsk, M. A., 308 Pirenne, M. H., 27, 32,308 Pizlo, Z., 185,209-212 Pleijsant, J. M., 149 Podos, S. M,, 366 Poggio, G. F., 252,275, 387, 388,403
430 Poggio, T., 74, 80,87,194,211,257,277, 375, 382 Pohndorf, R., 322, 330 Pokorny, J., 274 Polat, U., 117, 372-376, 382-384 Pollick, F. E., 16,32 Ponce, J., 16, 24, 31 Portfors-Yeomans, C. V., 133, 150 Posner, M. L, 325,330 Post, R. B., 401 Postigilone, S., 400 Poston, T., 16,32 Pradzny, K., 48, 55 Previc, F. H., 390, 403 Proffitt, D., 142, 750, 300, 303,310, 316, 329 Puckett, J. D., 208,211 Pylyshyn, Z. W., 325, 327
Author Index
Richardson, A., 115 Richter, S., 385, 403 Riddoch, G., 87 Riemann, B., 38-40, 41 Rieth, C., 106, 118 Ringach, D. L., 311 Rivers, W. H., 87 Rivest, J., 292,311, 340,345, 346 Rizzo, M., 102, 118 Roberts, W. A., 52,55 Robson, J. G., 348-351, 357, 364 Rodieck, R. W., 117, 348, 350, 351, 366 Rodman, H. R., 317,331 Rodriguez, A. M, 166 Roelofs, C. O., 394, 403 Rogers, B., 300, 311, 392, 401 Rorschach, H., 12, 33 Rosa, M. G. P., 404 Rosen, B.R., 316, 323,332 Qian, N., 307,318,321, 326,331 Rosier, A., 307 Qiu, M., 118 Ross, J., 45,53, 185 Raiguel, S., 285-288, 299, 309-311, 313, 314, Rugenstein, B. S., 382 Rubin, E., 154, 165, 166 316,329 Ruddle, R. A., 139, 150 Ramachandran, V. S., 70, 83, 84, 87, 88 Rush-Smith, N., 401 Rauber,H.J.,313,3l8,332 Rusterd, J., 173, 186 Rawson, P., 14, 32 Ryan,J., 110,118 Raymond, J., 110, 775, 323, 325, 331, 344 Recanzone, G. H., 317, 322,331 Sachs, H., 26,33 Reed, M, 109, 778, 386, 387, 391,403, 404 Sachtler, W. L., 290, 311 Rees, G., 288,311 Regan, D., 3, 5, 7, 8, 35, 37, 41, 43, 55, 57, Saenz, M, 325, 557 66, 67, 88, 101-106, 110, 113, 116, 118, Safran, A. B., 344 122, 124, 127, 130-134, 136, 138-142, Sagi, D., 307, 373, 375,382-384 147, 147-150, 154, 766, 270, 216, 217, Saito, H., 313 223, 225, 226, 228, 229, 231, 233, 234, Salzman,C.D.,317,330 236-239, 242, 243, 245, 247, 249, 250, Sandell, J. H., 277 252, 253, 255, 273-277, 280, 285, 291- Sansbury, R. V., 212 293, 295, 296, 298, 299, 508, 577, 315, Sarti,A.,71,88 316, 326, 527, 529, 557, 339, 341, 343, Sdry, G., 292,311, 346 344-346, 351, 566, 387, 390, 400, 403, Sato, T., 322, 323, 330 Saudargene, D. S., 95, 99 404 Regan, M. P., 220, 245, 247, 249, 250, 252- Sounders, R., 309 Sceniak, M. P., 287, 288, 577 255,266,276, 277 Schall, J. D., 312 Reichardt, W., 57, 66, 252, 257, 277, 507 Scheich, H., 312,346 Reichert, C. M., 364 Reid, R., 95, 100, 104, 116 Schein,S.J.,48,51,55 Schenk, T., 295,311, 343,346 Rentschler,I.,44,51,55 Schiffinan, H. R., 199,211 Reppas, J., 114, 118,297,311,316, 323,332 Schiller, P. H., 252, 253, 277, 285, 297, 507, Ress, D., 308, 323,329 572 Reynolds, J. H., 325, 330, 331
Author Index Schmid, L. M., 386,404 Schmitz, N., 323, 329,332 Schmolesky, M. T., 299,309,312 Schneider, W. X., 181,185 Schoenfeld, M. A., 299,312, 336, 343, 346 Schor, C.M.,402 Schrauf, M., 102, 105,118 Schumann, R, 68, 88 Schwartz, E. A., 349,366 Schwarz,J.,312 Schwarz, U., 322, 331 Sclar, G., 288,312 Seeck, M., 344 Seidemann, E., 321,329,331 Seiffert, A. E., 336, 346 Seiple, W. H., 274 Sekuler, R., 199, 277,252, 277, 291,310,316, 318,327,329 Selenaw, A., 381 Seng, C. N., 122,140, 149 Serena, ME.,301, 312 Serena, M. I., 304, 311, 312 Serenso, M, 118 Sethian, J. A., 71,88 Seu,L.,45,55 Severin, C., 366 Shaffer, B., 143, 150 Shah, N. J., 323, 329, 332 Shannon, C. E., 282,312 Shannon, E., 310 Shapiro, L. S., 194,212 Shapley, R., 95, 100, 104, 118, 311, 350, 365, 366 Sharpe, J. A., 7, 101, 116, 118, 308, 311, 341, 344-346,366 Shaw, M.L., 172, 180, 786 Shaw, P., 172, 180, 786 Shepard, R. N., 84, 88 Sherren, J., 87 Shiffrar, M, 290,293,309,312 Shih, S. L, 325, 331 Shimojo, S., 73, 87, 165, 165, 290, 310 Shipley, E. F., 283, 312 Shlaer,S.,308 Shorter, S., 322, 328 Shoup, R. E,, 54 Shulman, G. L., 297, 312,344 Shy, M., 92, 100 Siegel,R.M.,301,312 Sigman, M, 382
431 Sillit, A. M., 308 Silverman, G. H., 73, 87 Simpson, G. V., 343 Simpson, T., 399 Simpson, W. A., 392, 404 Singer, W., 67, 88, 321, 330 Sireteanu, R., 106, 118, 372, 382 Skavenski, A. A., 186, 192, 197, 212 Skelton, N., 136, 749 Skelton, R. A., 154, 165 Skillen, J., 52, 55 Skrandies, W, 390, 404 Slaghuis, W., 110, 118 Smale, S., 257, 274 Smeed, R. J., 136, 149 Smith, A. T., 285, 298, 308, 372, 323, 328 Smith, D.R., 400, 403, 404 Smith, E. L., 362, 365 Smith, M.A., 44, 48, 55, 56 Smith, R. G., 361, 367 Smith, S., 52, 56 Smith, V. C., 274 Snowden, R. J., 45, 56, 139, 750, 297, 372, 316,321,331,332 Snyder, C. R., 325, 330 Sohn, W., 325, 325, 326, 332 Solomon, S. G., 51, 53, 287, 312 Soloviev, S., 346 Somers, D. C, 346 Sorensen,R., 110,118 Spear, P. D., 386, 401 Spehar, B., 51, 53, 92, 93, 99, 100 Spekreijse, H., 223, 226, 229, 234, 237, 252, 257, 276, 277, 349, 365 Sperling, G., 57, 59, 65, 66, 172, 786, 325, 327, 337 Sperling, H., 226, 276 Spetch, M, 54 Spillmann, L., 154, 166 Spinetti, L., 344 Stager, C., 399 Staller, J. D., 283, 309 Stanard, T., 148 Stanek, K., 116 Stanzione, P., 366, 367 Sleeves, J. K. E., 389-391, 400, 403, 404 Stefanelli, M. A., 45, 55 Stein, J.,115,118 Steinbach, M. J., 392, 395, 399-404 Steinman, R. M., 185, 189, 191, 192, 194, 200,
432 208, 209, 209-212, 398, 404 Stepanov, M. R., 210 Stereno, M. I., 316, 323, 332 Stewart, D., 138, 150 Stewart, /., 16, 32 Stimpson, N., 139, 150 Stoker, J. J., 257, 277 Stone, J., 348, 366 Stone, L., 172, 186, 290, 313 Stoner, G. R., 325, 330, 331 Strubecker, K., 24, 26, 33 Stucki, M., 117 Subramaniam, B., 181, 185 Sunaert, S., 118, 310, 313, 336, 346 Suppes, P., 170, 185, 282, 308 Sutherland, J., 403 Sutherland, N. S., 323, 332 Suzuki, S., 338, 342, 345, 346 Swets, J. A., 283, 308 Switkes, E., 48, 51, 53, 54, 56
Author Index
Todd, J. T., 15, 18, 19, 26, 32, 33, 73, 87, 142, 150, 300, 310, 313 Todorovic, D., 92, 100 Toet, A., 75, 88 Tolhurst, D. J., 48, 53, 274 Tolias, A. S., 54 Tootell, R. B., 54, 118, 285, 287, 296, 306, 311, 316, 323, 328, 332, 346 Tosetti, M., 316, 330 Tramo, M., 115 Treisman, A., 115 Tresilian, J. R., 147, 151 Treue, S., 297, 303, 372, 313, 318, 321, 325, 332 Treutwein, B., 55 Trevarthen, C., 117 Trimarchi, C., 116, 366 Trinath, T., 56, 312 Trope, G., 116 Tse, P. U., 51, 56, 68, 69, 88 Tsotsos, J. K., 181,185 Tadin, D., 284, 287, 288, 290, 293, 294, 305, Tucker, G., 367 309, 312 Turano, K., 287, 313 Tagliati, M., 352, 354, 364, 366, 367 Turner, R., 115, 344 Takeuchi, T., 290, 312 Turvey, M. T., 143, 148 Talbot, W. H., 252, 275 Tversky, A., 282, 308 Tychsen, L., 390, 404 Talcott, J., 110, 118 Taleghani, A., 210 Tyler, C. W., 74-76, 78, 80-82, 84, 86, 87, 88, Tamura, H., 345 216, 242, 274-277, 373, 384, 402 Tanaka, K., 285, 286, 373, 345 Tzelepi, A., 352, 364 Tanne, D., 375, 382, 384 Tansley, B. W., 8 Ullman, S., l94, 212, 307, 310 Taylor, J.G., 323, 329, 332 Underwood, J., 140, 145, 151 Teghtsoonian, M., 138, 150 Ungerleider, L. G., 4, 7, 308 Teghtsoonian, R., 138, 150 Ungerstedt, U., 353, 367 Teller, D. Y., 281, 313 Usycky, I., 116 Tellmann, L., 323, 329 Teranishi, T., 359, 367 TTze'ry, M, 291, 373 Vawa, L. M, 45, 55, 295, 307, 316, 330, 336, 341, 346 Thibos, L., 44, 55, 115 Thompson, H. T., 87 Vajda, /., 306 Thompson, I. D., 274 Valdes-Sosa, M., 325, 332 Thompson, K. G., 312 van de Grind, W., 306, 318, 323, 324, 332 Thorell, L., 273 van den Berg, A. V, 306 Tigges, J., 250, 277 van der Mark, R, 210 Tigges, M., 250, 277 van der Smagt, M. J., 306, 318, 332 Timney, B., 386, 402 Van der Steen, J., 210 Tipper, S. P., 325, 331 van der Tweel, L. H., 223, 225, 277 Tittle, J. S., 301, 313 van der Vaart, H., 149 Tjan, B. S., 211 Van der Zwan, R., 52, 53, 56
Author Index van Doom, A. J., 11, 12, 14-22, 24, 26, 29, 50-53, 282, 300, 303,308 van Ee, R., 301, 313 Van Essen, D. C., 51, 54, 104, 116, 117, 317, 529,336,344 Van Heche, P., 118,310, 313, 346 Van Hulle, M. M., 285, 289, 508-570 Van Oostende, S., 114,118, 297, 575 van Santen, J. P. H., 57, 59, 65, 66 van Wezel, R. J. A., 306 van Wieringen, P., 149 van Winsum, W., 138, 151 Vandenberghe, R., 307 Vandenbussche, E., 309 Vandezande, K., 403 Vanduffel, W., 301, 304, 313 Vaney, D. L, 349,365,367 Vardi,N.,36l,367 Verghese, P., 290, 313 Verlinde, R., 324,332 Vernier, P., 366 Versaux-Botteri, C., 366 Verstraten, F. A., 318,321, 322,324, 325,329, 332, 333 Victor, .J.D., 351,367 Vidnydnszky, Z., 324-326, 332 Vilis, T., 323, 328, 344 Vishwanath, D., 170, 186 Viviani, R, 170, 186 Vogels, R., 309, 311, 345 von der Heydt, R., 166 von Grunau, M., 285, 313, 325, 332 von Noorden, G. K,, 376,384, 385,404 vonRohnr,M., 12,33 von Ukexkull, J., 29, 33 Wade, N. J., 322,333 Wall, M., 116 Wallach, H., 300, 313 Wang, H., 87 Wang, J., 113,114,118 Wang, R. R, 366 Wang, W., 308 Wang, Y., 309,312 Wang, Z, 118 Ward, P. J., 136, 148 Ward, R., 325, 328 Warren, R., 127,148 Warren, W. H., 138, 151,285, 313 Wassie, H., 117
433 Wassle, H., 349,364 Watanabe, M., 364 Watanabe, T., 325, 330 Watson, A. B., 287, 313 Watson, R. T., 402 Watt, R. J., 75,87,382 Wattam-Bell, J., 102,105, 115, 116, 119,344 Watts, D. G., 257,274 Watts, R. G., 140-142, 151 Weaver, W, 282,312 Wehrhahn, C., 307,404 Weiler, R., 365 Weinshall, D., 80,88 Weinstock, M., 401 Welch, C. M, 143, 757 Wendewth, P., 52, 56 Werblin, F. S., 349,367 Werner, J. S., 166 Wertheimer, M., 154,157,159, 166 Wertman, E., 402 Westheimer, G., 281, 288, 308, 313, 387, 402, 404 Whart, J. H., 273 Whitaker, D., 52, 55 White, M.,91,92, 100 Wiener, E. L., 123, 151 Wiesel, T., 3, 7,44, 54, 386,400, 401 Wiesenfelder, H., 323,333 Wilkinson, F., 43-45, 48, 51, 52,56, 389, 404 Williams, A. C., 364 Williams, T., 140,145,151 Williamson, S. J., 233,274 Wilson, H. R., 43,45,48,52,56,290,308,323, 333,404 Wilson, J. A., 53,56 Wilson, M. E., 376, 384 Wilson, V. J., 193, 212 Wist, E., 118 Witkovsky, P., 365-367 Wittgenstein, L, 29, 55 Woldorff, M., 312,346 Wolf, M., 109,110, 119,401 Wolford, G., 375, 384 Wollschlager, D., 154, 166 Wong, C., 352,367 Wong-Wylie, D. R., 54 Woods, D. L., 310 Wright, W. D., 216,277 Wurger, S. M., 88 Wurtz, R. H., 285, 307, 317, 318, 322, 328,
434
Author index 331, 336,344, 345
Xiao, D. K., 285, 286, 289, 302, 303,309-311, 313,314,316,329 Xin, D., 349, 367 Yahr, M., 115,367 Yamamoto, T. S., 232,277 Yang, K, 88 Yarbus, A., 170,186 rilmaz,E.H.,138,151 Yo, C., 53, 56, 323, 355 Youngster, S. L., 366 Yukie, M, 373 Zago, L, 290,309 Zaidi, Q., 51,53, 92,100, 290, 308, 311 Zanella, F. E., 321, 330 Zanker, J. M, 318,333 Zeiss, 33 Zeki, S., 48, 51, 54, 56, 297, 314, 346 Zhang, K., 403 Zhao, R., 306 Zhou, C., 118 Zhou, H., 165, 166 Zhou, T., 118 Zhou, Y., 309 Zhuo, Y., 118 Zhur,G.J.,211 Ziemons, K., 323, 329, 332 Zihl, ]., 118,295,311,346 Zingale, C. M, 177, 186
Subject Index absolute speed, 138 achromatopsia, 337 adaptation, 315 alignment acuity, 62 altitude splay angle, 127 amacrine cells, 347 amblyopia, 101, 106, 242, 371, 372, 376, 396 anisometropia, 106, 385 anisotropia, 387 apparent motion, 96, 233 Aristotle, 322 assimilation, 92,98 astigmatism, 239 asymmetric surrounds, 300 attention, 183 attentional enhancement, 81 selection, 315,324,326 shroud, 74, 85, 86 attentive motion tracking, 109 audiovisual convergence, 250 auditory cortex, 250 Automap model, 80 automatic averager, 217 aviation, 122 Barbie Doll, 196 base-to-final turn, 123 baseball, 121, 143, 144, 296 batsman, 280 bed comforter, 85 binding problem, 67, 86 binocular completion, 386 disparity, 300 rivalry, 388, 390 biological motion, 292 biotype, 29 bipolar cells, 347 bivectorial motion, 318
body schema, 84 cartography, 153 cataracts, 376,385,387 center-surround antagonism, 284, 287, 288, 291 contrast dependency, 288 interactions, 279, 284 mechanisms, 293 neurons, 280, 287 CFIT, 131 chromatic borders, 153 closing speed aftereffect, 139 closure, 154 coherence thresholds, 109 coincidence detector, 57 collateral sulcus, 337 color matching, 216 color-defined form, 229, 232 common fate, 154 contour integration, 375 contrast, 386 defined form, 386 discrimination model, 65 sensitivity, 372 controlled flight into terrain, 131 cortical area hMT, 336 MST, 104,285,287,318,321, 336,341 MT, 104,114,280, 284,285,295-297,299301, 317, 318, 321, 336, 341, 343 VI, 43,48,104, 316, 321, 336, 341 V2, 51,104, 297, 299, 336, 341 V3, 336 V4,48,51 V5,104,114, 280, 317, 321, 341, 343 V5+, 336 cortical feedback, 299 cortical pooling, 389 cricket, 43,169, 280,296
435
436 fast pitch, 145 slow pitch, 145 critical period, 376 cross-modulation, 249 crossing distance, 132 cyclopean cleaning, 74 eye, 385, 396 gaze, 197,198,209 cyclops effect, 394 Da Vinci, Leonardo, 13, 19, 67 depth defined form, 233 map, 75 perception, 385, 386, 391 Descartes, R., 36 differentiation of motion, 292 direction discrimination, 290 direction-selective motion processing, 109 neurons, 323 disparity detectors, 74 distribution-shift theory, 323 dominant eye, 208 dopamine depletion, 352 receptors, 347 dorsal processing stream, 286, 336, 337 driving, 121, 136 dyslexia, 109 Ebenbreite, 157 edge detector, 72 BEG, 217 egocenter, 385, 392 egocentric coordinates, 84 Einstein, A., 36 enucleated children, 385 equiluminance, 226 essential nonlinearity, 256 Euclid, 36 Euclidian plane, 25 extrafoveal attention, 182 visual analysis, 177 eye movements, 81, 171, 189 face recognition, 44, 73 turn, 397
Subject Index Fechner, 37 figure-ground
organization, 161, 292 segregation, 105, 293 fixation, 200 flankers, 376 flicker, 236 flying, 121, 122 form from-watercolor, 153 perception, 340, 385 frontal cortex, 84 Gabor patches, 62,75 Gabor-Pauli-Heisenberg limit, 245 gain control, 242 ganglion cells, 347 receptive fields, 350 gauge figures, 14 Gauss, J., 37 Gaussian blobs, 75 general relativity, 71 geometry, 35 Gestalt psychology, 154 Glasperlenspiel, 24 glass patterns, 44, 51 color tuning, 51 isoluminant, 51 perception in pigeons, 52 glaucoma, 101 global form detection, 43 global motion direction discrimination, 106,109 good continuation, 154,159 grating-grating interactions, 249 grouping, 161 half-Reichardt detector, 59 half-wave rectification, 253 Head, Sir Henry, 84 head movement, 191 turn, 397 Helmholtz, H., 37, 101 Helmholtz field coil, 191 heterochromatic flicker photometry, 238 highway hypnosis, 140 Hildebrand, 13 homunculi, 21 horizontal cells, 347 human brain electrophysiology, 215
Subject Index hyperacuity, 281, 284 hyperbolic space, 39 illusory overlay, 68 induced motion, 290 jump effect, 257 Kanizsa, G., 68 Kant, /., 36 Keplerian array, 74 kinetic boundaries, 297, 299 occipital area, 297 koniocellular pathway, 105 landing flare, 130 laws of optics, 29 letter acuity, 387 linking propositions, 281 lipstick, 204 local motion, 316 neuron, 287 signals, 109, 316 local sign model, 61 local stimulus orientation, 43 long-range interpolation problem, 75 luminance grating, 287 macular degeneration, 376 magnetic field search coil, 191 magnification factor, 25 magnocellular motion, 343 pathway, 101, 238, 336, 337 maps, 153 Marr, D., 72 Maryland Revolving Field Monitor, 194 meaning, 199 mental movements, 22 metamorphopsia, 335, 337 migraine, 238 mipmapping, 136 monocular deprivation, 385 monocular stereopsis, 12 motion, 385, 386 adaptation, 316 aftereffect, 109, 287, 290, 316, 322, 323, 326 assimilation, 291
437
based segmentation, 316,320 coherence, 390 contrast, 291 defined boundaries, 297 defined form, 101, 292, 299, 300, 341, 343, 386, 390 direction discrimination, 339, 341 discrimination, 290 processing, 324 relative, 299 transparency, 316 motion-based figure-ground segregation, 292 multiple sclerosis, 101, 103, 223, 299, 351 natural scenes, 29 neural networks, 71 plasticity, 375 Newton, Sir. I., 36 nondestructive zoom-FFT, 245 nonessential nonlinearity, 256 nonlinear models, 251 nystagmus abduction, 397 congenital, 200 object perception, 67 object-centered coordinates, 84 ocular dominance columns, 386 oculomotor control, 188 slop, 193 opponent mechanism, 59 model, 65 optic flow, 318 optokinetic nystagmus, 385,386, 391 orientation bandwidth, 247 overtaking, 139 Panum's fusional area, 209 parietal lobe, 86 Parkinson's disease, 101, 110, 352 parvocellular color/form, 343 pathway, 238, 337 pattern electroretinogram, 351 perceptual coordinate frame, 86 learning, 375, 376 states, 281
438
phantom limbs, 83, 84 photograph, 11 photonic driving, 217 photoreceptors, 347 physiological states, 281 pictorial relief, 11,21 space, 11 pilots, 122 population code, 297 Pragnanz, 159 preganglionic processing, 351 prehensile matrix, 86 representation, 85 vision, 84 primal sketch, 72 priming effect, 82 Princess Margaret, 239 probability multiplication, 57, 59 prosopagnosic, 51 proximity, 154 Ptolemy, 392 ptosis, 385 radial frequency, 389 Rayleigh,LordJ.,37 reading, 170, 199, 200 real forms, 187 real life, 187 recruitment, 387 Reichardt detector, 57,61 relief space, 13 retinal disparity, 388 Riemannian surface, 71 Robinson's search coils, 191 Roelofs method, 394 Rorschach test, 12 rotation, 191 saccade, 170,173 braking, 200, 201 catch-up, 208 decision, 183 get ahead, 208 intrusive, 200 saturation, 216 search, 170,172,183 second-order alignment, 62
Subject Index global motion, 109 structure, 300 segmentation, 315 segregating surfaces, 291 self-organizing surfaces, 74 shape, 337 discrimination, 82 distortion effect, 340 shear, 18 sensitivity, 390 short-range motion, 233, 234 silicone annulus sensor coil, 190 similarity, 154 simple cells, 48 simultaneous contrast, 92,98 SKALAR, 190 slant haptic responses, 300 smooth-pursuit eye movements, 208, 341 SMS algorithm, 72 Snellen acuity, 376 space, 35 perception, 84 spatial binding, 67 frequency bandwidth, 247 integration, 105, 292 of motion signals, 288 processing, 347 summation, 287 vision, 371 speed thresholds, 106,109 splay angle, 123 static flow, 45 steady-state evoked potential, 216, 226, 236 stereo-blind observers, 19 stereo-defined motion, 62 stereopsis, 12 stereoscopic attention, 81 strabismic amblyopia, 387 strabismus, 106, 385, 386 striate cortex, 316, 386 structures of consciousness, 11 Stuart's rings, 91 summation, 288 surface perception, 153 reconstruction, 83, 85 representation, 68 sweep method, 239 sweep-averaging technique, 243
439
Subject Index symmetry, 154 synopter, 12,19 T-junctions, 92 tapping, 205 tau, 133 tea, 173 temporal bandwidth, 247 binding, 67 temporal-imbalance model, 323 TEO, 342 texton change, 234 texture, 386 defined form, 234,236,386,390 density, 127,128 time invariance, 257 to collision, 133,391 to contact, 130 to passage, 133 translation, 191 transparency perception, 80, 82 transparent motion, 315, 323, 325 plane, 82 Troxler fading, 389, 390 twins, 387 Umwelt, 29 velocity field, 300 gradient, 300 ventral stream, 336, 337 Verant, 12,19 vergence, 81,199 veridicality, 17 vernier acuity, 62, 376, 387 vestibulo-ocular reflex, 193,201 visual acuity, 372 attention, 169,177 development, 385 direction, 385, 392 evoked potentials, 217 motion processing, 279 scission, 92 search, 169 visual-auditory convergence, 250
visual-motor control, 123 watercolor effect, 153, 157 waterfall illusion, 322 wavelength discrimination, 216 what pathway, 336 where pathway, 336 White's effect, 91,98 wide-field neurons, 287 zero-memory system, 257 zograscope, 12