HUMAN MENTAL WORKLOAD
ADVANCES
IN PSYCHOLOGY 52 Editors: G. E. STELMACH
P. A. VROON
NORTH-HOLLAND AMSTERDAM * NEW ...
21 downloads
992 Views
17MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
HUMAN MENTAL WORKLOAD
ADVANCES
IN PSYCHOLOGY 52 Editors: G. E. STELMACH
P. A. VROON
NORTH-HOLLAND AMSTERDAM * NEW YORK * OmORD .TOKYO
HuMlANMENTALwoRKLoAD
Edited by
PeterA. HANCOCK Department of Safety Science, ISSM University of Southern California Los Angeles, CA, U.S. A . and
Najmedin MESHKATI Human Factors Department, ISSM University of Southern California Los Angeles, CA. U.S.A .
1988
NORTH-HOLLAND AMSTERDAM. NEW YORK .OXFORD .TOKYO
ELSEVIER SCIENCE PUBLISHERS B.V., 1988 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.
ISBN: 0 444 70388 8
Publishers: ELSEVIER SCIENCE PUBLISHERS B.V. P.O. Box 1991 1000 BZ Amsterdam The Netherlands
Sole distributors for the U.S. A. and Canada: ELSEVIER SCIENCE PUBLISHING COMPANY, INC. 52Vanderbilt Avenue NewYork, N.Y. 11017 U.S.A.
PRINTED IN THE NETHERLANDS
V
PREFACE
It is now almost a decade since the publication of the Proceedings of the NATO symposium on Mental Workload (Moray, 1979), and as Jahns (1987) has noted in a recent editorial, a look at changes that have occurred with respect to mental workload assessment in the interim decade may prove both refreshing and informative. I t is our major purpose in the present volume to fulfill this function. Within the contributed chapters is information not only on the contemporary status of mental workload but also a number of insightful glimpses into the future of the area. The volume was conceived in association with several colleagues at the 1984 meeting of the Human Factors Society, in San Antonio, Texas. We asked several prominent researchers to survey their respective area of expertise with respect to recent developments. In order to limit the overlap that can occur when different individuals comment on the same area of investigation, we asked authors to provide a contribution that focused on their own particular research endeavors rather than a general survey of particular workload topics or methodologies. While we have attempted to elicit contributions from a wide range of acknowledged experts, we are only too aware of the number of individuals who because of space and time limits we were unable to invite. Also, as our work has progressed, we have become familiar with many more colleagues from whom, had it been possible, we would have also liked to solicit contributions. Indeed, such is the rate of progress in this area that a decade is perhaps too long a period to cover adequately in terms of progress. Some prominent workers had, or were, completing extensive chapters on this issue and are consequently missing from the present work. T h e reader is directed to the works of Kantowitz (1987) and of Gopher and Donchin (1986) for further elaboration of alternative views on the mental workload question. We have organized the volume into a series of coherent sections. These include a section for each of the current dominant methodologies, a further section on individual differences, and final contributions concerning unanswered questions and future directions for the mental workload issue together with a listing of contemporary research reports. The text begins with a contribution by John Senders. His poetical offerings provide a creative view of the state of mental workload as represented at the NATO conference alluded to above. I t is followed by Henry jex’s chapter which represents the written version of the Franklin V . Taylor lecture he presented as the 1980 winner of the award given by the Engineering Psychology Society of the American Psychological Association. T o capture the essence of this work, the chapter is, with minor amendments, a written reproduction of the original verbal presentation. It is both instructive and informative to compare j e x and Senders’ assessments and aspirations of approximately a decade ago with the reality of contemporary developments as represented in chapters which follow. The Fist of these contemporary perspectives is provided by Tom Eggemeier in his chapter on the properties of workload assessment techniques. I t is followed by a chapter from Glenn Wilson and Robert O’Donnell who survey the growing field of physiological measures and continues with the work of Najmedin Meshkati which focuses specifically on heart rate variability as a measure of mental load. Aasman. Wijers, Mulder, and Mulder have chosen to explore the concept of effort and fatigue in relation to the workload experienced during normal daily routines. The largest section of the t ex t concerns the use of subjective assessment techniques, and the originators of two of the most widely employed techniques, i.e., SWAT and NASA TLX, give a detailed account of
Preface
vi
these procedures and the knowledge upon which they are founded. In conclusion to the section Michael Vidulich provides a discussion of the cognitive psychology of subjective workload. An area that has often been acknowledged as of vital importance is the effect of individual differences. In the first paper of the section, Diane Damos emphasizes the paucity of experimental information on this topic. However, the chapters by Peter Hancock and by Najmedin Meshkati and Alex Loewenthal present some data on individual characteristics that appear to influence the experience of workload. In the concluding section of the volume Najmedin Meshkati offers a preliminary proposal for a cohesive model of mental load, and Walter Wierwille discusses some remaining questions and future issues which surround the investigation of mental workload. As an appended element to the work, we have collected a listing of workload-related literature which provides a sampling OF the many citations in the area. The criteria For selection of this listing and its breakdown according to several characteristics are given in the final chapter . In generating any volume there are a number of individuals who have made significant contributions and whose efforts it is a pleasure to acknowledge. First and foremost, w e must thank the authors have who provided prompt and complete copy. Our appreciation goes to Cuong Chu who provided considerable help to a number of authors in generating final copy and also to Nancy Knabe, George Rodenburg, and Eric DiGiovanni who were instrumental in producing the finished text.
N ajmedin Meshkati
P.A. Hancock
REFERENCES Gopher, D., & Donchin, E. (1986). Workload: An examination of the concept. In: K. Boff., L. Kaufman., and J.P. Thomas, (Eds.). Handbook of perception and human performance, New York: Wiley. Jahns, D.W. (1987). Editorial. Human factors bulletin, Kantowitz, B.H. (1987). Mental workload. psvcholony, North-Holland: Amsterdam.
30, 3.
In: P.A. Hancock (Ed.). Human factors
Moray, N. (1979) (Ed.). Mental workload: Its theory and measurement, New York: Plenum Press.
vii
ACKNOWLEDGEMENT Part of my editorial efforts with respect to the present volume were supported by Grant NCC 2-379, ( I EH) from NASA, Ames Research Center, Moffett Field, California. Michael Vidulich and Sandra Hart were the technical monitors for the grant. The contributions contained should not necessarily be construed as representative of t h e position of this agency.
P.A. Hancock
This Page Intentionally Left Blank
IX
TABLE OF CONTENTS
1
PREAMBLE
1. MENTAL WORKLOAD . . .
..
. . . . . . . . . . . . . . . . . . . . . . . . . J.W. Senders
3
APPLICATIONS PSYCHOLOGICAL ISSUES PSYCHOLOGICAL MEASURES CONTROL THEORY MATHEMATICAL MODELS
2. MEASURING MENTAL WORKLOAD: PROBLEMS, PROGRESS, AND PROMISES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
H.R.Jex
INTRODUCTION T H E WORKl.OAD PROBLEM IS MULTIFACETED PROBLEMS I N DEFINING MENTAL WORKLOAD PROGRESS TOWARDS A LJSABLE DEFINITION CRITERIA FOR WORKLOAD MEASUREMENT TYPES OF MEASUREMENT Objective Measures Subjective Measures SEQUENTIAL SUBJECTIVE RATING SCALES PROGRESS IN SUBJECTIVE WORKLOAD RATINGS PROGRESS IN AUXILIARY-TASK TECHNIQUES AUTOMATIC MEASUREMENT O F WORKLOAD MARGIN CORRELATION O F SUBJECTIVE RATINGS W I T H WORKLOAD MARGINS PROGRESS O N A THEORY FOR DIVIDED A T T E N T I O N Finite Dwell Sampling Theory Sampling Effects o n Control Performance Discrete Task Interference Combining Continuous and Cognitive Tasks PSYCHOPHYSIOLOGICAL MEASURES OF WORKLOAD PROGRESS PROMISES Standard Tasks for Calibrating Mental Workload Tracking Task Discrete Tasks Divided Attention Tasks Event Related Potentials WORKLOAD SPECIFICATIONS
5 5 6 8
It 12 12 13 14 15
16 18
21 21 23 23 25 26 27 28 30 30 30 31 32 33 35
Table of Contents
X
CONCLUSION REFERENCES
36 36
3.PROPERTIES OF WORKLOAD ASSESSMENT T E C H N I Q U E S ...
, ,
...
, ,
.
.. ...... . ..... .. .. .
.
, ,
.
. .
.......
. .
. F.T. Eggemeier
I NTRODUCTION SENSITIVITY Sensitivity as a Function of Level of Capacity Expenditure Sensitivity as a Function OF the Locus of Processing Demands I NTRUSl VEN ESS Intrusion With Secondary Task Techniques Intrusion With Subjective and Physiological Techniques IMPLICATIONS O F PROPERTIES WORKLOAD METRIC EVALUATION METHODOLOGY T h e Criterion Task Set Applications of the CTS Battery SUMMARY A N D CONCLUSIONS REFERENCES
4. MEASUREMENT O F OPERATOR WORKLOAD W I T H T H E N EUROPSYCHOLOGICA L WORK LOAD T E S T BATTERY , , . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . G.F. Wilson and R.D. O'Donnell INTRODUCTION MEASURES OF BRAIN FUNCTION General Introduction Epoch Analysis Cortical Evoked Potential T h e Transient Cortical Evoked Response P300 to Primary Tasks P300 to Secondary Tasks T h e "Probe" Technique Steady State Evoked Responses Brain Stem Evoked Responses MEASURES OF HEART RATE EYE BLINK MEASURES COMBINED PHYSIOLOGICAL, PERFORMANCE A N D SUBJECTIVE MEASURES T H E NEUROPSYCHOLOCICAL WORKLOAD T E S T BATTERY (NWTB) Odd-Ball Test Memory-Scanning Test Continuous Performance Test Flash Evoked Response Monitoring Task
41
41 42 42 45 49 50
51 52 54 54 56 57 59
63 63 66 66 66 67 68 69 70 72 73 75 76 77 78 79 82 84 R5 X .5
8G
Table of Contents Tracking Task Brain Stern Evoked Response Checkerboard Steady State Evoked Response Sinewave Grating Steady State Evoked Response Unpatterned Steady State Evoked Potential Electrocai diograph Electroociilograph Electromyograph Operating Procedures OVERVIEW O F C U R R E N T S T A T U S GUIDELINES FOR APPLICATION O F PHYSIOLOGICAL MEASURES SUMMARY REFERENCES
5. H E A R T RATE VARIABILITY AND MENTAL WORKLOAD ASSESSMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. rneshkati ABSTRACT INTRODUCTION H E A R T RATE VARIABILITY Significant Relationship Between Heart Rate Variability and Mental Workload Parameters of H R Data a n d Scoring Methods Spectral Analysis of H R Data Combination of Calculated Parameters of H R Data and Spectral Analysis Absence of Significant Relationships Between Heart Rate Variability and Mental Workload Parameters of H R Data a n d Scoring Methods REFERENCES
6. MEASURING MENTAL FATIGUE I N NORMAL DAILY WORKING ROUTINES . . . . . . J. Aasman, A.A. Wijers, G. Mulder and L.J.M. Mulder INTRODUCTION METHOD Subjects Data Analysis RESULTS Effects o f Task Conditions Effects o f Workload T h e Effects of the Preceding Day Individual Differences DISCUSSION REFERENCES
xi
86 86 87 87 87 88 89 89 x9 90 91
92 93
101 101 101 101 102
102 105 106 107 107 111
117
I I7 123
123 125
126 127
128 132
132 133 I36
Table of Contents
xii
7. DEVELOPMENT OF NASA-TLX (TASK LOAD INDEX): RESULTS OF EMPIRICAL AND THEORETICAL RESEARCH . . . . . , . . . . . . . . . . . , . . . . . . . . , . . . . . . . S.G. Hart and L.E. Staveland ,
ABSTRACT INTRODUCTION Conceptual Framework Information Provided by Subjective Ratings Evaluating Ill-Defined Constructs Individuals' Workload Definitions Sources of Rating Variability Research Approach Research Objectives and Background OVERALL RESULTS Weights Ratings EXPERIMENTAL CATEGORIES S I NGLE-COGNITIVE Category SINGLE-MANUAL Category DUAL-TASK Category FITTSBERG Category POPCORN Category SIMULATION Category CONSTRUCTING A WORKLOAD RATING SCALE Subscale Selection Task-Related Scales Behavior-Related Scales Subject-Related Scales Overall Workload Ratings Weighted Workload Score Verification of Selected Subscales Combination of Subscales Quanti tication Reference Tasks Validation Weights Ratings SUMMARY REFERENCES
8. THE SUBJECTIVE WORKLOAD ASSESSMENT TECHNIQUE: A SCALING PROCEDURE FOR MEASURING M E N T A L WORKLOAD . . . . . . . . . , , . . . . . . . . . . . . . . . . . . . . . . . . . . G.B. Reid and T.E. Nygren I NT RODC CT 10N StiBJECTlVE VEASC REMECT OF WORKLOAD
139 i39 139 140 141
142 143 144 144 146 145 149 151 153 154 156
157 158 159 160 161
162 162 164 165 166 166
167 168 168 171
172 173 174 175 178
Table of Contents M E N T A L W O R K L O A D OPERATIONALLY DEFINED C O N J O I N T MEASUREMENT A N D CONJOINT SCALING Axiom Tests for Conjoint Measurement Conjoint Scaling SCALE DEVEI.OPMENT Analyzing C a r d Sort Data Stability ot Subjects’ Judgments EVENT SCORING Simulation Studies SUMMARY A N D CONCLUSIONS REFERENCES
9. T H E C O G N I T I V E PSYCHOLOGY O F SUBJECTIVE M E N T A L WORKLOAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.A. Vidulich I N T R O DU CT ION DISSOCIATION I N SINGLE-TASK T R A C K I N G DISSOCIATION I N DUAL-TASK ENVIRONMENTS Dual-Task Experiment I . Dual-Task Experiment 2. DISSOCIATION CAUSED BY M O T I V A T I O N A L DIFFERENCES G E N E R A L DISCUSSION REFERENCES
10. INDIVIDUAL DIFFERENCES IN SUBJECTIVE ESTIMATES OF W O R K L O A D , . , , . . . . . . . . . . . . . . , . . . . . . . . . . . . . . . . . D.L. Damos INTRODUCTION PERSONALITY T R A I T S A N D BEHAVIORAL P A T T E R N S Personality Traits Behavioral Patterns RESPONSE S T R A T E G Y INDIVIDUAL DIFFERENCES IN RESOURCE CAPACITY DISCUSSION REFERENCES
xiii
189 i92 192 195 198
200 202 203 208 21 I 214
219 2 19 220 22 1 22 I 224 227 227 228
23 I 23 1 232 232 233 234 234 235 236
11. T H E EFFECT O F G E N D E R AND T I M E O F DAY U P O N T H E SUBJECTIVE E S T I M A T E O F M E N T A L WORKLOAD D U R I N G T H E PERFORMANCE O F A SIMPLE T A S K . . . . . . . , , . , . , , P.A. Hancock ABSTRACT I N T R O D U c rI O N METHOD Subjects Procedure Tasks
239 239 239 240 240 240 240
xiv
Table of Contents Design Physiological Measurement RESULTS Workload Evaluation Weighted Responses IJnweighted Responses Gentler Differences in Scale Weightings Tiine of Day Difterences in Scale Weightings DISCUSSION REFERENCES
24 I 24 I 24 I 242 242 242 242 245 245 248
12. AN ECLECTIC A N D CRITICAL REVIEW O F F O U R PRIMARY MENTAL WORKLOAD ASSESSMENT M E T H O D S : A G U I D E FOR DEVELOPING A COMPREHENSIVE M O D E L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. Meshkati and A. Loewenthal ABSTRACT INTRODUCTION REVIEW Remarks o n Secondary Task Methods Remarks on Subjective Rating Methods Remarks on Performance Measure Methods Remarks o n Physiological Methods EPILOGUE TO T H E DISCUSSION O F M E N T A L WORKLOAD ASSESSMENT M E T H O D S REFERENCES
25 I 25 1 25 I 252 252 254 256 257 259 26 I
13. T H E EFFECTS O F INDIVIDUAL DIFFERENCES I N INFORMATION PROCESSING BEHAVIOR O N EXPERIENCING M E N T A L WORKLOAD A N D PERCEIVED T A S K DIFFICULTY: A PRELIMINARY EXPERIMENTAL INVESTIGATION . . . . . . . . . . . . . . . . , . . . . . . . . . . . . . . . . N. Meshkati and A. Loewenthal
269
ABSTRACT INTRODUCTION DECISION STYLE MODEL T H E C O N C E P T U A L MODEL A N D M E T H O D Experimental Design Independent and Dependent Variables Experimental Method a n d Procedures RESULTS Dominant Decision Style Grouping Results ot Variables for Each Doininant Decision Style G r o u p Behavior of Each Dominant Decision Style DISCCJ SS 10N A N D C O N C LUS 10NS Siiius Air-hythniia Measure
269 269 270 273 273 274 274 275 276 275 280 28 I 28 I
Table of Contents Subjective Rating Measure REFERENCES
14. FUZZY ANALYSIS O F S K I L L A ID RULE-B S E D ITAL W O R K L O A D . , . . . . . . . N. Moray, P. Eisen, L. Money and I.B. Turksen INTRODUCTION METHOD Development ot t h e Skill and Rule Base Curves RESULTS DISCUSSION REFERENCES
15. T O W A R D DEVELOPMENT O F A COHESIVE MODEL O F WORKLOAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. Meshkati ABSTRACT PRESENT S T A T U S O F M E N T A L WORKLOAD T H E O R I E S COHESIVE M E N T A L W O R K L O A D MODEL A N D C O N C U R R E N T TASKS C R I T E R I A FOR A COHESIVE WORKLOAD M O D E L REFERENCES
16. I M P O R T A N T REMAINING ISSUES IN MENTAL W O R K L O A D ESTIMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W.W. Wierwille INTRODUCTION T H E IMPORTANCE O F MULTIPLE EXPERIMENTS T H E C O N C E P T O F FULL MENTAL LOAD A N D ITS IMPLICATIONS FOR SYSTEM DESIGN TASK ANALYTIC M E T H O D S A N D MOMENTARY W O R K L O A D WORKLOAD E S T I M A T I O N BASED O N NORMAL O P E R A T I N G RECORDS EFFECTS O F LEARNING A N D PROFICIENCY O N W O R K L O A D REFERENCES
17. A BIBLIOGRAPHIC LISTING O F M E N T A L W O R K L O A D RESEARCH . . . . . . , . . , . . . . P.A. Hancock, M. Rahimi, T . Mihaly a n d N. Meshkati INTRODUCTION RESOURCES USED TO C R E A T E T H E BIBLIOGRAPHIC LISTING DESCRIPTION O F PUBLICATION SOURCES PUBLICATION G R O W T H C O N T E N T AREA DECOMPOSITION O F BIBLIOGRAPHIC LIST 1N G SUMMARY
XV
283 285
289 289 29 I 293 294 300 303
305 305 305 308 309 310
315
315 316 318 320
322 323 325
329 329 329 330 33 I 33 I 332
xvi
REFERENCE LISTING
Table of Contents 335
1
PREAMBLE
In assembling the present work we were faced with the problem of how and where to begin. We felt that the text edited by Neville Moray was a most appropriate point of departure for this work, but could we summarize in a few succint phrases the state-of-theart that it represented some decade ago? This was proving a most thorny endeavor until happily the problem was solved for us by the timely appearance of John Senders poetical offerings. By kind permission of the author and the copyright holder we are able to reproduce these verses. We leave it to the reader to assess their pertinence to contemporary progress as represented by the contributions in the following pages.
This Page Intentionally Left Blank
3
MENTAL WORKLOAD
John W. Senders Department of Mechanical Engineering University of Maine, Orono Orono, ME 04473
APPLICATIONS For a task in real life, like grinding a knife, The workload can hardly be found A useable measure would be a great treasure For the chap with his nose to the ground.’
PSYCHOLOGICAL ISSUES The effects of emotion are sought with devotion For they clearly relate to the matter But problems arise when you focus your eyes On all the statistical scatter.’
PHYSIOLOGICAL MEASURES Mental workload can serve as a goad For the skin and the lungs and the heart. These organs reply with a trace hard to spy On the physiological charts
CONTROL THEORY For that rarest of praxes: control in two axes, Optimal Control Theory’s great. It’s really quite practical for missions galactical; Though there haven’t been many of late.4
4
J. W.Senders MATHEMATICAL MODELS Mental workload is a gyrating vector In multidimensional space. With an input detector and output selector Ope can fit any possible case.5
Notes 'The "ground" can, of course, refer both to the traditional place for a nose and to the very surface of the grindstone, ugh! 'It is in fact very difficult to focus on statistical scatter. That's why it is so difficult to make any sense of the effect of psychological variables on workload. One should still try, even if only to find a place in the workload structure for motivation, boredom, drive, and such things. 'Chart records have, of course, been replaced in recent years by computer records and FFTs and the like. Still it is always instructive to took at the traces on the chart if only to reassure oneself that what the FFT seems to tell you is really there. IFinding a relevant rhyme for axes gave me a wonderful sense of accomplishment. T h e statement is, in my opinion, true: one can solve all the problems of space flight but the difficult earthbound ones are really tough! 'This was intended to describe my own contribution to the volume on mental workload of the N A T O series, edited by Moray. T h e more 1 look at it the more I think that it fits a great many other theories of mental workload as well.
This extract is taken from the Human Factors Bulletin, 1987, 30. p.6. Copyright (1987), by the Human Factors Society Inc, and reproduced by permission. We would also like to thank John Senders for his kind permission to reproduce the series of verses.
HUMAN MENTAL WORKMAD P.A. Hancock and N. Meshkati(Edit0r.s) 0 Elsevier Science Publishers B.V.(North-Holland), 1988
Henry R. Jex Systems Technology, Inc. Hawthorne, California 90250
An overview is given of the problems of defining, quantifying, and measuring mental workload during interactive human/machine tasks, based on the author's work in the areas of: aircraft handling qualities; human operator modelling, measurement, and prediction; multi-display scanning; and psychophysiological correlates of mental workload. The frustrating cycle of promises--then-problems with psychophysiological measures of mental effort is assessed, and the importance of workload rating techniques is emphasized. The lack of a unifying theoretical approach is identified as the main impediment to progress, and an approach is suggested that can handle both continuous and discrete task loads. A review is given of some "new" (c 1980) workload measurement concepts such as: multi-dimensional ratings, the "imbedded surrogate" auxiliary task method, and the measurement of "workload margin" via the Cross-Coupled-InstabilityTask (CCIT). IWllWDUCl'ION'
The technology for measuring the task demands of human operators during their interactions with machines has been of abiding interest to engineering psychologists. The performance of the man machine systems has a usually nonlinear, often precipitous, and sometimes catastrophic decrement with increased task loading. Consequently, the conditions for incipient operator overload are difficult to predict, despite the importance of doing so for both the safety of the operator and the consequences of task errors. The human subsystems involved include the perceptual, neuromotor and biomechanical ones, in which the field of ergonomics has an extensive data base and fairly well-established prescriptions for successful designs or remedies. But also involved are the more psychological attributes such as: motivation, anticipation, skill, and fatigue; these greatly complicate the picture and often bring the level of applied workload technology from "good standard practice" to "an erratic art." References (11 and [ 2 ] discuss these issues comprehensively.
'This is the author's 1980 Franklin V. Taylor Award Lecture, given to the Engineering Psychology Society of the American Psychological Association in August 1981. at Los Angeles, California.
H.R. Jex
6
My interest and research in this field stems from over twenty years of research and applications, in attempting to quantify the "handling qualities" of automobiles, aircraft and manned spacecraft via: engineering analyses, empirical ergonomic rules, and ad-hoc pilot-vehicle simulation experiments. In this fairly successful work, we were able, through the great power of McRuer's Rules, to predict the likely human visual-motor behavior (in a control-law sense) and show by experiments (which included carefully developed subjective rating schemes), that: the subjective impressions of vehicle handling qualities are related primarily to the mental workload involved in creating and executing the control-laws(s) appropriate to the given task. (See [3] or [ 4 ] ,a version of McRuer's Rules is given in the Appendix). This was an often frustrating process which led us into many probleus, with slow and halting progress, always led on by the promise of a more rational and useable workload technology. From a retrospective, but optimistic viewpoint, this lecture attempts to make the following points regarding the measurment of mental workload: The PRDBLEIIS are characterized by:
--
-_--
multifaceted definitions, attributes and criteria vaguely defined, and poorly understood mechanisms indirect measures of mental workload embryonic levels of analytic models and computational methods
PROGRESS is being made in terms of:
--
---
more concordant definitions of the mental workload factors improved measurement techniques better and validated theory and models
There remains the still elusive PBOlIISES of:
__
---
Psychophysiological co-variates of mental workload, such as: evoked-response-potentials of the brain, and covert measures of neuromuscular signatures More standardized and validated workload tests and measurements, to match the spectrum of needs Evolution and acceptance of Standardized Workload Design and Evaluation. Specifications, of similar usefulness as those applied for decades the vehicle handling qualities Specifications and Guidelines (e.g., MIL F-8785-C, which is in its third edition). Whole departments of major aerospace companies work effectively under these carefully flight-validated Guidelines to evolve stability augmented aircraft which place minimal. control demands on their pilots in given mission phases. However, when it comes to overall aircrew task workload, there are no proven and agreed-on guidelines.
Developing a general technology for mental workload is a multifaceted problem, covering a broad range of: situations, time scales, influences, situations, and applications; Figure 1.
Measuring Mental Workload Figure I
MULTIFACETED PROBLEM SITUATIONS:
Pilots Aircrew (Rodor, Awocsl Ground crew (A.T.C.,. . .I
TIMES: .Ol-.I days
Air combat 8 altock Long missions; ATC, CIC Sortie surge Logistics Career burnout
I
3 10
1000' INFLUENCED BY:
Skill, troining, practice Motivotion, risk Fotigue
APPLICATION TO :
Bosic reseorch Clinicol evaluations Design and development
The situations of interest may cover any task or situation in which mental effort is expanded more or less continuously. Here, our interests center on the operators of vehicles and systems (drivers, ship crews, aircrew, astronauts), % highly interactive and dynamic systems (e.g.. video game players, weapon systems operators, and command-and-control operators). The time scales involved in general mental workload problems vary widely: acute operations such as air combat or landings having durations of a few minutes to a few hours (.01 - .10 days); long stressful mission phases, such as: flights near guarded borders, combat-information-centers and antisubmarine pursuit operations (.l to l.+ days); sustained ensembles of acute operations such as "sortie-surges" of attack or defense aircraft (1 - 10 days) ; intensive logistics supply operations (10 - 100 days), and finally, executive and officer "career burnout" (100 - 1,000) days. Here, we focus on the acute workload conditions having intense workload periods from minutes to hours. The most troublesome problems are due to the diverse and primarily psychophysioloRica1 influences on one's mental workload, such as: the degree of training, practice, and application skills one brings to the task; the motivation towards performing at high mental efforts and the perceived risks o f doing s o ; and the acute o r long term "fatigue" of such continous mental effort. It is difficult to even find useful definitions o f such factors as fatigue and motivation, let alone to measure or model their influence on mental workload! Finally, the diverse applications for applying mental workload technolcombined with a lack of a connecting theoretical basis, makes it necessary to gather and collate an enormous empirical and ad hoc data base for each problem application. Because many of the covariances among the workload variables are nonlinear and multifactored, most of the commonly
w,
H.R. Jex
8
used statistical procedures become inefficient or inappropriate. For example, Analysis-of-Variance, properly used to test the significance of each tested factor, is useless for reconstructing the highly nonlinear "performance surface" connecting the several variables and levels needed to interpolate results over the application of interest. The research data-base needs differ widely from the clinical application prescriptions. Nevertheless, system design and d e v s l v n t .launrla a mature combination of approaches. each proparly validated in field trials.
"Workload" covers a broad spectrum of human activity, but in "mental workload" we limit these activities to the primarily mental and physical coordination ones, such that muscular fatigue is not an important factor. Unfortunately, like the terms; happiness, love, and fatigue, the term mental workload is a primitive construct which "everybody knows, but hardly anybody can define in precise, operationally useful terms. I'
All of the following involve mental-workload-like activities, but who can define the "workload" in a measurable sense7 A battleship running "in harm's way" A time-shared computer facility a
...7
. . .7
A company's financial department at tax time ...? An adaptive autopilot during maneuvers
. . .7
An aircrewman tracking an air-to-air target or a submarine via multiple sensors . . .7
. . .7
The besieged battleship most clearly illustrates our dilemma.2 If you were asked, as an outside observer, to evaluate the "workload activity" of that ship, what would you do? You might first ask what sort of workload is of interest: the frequency of activities such as turning valves to control the speed, power generation, rudders, or stabilizers?, . . . the "perceptual" activity in watching, evaluating and interpreting inputs from radio, radar, sonar and infrared sensors (sometimes impaired)?, . . . the "data-base management" activities in keeping track of ship's status, available resources, and staff skills and their availability?, . . . the command and control activities in assimilating the above information against expectations based on prior knowledge, evolving a strategy for response, allocating available resources, delegating certain activities to the appropriate (and available) party, (or, when necessary, taking over direct conactions of the captain in monitorinE and managing these activities to leave some margin for contingencies. Of course, the correct answer is: Any and all of the above, as is appropriate to the needs of the questioner.
m)?,.,.
zI first heard of this "battleship" illustration from Walter Schnieder, at the Carmel-IV Conference on Evoked-Response-Potentials.
Measuring Mental Workload
9
One measurement attribute could be the information transmission traffic between all of these groups, if one could define the nebulous pathways and measure the complex signals. But that might neglect, for example, the relative importance of the computerized radar/threat/decision signals versus the captain's single command to "launch missiles." After measuring everything transducible, one soon realizes that the absolute levels of these workload activities are leaningless without reference to some norma of scale, skill and lotivation. Does twice the rate of targets detected, decisions made, or weapons launched (even when compared with a companion ship nearby), imply: a larger crew, more experienced crew, more technically advanced apparatus, or temporarily frenzied activity inspired by an incoming cruise-missile7 Are the crew's increasing groans and gripes about about being "overworked" any clue to incipient overload of the ship's capabilities7 In this example, you can see the many analogies with the activities of a human in an interative control task and, especially, in concurrently running tasks. The questions, concepts and solutions exemplified in this approach to the battleship workload problem are directly applicable to our approach toward measuring mental workload in an individual operator. One thing is clear - no single activity, or signal, or measure, or evaluation is adequate for the whole problem. Mental workload is intrinsically complex and multifaceted. A concept model for building a more comprehensive computational model of the ebb and flow of interactions like these is given in Figure 2. It builds upon the control theory concepts of hierarchal control of multiinput multi-output (MIMO) dynamic systems, a field in which Systems Technology, Inc. has been active for decades, albeit for more rational processes. Besides the dominant influence of McRuer's Rules, the similar views of Warren Clement at STI and Dieter Jahns [6] also bolstered my thinking in the early 1970's. In the man-machine control tasks of primary interest here, there are a number of competing objectives that must be achieved by the operator: 1)
Stable operation is of prime importance (e.g.. the direction of travel must be under some control)
2)
Performance results must satisfy the operator's goals and mission criteria (e.g., stay within safe landing parameters, provide accurate weapon delivery)
3)
The achieved perceptual-motor workload must lie within the operator's current limits, as set by intrinsic abilities, state of practice, and as influenced by motivation.
H.R. Jex
10
Figure 2
BLOCK DIAGRAM SHOWING INTERACTIONS AMONG MAN-MACHINE STABILITY, PERFORMANCE, AND WORKLOAD Achieved Workload
Workload Margin (excess control capacity)
r
-
___c
Peforrnance
priorities)
Varidblllty
Dirtu:bances
T
Achieved S ta bllit y Achieved Performance
The block diagram illustrates the strong interactions involved as an operator seeks an acceptable compromise (i.e., optimum) among these often conflicting criteria. The inherent feedback nature of the ongoing tradeoffs is designated by the comparison operations. [-] For a detailed discussion of each element and concept, see [5].) The key points to note are : The man-machine interface is represented by the adjacent "sensorimotor-control" and "controlled element" blocks. The strategies, adjustments, and allocation of attention to concurrent tasks is handled by the "mtacontroller" block, a sort-of workload supervisory system (initially proposed by Professor Tom Sheridan, of M.I.T. [7]). There are continual and interactive waxing-and-waning activities among these blocks (reflecting the variations in environment, disturbances, and occurrence of events. The hierarchy of loop priorities is: 1) 2) adequate performance; 3) acceptable workload.
stability;
The key problem blocking the developlsnt of a colprehensive and usable technology of mental uorkload is the lack of a proper theoretical f r a m work, along w i t h analytical models and their disk bases (in compatible terms), to flesh out the concept model shown above.
Measuring Mental Workload
11
Most operational definitions of mental workload observe task activity as an measure of workload, but there are serious difficulties in this approach. Try a thought-experiment: a male and female are walking diagonally down the broad steps of (say) their office building, concurrently buttoning up their similar overcoats and chatting about where to eat lunch. Observe their perceptual scanning, motor functions, and verbal activity. Now suddenly change only one or two things: let their coats be switched (so the buttons are on opposite sides) and put some thin ice on the steps. Suddenly the chatting stops as their metacontrollers cope with the new situations: the subconsious buttoning subroutines must be replaced by conscious sensorimotor actions, and the practiced stepping routines are replaced by concurrent perceptual-motor guidance and balancing. The same measurements would show: only slight changes in buttoning performance, barely detectable impairment of balancing, but a near stopping of the chatter. Their metacontrollers (mental workload) have switched from minor activity to furious activity, but - - where are the observable manifestations? The primary activities are maintained at the expense of the secondary ones, as the excess workload capacity is utilized by the metacontroller. This phe-non is characteristic of interacting closed-loop feedback systems. and it renders human/machine performance measures insensitive to many disturbances and variations of ambient CondftiOns. Only their auxiliary tasks (e.g., talk of restaurant choice; a lower priority loop of the metacontroller) show noticeable effects. They could also tell you (if asked) that they "were busy buttoning and balancing" all without changing their pace! Humans are aware of their metacontroller activity.
It seems clear that modern concepts o f &fining and measuring mental workload should focus on the mtacontroller's activities. As noted in Figure 2, the metacontroller: directs perceptual attention; sets performance priorities and "indifference thresholds;" copes with interacting goals, expectations, strategies, and subroutines, as well as unforeseen events; and it reserves margins for contingencies. It is easy enough to postulate such a functional subsystem, but it is hard to find; being diffusely located throughout the central nervous system and not easily accessible for observation or measurement. Nevertheless, based on years of research, it is our c d c t i o n that the human operator is subjectively aware of his metacontroller activitg, and he can introspectively evaluate its -workload ParRin- (the excess capacity between the current demands and current metacontroller capability limits). A definition which embodies these concepts is:
Uental Workload is the operator's evaluation of the attentiwal load -Kin (betueen their mtivated capacity and the current task demunds) uhile achieving adequate task performance in a mission-relevant context. Let us clarify this definition. Referring to the Figure 3, the time course of mission phases and tasks imposes varying demands on the metacontroller's activity (mental workload). Some complex tasks, like a procedural steep-turn during landing an aircraft, take little attention; while other tasks, like weapon aiming, take much attention. The metacontroller's
H. R. Jex
12
-
Figure 3
CONCEPT OF WORKLOAD MARGIN Physiological .Capacity
TASK WORKLOAD t Mission rslevont units 3
d Taak demands
"capacity" has some physiological limit (seldom approached), and a fuzzy band of "motivated capacity," which can vary with mission-phase importance and urgency. The "mental workload" of which the subject is aware is the margin between the current task demands and motivated capacity. As time progresses, the capabilities may change due to: practice (increase), fatigue (decrease), or boredom (decrease). Further, one person's capabilities may differ from another's due to: different psychophysiological endowments, different training, and recent practice. These factors complicate the measurement but do not change our definition.
I feel that many attempts to measure mental-workload fail to meet the basic criteria that must apply to any measurement for building a valid, empirical data base. The criteria in Figure 4 should be kept in mind when reastaring human-whine interactions and activity of the metacontroller. TYPES OF
-
Workload measures can be objective (measurable events scores, or activity levels) or subjective (introspective evaluations of effort or margins). Considering again the earlier block diagram, it is apparent that both types of PB(LBUTB should be takan for each test or evaluation, and that multiple masures may often be required. These group as follows:
Measuring Mental Workload
13
Figure 4 CRITERIA FOR MENTAL WORKLOAD MEASURES 1. Relevant
direct connection with the mental workload or its components
2. Sensitive
monotonic trend with respect to mental workload (as defined above) high test-power with respect to workload variable i . e. , high (covariancehesidual) error insensitive to other variables or ambient environment
3. Concordant
ubiquitous trends in target population
4. Reliable
proven test-retest repeatability "differential stability" (parallel trends) among subjects with practice on a task validated means and variance statistics. with norms for the target population
5. Convenient
easy to learn and administer portable, for use in field trials and evaluations
low cost, for a given level of measurement reliability
Objective Measures: Task Demands: The characteristics of the imposed tasks; types (continuous control, number of axes in parallel, discrete decisions, etc.) as well as their criteria for adequate performance, rates of onset, and priorities. Task Results: Performance measures, errors, achieved task loads, etc. Correlated Measures: Gross motor activity, gross psychophysiological (PP) measures (heart and breathing rates, muscle tension, eye scans and blinks, speech fluency properties, etc.), subtle PP measures (electrodermal responses, electromyograph, electro encephelograph, voice effects).
H.R. Jex
14
Subjective Measures: 0 On-Line reports of mental workload levels (verbal) Post-test evaluations (questionnaires, rating scales) 0 Explanations of high-workload events Remember one important point: In the (current) absence of any single objective measure of the diffuse metacontroller's activity, the fundamental measure, against which all objective measures llust be calibrated, is the individual's subjective workload evaluation in each task.
It is not my intent to review all workload measurement methods, but to discuss a few in which we have made significant progress, or have some comments. Figure 5 lists some of the methods and the properties which are best revealed by them. Only those with an asterisk will be discussed here. Let me comment, first, on the AUXILIARY TASKS methods. As mentioned earlier, a human operator tries to maintain acceptable performance on those concurrent tasks which are of primary priority in the mission, at the expense of attention to tasks of auxiliary or secondary priority. This holds right up to the point of incipient (and occasional) overload. Slight relaxation of one can greatly ease the other. S o , primary task performance decrements are usually not sensitive to reduced workload margins unless the auxiliary task is somewhat artificially increased in its attentional demand. (Later I show how.) If the main task's priority and performance criteria are fixed, then the decrement in attention to the auxiliary task makes the latter a sensitive measure. Finally, it is possible to adaptively adjust the side task difficulty to maintain a given primary task performance, using measures of the side task difficulty as a more sensitive measure than primary task performance alone would be.
Figure 5
WORKLOAD MEASUREMENT METHODS METHOD:
ATTRIBUTES MEASURED:
PRIMARY TASKS: Task/Time Analyses Eye Scanning Traffic Operator Dynamics
Input Load Attention Allocation Changes in adaptive model parameters
AUXILIARY TASKS
*
Fixed Auxiliary Task Constant Main Task Adaptive Auxiliary Tasks
j,
PSYCHOPHYSIOLOGICAL CORRELATES -WORKLOAD NERVE" ACTIVITY
Main task decrements Auxiliary task decrements Main task performance held; auxiliary task decrements
* SUBJECTIVE RATINQS METACONTROLLER ACTIVITY: FINAL ARBITER *
P
Discussed herein
Measuring Mental Workload
IS
SEQUENTIAL SlJBJEfXIVE BATING ScALgs If subjective evaluations of mental workload are the ultimate arbiter of all objective workload measures, why aren't they readily available and highly refined? They are; but in a technology not often used (or appreI refer to the empirically ciated) by many engineering psychologists! developed methods for rating the "handling qualities" of aircraft. Over two decades ago, faced with similar evaluation problems regarding pilotaircraft's control-response-suitability for various mission tasks, engineering test pilots (notably George Cooper of NASA-ARC and Bob Harper of Calspan Corporation) developed a flying qualities rating scale which (when done correctly) is much more sophisticated than commonly understood; one which has proven its usefulness in hundreds of validated, real-world aircraft problems. The Cooper-Harper Rating procedure, shown in simplified form in Figure 6, is a sequential-decision process, with criteria and priorities remarkably close to the interactive man-machine loops discussed earlier. First comes "controllability;" then "performance" with respect to mission derived criteria, then any modifications needed with respect to the pilot workload optimization to achieve that performance. Three levels of shading are allowed within each major region defined by the sequential decisions. We have found that the last two questions (3 and 4) are dominated by workload issues, so the Cooper-Harper Rating paradigm is quite relevant to workload evaluation. In an important bit of research, seldom referred to outside the flying qualities community, Jack McDonnell (then at Systems Technology, Inc.) and I investigated the psychometric properties of the Cooper-Harper Scale (CHS). Factor analysis procedures and non-parametric scaling techniques pioneered by Osborne and Thurstone (see [ 8 ] for details), were used to expose the dominant factors on which a variety of aircraft handling
Figure 6
SEQUENTIAL-DECISION RATING Ouestions in Misslon-Required Context Controllable System?
1.
Yes
No
' r
2.
Achievable Mission Requirements?
No
Yes
3.
System Modifications Required?
No
4.
Shadings
Yes
I
I
COOPER-HARPER RATINQ SCALE
16
H. R. Jex
qualities were rated. These methods allowed us to "linearize" the raw ordinal rating scores to the level of an interval scale, suitable for parametric statistical analyses. The latter process is shown in the Figure 7 , and was based on questionnaires returned by 80 test pilots located in North America and Europe. The method, which distorts raw ratings to maintain an approximately equal "subjective discriminal dispersion" across the majority of the "psi-scale,"provides sufficient homoscedasticity and equal discriminations to allow powerful ANOV techniques to be used on rating scale data. Similar procedures could be easily applied to a sequential-type Workload Margin Scale.
The above rating scale research also included attempts to tease out the "principal factor" dimensions underlying some 90 descriptive phrases commonly used in evaluating aircraft handling qualities. Our interpretation of these (less conclusive) results indicated that three key aspects uere dodnant: "attentional demand, "difficulty of control," and "adequacy for the (specified) task." In fact, a five-level attentional demmd scale was evolved from the array of 90 descriptors which would have nearly psi-scale equal subjective properties (i.e., equal subjective discriminance variance; see ( 8 1 for details).
Ronking Ef concordance of row m n b of 90 Handling Ovolity descriptors by 80' Test Pilots (world wide) [Ordinal scoles non-equol wrionce 1
F m : J. D. Mc Oonnell , 1968 AFFOL TR 68-76 [Similar developments in Conjoint Rating Scolesl
Measuring Mental Workload
17
About a decade later, Tom Sheridan and his colleagues at M.I.T. proposed that the dominant factors in mental workload were "busyness" (rate of coping), "complexity" (difficulty of component tasks), and "anxiety" (about consequences of actions), [9]. These are similar to those given above, so a comparison was made among these three and the embryonic work of (then) Major Robert O'Donnell and associates at Wright Patterson AFB, on the Subjective Workload Assessment Technique (SWAT) [lo]. The comparison is shown in Figure 8 (with one slightly later Carmel-IV Conference source added [Ill). There is a remarkable concordance evident among the three principal factors and different investigators. (There was some acknowledged influence of the M.I.T. work on the SWAT categories). This is an encouraging development which fulfills one of the heretofore missing criteria in subjective workload assessment, that of concordant definitions.
Figure 8 CONCORDANCE AMONG THE PRINCIPAL FACTORS OF SUBJECTIVE WORKLOAD
IFACTOR
SOURCE Psychometric Handling Oual. Scale J. McDonneli 1968
M.I.T.: Sheridan et al., 1979
"Attentional Demand"
-Difficulty of Control(Equalization)
"Adequacy for (Specified) Task"
"Anxiety" about consequences coping)
-
component tasks) .-
USAF/AMRL Wright St. U. Subjective Workload Assessment Technique 1981 Carmel TV Conference -Evoked * Response Potential" 1982
1
ci
"Psychological Stress" ^Pucker Factor-
"Frequency" of cognitive involvements
"Level" of cognitive involvement
Tonsequences" of cognitive involvements
H. R. Jex
18
Summarizing, I feel that the three dominant factors in subjective workload are:
_-
the rate of coping with control actions or decisions; the frequency of attentional demands whether simple or complex
B. Complexity
--
the cognitive difficulty of the component tasks or strategy; the degree or depth of attention required
C. Consequences
--
A.
Busyness
the concern or importance of the task's performto mission success or personal safety
ante
This is a useful set, because the first two can be conveniently found from ground simulation tests and/or data-base interpolations, while the third correctly allows for the ubiquitous difference between laboratory and field (e.g., in-flight) ratings. Rating scales are often criticized with respect to "concordance" and test-retest "reliability,"but with proper techniques (such as the CooperHarper sequential decision scale, or the SWAT conjoint rating technique), both criteria can be met, using experienced, well-practiced test pilots (or test drivers) who are representative of the target population of pilots ( o r drivers). Appropriate "relevance" and good "sensitivity" are assured by selection of rating-term definitions to suit the situation, and it is hard to beat the "convenience" and cost of logging a check-off table or tape recorded rating. Let us evaluate such a properly developed Mental Workload Rating technique by the measurement criteria given earlier in Figure 4 . This evaluation is based on long experience with the Cooper-Harper Scale: Criterion:
Relevant
Subjective Workload Rating
JJ
Sensitive Concordant Reliable Convenient
JJ
J
J
JJ
Cornpace your favorite workload method against this I PaOGBesS IN AUXILIARY-TASK TECHNIQUES
In the late 1960's System Technology Inc. developed the Criticalinstability Tracking Task (CTT), in which the operator, using a compensatory display, stabilizes a first-order unstable controlled-elementby closThe ing the control loop with proportional control corrections [ 1 2 ] . controlled-element is, in transfer function form: Yc(s) -KA/(s - A ) ; where A is the degree of instability. As the instability is progressively increased (using a carefully developed autopacing algorithm), a "criticalinstability" is reached where the operator can no longer stabilize the loop, and control is lost very precipitously. The critical instability, denoted as Ac, is dominated by the perceptual-motor delays of the operator, including any scanning of attention away from the display. The CTT is one of the most well-validated tasks in the psychomotor test tool box. Its task-induced behavior is uniform in all subjects and is well understood and validated (both theoretically and empirically); its score statistics are monotonic, near-gaussian, and display differential stability; and score norms are available for typical operators and conditions (e.g., see [13], [14],and [ 2 7 ] ) .
-
Measuring Mental Workload
19
In 1970 we (Wade Allen, Warren Clement, and I) developed a "subcritical" version of the CTT to investigate divided attention affects, such as: display scanning, sampled signal reconstruction, and interactions between conflicting attentional demands [15]. We were able to show, in an elegant series of experiments involving an eye tracker, two tasks and displays and various controlled elements (one with variable subcritical levels of A ) , that the attention to the main task could be systematically, but naturally, varied by requiring the attention to a subcritical side task to be controlled by using different levels of A . The experimental setup is shown in the upper portion of Figure 9 (without the Cross-Coupling Algorithm; discussed later). Some typical results are shown in Figure 10, where the main task attention dwell fraction, Td/Ts; (measured from the eye scanning data) is shown to vary inversely with the side task difficulty, A , in accordance with the theoretical limit shown. (The formula reflects the observation that the time away from Yc2 must be less than TA 1/A. See [15])
-
The primary task performance (error) is impaired less than 30 percent despite the great increase in workload from the increased divided attention. This is because the side task's A ( . 5 to 2.0 r/s) .is a small fraction of the subject's Critical Instability limit of A, 5. The excess control capacity is still adequate, so incipient loss of performance is barely reached, in accord with earlier comments.
-
Beyond a level of A A 2 the whole system performance deteriorated significantly, making continuous runs impractical with 30 degrees of display separation. Summarizing; a subcritical tracking side task (SCT) can be used to force divided attention in a natural and predictable fashion. using up excess control capacity and, thereby, to vary the workload margin in an efficient manner. Detailed instructions for using this SCT technique are given in [ 5 ] and [ 2 7 ] . Three important insights resulted from this work:
1. Subcritical auxiliary task loading was a natural way to utilize the remaining workload margin in a primary task, and its level, A,, was a good indicator of that margin in a meaningful way. A good rule of thumb for cases where parafoveal viewing of the main task is minimal, is that: the t h e away from a subcritical side task (Td) is about one 1 / A ; that is: half of the instability t h e constant, TA Td 5 0 . 5 / A . (See [ 5 ] and [15])
-
2. The degree of. secondary task loading L. can be normalized aad non-dhensionalized by dividing As by the individual's current critical instability limit, A,: L As/Ac. This helps to account for the individual differences in skill and practice which often confound workload measurements. The long-sought nondimensional task loading is here!
-
3.
The nonlinear growth of main task errors, as the excess control capacity is absorbed by increased side task difficulty, shows a sudden growth beyond the near-fully-loadedoperating point. A main task error growth of about 30 to 40 percent
H.R. Jex
20
Figure 9
ELEKNTS OF THE SUB-CRITICAL AND CROSS-COUPLED INSTABILITY TASKS
- DISPLAYS
OPERATOR
CONTROLLED ELEMENTS
r----1 Primary Task(s)
L
--
I
i
Cross- Coupling Algorlthm
CROSSCOUPLED
Filtering, Inltlalizing ,Comparing,
I
I
---I
AX
instability Level
Adjusting, Timing, and Scoring primary Tas h (5) Performance
t Error-Increase Criterion (LO< Ec'1.31
Figure 10 CONTROC OF MAIN TASK ATTENTION
0 0
BY SUBCRITICAL SIDE TASK
1.0
'
A
2 Y c , - : ; Y c z ' -A ~
Display Separation : 30° Wain Task Error
161 Icm 1
Measuring Mental Workload
21
(relative to the unloaded condition) seems to be the necessary just-noticeable-differenceon which to base incipient overload. However, the absolute levels of unloaded (and thus the incipient overloaded) performance vary from one individual to another, and from day to day. This precludes the use of primary task error as a workload measure, per s e .
AUTOMATIC
OF WORKIBAD W I N
The problem was how to adjust A s to keep workload near the incipient overload point but not to exceed it. The solution, which took much evolution during 1968-1972 by Jack McDonnell, Wayne Jewell, Wade Allen, Ron Hess, and I, was the "Cross-Coupled Instability Task" (CCIT). Here, the degree of a subcritical side task difficulty, Ax, was adaptively adjusted to maintain the primary-task-ensemble's error to within 1.3 to 1.4 times the unloaded level near the (unloaded) start of scoring run. The somewhat complex algorithms for: on-line error scoring, Ax adjusting and final Ax scoring are beyond the scope of this lecture (See [17] for details). In careful and experienced hands the CCIT can give excellent results, but it requires careful test preparations, well practiced subjects, and some control theory background to apply successfully; at least in its present (1981) state of refinement. Call me first, if you plan to use the CCIT! An exemplary CCIT application was to investigate the effects of combined variations in display quantization and controlled element order, both involving complex perceptual and control-law behavior by the operator (i.e., from no-lead to full-lead equalization) and, consequently, showing workload performance tradeoffs [18]. The "Performance Penalty" metric proposed in [17] was used, in which the rms error normalized by the rms input was summed with the weighttd (here, l.O)-ratio of the inverse workload margin noted earlier, i.e.: &/Ix. (Here, A, is the ensemble average for all test sessions by an individual.) The results, shown in Figure 11, illustrate the following points: The error measure (white) is insensitive to quantification, but the workload measure (dark) is sensitive to it. The errors increase only with plant order, while the workload increases even more, such that the overall performance metric, P is a strong function of plant order, i.e., of the degree of lead equalization (cognitive difficulty). 0
For the acceleration control case (Kl/s2), the coarser quantization appears to help the operator to produce the lead equalization required (lower mental workload) but at the expense of more error. There is good agreement with some of McDonnell's earlier workload data for corresponding inputs and elements. [ 8 ]
An early version of the Cross-Coupled Instability Task was used to check the workload margins for a variety of controlled elements characteristic of a range of aircraft and spacecraft, spanning the 10-point CooperHarper Scale from 2 to 9. The fixed base simulation task was compensatory
H. R. Jex
22
Figure I I
N P I C A L APPLICATION OF ADAPTIVE-WORKLOAD TESTING USING THE CCIT
(FROM HESS AND TEICHGRABER, 1974) Performonce = f (Error + Worklood) Penalty
\\ p
1.4
t
-
r;/
: B + L
' KI -
A, 0 Subject A Dota; Bars = Meons oi
S2
0 Worklood Doto from McDonnell: AFFDL TR 68-76
n
1.2 -
P 1.0 .8 -
.6 -
4-
-
.2
Di& Quanta Level (cm) Operator Lead Req't
(" 1 I 0.254
01 0.254
0.508
0.508
None
1
Some
I
L.0.254 J
11
0.508 Much
tracking of a randomly moving target (like gunnery or re-entry orientation). The pilot was well practiced in each case, and the controlledelement gains (response sensitivities) were pre-adjusted to select the optimum set for this comparison (for details see [ E l ) . The most relevant results to this lecture are shown in Figure 12, where the Cooper-Harper Ratings (made after an unloaded run) are compared with (Here, the level of A, was manually the Cross Coupled Instability A,. adjusted to keep a smoothed-error-magnitude measure under a 1.3 rise over the unloaded case.) Also shown is the critical instability range (average A, 5.5), the normalized Excess Control Capacity or Workload Margin A,/Ac, and its compliment, the "Attentional Workload" of the primary task. It is well known that the "rate-control" elements (K/s-like) are easy to use for tracking, and this is true for the other elements with Cooper-Harper Scale As the controlled elements approach a K/s*-like CHS ratings < 3.5.
-
23
Measuring Mental Workload Figure 12
CROSS A D A P T I M MEASURE OF EXCESS CONTROL CAPACIlY FOR SEVERAL EXAMPLES OF PRIMARY CONTROLLED ELEMENTS 1.0
I 0
I
0.4
0.2
I
0 I
0.2 0.4 0 . 6
0.8
1.0
I
I
0.8 0.6
I
I
I
I
I
I
I-(X,/X,)AttentionaI (A,/Ac)
Workload (operator demands)
Excess Control Capocity (workload margin)
Day to Day Range o f X c , T h e Critical, Limiting Score
I 2
Controlled Elements : 0 K / s Rate Control x K/[s2+2(.7)(16)s +(1612] 4- K/[s2+2(.7)(7.8)s +(7.812] ALongitudinal /C:
3
I
.-+F 4 0
CL
L 5
n
fi K / s ( s + ~ )
2 6 El K / s 2
7
B K/(s-2)
1
A/C: Lateral
Acceleration Control Unstable Vehicle
8
'
0
1
2
3 4 5 Xx(rad/sec)
6
7
response, lead-equalization (rate detection generation) must be present to stabilize the loops, hence attentional demand and mental workload increases and the CHS ratings deteriorate. The unstable case shown would be nearly impossible to fly as an aircraft, except for short-term emergencies, and it gets an appropriate CHS Rating of 9, while the Attentional Workload measured over 90 percent (under 10 percent Workload Margin). The fact that there is a lonotonic trend of the subjective CHS ratings (dominated as discussed earlier, by mental workload) and the CCIT scores is very significant and important, because it fulfills one of the key criteria of Figure 4 . The apparently linear correlation shown by these data must be considered fortituous, since raw CHS ratings were shown to have a nonlinear psi-scale. Much more work remains to be done in following up on this promising start with the CCIT; convenience and concordance being key issues. PROGRESS OA A THEORY FOR DIVIDED ATTENTION
Finite h e l l Sampling Theory
One of the key obstacles to progress is a comprehensive theory for the a) quasiinterfering effects of tasks performed concurrently, i . e . : continous tasks (as in multi-axis control), b) discrete tasks (as in decision-action pairs), or as commonly occurs, a mixture of both. Substantial progress has been made by Warren Clement and others at S.T.I., which builds
H.R. Jex
24
on the "sampled-data noise theory" of Bergen [19], and extends and validates this theory for human operator display scanning, sampling and signal reconstruction [15]. Concurrently, a parallel effort by Bill Levison at Bolt, Beranek and Newman, Inc. was being developed, based on the Optimal Control Model of the Human Operator, and assuming quasi-parallel control loops with a (Weber-Law-like) noise-ratio in each observer channel. Several of the corresponding results (e.g., the scanning "remnant" spectrum) are similar for both approaches [20]. Let us review Finite Dwell Sampling and its consequences, see Figure 13. Consider a human randomly scanning and sampling one or more displays with an average intersample period T, (sac) and variability (standard deviation) oTs. Each signal is perceived for a finite dwell time Td, for an average dwell-time fraction of q Td/T,. The perceptually reconstructed signal thus consists of the actual signal x(t) over Td and 0.0 over T, - Td, as sketched for one sinusoid in the figure below. Now, subtract that portion of the signal linearly correlated with the actual signal (thus given by its "describing function" - - shown by the dashed line). The shaded difference represents the scanning and sampling noise, or "remnant." From the early work of Bergen, and its extension in [15], it can be shown that, when sampling is not periodic (it has rms variations OT~), this circulating remnant becomes wideband noise and is "demodulated" to frequencies well below the average scanning and sampling frequency. This noise can be characterized at these lower frequencies by a first-order power-spectral density in terms of circular frequency (w 2,f) :
-
-
jsignal units)2 rad/ sec
Figure 13
EFFECTS OF FINITE DWELL SAMPLING ON THE RECONSTRUCTED SIGNAL'S DESCRIBING FUNCTION AND REMNANT
Descr lbing Function Components: Gain S 1.0, rs 10' Finite Dwell Sampling Td/ Ts = .5 5 7
X
Sample Intervals
Eq. 1
Measuring Mental Workload
25
Without going into details (see [5] and [15]), the following important features are implied for sampling remnant:
- -
Amplitude is Weber-Law-like: n2
- x2
-
Amplitude varies with sampling variability:-n2 Amplitude varies with "undwell fraction:" n2
- UT,
- (1 -
a)
Spectral shape is like first-order-filteredwhite noise An important point is that the undvell portion of a sampled display can be used to samplejreconstruct another display, or to perform a discrete task. As we will see, this is the key that opens a way to treat diverse effects of intermittent attention. Saupling Effects on Control Performance We have applied this finite dwell random sampling theory to the modeling and measurement of tracking displays [15], [21]. The details are too complex to give here, but the key effects can be seen in the sketch of Figure 14 and are as follows: Finite-dwell quasi-random scanning and sampling reduces the loop gain (tightness), but adds little to the effective loop delay. The optimum operator gain is less, too. Because the scanning and sampling noise is multiplicative, lightly damped modes are greatly excited by sampling noise, and the closed-loop errors can blow up as gain is increased, before the loop becomes dynamically unstable. (This is termed "error instability in the mean-square sense" and it is akin to the well-known phenomenon of conversational noise "blow-up" at a cocktail party)
Figure 14
SKETCH OF SCANNING IMPLICATIONS ON LOOP CLOSURE AND PERFORMANCE Sampled ( Opt. 1
Cont inuous (Opt. 1
I
Error Instability
Error Input ( lag scale)
Dynamic Instability
Sampling Remnant
t
.2
Reduced Gain
l L
I
.' 0
I
Reduced Optimal Gain Loop Gain ("Tightness"of Operators Control)
k1 -
Stability Margin
max
H. H. Jex
26
Various types of signal "reconstruction" during the undwell period (e.g., state- and/or rate-extrapolation) can reduce the sampling noise, but at the expense of increased attentional demand (mental workload) and, often, the signal processing delays (which can destabilize the man-machine loop). Thus, it is difficult to overcome the detrimental effects of scanning in busy situations. That is why pilots take hundreds of runs to learn instrument landings, which require skilled scanning. Because this theoretical model has proven valid to date, and is applicable to discrete tasks as well, further work on developing a useable set of procedures and supporting data base is being pursued. Discrete Task Interference Althougb derived for display scanning. the foregoing nodel and results have a far wider application. Any task sharing which requires the operator to divert attention more-or-lessperiodically will produce similar effects. Such situations include: internal sharing of attention among various control axes, concurrent discrete tasks (e.g., communications, configuration, or navigation procedures), cognitive tasks, and workload reasurenent tasks. The type and degree of interference could be computed (predicted or anaa properly measured data base were available. lyzed) by this approach, The breakthrough came in recognizing that each discrete task acts as an interruption to the display scanning, i . e . , in the "undwell" period noted earlier. A stream of attentional demands if formed by pooling the demands from the concurrent scanning and cognitve task arrival times (A, B , C . . . ) has some sort of quasi-periodic distribution of inter-arrival times. The distribution of several pooled demands sources is approximately random over periods longer than the mean inter-arrival interval from any one source (see Figure 15). Consequently each (or some combination) of discrete or cognitive demands can be considered as a sampling channel interacting like an undwell-period on the control signal channels. Among the more interesting implications of this theory for discrete task situations are the following: Task interference will be proportional to the average discrete demand duty-circle (via 1 - 9 ) and to the randomness of ) Eq. 1. cognitive demands (via U T ~ see The quality of the ongoing control task loop closure must always suffer, albeit not very much if the closure is near optimum and the sampling remnant is small. Paradoxically, dynamic stability margins (gain and phasemargin) may increase with sampling (see Figure 14), although "error instability" may be incipient. Our experiments, e.g. [15], bear out this implication.
Measuring Mental Workload
21
Figure 15
SKETCH ILLUSTRATING RANDOMNESS OF POOLED PARALLEL DEMANDS Avg. Periodicity: v
Source A
. .. . .. ... . I 4
l-ii-4
tl
,All
=Random oze r TC
Discrete task interference can be reduced by proper mental signal reconstruction (extrapolation) during the diversion of attention, but only at the expense of additional mental workload and possibly additional loop delays, if the human operator be restricted to the "compensatory" level in the SuccesIf, sive Organization of Perception (SOP) paradigm [ 4 ] . instead, the "pursuit" or "precognitive" levels of SOP can be adopted via changes in the operator's loop "architecture," discrete task interference may well be reduced without as great a cost in additional workload. See the discussions in 1151.
The potential power of this finite-dwell-sampling theory to model and compute such effects provides a basis for a comprehensive theory of divided attention: to be discussed next. C o m b i n i n g Contirmous and Cognitive Tasks
Normally, the excess control capacity of the pilot or operator is, by design, sufficient to handle discrete and monitoring tasks. When a lengthy or series of discrete tasks, intrusions, distractions, or a system failure occurs, the pilot must postpone some of his discrete monitoring tasks or compromise his tracking performance. It is appropriate to seek models and measures for such situations from unsteady queueing theory, for example, as pioneered by Senders, et al. [ 2 2 ] . Before we consider queueing theory, however, it is worthwhile to mention another "quasi-steady-state equivalent" analytical technique, Average Duty Cycle, which has been used for years. It is especially useful for incorporating the average time required to perform discrete tasks and has its orgins in the numerous time line methods e.g., see [23]. If we identify the-total average time allowed for a short segment of the flight profile as T and the total average time used for discrete tasks as Tu, we can define the average or "steady-state equivalent" discrete task duty cycle, TJT. This discrete task measure is commensurate with_ (_and utilizes, but for the is not equal to) excess control capacity, Xx/Xc. If T,JT < Ax/& operator, presumably he is, in an verage sense, less than fully occupied with both steady-state tracking and discrete tasks over that segment of the
H.R. Jex
28
flight profile (i.e,, unsaturated). bined Average Duty Cycle, ~ D C :
~ D C
-
(1/?)
[I TDC(contro1
A useful combined measure is the com-
display)
+
1 Tu j
(discrete dwells)] Eq. 2
There is no problem as long as the duration of attention to each demand is sufficiently short so that none interferes with another. However, when the ensemble-demand openings become shorter than the attention time needed for a task, other tasks must wait and queue develops. As the average demand duty cycle approaches 100 percent, the process becomes oversaturated and a queue starts to grow. Since an oversaturated queueing process is not in equilibrium only transient characteristics can be computed. An approach to this problem has been made by Warren Clement and is summarized in [5]. It is too detailed to give here, but builds a "renewal theory," e.g., [24], which considers the rates of steady-state demand as imposed upon by an emergency event of top priority, thereby causing a queue to develop. Provided the unattended demands can wait, the buildup and recovery statistics can be estimated by the formulas given in [5]. These ingredients - - a) the finite-dwell display scanning theory; b) closed-multiloop control performance computations based upon it; c) discrete (cognitive) task ensemble demand statistical models; and d) queueing theory results for the occasional overloads - - are being combined into a comprehensive theory for: modelling complex tasks, computing parametricstudy implications, and for fitting data gathered in past and future experiments. I hope the currently (1981) disappointing funding of such efforts increases, so that more useable procedures will soon be available. PSYctlOPHYSIOIDGIcU HEMURES OF VOBKUUD PROGRFSS
Many researchers, myself included, have sought the elusive "workload nerve," or evidence of its activity, via psychophysiological (PP) measurements. But, because the metacontroller is diffuse and largely cognitive, its activity is seldom directly observable. Therefore one looks for correlated or co-varying autonomic system activity which can be measured, such as: heart-rate or breathing-rate variations, muscle tension, and eyescanning or blinking. The most sophisticated measure is a head-surface electroencephalogram (EEG), from which the microvolt brainwave activity accompanying distinct events often can be extracted. For background see Andreassi [28]. The sine qua non of all psychophysiological measures is a monotonic sensitivity to mental workload and insensitivity to other, irrelevant, ambient variables, as discussed under "criteria for workload measurements." Utilizing such PP measures involves much empirical calibration of each one versus every likely variation is task, task variables, and task loading, repeated for a variety of individuals, ages, and installations. We have done some PP research along these lines. In one series of tests, a set of subcritical-instability tasks of increasing order (and, The PP hence, of increasing mental workload) was investigated [25]. measures included: 1) ECG, from which average and variations in heart-rate could be computed; 2) a nasal flow themistor, yielding breathing frequency; 3 ) two forearm (carpi muscle) electromyographs (EMG), yielding
Measuring Mental Workload
29
both the "active" and "passive-limb" muscle tension; 4) the finger stick grip presence (thumb to fingers); and 5) a trans-palmar impedance transducer giving "palmar skin resistance" (similar to the galvanic-skinresponse of classic polygraph fame). Six-electrode, subdermal scalp EEGs were also measured concurrently. (See [25] for details) Typical results are shown in Figure 16, where the averages of the four subjects at each order of controlled element are shown along with their "resting" levels between runs. Here the "mission" criteria was simply to keep the error on scale for 100 seconds. The ensemble averaged Cooper3 , 6 , and 9 for first-, Harper ratings were progressively worse: CHR second- and third-order tasks respectively, so the subjective mental workload clearly ranged from "best" to the "worst." Yet, of all the measures, only the group median breathing frequency and group median passive-limb FNG show a (slightly) monotonic correlation with workload (Cooper Rating) and that is neither sensitive nor concordant. The EMG, purporting to measure residual body tension, shows a slight decrease with workload despite the subjective impression that tension increased during the harder tasks.
-
Not shown is the heart-rate variability (HRV) proposed by Kalsbeek. A cross-spectral analysis of heart-rate vs. healthy flow showed that nearly all of the HRV was due to the well-known sinus arrythmia effect [ 2 6 ] , which is easier to measure directly. These data show generally conflicting trends, levels, or lack of concordance, even when normalized in various ways. The resting values (no workload) often show more variability than the tracking values. Figure 16
PSYCHOPHYSIOLOGICAL MEASUREMENTS DURING REST PERIODS AND TRACKING RUNS 90
c
These PP Data are: ETubiquitoua,
5 coherent
( w / worhlood)
NOT ortlfocf free, NOT modrl-atruclured
:. NOT worth modrllng! ( a n iqnlr
fatuus)
01
Coopar-Harper RallnOS
H. R. Jex
30
Nearly all of our psychophysiological workload results, and those of many other researchers, show similar effects, i . e . , they are: 1) not directly relevant; often not ronotonically sensitive to strong vorkoad variations 2)
but are sensitive to non-related variables (e.g., body movement artifacts plagued these tests);
3)
they are obviously not concordant, have no clear norms, and the resting levels among subjects are not meaningful.
4)
they are seldom repeatably reliable (not shown here); - - with their requirements for: elaborate application procedures by trained personnel, for shielding, and with expensive equipment - - not convenient to use under field conditions.
5) and, for these measures
Since PP measures of workload fail most of the criterion of Figure 4 , why do we use them? Because there is a "gut-feeling" among researchers i n this arena that there must be me measurable PP effects of mental workload, if only because the subject is aware of his workload margin, as discussed above. We must keep trying to perfect PP workload measures, but don't expect validated clinical techniques to be available for a long time to come. PaOlIISES
Standard Tasks for Calibrating Hental workload One promising, and much overdue, development in workload research i s the development of a set of standardized tasks for calibrating and validating workload measures. I think that this could easily be done with little risk and excellent payoff, based on the subcritical instability task and divided attention theory presented earlier. The objectives would be to provide a lab tracking task enserble and procedures which could: 1) be systematically varied over a "nearly unloaded" to "fully loaded" mental workload, 2 ) be well understood in terms of concordant operator behavior, repeatable performance descriptive parameters, and known sources of statistical variations, 3 ) be easy to: mechanize or acquire, prepare administrators, and train subjects, 4 ) have well-established norms and training regimens, 5) permit convenient calibration of candidate workload measures against these norms, and 6 ) most importantly. have correlations of subjective workload vith the adjusted workload variables. Tracking Task The Critical-Instability Task (CTT), and its quasi-stationary variation, gubgritical instability tracking (SCT) already meet the above Objectives 1,2,3; and a significant data base exists at STI and elsewhere, toward Objective 4 . The subject's behavior while doing SCT or CTT is well documented and is ubiquitous, because the task dynamics constrain operators to a simple (proportional) control action. Consequently, the task demands for attention (busyness factor) and finesse (cognitive involvement) are well controlled by the single variable A , the level of instability. A CTT with its adaptive autopacer can be used to train subjects at a maximum
Measuring Mental Workload
31
learning rate, and as noted above, CTT scores have excellent statistical properties. First proposed in 1978, the combined use of CTT and SCT have another basic advantage - - (as noted earlier) a truly nondimensional and individualized task workload loading can be characterized by the ratio L X1/Xc, where X i is the SCT level and Xc is the critical limit on the same apparatus and date. Data [14] prove that the operator is reliably fully loaded .1, so Objective 1, as L approaches 1.0, but only lightly loaded for L above, is easily met. See [la] and the Appendix for further details and recommendations.
-
-
There are many aspects to this proposal that need research. Only meager data are available on the detailed functional form of Subjective Workload Ratings vs. L., although it seems to be monotonic, concave upward. Modern workoad scales such as the modified Cooper-Harper scale, or SWATscale should be employed, giving careful attention to practice effects and randomly repeated conditions. Effects of input spectrum on the SCT performance need to be explored, as well as subjective workload consequences thereof. Training regimens and criteria; scoring procedures; rating procedures, - - all have to be carefully evolved before the Subcritical/Critical Task Ensemble can be confidently used for calibrating workload measures. Discrete T a s k s In the category of intermittent stimulus - - response tasks, the Standard Workload Task Battery should include the well known Sternberg Reaction-Time Task, or Sternberg Paradigm [29). Here, the subject memorizes and/or has available a list of from 2 to 6 "target" numerals or icons in the range of 0 to 9. Whenever a numeral from 0 to 9 is presented (visually or auditorily), the subject must answer as quickly as possible (e.g., by pressing a button or speaking) whether or not the numeral is in the target list. The Sternberg Reaction Time has been found to be primarily dependent on the number of target numerals (more targets takes more mental comparisons), and the workload margin of any concurrent control tasks [30]. More primary task workload results in an increase in the Sternberg Reaction Time. Interpreting these results, we would say the latter component is dependent on the allowable time-away-from the control tasks, as explained earlier. The Sternberg-Task is relevant to many vehicle control tasks (such as: radio tuning, call-letter answering, procedure check-offs, etc.). The operator responses seem to be fairly ubiquitous in trends, albeit sensitive to training and idiosyncratic skill levels. Repeatability is fairly good, once trained, and the apparatus and procedures are simple to mechanize and to learn. On the copverse side: the need to do numerous repeats with different target list lengths is a severe inconvenience; norms for control task workload are largely lacking; and very few correlations with subjective workload have been made. Nevertheless, the Sternberg Reaction Time Test is felt to be a good discrete task for a calibrator because its demand can be controlled (via the target-list size) and it can be easily integrated with continuous control tasks. It can also be disguised as an operational task surrogate for relatively unobtrusive workload measurement.
H.R. Jex
32 Divided Attention Tasks
Real-world situations involving high mental workload usually involve some degree of divided attention, in the form of multiaxis continuous control, concurrent discrete tasks (such as: radio calls, attacking missile warnings, etc.) or a mixture of both. What is needed is a simpler set of such tasks that can validly emulate a more complex set of operational tasks, to permit workload measures to be calibrated in the lab, validated across labs, and compared with better statistics than can be obtained from few operational task results. As a start I recommend a combination of the two foregoing tasks - - the Subcritical Tracking Task (SCT) and the Sternberg Reaction-Time Task (SRT). Performing both tasks concurrently at various levels of L and/or of target-list size, N , could lead to a very robust complex task ensemble on which to test or calibrate various workload measures. Subjective Workload and Attentional Demand Ratings versus L and NT would have to be obtained for all experiments, and eye-scanning measurements would be an important aid to interpretation of results (Figure 17). This combined SCT/SRT task can be set up as a concurrent "piloting" task (e.g., terrain following) with "missile attack warnings." If the controls, displays, and functions of the simple SCT and SRT are made similar to a real-world scenario, and the criteria for performance are missionrelated, then we have what has been termed an "imbedded surrogate task" situation, wherein one of the concurrent tasks (e.g., SCT ar SRT) is actually a workload-margin measurement tool for the others, taken as a primary task ensemble.
Figure 17 ONE PROMISE STANDARD TASKS FOR WORKLOAD CALIBRATION 0
TRACKlNQc Subcritical Instability: L I / (Ref: Jex , In: "Mental Workload Plenum Press, 1979)
0
coGNlTlvE Strrnkrg Paradigm
0
DIVIDED ATTENTION Cross-Couplrd lnstablllty Task with "lmbodd.d Surrogate'' Auxiliary Tasks or Robes
0
NEEDS "Catalog;" procedures; norms, Funding
xc
Measuring Mental Workload
33
The ultimate in automated workload-margin measurement would be to use one or more subcritical or realistic tasks for tracking in 2 axes (say, pitch and yaw) while the third axes (e.g., roll-axis) is really CrossCoupled Instability Task (CCIT) acting as a surrogate for the roll-control axis (e.g., "the aircraft has an unstable spiral roll mode needing frequent attention"). One sucessful example of applying this is described in [ 3 3 ] , wherein a CCIT in the roll axes of a STOL-aircraft clearly showed the waxing and waning workload margin during STOL landing approaches. These are some very promising developments, needing only funding and experience to accomplish with minimal risk and high payoff. The major needs are a "catalog" of mechanizations for any lab to replicate, a set of procedures for training, administering and analysis of results; and a normative data base by which to judge new results and/or actual incipient overloads in the lab or in the sky. Event Related Potentials
Will the "workload nerve," the long-sought Holy Grail of mentalworkload researchers, ever be discovered? Some psychophysiologists think that it already has been revealed in the form of brain-wave Event Related Potentials (or Evoked Response Potential; take your pick; either one is called ERP). For decades, EEGs have been investigated and correlated with those cognitive activities characteristic of mental workload, such as: decision making, visual-motor activity, perception of "interesting" events, etc. (see Calloway, et al. [31] for background). A methodology has been developed for measuring the extremely small signals sought, which are hidden in the surrounding cacophony of other brain activity and the electromagnetic noise of the ambient enviromnent, both of which are higher (in an rms sense) than the signal. A distinct signal (say an audible click, tone or visual event) is input to the ears or eyes in a repeated manner, and the brain's surface potential waves in the certain areas (e.g., between the central and parietal electrode locations on the scalp) are measured for the few seconds following the event. As the event is repeated, the correlated waveform components add up, while the uncorrelated ones average out to near zero, yielding, after tens to hundreds of repeats, the waveform of the desired Event Related Potential, As a cardiogram wave is "interpreted" by cardiologists, the "signature" of the ERP waveform often can be interpreted by psychophysiologists. While the ECG wave has a well understood causal connection to the heart muscle's enervation and contraction activities (and thereby relates to heart problems), the ERP wave has a much more tenuous connection to cognitive events. It is much less intense, has vague origins in the electromagnetic dipoles which accompany massive synaptic activity deep in the reticular region of the brain, and the ERP has an often idiosyncratic signature among individuals.
Nevertheless, one ERP waveform characteristic seems more repeatable than many others. It is a positive (P) peak which seems to occur near 0.300 secs (300 milliseconds) after a particularly "interesting" event has occured, so it has been named the "P300" component, even though it can occur anytime from 250-500ms. If a train of similar numerals or tones is given to most subjects, their P300 ERP will decrease, but when an "oddball" change occurs to which the subject has been instructed to detect, a sharp increase in the P300 component is evident (of course, after dozens of the
34
H. R. JEX
ensemble-averaged oddball-event responses). The characteristics of this P300 component often seem to be roughtly correlated with the degree of alerting, or "cognitive involvement" of the event. The latter correlation promises ERP investigators a route to the long-sought objective measurement of mental workload, incipient cognitive overload, and breakdown of alertness; the activity of the workload synapses.
We have done some of this EEG research ourselves, with additional emphasis on continuous-process frequency-domain correlations (using crossspectral analysis) with the human-operator's visual motor processes during high workload control tasks [25]. I think it is safe to say that we and most other researchers in this arena have found that ERP measurements are: only loosely correlated with one component of mental workload - - the cognitive involvement - - and not to busyness, or the "pucker factor" (i.e., not clearly relevant - - per the criteria of Figure 4 ) . seldom strongly correlated with subjective mental workload in a monotonic fashion, but usually influenced by many extraneous variables (i.e., not monotonically correlated, low test power. and not insensitive to other variables). inconsistent in the optimum electrode locations for ERP signal detection; a significant fraction (about 20-30 percent) of subjects seem to give anomalous or idiosyncratic ERP characteristics (i.e., not concordant in the target population). vexingly inconsistent in test-retest reliability. Despite some promising early results in one lab, they have often failed t o repeat in other labs or after long practice. No clear norms have been established, partly because the data base is difficult to acquire and nearly impossible to encode, e . g . , by a math model with fitted parameters, as has been done for ECG signatures (i.e., not reliable). difficult to measure the ERPs (discrete or continuous) is very difficult, because: a) one needs an electromagnetically shielded environment (at least local to the head), b) many repeated measures are needed to produce a usable signal-tonoise ratio, c) complicated and expensive apparatus and computerized on-line data processing is needed, and d) their interpretation is often an ''expert" process ( i . e . , is not convenient. So, while I admire efforts to discover better ERP measures, procedures, transducers, understanding and interpretation, I feel that at present (c 1981), the ERP is an "ignis fatuus" a tantalizing will-o-th-wisp;a dancing swamp-gas light that leads one into ever deeper morasses while seeking solid goals. Researchers need to establish a much better causal chain (relevance) between mental workload per se and the ERP measures, while the measure itself must be made more sensitive, concordant, reliable and convenient. Until then, what i s needed is high risk funding for the possible promises, but application efforts should not be wasted while it remains an ignis fatuus.
--
Measuring Mental Workload UOBgloAD
3s
SPECIFICATIONS
If my earlier comparison of the established "standard practice" of evaluating aircraft handling qualities versus the embryonic status of mental workload evaluation were completely valid, then one might soon expect workload-margin specifications to be as commonplace as stabilitymargin specifications. It would be a great help to human-machine interface designers if they had specified performance criteria and allowable workload margins for various levels of mission completion and pilot safety. For example, the Aircraft Handling Qualities Guidelines (MIL F 8 7 8 5 - C ) specify three such levels (paraphrased): Level 1 - - full mission completion with performance and safety goals all met; Level 2 - - primary mission achieved with increased pilot workload and reduced performnace; Level 3 - - mission incomplete, safe return and landing possible. An analogous set of such criteria can (and should) be evolved for operator/system mental workload. I advocate the use of operator workload margins, rather than absolute levels of workload demands, to allow for variations among operators, their skills, and their training. Such a s e t of guidelines would permit: comparison between radically different operator control/display configurations at the preliminary design stage; systematic evaluation and evolution of better designs; spotting of potential control workload-problem areas early in the design, and help in diagnosing and curing the numerous problems which will still occur concerning person/ machine overload. To some degree this has been practiced for years 7e.g., the use of time-line-analyses), but seldom as part of an official design specification, and never with the sophistication and solid data base of handling qualities evaluations. Before such Workload-Margin Specifications can become as real and useful as aircraft Handling Qualities Specifications, two difficult conditions must be achieved, each in a chicken v s . egg role with respect to the other: a) a systematic procedure, w i t h a useable (extrapolatable, predictive) data base, must be available and have demonstrated predictions and cures of potentially dangerous situations, and b) the u8er agencies must be willing to sponsor the necessary research, data analysis. procedure developnent. simulations end field validations needed to demonstrate item a). Unfortunately, I think that these may be a long time coming despite their obvious need. In contrast to the vehicle handling-qualities field, where good models of system response and operator adaptive behavior were achievable within the available theoretical infrastructure (i.e., systems and control theory), analogous mental-workload models are simply not yet available, and won't be until some of the measurement problems discussed earlier can be resolved. The key obstacle is a conprehensive, 'causalchain. theory and analysis procedure to account for the complex, adaptive, and multi-faceted lmhavior of the human operator in a lental workload context. Correlations, however sophisticated, among variables in a massive data base will not suffice, nor will the currently available human operator models used mainly for treating the control behavior required of the pilot. As a start, government sponsors of aircraft and spacecraft could start requesting and fimding the acquistion of workload data on a common set of subjective scales such as those described here. Parallel analytical efforts to evolve predictive, dynamic models for the operator's workload optimizing behavior come next, couched in terms fittable to the data base (i.e., Subjective, as well as objective data). A standardized set of
H.R. Jex
36
workload calibration procedures such as given earlier is anotner essential stepping-stone to the process. The user agencies should start to apply the different scales and measures in joint experiments to evolve the best, rather than picking one to advocate at the expense of others. Extensive, systematic (and expensive) in-flight or on-the-road validations of numerous laboratory simulations (fixed- and moving-base) are required before any agency vill be uilling to depend on and pay for workload-margin specifications, analyses, and evaluations. Eventually, the longer-term dynamic effects of intense mental workload on an operator's chronic fatigue and health must be faced and treated in a similar comprehensive manner.
CONCUJSION Upon reflection, the above needs are our goals and our promises as we work towards a useable technology in measuring, modelling and predicting the mental workload of complex human/machine systems in the 1980-1990 decade.
BEpEBgACEs
[l]
Spyker, D. A., Stackhouse, S. P., Khalafalla, A. S . . et al., Development of Techniques for Measuring Pilot Workload, NASA CR-1888, Nov. 1971.
[3]. McRuer, D. T., and Jex, H. R., "A Review of Quasi-Linear Pilot Models," IEEE Trans., Vol. HFE-8, No. 3, Sept. 1967, pp. 231249. [4]. McRuer, D. T., 'Human Dynamics in Man-Machine Systems," Automatica, Vol. 16, No. 3, May 1980, pp. 237-253. [5]. Jex, H. R., and Clement, Warren F., "Defining and Measuring Perceptual-Motor Workload in Manual Control Tasks," Mental Workload: Its Theory and Measurement, Neville Moray (ed.), Plenum Press, NY, 1979, pp. 125-177. [ 6 ] . Jahns, Dieter W., "Operator Workload:
What is it and How Should It Be Measured?," Management and Technology in the Crew System Design Process Conference, Los Angeles, CA, Sept. 1972,
[7]. Sheridan, T. B., "The Human Operator in Control Instrumentation," Progress in Control Engineering, Vol. 1, R. H. Macmillan, et al., eds., Academic Press, N.Y., 1962, pp. 141-187 (81. McDonnell, J . D., Pilot Rating Techniques for the Estimation and Evaluation of Handling Qualities, AFFDL-TR-68-76, Dec. 1968.
Measuring Mental Workload
31
[9]. Sheridan, Thomas B., and Simpson, R. W., Toward the Definition and Measurement of the Mental Workload of Transport Pilots, Final Report, Contract DOT-0s-70055, Jan. 1979.
[lo]. Reid, Gary B., Shingledecker, Clark A., Nygren, Thomas E., et al., "Development of Multidimensional Subjective Measures of Workload," Proc. 1981 International Conference on Cybernetics and Society, Oct. 1981, pp. 403-406. [ll]. Anonymous, Notes taken by H. R. Jex during the Carmel Conference on "Cognitive Psychophysiology and Man-Machine Systems", Carmel , CA., Jan 1982. [12]. Jex, H. R., McDonnell, J . D., and Phatak, A. V., Tracking Task for Manual Control Research, " Vol. HFE-7, No. 4, Dec. 1966, pp. 138-145.
"A 'Critical' IEEE Trans. ,
[13]. Jex, H. R., McDonnell, J. D., and Phatak, A. V., A "Critical" Tracking Task for Man-Machine Research Related to the Operator's Effective Delay Time: Part I. Theory and Experiments with a First-Order Divergent Controlled Element, NASA CR-616, Oct. 1966. McDonnell, J. D., and Jex, H . R., A "Critical" Tracking Task for Man-Machine Research Related to the Operator's Effective Delay Time: Part 11. Experimental Effects of System Input Spectra, Control Stick Stiffness, and Controlled Element Order, NASA CR-674, Jan. 1967. Allen, R. W., Clement, W. F., and Jex, H. R., Research on Display Scanning, Sampling. and Reconstruction Using Separate Main and Secondary Tracking Tasks, NASA CR-1569, July 1970. [16]. Jex, H. R., "A Proposed Set of Standardized Sub-critical Tasks for Tracking Workload Calibration," In Mental Workload: Its Theory and Measurement," Neville Moray (ed.), Plenum Press, NY, 1979, pp. 179-188. [17]. Jex, H. R., Jewell, W. F., and Allen, R. W., "Development o f the Dual-Axes and Cross-Coupled Critical Tasks," 8th Annual Conference on Manual Control, AFFDL-TR-72-92, Jan. 1973, pp. 529-552. [ M I . Hess, Ronald A., and Teichgraber, Walter M., "Error Quantization Effects in Compensatory Tracking Tasks," IEEE Trans. , Vol. SMC-4, NO. 5, July 1974, pp. 343-349. (191. Bergen, A. R., "On the Statistical Design of Linear Random Sampling Schemes, Proc. IFAC, Vol. 1 , Butterworth. London, 1961, pp. 430-436. [20]. Baron, S., and Levision, W. H., "An Optimal Control Methodology for Analyzing the Effects of Display Parameters o n Performance and Workload in Manual Flight Control," IEEE Trans., Vol. SMC-5, NO. 4, 1975, pp. 423-430.
38
H. R. Jex
Clement, W. F., Allen, R. W., and Graham, D., Pilot Experiments for a Theory of Integrated Display Format, JANAIR Rept. No. 711107, Oct. 1971. Senders, John W., Carbonell, Jaime R., and Ward, Jane E., Human Visual Sampling Processes: A Simulation Validation Study, NASA CR-1258, Jan. 1969. Parks, D. L., "Current Workload Methods and Emerging Challenges, Mental Workload: Its Theory and Measurement, Neville Moray (ed.), Plenum Press, NY, 1979, pp. 387. Cox, D. R., and Smith, W. L., Queues, Methuen, London, 1961. Jex, H. R., and Allen, R. W., "Research on a New Human Dynamic Response Test Battery. Part 11: Test Development and Validation," 6th Annual Conference on Manual Control, AFIT, WrightPatterson AFB, OH, Apr. 1970. Clynes, Manfred, "Respiratory Control of Heart Rate: Laws Derived from Analog Computer Simulation," IRE Trans., Vol. ME-7, No. 1, Jan, 1960, pp. 2-14. ~ 7 1 .Jex, H. R., The Critical-Instability Tracking Task - - Its Background, Development, and Application, Advances in Man-Machine Systems Research, Wm. B. Rouse (Ed), Vol 5, forthcoming, 1988. Andreassi, John L., Psychophysiology. Human Behavior and Physiological Response, Oxford University Press, New York, 1980. Sternberg, S., "Memory-Scanning: Mental Processes Revealed by Reaction-Time Experiments," American Scientist, 57, 1969, pp. 421-457. 1301. Schiflett, S. C. Evaluation of a Pilot Workload Assessment Device to Test Alternate Display Formats and Control Handling Qualities, NATC SY-33R-80,July 1980. [311. Callaway, E., Event Related Brain Potentials in Man" , Trieting P. and Koslow, S . H. (Eds), Academic Press, 1978. (321. Donchin, E., Ritter, W., and McCallum, W. C., "Cognitive Psychophysiology: The Endogenous Components of the ERP," Event Related Brain Pitentials in Man, Academic Press, 1978. Clement, Warren F., Investigation of the Use of an Electronic Multifunction Display and an Electromechanical Horizontal Situation Indicator for Guidance and Control of Powered-Lift Short-Haul Aircraft, NASA CR-137922,Aug. 1976.
Measuring Mental Workload
39
APPENDIX HCBUEB'S RULES (See [3] and [4])
In a well-defined control task, the human operator learns, and can adopt the behavior of an "optimal controller" subject to constraints on perception, computation, and servomechanical execution. This is a well-posed problem in systems control theory with computable solutions, using well-developed analytical methods, sophisticated measurement techniques, and a generalizable data base. If the control task is single loop "compensatory" (error correcting) with quasi-random forcing functions, then the operator adapts a lead or lag-compensated control-law,such that the combined human/machine opened-loop frequency response approaches that of a simple integrator in series with an effective delay time. The operator's mental workload (cognitive difficulty) increases for adoption of large lead (prediction) or small lags smoothing. The operator's control law is adjusted for several criteria such as: stability, satisfactory performance, and minimum mental workload. More complex interactions occur for multi axis control tasks. SUBCRITICAL TASK FOE VOBKIIUD CALIBRATION
The "Standard zub-CriticalTracking Task (SCT) specifications include a compensatory subcritical-instabilitytask, which should have a horizontally moving error cursor on a CRT display of at least 15 c diameter, placed about 50 cm from the eye. The control-display path must have negligible transport delays compared to the human visualmotor delays (i.e., less than .025 seconds from control action to display motion; or faster than 40 updates per second). The standard control is a freely moving "isotonic" control knob with a sensitivity of about 1 cm cursor movements per 10 degrees of knob twist. U s e a first-order subcritical tracking element with dynamics: Yc -X/s-A) where X can range from 1.0 to 10. No input is needed, but there must not be a percievable deadband near zero error. Let Xc be the autopaced critical-instability limit for the subject, with that control and display. Define the task relative loading as L A/&: Measure Xc at the beginning and end of each session, and interpolate it between, if it varies. With task loading L 0.2, 0.4, 0.6, 0.8 of the limit, measure the subjective ratings, and psychophysiological correlates until stable values are achieved. Plot the results and various workload indices versus the loading L.
-
-
-
This Page Intentionally Left Blank
HUMAN MENTAL WORKLOAD
P.A. Hancock and N.Meshkati (Editors)
41
Elsevier Science Publishers B.V.(North-Holland),1988
PROPERTIES OF WORKLOAD ASSESSMENT TECHNIQUES P. Tbomas Eggemeier University of Dayton and Armstrong Aerospace Medical Research Laboratory Dayton, Ohio U.S.A.
Workload measurement techniques vary with respect to certain properties that determine the utility of a technique for individual applications. Two particularly critical properties are the sensitivity and intrusiveness of a technique. Present theory and supporting evidence suggest that these properties can be influenced by a number of factors, including the level and type of information processing demands that are imposed on an operator. Such factors emphasize the need for more extensive comparative information regarding the sensitivity and intrusiveness of the major classes of techniques. This chapter discusses theoretical bases of these properties, and reviews some current data that address the sensitivity and intrusiveness of several techniques. The development of a standard evaluation methodology which is designed to provide the required comparative data and refine present workload metric application guidelines is also discussed.
INTRODUCTION Applications of sophisticated control and display technologies to modern systems can impose heavy demands on operator information processing capabilities. Such technologies often require the rapid sampling and integration of large volumes of information, and the resulting demands can approach or exceed the limited information processing capacities of the operator. Consequently, the need to assess the load imposed on operator processing capacities is particularly critical in high technology systems. Mental workload refers to the degree of processing capacity that is expended during task performance, and a large number of workload measurement techniques have been developed for application during system design and evaluation (O’Donnell & Eggemeier, 1986; Wierwille & Williges, 1978, 1980). Workload assessment procedures can be categorized according to the type of response used to derive the index of capacity expenditure. The resulting major classes of measurement techniques include subjective, physiological, and performance-based measures. Although various individual assessment techniques have been developed within each category, all subjective procedures use some report (e.g., rating scale) of experienced effort or capacity expenditure to characterize workload levels, while physiological techniques derive a capacity expenditure estimate from the operator’s physiological response (e.g., variations in heart rate) to task demand. Performance-based procedures, which include primary and secondary task measures, are based on operator performance levels. Primary task procedures use the adequacy of performance on the task or system function of interest to Characterize capacity expenditure, while secondary task measures are typically derived from the levels of performance on a concurrent or secondary task. Techniques from each major category of procedure have been employed in a range of applications with varying degrees of success (O’Donnell8t Eggemeier, 1986). The capability to assess effort or capacity expenditure with a variety of approaches raises fundamental questions regarding the utility of both classes of measurement procedures and individual techniques. Measurement techniques vary with respect to a number of properties that can be used to evaluate their usefulness for individual appllcations (Eggemeier, 1984; Eggemeier, Shingledecker, & Crabtree, 1985;
42
F. T.Eggemeier
Shingledecker, 1983; Wickens, 1984a; Wierwille & Williges, 1978). In addition to validity and reliability, two of the most important properties are the sensitivity and intrusiveness of a technique. Sensitivity refers to the capability of a technique to reflect differences in the levels of processing capacity expenditure that are associated with performance of a task or combination of tasks. Intrusiveness, o n the other hand, refers to the tendency of a measurement technique to cause unintended degradations in ongoing primary task performance. Because of their importance In determining the utility of a workload measurement procedure, sensitivity and intrusiveness have been the subject of considerable recent research. This work has identified a number of variables which appear to affect the sensitivity and intrusiveness of several metrics, and has provided the basis for some initial general guidelines regarding the application of measurement techniques. This chapter describes theoretical bases for both sensitivity and intrusiveness, and discusses a number of factors which appear to influence these properties in an assessment technique. Data which address the sensitivity and intrusiveness of several assessment procedures are reviewed, and general application guidelines outlined. The development of a standard metric evaluation methodology for refinement of the comparative data base related to both properties is discussed, as arc directions for future resclrch.
SENSITIVITY
Workload assessment techniques differ in their sensitivity to variations in primary task loading (O’Donnell & Eggemeier, 1986), and such differences significantly affect the utility of a technique for various applications. Current evidence suggests that sensitivity is a complex property that can be influenced by a number o f variables. One such variable is the degree of capacity expenditure ZsSockdtKd with task performance. A second variable with the potential to affect the sensitivity of some measures is the locus o f the demands placed o n individual capacitieslresources within the human processing system.
Sensitivity as a Function of Level of Capacity Expenditure At a general theoretical level, sensitivity can be described in terms of a hypothetical function which relates level of effortkapacity expenditure to the adequacy of primary task performance. Figure I depicts a function that consists of two regions which are defined by the relationship of capacity expenditure t o a theoretical threshold for Unimpaired performance. The first or non-overload region spans those levels of expenditure which d o not exceed operator capacity, and is therefore characterized hy adequate levels of primary task performance in which both errors and reaction time are relatively low. In this region, the operator has sufficient spare processing capacity to deal with increased levels of d r mand, and can maintain performance by expending more effort or capacity. Consequently, n o direct relationship exists between capacity expenditure and primary task errors o r reaction time. Therefore, the increase in capacity expenditure from “A” to “B” noted in Figure 1 will not be reflected by changes in performance levels. In the second or overload region, expenditure levels surpass the capacity o f the operator to compensate for increases in demand, and the threshold for unimpaired performance is exceeded. A direct relationship between performance and capacity expenditure is hypothesized in this region, and taka the form of increased reaction time and/or errors with increased demand. Consequently, the increase in capacity expenditure from “C” to “D” will be reflected in performance, even though it is equivalent in magnitude to the previously undetected incrcwe. One important implication o f the hypothesized relationship is that while primary task performance measures will be sensitive t o differences in capacity expenditure under overload conditions, thcy can be relatively insensitive to such differences in the non-overload region. Workload measurement in this latter region is, therefore, dependent upon alternative techniques which can reflect capacity expenditure differences at levels below the threshold for performance breakdown. Subjective, physiological, and secondary task measures represent alternative asessment procedures which can provide the required capability. Expectations regarding the sensitivity of subjective and physiological technlques are based on the assumption that increased capacity expenditure in either of the noted regions will b e accompanied by physiological changes and feelings of exertion or effort that will be
Workload Assessment Techniques
43
HIGt v)
a 0 a
I
a
NON -OVERLOAD REGION
W
\
W
I I I I I
EIL
0
OVERLOAD REGION
/
I0
4
W
rHRESHOLD FOR UNIMPAIRED PERFORMANCE
a Y v)
4 I-
I
a 4
za
n
LOW
tt
I I 1
LOW A 0 C D HIGH OPERATOR INFORMATION PROCESSING CAPACITY EXPENDITURE Figure 1. Hypothesized relationship between operator capacity expenditure and primay task p f o r m a n c e
reflected in appropriate indices uohannsen, Moray, Pew, Rasmussen, Sanders, & Wickens, 1979). Secondary task methodology (Knowles, 1963) is based on the expectation that the addition of concurrent secondary task processing demands will be sufficient to shift total capacity usage into the region where performance and expenditure are directly related. Differences in sensitivity between primary task and alternative measures which are consistent with the noted expectations have been demonstrated in a number of instances (e.g.. Bahrick, Noble, & Fitts, 1954; Bell, 1978; Dornic, 1980; Eggemeier, Crabtree, & LaPointe, 1983; Eggemeier, Crabtree, Reid, Zingg, & Shingledecker, 1982; Eggemeier & Stadler, 1984; Schifflet, Linton, & Spicuzza, 1982). Eggemeier et al. (1983), for example, compared the capability of primary task errors and workload ratings obtained from the Subjective Workload Assessment Technique (SWAT)(Reid, Shingledecker, & Eggemeier, 1981; Reid, 1985) to reflect differences in task demand manipulations in a short-term memory update task. Subjects monitored a display and mentally updated the status of several information categories that changed periodically. Categories of information were three letters of the alphabet that were presented in twenty-item sequences, and subjects retained a count of the number of times that each letter occurred. Task demand was manipulated by varying the time interval between the presentation of items, and intervals of 1 .O, 2.0, and 3.0 seconds were used. Figure 2 illustrates the effect of the time interval manipulation on both mean SWAT ratings and errors in the memory task. As is clear from Figure 2, SWAT ratings varied substantially with the time interval manipulation, and discriminated the three levels of task difficulty. On the other hand, errors failed to vary systematicallywith the time manipulation, and demonstrated no significant differences as a function of the demand levels. A similar pattern of results was obtained by Eggemeier and Stadler (1984), who evaluated the sensitivity of SWAT ratings and both primary task reaction time and error measures to demand manipulations in a spatial shortterm memory task. In this task, histogram patterns which had been memorized were compared with a test pattern to determine if a match existed. Demand was manipulated by varying both the complexity of the histogram patterns and the length of the memory retention interval. Both SWAT ratings and
F.T. Eggemeier
44
(3
7.0
a a
-
a
a
0
2W 5.0 -
a
a
0
2I 3.0 -
I-
0
z a W
W
2
= 1.0
30
z W
I
--
3.0
2.0
I .o
3.0
c -*I
2.0
,. I .o
INTERSTIMULUS INTERVAL IN SECONDS Pigure 2. Mean subjectioe workioad ratings and mean memory task errors as a function oJ interstimulus intewai. (Redraw fnjm Egqemeier, Crabtree, G LaPointe, 1983. Reprinted with permission. Copyright 1983,
Human Factors Socrety. Inc.)
reaction time to the test pattern discriminated the differences in histogram complexity. However, SWAT ratings also varied significantly as a function of retention interval, while reaction time failed to do so. Errors in the memory task were not significantly affected by either the retention interval o r complexity manipulations. These and similar patterns of disagreement or dissociation between primary task and alternative measures can be interpreted within the previously described framework by assuming that demand levels in the noted instinces fell within that region of expenditure which affords sufficient spare processing capacity to maintain primary task performance. However, maintenance of performance was achieved at the cost of greater effortlcapacity expenditure, and this was reflected in the subjective workload ratings. The proposed framework also suggests that primary task measures should demonstrate increased sensitivity at those higher levels of capacity expenditure which fall within the region characterized by a monotonic relationship between expenditure and performance. This type of pattern has been reported by Eggemeier et al. (1982), who manipulated both the number of information categories to be retained and the time interval between information status updates in the short-termmemory task described above. Primary task error measures were again less sensitive than subjective workload ratings at lower levels of task demand. However, at the highest level of time demand, error measures equaled the sensitivity of the subjective measure, and actually demonstrated greater sensitivity at the highest level of memory load. Comparable differences in the sensitivity of primary task and secondary task m a w r e s of capacity expenditure have also been noted in a number of instances (e.g., Bahrick et al., 1954; Bell, 1978; Dornic, 1980; Schifflet, et al., 1982). Schifflet et al. (1982), for instance, reported that a secondary task version of the Sternberg (1966) memory search paradigm discriminated differences in the workload associated with two aircraft display options, even though primary flight task performance was equivalent with both displays. These results are consistent with the previously discussed rationale for secondary task methodology, which is to provide a more sensitive index of primary task workload by shifting total task loading into the region where performance and capacity expenditure are related. The noted framework and data therefore suggest that alternative techniques can provide greater sensitivity than primary task measures in some instances. Ideally, the framework and supporting data should
Workload Assessment Techniques
45
be extended to examine the relative sensitivity of subjective, physiological, and secondary task measures in the region of their maximum sensitivity. However, such data are quite limited (O’Donnell & Eggemeier, 1986), and factors which influence the relative sensitivity of alternative measures have not yet been fully documented by workload metric research. One such factor that has been identified by recent work is related to the locus of demands placed on different capacities within the human processing system, and work related to this factor is discussed in the next section.
Sensitivity as a Function of the Locus of Processing Demands A theoretical basis for differences in the sensitivity of some workload assessment techniques can be derived from the multiple resources approach to capacity limitations within the human system (Navon & Gopher, 1979; Wickens, 1979, 1980, 1984b). Essentially, this theory holds that the human processing system can be described as consisting of a number of separate capacities, each with a limited capability to process information. According to this theory, it is possible to exhaust the capacity associated with one processing function (e.g., central processing),while maintaining sufficient independent processing capacity to perform other functions ( e g , motor output). Current multiple resources theory (Wickens, 1984b)suggests that separate capacities may be defined on the basis of three principal dimensions: (1) stages of processing (perceptual/central processing vs. motor output); (2) codes of processing and output (spatial/manualvs. verballvocal); and (3) modalities of input (visual vs. auditory). An adequate characterization of workload in this approach is dependent upon the capability to specify the pattern of capacity expenditure associated with each of the proposed processing functions.
Present evidence indicates that some techniques may be capable of discriminating the levels of loading imposed on separate capacities. Such techniques are considered diagnostic (Wickens, 1984a; Wickens & Derrick, 1981) in that they are sensitive to some types ( e g , motor output) of capacity expenditure, but exhibit little or no sensitivity to demands placed on other (e.g., central processing) capacities. Other techniques appear to be less diagnostic, and exhibit relatively uniform levels of sensitivity across different types of capacity expenditure. In general, secondary task methodology and some physiological measures ( e g , the P300 component of the evoked cortical response) can be classified as diagnostic; while primary task measures, subjective procedures, and other physiological techniques (e.g., pupil dilation) appear to be less diagnostic and more globally sensitive to capacity expenditure throughout the human processing system (Eggemeier, 1984). Secondary task measures provide a clear example of an assessment technique which can exhibit a very selective pattern of sensitivity to different forms of capacity expenditure. As noted above, the basic assumption of secondary task methodology is that additional processing requirements imposed by the concurrent task will shift total loading into the region of the capacity expenditure-performance function which demonstrates a monotonic relationship between the variables. If the concurrent task draws from the same capacity as the primary task, the assumption of an increase in total processing demand can be met for thdt capacity. Decrements in concurrent task performance relative to single task performance baselines should result in this instance. However, if a mismatch exists between the capacities required by the two tasks, the addition of concurrent processing demands will not shift capacity-specific expenditure into the more sensitive region. In this case, no significant differences between single and dual task performance levels may be evident. Differences in single to dual task decrements that are consistent with the processing functions outlined by multiple resources theory (Wickens, 1984b) have been reported by several investigators ( e g , Stadler & Eggemeier, 1985; Wickens & Kessel, 1980; Wickens, Mountford, & Schreiner, 1981). Stadler and Eggemeier (1985), for instance, investigated levels of dual task performance as a function of overlap in codes of processing as specified in the current theory. Subjects performed a version of the Sternberg (1966) memory search paradigm which required that a letter probe be compared with items in a memory set. This task was considered predominantly verbal in its coding demands, since it required that letters of the alphabet be processed and retained. The memory search task was performed either singly, or during the retention interval of a concurrent memory task that was either predominantly verbal or spatial in its processing demands. The concurrent verbal memory task required that a list of words be retained and matched with a subsequently presented comparison list. The procedure for the
F.T. Eggemeier
46
concurrent spatial task was identical, except that word lists were replaced by histogram patterns Figure 3 shows the percentage of correct responses under single and dual task conditions in the memory search task as a function of concurrent memory task type.
a concurrent demand to retain words led to decrements in memory search performance relative to single task baselines, while the addition of spatial retention demands was not associated with significant performance impairments. These results can be interpreted within the multiple resources and secondary task frameworks outlined above by assuming that verbal and spatially coded tasks draw upon different processing capacities. Under this assumption, the addition of the word task retention demands to memory search requirements was sufficient to overload verbal processing capacity, while the addition of functionally separate spatial demands failed to result in capacity-specific overload. Consequently, performance decrements resulted in the first case but not in the latter. Wickens and Kessel(1980)have demonstrated similar differences in single to dual task decrements that are consistent with the perceptuallcentral processing and motor output stages of processing proposed by multiple resources theory. As is evident, the addition of
95
k\
rn
kt
90
OV/S
\
\
0
\
n
\
v)
W K
\
\
I-
\
s 85
\
a a
2 l v/v
0 0
I-
z W
80
n
Ad
VERBAL/SPATIAL COMBINATION VERBAL/VERBAL COMBINATION
SINGLE TASK CONDITION
DUAL TASK CONDITION
Ptgvre 3. PercenI correct responses in a uerbal memory search Iask as a Junction ojsingle us. dual Iask perJormance conditions and the type of concurrenf Iask. (Redrawn from Stadler & Eggemeier, 19R5.)
These types of dual task results suggest that the assumptions of the secondary task paradigm will be most readily met in those instances involving substantial overlap in processing demands between the primary and secondary tasks. Consequently, the sensitivity of a secondary task can vary as a function of the locus of processing demands in the primary task, and such variations can be diagnostic of primary task loading patterns. Use of a secondary task which emphasizes a particular form of capacity expenditure can therefore permit some specification of the locus of primary task demands. The results of work reported by Shingledecker, Acton, and Crdbtree (1983)very clearly illustrate the
Workload Assessment Techniques
41
diagnostic sensitivity that can be associated with secondary task applications. Shingledecker et al. used a secondary task version o f the Michon (1966) interval production task (IPT) in a series of experiments that involved three primary tasks that differed in their information processing demand patterns. The IPT requires the production of a series of regular finger movements by subjects, and can therefore be assumed to impose demands o n motor output functions. The primary tasks used in the three experiments included a probability display monitoring task that had been adapted from Chiles, Alluisi, and Adams (1968), a version o f the Sternberg (1966) memory search paradigm, and an unstable tracking task that was similar to the critical tracking task of Jex, McDonnell, and Phatek (19C6). The display monitoring task required that subjects detect the occurrence of visually presented signals. Demand levels were manipulated by varying both the number of displays that were to be monitored ( 1 vs. 3 vs. 4) and the discriminability of the signals. Each display included a pointer which moved randomly with respect t o a center marker under non-signal conditions. A signal occurred when the pointer movement became biased, such that a disproportionate percentage (i.e., 95%, 8 5 % , or 75%) of moves occurred o n one side of the center marker. A 95% bias was more discriminable than an 85% bias, which could b e more easily discriminated than a 75% bias. Three levels of perceptual loading were achieved with the following combinations of dials and bias levels: (1) one dial at a 95% bias, (2) three dials at an 85% bias, and (3) four dials at a 75% bias. The procedure in the Sternberg memory search task was similar to that which was described previously, and required that subjects determine if a probe letter was a member of a specified memory set. Different loading levels were achieved by manipulating the size (1 vs. 4 items) of the memory search set. The tracking task required that subjects control the movement o f a visual target with a joystick controller. Demand was manipulated by varying the instability (lambda levels of 2.4, 3.6, and 6.0) of the target element. Manipulations of demand in the display monitoring and memory tasks were therefore designed to principally involve perceptuallcentral processing functions, while demand variations in the unstable tracking task were predominantly related to motor output loading. The results of the three cxperiments are illustrated in Figure 4, which shows levels of IPT performance as a function of demand level in each of the primary tasks. The IPT workload score was based on the variability of interval durations, and was derived for individual subjects in each demand condition by subtracting a baseline single task score from the dual task score and dividing by the baseline. Therefore, higher scores are associated with larger decrements in performance relative to single task baselines. As is clear, secondary II’T performance varied systematically with manipulations of tracking task demand. However, IPT performance was not significantly affected by demand variations in either the display monitoring or memory search tasks. These results can be interpreted as indicating that the IPT is sensitive to manipulations of motor output demand, but is relatively insensitive to such variations in perceptuallcentral processing demand. Similar patterns of differential sensitivity that can be related to the stages of processing dimension have been reported for the 1’300 component of the evoked cortical response (Isreal, Chesney, Wickens, & Donchin, 1980; Isreal, Wickens, Chesney, & Donchin, 1980).These patterns of specific sensitivity suggest that although selected secondary task and physiological metrics can reflect levels of expenditure within particular capacities of the human system, they can be relatively insensitive to other forms of capacity expenditure. Such diagnostic measures therefore provide a workload index for selected processing functions, and cannot be assumed to reflect general levels of loading throughout the processing system. There are datd. however, which indicate that primary task measures and some subjective and physiological procedures may be generally sensitive to capacity expenditure anywhere within the human system. These techniques may, therefore, provide more global measures of load. Current data which support the global sensitivity of subjective measures are primarily derived from programs that were designed to systematically evaluate the sensitivity of a particular subjective metric (e.g., Hart & Staveland, in press; Reid, 1985; Wierwille & Casali, 1983a). The SWAT development program (Reid, 1985), for example, has included sensitivity evaluations in laboratory, simulator, and field-based environments. A number of the laboratory studies employed tasks designed to place heaviest processing demands on several of the capacities identified by multiple resources theory (Wickens, 1984b), and SWAT has demonstrated its sensitivity across the range of processing functions represented in these experi-
F. T.Eggemeier
48
::I
PERCEPTUAL 2.0 -PSYCHOMOTORI.8 -CENTRAL DEMAND PROCESSING DEMAND DEMAND 1.8 1.6 .1 1.7 1.1.4
P a
1.4 1.3 1.2 1.11.0 .9 .8 .7 -
2 1.6 g 3 1.1 p
Y
1.4
a 8 1.3 v)
1.1
f
F 1.0
z
-9
si
.7
.2 .3
- 0 .2 -
.3 .4
--
.4 1
1
I
2.4 3.6 6.0 TRACKING (LAMBDA)
I
I
I
I
I
4 1/95 3/85 4/79 MEMORY SEARCM MONITORING (SET SIZE1 (DISPLAYS/% BIAS)
I
Figure 4. Performance in a secondary intewal production task as a Junction of demand levels in tbree primary tasks emphasizing different processing functions. (Redmwn from Sbingledecker, Acton, & Crabtree, 1983. Reprinted with permission. Copyright 1983, Society of Automotive Engineers, Inc.)
ments. A subset of the tasks and associated processing functions to which SWAT has demonstrated its sensitivity include: visual display monitoring (Eggemeier & Amell, 1987; Notestine, 1984) which was designed to heavily load perceptual input capacity; verbal (Eggemeieret al., 1982; Eggemeier et al., 1983) and spatial short-term memory (Eggemeier & Stadler, 1984) which primarily loaded two major central processing coding dimensions; and unstable tracking (Eggemeier & Amell, 1987; Reid et al., 1981) which exerted heavy demands on motor output capacity. Further references to work which supports the sensitivity of SWAT to various forms of capacity expenditure can be found in Reid (1985). Similar patterns of general sensitivity have been reported by Wierwille and Casali (1983a) using a modified version of the Cooper-Harper(1969) aircraft handling characteristicsscale, and by Hart and Staveland(in press) with multldimensional workload rating technique developed by the NASA-Ames Research Center. The modified Cooper-Harper (MCH) scale requires direct estimates of workload and effort expenditure by subjects, and proved sensitive to a number of different demand manipulations in a series of flight simulator experiments (Wierwille& Casali, 1983a). Likewise, workload ratings derived from application of the NASA multidimensional procedure demonstrated sensitivity in a variety of laboratory and simulator studies that were conducted as part of the program to develop the technique (Hart & Staveland, in press). The pattern of sensitivity which has emerged from systematic work with rating scale procedures such as SWAT, the MCH scale, and the NASA multidimensional technique suggests that subjective measures are capable of reflecting variations in effort expenditure across a variety of processing functions, and indicates that these rating scale approaches should be considered global rather than diagnostic in their sensitivity. Although they exhibit high degraes of Sensitivity only at levels of capacity expenditure that exceed the threshold for unimpaired performance, primary task measures appear to represent global indices of workload under such conditions. Theoretically, an overload of any capacity (e.g., central processing, motor output) should lead to performance degradations, since successful performance is dependent o n the variety of capacities required by the task. Primary task measures have demonstrated the anticipated Sensitivity to a variety of manipulations that would be expected to heavily load perceptual, central processing, and motor output functions within the human system (O'Donnell & Eggemeier, 1986). As
Workload Assessment Techniques
49
a consequence, it appears that such measures should be considered global in their sensitivity. Likewise, those physiological measures with the potential to index levels of activation throughout the processing system could be expected to exhibit global rather than diagnostic sensitivity. Beatty (1982), for example, has reviewed the literature which supports the capability of a pupil dilation measure to reflect levels of loading across a range of processing functions. Multiple resources theory and the noted data therefore provide a framework which supports a distinction between global and diagnostic metrics. It is probable that results derived from global and diagnostic measures will exhibit some dissociation in those situations that involve a mismatch between primary task demand and the sensitivity area of a diagnostic metric. Therefore, the distinction suggests some caution in interpretation of capacity expenditure estimates derived from application of the two types of measurement procedures.
INTRUSIVENESS Intrusiveness (Eggemeier, 1984; Shingledecker, 1983; Wickens, 1984a; Wierwille & Williges, 1978), the tendency to cause unintended degradations in ongoing primary task performance, can pose potentially serious problems in application of a workload measurement technique. Such problems are primarily related to the interpretation of results obtained with an assessment procedure, and with application of techniques to operational environments. Significant intrusiveness can produce difficulties in interpreting capacity expenditure estimates derived from an assessment procedure. A technique whose use leads to primary task performance decrements would not be expected to accurately reflect the expenditure levels that would be associated with unimpaired performance. The tendency to intrude on primary task performance can also lead to problems in application of a measurement procedure. Levels of intrusiveness which could be accepted in the laboratory might not be tolerable in operational environments where any compromises in system safety would be unacceptable. Although systematic evidence regarding the intrusion associated with individual assessment techniques is not extensive, it appears likely that intrusion does not represent a static property of a technique, but may vary as a function of factors such as the type and level of primary task loading. One of the few systematic efforts to compare intrusiveness among techniques (Casali & Wierwille, 1983, 1984; Wierwille & Casali, 1983b; Wierwille & Connor, 1983; Wierwille, Rahimi, & Casali, 1985), for example, demonstrated different patterns of intrusion with a secondary time estimation task in a series of investigations that involved different types (e.g., central processing, motor output) and levels of primary task loading. The potential for variations in intrusiveness as a function of primary task type is consistent with the multiple resources approach to capacity limitations discussed previously. If some forms of intrusion represent the re-allocation of primary task capacitylresources to information processing requirements that are associated with a measurement technique, then levels of primary task decrement should vary as a function of the degree of overlap in the capacities demanded by the primary task and the assessment procedure. The differences in secondary time estimation intrusiveness reported by Wierwille and Casali (1983b) can be viewed as at least partially related to such overlap if it is assumed that the time estimation task drew heavily on central processing capacities. Time estimation interfered significantly with a flight simulator navigation task which was designed to load central processing capacities, but not with other flight simulator tasks that emphasized perceptual, motor output, or auditory monitoring functions. Within this framework, intrusiveness is similar to sensitivity, in that both properties can vary to some extent with the overlap in the capacities required by the primary task and the measurement procedure. It is important to note, of COUR, that such capacity-specificinterference does not represent the only potential cause of primary task degradation that can be associated with use of an assessment procedure. To the extent that use of a measurement technique is occasioned by distraction or other
50
F.T. Eggemeier
general interference with the primary task, intrusion that is not attributable to specific capacities will be observed. However, when these general factors are equivalent, the framework predicts relatively more interference in instances of capacity overlap than in those situations where minimal overlap exists. Intrusion can also be related to the amount of capacity expenditure associated with the combination of the primary task and the assessment procedure. Re-allocation of resources to the measurement technique should be more obvious under high as opposed to low levels of loading. For example, if a subjective rating scale requires the use of central processing capacity to judge and retain the amount of effort experienced during performance of a primary task, this additional capacity expenditure should be more obvious if primary task levels are already near the threshold for degraded performance outlined in Figure 1. The foregoing discussion is based on the assumption that the degree of intrusion can be significantly affected by the amount and pattern of operator capacity expenditure associated with use of a measurement technique. In this view, secondary task methodology should be the most intrusive of the major categories of techniques, since the capacity expenditure associated with its use should be substantial and would overlap temporally with the demands of the primary task. In fact, secondary task methodology has the potential to suffer not only from such capacity interference, but also from so-calledperipheral interference (Wickens, 1984b) which stems from physical input or output constraints (e.g., the inability to generate simultaneous responses to two tasks with the same hand) within the human system. Subjective techniques, whose demands are typically imposed after the completion of primary task performance, and physiological techniques, which would usually minimize processing demands, should demonstrate lower levels of intrusion. Data derived from individual applicationsof each class of technique are generally consistent with these expectations.
Intrusion With Secondary Task Techniques First, it is evident that there has been a high incidence of intrusion in laboratory applications of secondary task methodology (O’Donnell & Eggemeier, 1986; Ogden, Levine, & Eisner, 1979; Rolfe, 1971; Wierwilie & Wiiliges, 1978). The most common application of the methodology is the subsidiary task paradigm (Knowles, 1963), which requires that subjects maintain concurrent primary task performance at single task baseline levels. The intrusion problem in this paradigm has led to application of several techniques (Casali & Wierwille, 1983. 1984; Hart, 1978; Kelly & Wargo, 1967; Shingledecker, 1980a; 1983) which are designed to protect primary task performance. One such approach (Casali & Wierwilie, 1983, 1984; Hart, 1978; Shingledecker, 1980a) has involved investigating the utility of secondary tasks that minimize either perceptual input or response output requirements. This approach attempts to limit or control the degree of peripheral interference by minimizing the input andlor output requirements of a secondary task. The IPT (Michon, 1966; Shingledecker, 1980a; Shingledecker et al., 1983), which was discussed previously, represents an approach which limits the perceptual input requirements of the secondary task. Because it requires a continuous series of regular motor responses which are independent of external cues, this task minimizes the potential for peripheral interference problems associated with stimulus input. As noted above, Shingledecker et al. (1983) have demonstrated the utility of this task in indexing the motor output load imposed by a primary task. A second approach to protecting primary task performance which was designed to limit intrusion by controlling allocation of processing resources to the secondary task is the embedded task procedure (Shingledecker, 198Oa; Shingledecker, Crabtree, Simons, Courtright, & O’Donnell, 1980). This approach uses a task from normal system operational procedures as the secondary task, and is applicable to simulation and operational environments as well as to the laboratory. The technique is designed to minimize intrusion by identifying secondary tasks from system operation functions with a lower priority than primary tasks, thereby controlling the capacitylresource allocation policy of the subject. Use of normal system tasks affords the additional advantages of minimizing secondary task instrumentation requirements, and increasing the likelihood of operator acceptance of the measurement procedure.
Shingledecker et al. (1980) investigated the feasibility of using radio communications as an embedded
Workload Assessment Techniques
51
secondary task. Specifications of input messages and response requirements from sample aircraft communications tasks were obtained through interviews with pilots. The tasks were scaled to derive estimates of the loading associated with each so that quantified levels of subsidiary task demand could be produced. In order to assess the sensitivity of the scaled tasks, Shingledecker and Crabtree (1982) conducted an experiment in a laboratory analog of a flight simulator. The secondary communications tasks were performed both singly and in combination with a primary tracking task that was intended t o represent flight control activities o f varying degrees of difficulty. Aircraft communications panels were installed in a fixed-based cockpit with a controller for the primary tracking task. Performance of several communications tasks varied with the presence or absence of the tracking task, and as a function of tracking task difficulty. Results of the study therefore supported the use of some embedded radio communications tasks to assess workload. Additiondl research is required with operational pilots to further evaluate the sensitivity of thcse tasks, and to investigate the degree of intrusion that would be associated with them in a high fidelity flight simulator.
Intrusion With Subjective and Physiological Techniques As predicted by the framework outlined above, the reported incidence of intrusion with subjective and physiological techniques has been minimal (O'Donnell & Eggemeier, 1986). Current evidence regarding subjective techniques (Casali & Wierwille, 1983, 1984; Eggemeier & Amell, 1987; Wierwille & Conner, 1983; Wierwille et al., 1985) indicates that when applied after the completion of primary task performance, none o f thc rating scales employed in the experiments conducted to date resulted in significant levels of intrusion. Eggemeier and Arnell(l987), for example, performed two experiments in which the SWAT procedure was used t o gather subjective estimates of the workload imposed by several conditions in an unstable tracking task and in a display monitoring task. The first experiment required that subjects perform an unstable tracking task similar to that used by Jex et al. (1966). Several difficulty levels were achieved by varying the instability (lambda levels of 1, 2, and 3) of the target element. SWAT ratings were completed by subjects o n one-half of the trials, but were not required on the remaining trials. Root mean square (RMS) tracking error and the number of times that subjects lost control of the target element served as the measures of tracking performance. The results are illustrated in Figure 5, which shows
70 t . WORKLOAD RATING 0-
* NO WORKLOAD RATING
I
2
3
I
-
W WORKLOAD RATING
* NO WORKLOAD RATING
I
2
3
TRACKING TASK INSTABILITY-LAMBDA LEVEL IftgureI. Root mean square tracking m o r and control losses as a function of task demand and workload rating condition. (Redrawn from Eggemeier & Amell, 1987.)
52
F. T.Eggemeier
R M S tracking error and the mean number of control losses as a function of task demand under the two rating conditions. As is clear from Figure 5 , the requirement to provide SWAT ratings had no significant effect on either RMS error or the mean number of times that subjects lost control of the target element. Subjective workload estimates obtained from the SWAT procedure on those trials which required ratings increased systematically which increases in task demand. The second experiment followed an identical procedure, except that a display monitoring task replaced the tracking task. The display monitoring task was similar to the previously described variant of the Chiles et al. (1968) procedure. Demand was manipulated by varying the number of displays ( 1 , 2, or 3) to be monitored for the occurrence of signals. The requirement to provide SWAT ratings failed to affect any of the performance indices that were recorded, including mean time to detect signals, the number of missed signals, and the number of false alarms. The SWAT ratings did, however, discriminate the three levels of loading in the monitoring task. The pattern o f results from these experiments is therefore consistent with the expectation that a subjective opinion measure completed subsequent to primary task performance should not be associated with substantial levels of intrusion. I t should be noted, however, that the results apply only to the perceptual and motor functions emphasized in the display monitoring and tracking tasks, respeaively. It is possible, for instance, that intrusion would occur in a task emphasizing memory functions, since subjective techniques require that judgments regarding experienced levels of effort or capacity expenditure be retained until they are reported at the completion of task performance. Work is currently underway to evaluate this possibility. Finally, although the Eggemeier and Amell (1987) results were obtained with the SWAT procedure, the Same pattern of nonintrusiveness has been reported with the MCH workload rating scale in the previously noted flight simulator experiments (Casali & Wierwille, 1983, 1984; Wierwille et al., 1985). Current information regarding physiological techniques essentially parallels that which is available for subjective ssessment procedures. Physiological procedures typically do not require expenditure of operator processing capacity, and for the most part, appear to involve minimal risk of intrusion. Any potential for intrusion from application of physiological techniques would appear to come from possible operator distraction or discomfort that might be associated with recording equipment, but present evidence suggests that this has not represented a significant problem in applications to date (e.g., Wierwille & Casali, l983b).
IMPLICATIONS OF PROPERTIES The theoretical positions and data outlined above indicate that sensitivity and intrusiveness represent complex properties that can be affected by several factors. Techniques differ with respect to both properties, and these differences suggest that no individual metric is capable of meeting the range of sensitivity and intrusion requirements that can be associated with various workload measurement applications. The noted sensitivity and intrusion patterns, when coupled with instrumentation requirements, cdn be used to guide the selection of a metric for specific applications (Eggemeier, 1984).Primary task measures should be employed, for instance, when the objective is to determine the adequacy of performance that can be expected with a particular design option. Such measures do, however, require the capability to acquire and record time and error information, and have the potential disadvantage of not discriminating capacity expenditure differences that are below the threshold for unimpaired performance. Consequently, a problem requiring a more sensitive workload evaluation in an operational environment that necessitates minimal intrusion and precludes performance measurement might be more appropriately addressed by subjective techniques. These techniques could meet the objectives and constraints of the noted problem, since they appear to provide global sensitivity, incur little likelihood of intrusion, and also minimize instrumentation requirements. Current data suggest, however, that alternatives to subjective measures would be required for an evaluation conducted to specify the locus of an overload which had been identified with a global metric. This type of application would call for use of more diagnostic secondary task or physiological techniques. The potential capability of such measures to identify the particular processing function or functions (e.g., perceptual, motor) which are
Workload Assessment Techniques
53
most heavily loaded can be useful in specifying the type of design modification that might alleviate the overload. Perceptual overloads, for instance, might suggest reductions in the information content of displays, while high motor output levels would indicate the possible need for modified controls. In many instances, such diagnostic work could be conducted in a simulator or laboratory environment, facilitating the use of physiological recording equipment, and minimizing the practical consequences of any secondary task intrusion. Considering the variety of objectives and constraints that can be associated with application of workload metrics, it is clear that a comprehensive workload assessment methodology will require the complementary use of several measurement procedures. In fact, the objectives of a particular problem will frequently lead to application of more than one type of technique. I t would be typical, for instance, to use primary task measures and one o r more additional metrics in an evaluation of alternative designs or operating procedures. Since specification of the operator performance levels that are associated with a design or procedural option is central to most evaluations, primary task measures would be applied to gather such information. Depending upon the objectives and practical constraints of an evaluation, selected subjective, secondary task, or physiological techniques would be employed to provide additional capacity expenditure information. The capacityleffort expenditure data derived from these techniques represent very important supplements to primary task information, since equivalent levels of primary task performance do not provide a strong basis to infer that the workload imposed by design alternatives or tasks is equivalent. The global versus diagnostic capability afforded by the potentially more sensitive alternative measures also suggests complementary application of techniques which differ on this dimension. Globally sensitive techniques might be initially applied, for example, to determine if high levels of loading exist anywhere within a particular design or procedural option. This global evaluation could be followed by use of more diagnostic techniques to pinpoint the locus of any high levels of loading identified in the overall screening. In addition to differences in the objectives to be satisfied by a measurement technique, methodological considerations can also lead to concurrent application of multiple techniques. Proper interpretation of secondary task results, for example, requires measurement of primary task performance under both single task and dual task conditions so that the degree of any intrusion can be assessed (O’Donnell & Eggemeier, 1986). Although current data provide the basis to evaluate the utility of measurement techniques at the general levels that have been noted, further refinement of selection and application guidelines requires more extensive comparative information on the sensitivity and intrusion properties of individual techniques. As indicated above and in several reviews of the workload measurement literature (O’Donnell & Eggemeier, 1986; Wierwille & Casali, 1983b; Wierwille & Williges, 1978), the data base comparing individual techniques within major categories along these dimensions is quite limited. Available data suggest that differences exist between techniques within some categories, but not in others. Current information comparing alternative subjective techniques (Vidulich & Tsang, 1985; Wierwille & Casali. 1983b), for example, indicates that a high degree of correspondence has been obtained under the conditions that have been evaluated. However, more extensive work is required before firm conclusions can be drawn regarding the degree of comparability among rating scale techniques. In contrast to the results with subjective techniques, current secondary task data demonstrate some differences in sensitivity between techniques(e.g., Wetherell, 1981;Wierwille & Casali, 1983b).Similar patterns ofdifferential sensitivity have been obtained with some physiological metrics (Wierwille & Casali, 1983b; Wilson & Heinrich, 1987). These differences emphasize the need for programmatic research to investigate the sensitivity of individual measures from each of these categories, and suggest that batteries which include a number of both secondary task and physiological techniques might be required to meet the sensitivity requirements of various applications (Eggemeier, 1981; Knowles, 1%3; O’DOMell, 1983; Shingledecker, 1983). Programmatic sensitivity and intrusion research at both the individual technique and category levels requires a standard workload evaluation methodology (Acton, Crabtree, & Shingledecker, 1983; Eggemeier & Reid, 1986; Shingledecker et al., 1983) which will permit comparison of these properties across techniques. The next section describes several elements that are necessary in such a methodology, and also reviews the development of a standardized battery of primary loading tasks which represents the central feature in the recommended methodology.
54
F. T. Eggemeier
WORKLOAD METRIC EVALUATION METHODOLOGY In order to refine current guidelines for selection and application of workload assessment techniques, systematic research must be conducted to specify the relative sensitivity and intrusiveness that are associated with individual techniques. Without such data, neither a standard set of assessment techniques nor the required guidelines can be developed. The key elements in a methodology designed to permit comparisons of properties among techniques include standardized testing procedures and a standard set of primary loading tasks which can provide a uniform basis for metric evaluation. The inability to draw detailed comparative data from the existing literature stems largely from the fact that when individual metrics have been applied to evaluate workload in more than one setting. there have typically been variations in the testing procedures, primary tasks, or levels of loading across studies. Therefore, apparent differences in the sensitivity and intrusiveness between techniques cannot be properly interpreted. Since it is likely that both sensitivity and intrusion will vary as a function of the locus and level of primary task demand, development of an adequate Comparativedata base requires that these properties be evaluated across a range of information processing functions and loading levels. A standard set of primary loading tasks with known demand levels on each of several processing functions therefore represents an essential component of a workload metric evaluation methodology. Given such a battery, loading levels could be manipulated in individual tasks that emphasize particular processing functions, and the capability of workload metrics to reflect these manipulations assessed. The pattern of sensitivity to the processing functions represented in the battery would provide evidence of the global versus diagnostic nature of a metric, and would specify areas of maximum sensitivity for diagnostic measures. The potential for intrusion as a function of type and level of processing demand could also be evaluated in such an approach. The Criterion Task Set (CTS) (Shingledecker, 1984; Shingledecker et al., 1983; Shingledecker, Crabtree, & Acton, 1982) is a battery of primary tasks that was developed to provide the required capabilities for comparative evaluation of workload assessment techniques. The original or baseline version of the battery has been instrumented on a microcomputer system (Acton & Crabtree, 1985), and a number of initial applications have been completed. The following sections describe the battery and its development in more detail, and discuss its application to metric evaluation and other performance assessment areas.
The Criterlon Task Set The baseline version of the CTS (Shingledecker, 1984) included nine primary loading tasks intended to represent a range of human information processing functions involved in performance of complex tasks. The current battery (e.g., Amell, Eggemeier, & Acton, 1987) includes some modifications to the original versions of the same nine tasks. Choice of tasks for the CTS was guided by a model/framework of the human information processing system (Shingledecker, 1984) that had been derived from theoretical positions regarding human processing functions and limits. In developing the model, emphasis was placed on multiple resources approaches (Navon & Gopher, 1979; Wickens, 1980; 1984b)to processing functions. Figure 6 is adapted from Shingledecker (1984). and depicts the CTS processing framework. As illustrated, three major dimensions of information processing have been incorporated into the model. These include stages of processing, modalities and codes of processing, and functions of central processing. A number of individual processing functions are identlfied within each dimension. The stages dimension includes perceptual input, central processing, and motor output functions. Within the modality/codes dimension, visual input is distinguished from auditory input, manual output from verbal output, and verbal/syrnbolic processing from spatial processing. Finally, the central processing dimension differentiates working memory as the locus of central activity from three processing functions: (1) information manipulation or transformation (e.g.,pattern analysis, mathematical computation); (2) reasoning activities, which center on extraction of relational rules from information (e.g.,logical analysis, problem solving); and (3) planning and scheduling activities involving multi-attribute decision analyses (e.g.,
Workload A ssessrn en t Techniques
55
CRITERION TASK SET PROCESSING FUNCTION/RESOURCE FRAMEWORK STAGE/ STRUCTURE
MODE1 CODE
AUDITORY
SYMBOL1 C
q-) ACTIVITY\ FUNCTION
ENCODING STORAGE
(y=) ACTIVITY
INFORMATION MAN1 PULATION REASONING PLANNING & SCPEDUIING
Figure 6. A descriplicle model//rcrmework of human informalion processing functions and resources (Adapted from Shingledecker, 1984).
system supervision) Each task in the battery was chosen to place its heaviest demands o n one of the processing functions of the model. Table 1 is adapted from Shingledecker (1984), and lists the tasks and associated processing functions that are included in the current battery. Table 1 CTS TASKS AND ASSOCIATED PROCESSING FUNCTIONS
Task Visual Display Monitoring Continuous Recognition Memory Search Linguistic Processing Mathematical Processing Spatial Processing Grammatical Reasoning Unstable Tracking Interval Production
Processlng Function Visual Perceptual Input Working Memory Encoding/Storage Working Memory StoragelRetrieval Symbolic Information Manipulation Symbolic Information Manipulation Spatial Information Manipulation Reasoning Manual Response SpeedlAccuracy Manual Response Timing
(Adapled from Sbingledecker, 1984.) Parametric evaluations have been conducted (Amell et al., 1987; Eggemeier & AmeU, 1986; Shingledecker, 1984) with each of the tasks in the battery to determine the amount of training required to attain stable
F. T. Eggemeier
56
performance levels and to establish standard task loading levels. Stable performance levels were considered a prerequisite to use of the tasks to evaluate the sensitivity and intrusiveness of workload measures. Likewise, standard levels of loading were essential to comparisons between metrics, since there is reason to expect that both sensitivity and intrusiveness can vary as a function of primary task demand levels. In these evaluation experiments, loading parameters (e.g., size of the memory search set; number of displays to be monitored) appropriate for each of the tasks were manipulated. Analyses were conducted on both speed and accuracy measures to select three loading levels that were associated with reliably different levels of performance on each task that was evaluated. Eggemeier and Amell (1986), for example, evaluated a CTS version of the probability display monitoring task (Chiles et al., 1968) that was discussed previously. An initial parametric study was conducted to examlne the effects of variations in the number of displays and discriminability of signals on both reaction time and errors. The results of this study indicated that reliably different levels of performance could be obtained by manipulatlng the number of displays to be monitored (1, 2, or 3) within the condition of highest signal discriminability (95% bias). A subsequent validation study was conducted to verify the effectiveness of this manipulation, and to specify the amount of training that would be required to reach stable levels of performance on this variant of the task. Figure 7 is drawn from the validation experiment (Eggemeier & Amell, 1986), and illustratesmean reaction time and the mean percentage
2
I
3
I
2
3
NUMBER OF DISPLAYS TO BE MONITORED Figure 7. Mean maction time and mean percentage of missed signals as a functlon of t k number disprays lo be monitored.
OJ
(Redrawn from Eggemeler & Amell, 1986.)
of missed signals as a function of the number of displays to be monitored. As is clear from Figure 7, the mean reaction time to signals varied systematically with increases in the number of displays, as did the percentage of missed signals. The differences in reaction time between all three conditions were reliable, and the differences in missed signals between the lowest and highest display conditions were significant. On the basis of these results, standard loading levels of one, two, and three displays were established for the current version of the CTS display monitoring task.
Similar results (Acton et al., 1983; Amell et al., 1987; Shingledecker, 1984; Shingledecker et al., 1982) have permitted specification of three loading levels for seven of the remaining eight CTS tasks. As currently configured, the IPT d w s not Incorporate a difficulty manipulation. Applications of the CTS Battery
Elements from the CTS have been employed to investigate properties of several workload measurement techniques (e.g., Eggemeier & Amell, 1987; Potter & Acton, 1985; Shingledecker et al., 1983; Wilson & Heinrich, 1987). The previously cited work on IPT sensitivity (Shingledecker et al., 1983), for example, w d variants of elements from the baseline version of the battery as primary loading tasks.
Workload Assessment Techniques
51
Likewise, the sensitivity and intrusion analyses of the SWAT technique (Eggemeier & Amell, 1987) that were referenced above used the current versions of the CTS unstable tracking and display monitoring tasks to provide primary task loading o f motor output and perceptual input functions, respectively. Potter and Acton (1985) recently investigated the sensitivity of SWAT to demand manipulations in the CTS continuous recognition task, and the technique proved capable of reflecting demand manipulations in this task. Wilson and Heinrich (1987)used the CTS display monitoring and mathematical processing tasks to investigate the sensitivity of the SWAT technique and physiological workload measures derived from heart rate and evoked cortical response indices. SWAT proved sensitive to demand manipulations in each task, while differential patterns of sensitivity were obtained with the heart rate and cortical response measures. This type of result is consistent with the previously outlined framework which suggests that subjective techniques represent global measures of loading, while other techniques may exhibit more restricted patterns of diagnostic sensitivity. Use of the CTS in these types of evaluations provides a basis to generate systematic sensitivity and intrusion patterns for individual techniques, and can also provide the capability to build a data base comparing classes of assessment techniques on relevant properties. Although application t o workload metric evaluation research constitutes a principal use of the CTS, the battery can also be applied to assess the effects of a variety of stressors (e.g.,extreme environmental conditions, drugs, fatigue)on operator performance. Evaluationsof stress effects typically require a range of loading levels to properly assess potential impacts on performance, since such effects are sometimes detected only at high levels of task demand. While it is not possible to ensure that the range of task demand will be sufficient to detect interactive effects between demand levels and stressors, the multiple loading levels incorporated into CTS tasks increase the likelihood of such sensitivity. Likewise, the variety of processing functions represented in the battery increase its potential sensitivity in such applications. It IS quite possible, for instance, that a particular stressor might significantly affect one information processing function (e.g., motor output, working memory), while leaving other functions unimpaired. The choice of several tasks from the CTS to represent a range of processing functions for initial evaluation of a stressor can increase the likelihood of detecting any effects which are present, thereby increasing the sensitivity of the evaluation. Finally, the capability to detect the potential effects of any given variable on performance can also be facilitated by the stable levels of primary task performance produced by use of the training procedures that have been specified for each of the CTS tasks. Schlegel, Gilliland, and Schlegel(1986) have reported an initial application of the CTS to evaluate the effects of sleep loss and noise stressors on performance. The noise levels employed in the experiment had no reliable effect on performance o f tasks from within the battery. However, sleep loss did significantly impair response times in the central processing tasks, and also degraded both interval production capability and certain levels of tracking performance. The Schlegel et al. (1986) experiment demonstrates the use of the CTS to compare the effects of different types and levels of stressors on performance across a range of processing functions, and illustrates the pattern of stress sensitivity that can result from application of the battery as a primdry task assessment device.
SUMMARY A N D CONCLUSIONS Sensitivity and intrusiveness are important properties that significantly affect the utility of workload assessment techniques, and current theory and data indicate that a number of variables can affect these properties in an assessment procedure. Present sensitivity and intrusiveness data support a number of general application guidelines for metrics, but a more advanced selection and application methodology will require further evaluation and refinement of these guidelines.Current theoretical frameworks which propose factors that can influence properties of assessment techniques must be tested more extensively, and more definitive comparative information regarding the sensitivity and intrusiveness of individual procedures must also be developed. Present information makes it likely that an advanced metric selection and application methodology will require the complementary use of physiological, subjective, and performancebased techniques.
58
F. T. Eggemeier
In addition to programmatic work to develop more extensive comparative data on existing techniques, future research should evaluate procedures that demonstrate the potential to overcome possible deficiencies in the sensitivity or intrusiveness of present techniques. For example, the central role of primary task measurement in workload and performance assessment was discussed previously. An important disadvantage of such measures is their potential inability to reflect capacity expenditure differences helow the threshold for unimpaired performance. Therefore, an important area for future research would he to examine approaches which could increase primary task sensitivity prior to actual performance breakdowns. One such approach which has been discussed by several investigators (Eggemeier, 1980; Shingledecker, 1980b; Williges 81Wierwille, 1979)involves examining changes in operator performance strategies which occur with increases in task demand. Traditional primary task measures index the adequacy of performance. but do not examine the approaches or skategies used to achieve those levels of performance. The principal rationale for invariance of primary task performance with increases in demand is that the operator compensates for such increases and is therefore able to maintain performance. If operator compensation involves modifications of the way in which the primary task is performed, these changes could be used as possible indicants of increased demand. Several types o f compensatory strategies which permit maintmance of performance have, in fact, been identified (Meister, 1976; Shingledecker, 1980b; Sperandio, 1978; Welford, 1978). Development of primary task metrics which reflect such strategies could potentially increase the sensitivity of these measures, and would represent a significant augmentation of current workload assessment procedures. A second important area for future research is the investigation of methodological issues associated with applications of the secondary task technique. Because it is designed to measure the spare processing capacity afforded by the primary task, the secondary task technique represents the most direct index of workload as defined within the capacity expenditure framework outlined above. Consequently, secondary task measures represent an important and potentially useful tool for workload assessment applications. The comparative research suggested above will provide a more extensive basis to evaluate differences in intrusiveness among different secondary tasks, but it is likely that intrusion will continue to represent a problem for some applications of the procedure. Since intrusiveness represents a potentially greater problem in operational environments than in simulation or laboratory settings, it is important that this property be evaluated across a range of applications. With relatively few exceptions ( e g , Brown, 1968; Brown, Simmonds, 81Tickner, 1967; Schifflet et al., 1982; Wetherell, 1981),secondary task experiments have been conducted in the laboratory, and current intrusion data apply principally to that setting. Additional work of the type reported by Brown (1968), Brown et al., (1963, Schifflet et al., (1982), and Wetherell (1981) should be conducted to assess the intrusion potential of traditional secondary tasks in operational applications,thereby complementing the comparative laboratory research outlined above.
If intrusion does represent a problem in operational and simulation environments, the embedded task method which was discussed earlier represents one promising means of dealing with this difficulty in some situations. However, more extensive testing is required in order to evaluate the general applicability and other essential properties of the technique. Silverstein, Gomer, Crabtree, and Acron (1984) have applied embedded task scaling procedures (Shingledeckeret al., 1980) to commercial aviation communications activities, but additional investigations of the applicability of these procedures to other tasks are required. These investigations should be supplemented with research to document levels of intrusiveness and sensitivity that are experienced with the embedded task technique. Investigation of techniques (e.g., the embedded task procedure; analyses of operator strategies) that demonstrate the potential to address sensitivity and intrusiveness problems of existing metrics can build on information gained from a refined comparative data base, and should contribute to the development of a more advanced and comprehensive workload assessment methodology.
ACKNOWLEDGEMENTS William H. Acton, Herbert A. Colle, Mark S. Crabtree, and Donald J. Polzella made very helpful comments on an earlier version of this manuscript.
Workload Assessment Techniques
59
REFERENCES
Acton, W.H., & Crabtree, M.S., User’sguide for the criterion task set, Harry G. Armstrong Aerospace Medical Research Laboratory Technical Report, (AAMRL-TR-85-034),(Wright-Patterson Air Force Base, Ohio, 1985). Acton, W.H., Crabtree, M.S., & Shingledecker, C.A., Development of a standardized workload evaluation methodology, Proceedings of the IEEE National Aerospace and Electronics C o n f m e (1 983) 1086-1089. h e l l , J.R., Eggemeier, F.T., & Acton, W.H., The criterion task set: an updated battery, Paper prepared for presentation at the Thirty-First Annual Meeting of the Human Factors Society (1987). Bahrick, H.P., Noble, M., & Fitts, P.M., Extra-task performance as a measure of learning in a primary task, Journal of Experimental Psychology (1954) 48, 298-302. Beatty, J., Task evoked pupillary responses, processing load, and the nature of processing resources, Psychological Bulletin (1982) 91, 276292. Bell, P.A. Effects of noise and heat stress o n subsidiary task performance, Human Factors (1978) 20, 749-752. Brown, I.D., Some alternative methods of predicting performance among professional drivers in training, Ergonomics (1968) 1 1 , 13-21, Brown, I.D., Simmonds, D.C.V., and Tickner, A.H., Measurement of control skills, vigilance, and performance o n a subsidiary task during twelve hours of car driving, Ergonomics (1967) 10,665673. Casali, J.C., & Wierwille, W.W., A comparison of rating scale, secondary task, physiological, and primary task workload estimation techniques in a simulated flight task emphasizing communications load, Human Factors, (1983) 25, 623-642. [ l o ] Casali, J.G., & Wierwille, W.W., O n the measurement of pilot perceptual workload: a comparison of assessment techniques addressing sensitivity and intrusion issues, Ergonomics (1 984) 27, 1033-1050. 11 I ] Chiles, W.D., Allusi, E.A., & Adams, O.S., Work schedules and performance during confinement,
Human Factors (1968) 10, 143.1%. 1121 Cooper, G.E., & Harper, R.P., Jr.. Theuseofpilot ratingscales in the evaluation of aircraft handling qualities, (Report No. NASA TN-D-5153), (Moffett Field, California: Ames Research Center, National Aeronautics and Space Administration, 1969).
I131
Dornic, S., Language dominance, spare capacity, and perceived effort in bilinguals, Ergonomics (1980) 23, 366377.
[ I 41 Eggemeier, F.T., Some current issues in workload assessment, Proceedings of the Human Factors Society Twenty-Fourrb Annual Meeting (1 980) 669673.
I151
Eggemeier, F.T., Development of a secondary task workload aSSessment battery, Proceedings of the IEEE International Conference on Cybernetics and Society (1 981) 4 10-414.
1161 Eggemeier, F.T., Workload metrics for system evaluation, Proceedings of the Defense Research Group Panel VIIl Worksbop “Applications of System Ergonomics to Weupon System Development, ” Shrivenham, England (1984) C/5-C/20.
I171
Eggemeier, F.T., & h e l l , J.R., Visualprobability monitoring: effects of display loadand signal discriminability, Paper presented at the Thirtieth Annual Meeting of the Human Factors Society, Dayton, Ohio, (1986).
[IS] Eggemeier, F.T., & Amell, J.R., On the sensitivity and intrusiveness of subjective workload assessment techniques, Manuscript in preparation, Armstrong Aerospace Medical Research Laboratory, Wright-Patterson Air Force Base, Ohio (1987).
60
F. T. Eggemeier
(191 Eggemeier, F.T., Crabtree, M.S., & LaPointe, P.A., The effect of delayed report o n subjective ratings of mental workload, Proceedings of the Human Factors Society Twenty-Seventh Annual Meeting (1983) 139-143.
(201 Eggemeier, F.T., Crabtree, M.S., Zingg, J.J., Reid, G.B., & Shingledecker, C.A., Subjective workload assessment in a memory update task, Proceedings of the Human Factors Society TwenfySixth Annual Meeting (1982) 643-647. (211 Eggemeier, F.T., & Reid, G.B., Standardization of workload metrics for system design, in D.J. Oborne (ed.), Contemporary Ergonomics (London, Taylor & Francis, 1986). [22] Eggemeier, F.T., Shingledecker, C.A., & Crabtree, M.S., Workload measurement in system design and evaluation, Proceedings of the Human Factors Society Twenty-NinthAnnual Meeting (1985) 215-219. (231 Eggemeier, F.T., & Stadler, M.A., Subjective workload assessment in a spatial memory task, Proceedings of the Human Factors Society Twenty-EighthAnnual Meeting (1984) 680-684. 1241 Hart, S.G., Subjective time estimation as an index of workload, Proceedings of the Airline Pilots Association Symposiumon Man-SystemInterface: Advances in Workload Study, (Washington, D.C., 1978) 115-131. 1251 Hart, S.G., & Staveland, L.E., Development of a multidimensional workload rating scale: results of empirical and theoretical research, in P.A. Hancock and N . Meshkati (eds.), Human Mental Workload (Amsterdam, North Holland Publishers, in press). 1261 Isreal, J.B., Chesney, G.L., Wickens, C.D., & Donchin, E . , P 300 and tracking difficulty: Evidence for multiple resources in dual-task performance, Psychophysiology (1980) J 7, 259-273. (271 Isreal, J.B., Wickens, C.D., Chesney, G.L., & Donchin, E., The event-related brain potential as an index of display-monitoring workload, Human Factors (1980) 22, 2 1 1-244. (281 Jex, H.R., McDonneU, J.D., 81 Phatek, A.V., A critical tracking task for man-machine research rrlated to operator’s effective delay time, Proceedings of the Second Annual NASA-University Conference on Manual Control, (Report N o . NASA-SP-128), (Massachusetts Institute of Technology, 1966). [29] Johannsen, G., Moray, N., Pew, R., Rasmussen, J., Sanders, A., & Wickens, C.. Final report of the experimental psychology group, in N. Moray (ed.), Mental Workload:Its Theory and Measurement (New York, Plenum Press, 1979). (301 Kelly, C.R., & Wargo, M.J., Cross-adaptive operator loading tasks, Human Factors (1967) 9, 395-404. (311 Knowles, W.B., Operator loading tasks, Human Factors (1963) 5, 151-161. [32] Meister, D., Behavioral Foutwbtions of System Development(New York, Wiley, 1976). (331 Michon, J.A., Tapping regularity as a measure of perceptual motor load, Ergonomics (1966) 9, 401 -4 1 2. (341 Navon, D., & Gopher, D.,O n the economy of the human processing system, Psychological Review (1979) 86, 214-255. [35] Notestlne, J., Sub)ective workload assessment in a probability monitoring task and the effect of delayed ratings, Proceedings oJthe Human Factors Society Twenty-EighthAnnual Meeting (1984) 685-689. [36] O’Donnell, R.D., The U.S. Air Force neurophysiological workload test battery: concept and validation, I3meedlngsof the AGARD(AMP)Symposiumon Sustained IntensiveAir Operations: Physiological and PerJormance Aspects, (AGARD-CP-338), (November, 1983). 1371 O’Donnell, R.D., & Eggemeier, F.T., Workload assessment methodology, in K. Boff, L. Kaufman, & J. Thomas (eds.), Handbook OJ Pwcqtion and Human Performance, Vol. II: Cognitive Processes and PmJormance, (New York, John Wiley & Sons, Inc., 1986).
Workload Assessment Techniques
61
[38] Ogden, G.D., Levine, J.M., & Eisner, E.J., Measurement of workload by secondary tasks, Human Factors (1979) 21, 529-548. (391 Potter, S.S., & Acton, W.H., Relative contributions of SWAT dimensions to overall subjective workload ratings, Proceedings of the Third Symposiumon Auiation Psychology, (Columbus, Ohio, Ohio State University, 1985) 231-238. 1401 Reid, G.B., The systematic development of a subjective measure of workload, in I.D. Brown, R. Goldsmith, K. Coombes, & M.A. Sinclair (eds.), Ergonomics Internationl85, (London, Taylor & Francis, 1985). (411 Reid, G.B., Shingledecker, C.A., & Eggemeier, F.T., Application of conjoint measurement to workload scale development, Proceedings of the Human Factors Society Twenty-FuthAnnual Meeting (1981) 522-526 [42] Rolfe, J.M., The secondary task as a measure of mental load, in W.T. Singleton, J.C. Fox, and D. Whitfield (eds.), Measurement of Man at Work, (London, Taylor & Francis, 1971). [43] Schiftlet, S.G., Linton, P.M., & Spicuzza, R.J., Evaluation of a pilot workload assessment device t o test alternative display formats and control handling qualities, Proceedings of the AIAA Workshop on Flight Tesling to IdentiJy Pilot Workload and Pilot Dynamics (1982) 222-233. [44] Schlegel, R.E., Gilliland, K., & Schlegel, B., Development of the criterion task set performance data base, Proceedings of the Thirtieth Annual Meeting of the Human Faclors Society ( 1986) 58-62. (451 Shingledecker, C.A., Enhancing operator acceptance and noninterference in secondary task measures of workload, Proceedings of the Twenty-Fourth Annual Meeling of the Human Factors Society (1980a) 674-677. [46] Shingledecker, C.A., Operatorstrategy: a neglected variable in workloadassessment, Paper presented at the Eighty-Eighth Annual Meeting of the American Psychological Association (1980b). [47] Shingledecker, C.A., Behavioral and subjective workload metrics for operational environments, Proceedings of Ihe AGARD(AMP)Symposium on Sustained Intensive Air Operations: Physiological and Performance Aspects, (AGARD-CP-338), (November, 1983), 6/1-6/10. (481 Shingledecker, (:.A,, A task battery for applied human performance assessment research, Air Force Aerospace Medical Research Laboratory Technical Report, (Report N o . AFAMRL-TR-84-071), (Wright-Patterson Air Force Base, Ohio, November, 1984). (491 Shingledecker, (:.A,, Acton, W.H., & Crabtree, M.S., Development and application of a criterion task set for workload metric evaluation, (Paper No. 83 14 19), (Warrendale, Pennsylvania, Society of Automotive Engineers, SAE Technical Paper Series, October, 1983). [SO] Shingledecker, (:.A,, & Crabtree, M.S., Subsidiary radio communications tasks for workload assessment in RGD simulations: II. Task sensitivityeualuation, Air Force Aerospace Medical Research Laboratory Technical Report, (Report No. AFAMRL-TR-82-57), (Wright-Patterson Air Force Base, Ohio, 1982). [51] Shingledecker, C.A., Crabtree, M.S., & Acton, W.H., Standardized tests for the evaluation and classification of workload metrics, Proceedings of the Human Factors Society Twenty-SixfhAnnual Meeting (1982) 648651. [52] Shingledecker, C.A., Crabtree, M.S., Simons, J.C., Courtright, J.F., & O’DOMell, R.D., Subsidiary radio communications tasks for workload assessment in R&D simulations: I. Task development and workload scaling, Air Force Aerospace Medical Research Laboratory Technical Report, (Report No. AFAMRL-TR-80-126),(Wright-Patterson Air Force Base, Ohio, 1980). [53] Silverstein, L.D., Gomer, F.E., Crabtree, M.S., & Acton, W.H. A comparison of anuiyticandsubjective techniques for estimating communications related workload during commercial transportflight operations, Report prepared under Contract No. NAS2-11562, (Dayton, Ohio, General Physics Corporation, 1984).
62
F. T. Eggerneier
1541 Sperandio, J.C. The regulation of working methods as a function of workload among air traffic controllers, Ergonomics (1978) 21, 195.202. 1551 Stadler, M.A., & Eggemeier, F.T.,Codes of processing and timesharing performance, unpublished manuscript, Wright State University, Dayton, Ohio (1985). 1561 Sternberg, S.,High-speed scanning in human memory, Science (1966) 15.3, 652-654. 1571 Vidulich, M.A. & Tsang, P.S., Assessing subjective workload assessment: a comparison of SWAT and the NASA-bipolar methods. Proceedings of the Human Factors Society Twenty-Ninth Annual Meeling (1985) 71-75. 1581 Welford, A.T., Mental work-load as a function of demand, capacity, strategy, and skill, Ergonomics (1978) 21, 151-167. 1591 Wetherell, A. The efficacy o f some auditory-vocal subsidiary tasks as measures of the mental load of male and female drivers, Ergonomics (1981) 24, 197-214.
1601 Wickens, C.D., Measures of workload, stress, and secondary tasks, in N. Moray (ed.), Menlal Workload: 11s Theory and Measuremenl, (New York, Plenum Press, 1979). (611 Wickens, C.D., The structure of attentional resources., in R. Nickerson (ed.),Allenlion and Performance VIII, (Hillsdale, New Jersey, Erlbaum Press, 1980). 1621 Wickens, C.D., Engineering Psychology and Human Performance, (Columbus, Ohio, Charles E. Merrill Publishing Company, 1984a).
1631 Wickens, C.D., Processing resources in attention, in R. Parasuraman and R. Davies (eds.), Varielies of Allenlion, (New York, Academic Press, 1984b). 1641 Wickens, C.D., & Derrick, W ., Workload measurement and multiple resources, Proceedings of [he IEEE Conference on Cybernelics and Society (1981) 600-603. 1651 Wickens, C.D., & Kessel, C., The processing resource demands of failure detection in dynamic systems, Journal of Experimental Psychology: Human Perceplion and Performance (1 980) 6, 564-577.
1661 Wickens. C.D., Mountford, SJ.,& Schreiner, W., Multiple resources, task-hemispheric integrity, and individual differences in timesharing, Human Faclors (1981) 23, 21 1-229. 1671 Wierwille, W.W., & Casali, J.C.,A validated scale for global mental workload measurement applications, Proceedings of the Human Factors Sociely Twenty-Seaenlh Annual Meeting ( 1 983a) 129-13 3.
(681 Wierwille, W.W., & Casali, J.C., Tbe sensilivily and intrusion of menlal workload estimation techniques in piloting tasks, (Report No. 8309), (Blacksburg, Virginia, Virginia Polytechnic Institute and State University, Vehicle Simulation Laboratory, Department of Industrial Engineering and Operatlons Research, September, 1983b). 169) Wierwille, W.W., & Connor, S.A., Evaluation of 20 workload measures using a psychomotor task in a moving-base aircraft simulator, Human Faclors (1983) 25, 1-16. 1701 Wierwille, W.W., Rahimi, M., & Casali, J.C., Evaluation of 16 measures of mental workload using a simulated flight task emphasizing mediational activity, Human Factors (1985) 27, 489-502. 1711 Wierwille, W.W.. & Williges, R.,Suwey andanalysis of operalor workloadassessment lechniques, (Report No. 2-78-1 O I ) , (Blacksburg, Virginia, Systemetrics Corporation, September, 1978).
172) Wierwille, W.W., & Williges, B.H., An annolaled bibliography of operalor menlal workload assessmenl, (Report No. SY-27R-80), (Patuxent Rlver. Maryland, Naval Air Test Center, March, 1980). 1731 Williges, R.C. & Wierwille, W.W. Behavioral measures of aircrew mental workload, Human Faclors (1979) 21. 549-574. 1741 Wilson, G . , & Heinrich, T.,Steady-state evoked responses used to measure task difficulty in three performance tasks, Technical report in preparation, (Wright-Patterson Air Force Base, Ohio, Armstrong Aerospace Medical Research Laboratory, 1987).
HUMAN MENTAL WORKLOAD P.A. Hancock and N. Meshkati (Editors) Elsevier Science Publishers B.V. (North-Holland), 1988 MEASUREMENT OF OPERATOR WORKLOAD WITH THE NEUROPSYCHOLOGICAL WORKLOAD TEST BATTERY Glenn F. Wilson, Ph.D. Human Engineering Division Armstrong Aerospace Medical Research Laboratory Wright Patterson AFB, Ohio 4 5 4 3 3 Robert D. O'Donnell, Ph.D. NTI, Inc. 4 1 3 0 Linden Avenue Dayton, Ohio 4 5 4 3 2
INTRODUCTION Successful physiological measurement of operator workload has, unfortunately, been much easier to conceptualize than to achieve. Workload intuitively would seem to require the expenditure of physiological effort and resources, and it is reasonable to assume that some central or peripheral measure could be found which would index this expenditure. Early attempts to find such an index were, however, remarkably unsuccessful. The lack of correlation between physiological measures and states of consciousness was noted by Johnson ( 1 9 7 0 ) , and the frequent failure of specific physiological measures to correlate with imposed workload has been pointed out (O'Donnell and Eggemeier, 1 9 8 6 ; Wierwille and Connor, 1 9 8 3 ) . Indeed, such failures led to an early feeling that physiological measures might have questionable value as valid, reliable measures of mental workload. However, theoretical and laboratory work continued to include and refine a variety of such techniques. Perhaps because of the inherent simplicity and attractiveness of the relationship between physiology and work, researchers continued to obtain heart rate, muscle spectra, eye blink and even electroencephalographic (EEG) measures in situations where various workload factors wer'e manipulated. These efforts met with mixed success. Occasional reports produced unexpected results which could not have been predicted from other sources. For example, Roman, Older, and Jones ( 1 9 6 7 ) found that heart rate in U. S . Navy combat pilots was highest when they returned to the ship after a combat mission, and not during the mission as would have been predicted. Sem-Jacobsen ( 1 9 6 1 ) found that pilots' EEGs revealed unconsciousness and seizure-like activity during high C-forces. Such effects were not revealed subjectively, nor from behavioral data. They were also not found at the same C-loading in aircraft simulators. Thus, the added "workload" of the aircraft produced an effect revealed only by the physiological measure. Similarly, locomotive operators, while on duty, have been found to show stage I EEG sleep patterns, even when maintaining
63
64
G. F. Wilson ond R. D. O’DonneN
performance in a timing task designed to show that they were awake (Frustorfer, et al, 1977). While results such as these tended to stimulate interest in physiological measures, the preponderance of data continued to be negative, ambivalent, or contradictory. Most consistently, it was found that no physiological measure seemed to change reliably over a number of types of workload manipulation, (e.g., changes in cognitive load versus changes in tracking load). Even under the same type of workload manipulation (e.g., different levels of tracking) changes in most physiological measures were not systematically related to task or perceived workload. This situation existed into the mid-l970s, with occasional successful applications of physiological measures, (e.g., Spyker, et al, 1971), but with a general lack of enthusiasm and even open distrust of the measures (Chiles, 1982). During this period, however, significant changes were occurring in the theoretical conceptualization of workload itself. It was becoming clearer that workload could not be considered as a unitary construct, but was actually multidimensional (Shingledecker, 1983). Workload came to be seen conceptually as that portion of the operator’s capacity which was actually required to complete a task (Gopher and Donchin, 1986; O’Donnell and Eggemeier, 1986). These views clarified a very important fact regarding any attempt to measure the elusive construct “workload”, Obviously, no single measure would ever suffice as the “holy grail” (Shingledecker, Crabtree and Acton, 1982). The multidimensional nature of workload demanded that multiplemeasures be used to cover the entire construct. Thus, while a single metric may assess a specific causal factor, it would be a mistake to demand, as a criterion of acceptance, that any such medsure generalize over all kinds of workload, or even over different levels of the same kind of workload. Further developments reinforced this view. Theoretical positions appeared which postulated separate, relatively independent resources within the person (Norman & Bobrow, 1976; Navon and Gopher, 1979; Wickens, 1980). These resources can be depleted independently by a given task demand, and the overall pattern of such depletion constitutes the workload. It was suggested that resources were differentiated by input or output modality (visual, auditory, etc.), stages of processing (input, central type. of processing (memory, processing, output), or reasoning, decision, etc.) (Wickens and Keesel, 1979). Clearly, if these theoretical positions are true, the goal of workload assessment methodology is to develop reliable, valid, and sensitive measures of each resource “pool“ or capacity. To the extent that this is done, a set of measures will be created that will be “diagnostic” in the sense that they will yield a detailed evaluation of the pattern as well as the amount of workload for any given task. In fact, such diagnosticity has been suggested as a major criterion for evaluating the usefulness of a proposed
Measurement of Operator Workload
65
workload assessment technique (Shingledecker, 1983). It can also provide the basis for practical decisions concerning the selection of a measure for particular applications (O’Donnell and Eggemeier,, 1986). Early attempts to utilize physiological measures as workload assessment devices did not theoretically require such diagnosticity in the measures. In fact, just the opposite was generally true. Hassett (1978) has pointed out that activation level theory led many to conclude that physiological measures all tended to reflect an underlying dimension of arousal and, therefore, were relatively interchangeable. Expectations that there would be high correlations between such measures when they were used in the same situations reflected this view. In the study of workload, activation level theory suggested that perceived or actual effort should be correlated with an increased physiological demand at either a central or peripheral Such demand should be measurable in many level, or both. ways as increased activation. It now appears reasonable to conclude that increased workload does not always result in increased overall activation. A task may deplete one or more resources without creating generalized arousal. This might lead to a situation where a more diagnostic measure would show changes, while a general activation measure would not. If physiological measures could be shown to have such diagnostic specificity, the apparent contradictory results and lack of correlations between various measures would be explainable. Early indications of such specificity for at least some physiological measures had come from studies such as those by Lacey and Lacey (1958). These investigators demonstrated the remarkable ability of cardiac measures to differentiate between various types of task-related ECG deceleration patterns, depending on the direction of the subject’s attention. Further, various aspects of the cardiac cycle were shown to be dependent on the task activity and involvement of the person. Even though studies such as these suggest that physiological measures could be more diagnostic than simple activation theory would suggest, the field in general still failed to grasp the significance of these efforts. A powerful force in demonstrating diagnostic specificity arose in the Cognitive Psychophysiology Laboratory at the University of Illinois, headed by Donchin. This group focused on the transient cortical evoked response, and particularly the P3 or P300 peak of this response. Many studies (summarized later in this chapter; see also Pritchard, 1981) convinced these investigators that the P300 latency and/or amplitude reflected central processing load independent of motor load. These findings moved the field even further from the simplistic workload-effort-activation
66
G.F. Wilson and R.D. O'DonneN
view of physiological measurement, and set the stage for merging the multiple resources approach with the new-found diagnosticity of the metrics. At this point, a number of very practical concerns led the U.S. Air Force to launch a major thrust into the development of workload assessment techniques. As part of the Workload and Ergonomics Branch program of the Human Engineering Division at the Armstrong Aerospace Medical Research Laboratory, Wright Patterson Air Force Base, Ohio, a strong physiological measurement effort directed to workload assessment was begun (O'Donnell, 1 9 8 3 ) . Basic and applied efforts concentrated on defining the state-of-the-art in the area, and on the development of a test battery which could be used in the design and evaluation stages of system development. This effort resulted in the production of the Neuropsychological Workload Test Battery (NWTB) described later in this chapter. The separate measures comprising this battery are discussed in the following sections. MEASURES OF BRAIN FUNCTION General Introduction. Of all physiological measures, the EEG intuitively seems as if it should be the most productive and diagnostic. If central nervous system activity can be tapped nonobtrusively, one should be able to detect subtle ohanges in the person's involvement in a task, and perhaps even determine the brain structures which relate to the resources being used. The brain's electrical activity detected at the scalp is, however, a composite signal reflecting the activity of many neurons and even many brain structures. Therefore, the techniques required to analyze this apparently simple signal can become quite complex. If one is to interpret the EEG, it is necessary to separate the "noise" contributed by unwanted structures from the desired "signal" generated in relevant structures. Much of the history concerning the use of EEQ in workload assessment revolves around the development of such analysis techniques, and the sections below are organized around these developments. Epoch Analysis.
If the EEG is recorded for an extended period of time, the spectrum of all activity can be determined by any of a number of techniques. Typically, the Fourier components of this "epoch" of EEG are calculated, and power at each frequency is determined. This power can be calculated for traditional bands of the EEG which have been related to various behavioral activity levels (e.g., delta, theta, alpha, and beta). From a general and rather simplistic view, it could be postulated that as task involvement rises, there will be a tendency for the gross EEG to ehift to high frequencies, with increased power in the beta or upper alpha
Measurement of Operator Workload
67
bands. Such a view would be consistent with the activationlevel position, and would predict that EEG epoch analysis would reveal those tasks which cause overall activation due to high workload. Several attempts were made, with limited success, to utilize epoch analysis as an activation indicator of the left or right hemisphere (see Donchin, Kutas, and McCarthy, 1 9 7 6 , for a critical review). In one of the few successful applications of this technique to workload assessment, Sterman ( 1 9 8 6 ) for example, has reported reciprocal theta and alpha changes in EEG frequency spectra when subjects were engaged in a flight simulation task, versus when they were not performing the task. These data show at least a gross correlation with subject’s performance-vs-resting states. Further efforts may reveal even more precise relationships between level of workload and EEG spectra. This relatively crude measure has not become very popular as a candidate workload assessment technique. However, it is still possible that in situations where one is only interested in determining the overall activation level during an extended task, or in comparing activation generated by one task as opposed to another, this measure could be of value. Cortical Evoked Potential. The cortical evoked potential (EP) represents the brain’s response to a discrete stimulus and, ideally, is distinct from other cortical activity unrelated to that specific response. It can be isolated from the ongoing EBG through any of a number of techniques. Most commonly, a stimulus is presented many times, and the EEG signal occurring for a brief period of time after each stimulus is sampled. These samples are then averaged, point for point, to produce a composite picture of the brain’s response to that stimulus. This is possible because the evoked response (the signal) is temporally and spatially constant for each stimulus, whereas the extraneous EEG activity (the noise) occurs randomly with respect to the stimulus. Therefore, this time-locked averaging tends to increase the signal-to-noise ratio, and isolate the specific response buried in the EEG. Because of this specificity, it is reasonable to expect that the evoked potential will be more diagnostic than other, more noisy measures of brain functions. Other techniques for isolating the evoked response are available. Donchin and Herning ( 1 9 7 5 ) utilized linear stepwise discriminate analysis (LSDA) to isolate specific features of the evoked response from a single stimulus presentation, without the need to average in the actual experiment. This technique first identifies features of the EP on theoretical or empirical grounds. Features which differentiate between different situations (e.g., high or low workload) can then be determined, and the EEG is scanned to detect these features and classify them. The result is
68
G.F. Wilson and R.D. O’Donnell
that an evoked response can often be detected and classified on a single trial. More complex techniques have been proposed to achieve the same goal. For instance, the quadratic discriminate function (Aunon, McCillem, and O’Donnell, 1982) has shown some increased ability to detect and classify evoked responses to visual stimuli. For other types of EPs, such as the steady-state response described In that below, the frequency of the response is known. case, spectral analysis can be used to isolate the response from the ongoing EEC (Regan, 1972). In any case, several types of evoked potentials (sometimes called event related potentials, or ERPs) can be identified. These include: the transient response, which is obtained from a single stimulus, or from one which is presented repetitively at a fairly slow rate (slower than 1 per second) the steady-state response, obtained when the visual stimulus is presented rapidly (faster than 4 or 5 per second); and the auditory brain-stem evoked response (BSER), obtained from very rapid presentation of a click stimulus These techniques are discussed (faster than 5 per sec). separately below. The TransLent Cortical Evoked Response. The transient response typically consists of a number of relatively consistent positive and negative peaks occurring within 750 msec. after stimulus presentation. It can be obtained from stimulation of virtually any sense modality, Peaks but is most commonly derived visually or aurally. which occur within the first 250 msec. have been related to sensory characteristics of the stimulus, or to early cognitive events (O’Donnell, 1979, Hillyard & Kutas, 1983). A major peak typically occurs between 300 to 600 msec. after stimulus presentation if the subject is actively engaged in performing a task in response to the stimulus. This P3 or P300 peak, first described by Sutton, Tueting, Zubin, and John (1967) has attracted considerable attention as an indicator of specific cognitive events. Many experiments, primarily coming from Donchin’s laboratory at the University of Illinois, have revealed that P300 gmDlitude indexes the degree of subjective surprise to a stimulus. Put differently, the amplitude appears to index the occurrence of a mismatch between a subject’s expectations and the content of a stimulus (Duncan-Johnson and Donchin, 1977). On the other hand, P300 latency appears to be more directly related to the difficulty the subject has in centrally evaluating a stimulus (Donchin, 1981). Thus, the more difficult the task, and the longer it takes for the person to determine how to react to it, the longer the P300 latency. Clearly, then, the transient evoked response, and particularly the P300, is a prime candidate as a measure of workload. The specificity of the response, which has been well documented in the laboratory, suggests that it can
Measurement of Operator Workload
69
provide a relatively uncontaminated index of central processes or resources, and could form an important component of a workload test battery. From a theoretical point of view, one would expect that the EP would show differences in P300 amplitude or latency, as the decision making workload increased, as the memory load made information retrieval more difficult, and perhaps as the general cognitive load prevented the person from forming well documented expectancies based on previous experience. In fact, each of these results has been seen in basic laboratory settings as workload was increased. Some specific applied techniques for obtaining the P300 in workload studies are discussed below. P300 to Primary Task%. It is possible to assess the workload of certain tasks by determining the P300 elicited by the task itself. In this context, the memory scanning has been used in paradigm suggested by Sternberg ( 1 9 6 9 ) several studies to demonstrate the sensitivity of the transient evoked response. In the Sternberg task, subjects are required to memorize a set of stimuli (usually letters of the alphabet). Probe stimuli are then presented, and subjects must indicate whether the probe stimuli are members of the memorized set. By manipulating the number of stimuli in the memorized set, the memory workload can be controlled. Behaviorally, this paradigm produces a function in which reaction time increases with memory load. Sternberg interprets the slope of this function as indexing the memory scanning and identification capacity of the subject, whereas the intercept of the function indicates the perceptual input and motor output time. Gomer, Spicuzza & O’Donnell (1976) obtained visual evoked responses to the probe items, as one to six letters were held in memory. The P300 latency showed a linear increase as a function of the number of items in memory. In addition, the linear component of this function was greater than that of the reaction time measure, and therefore represented a more consistent measure of cognitive workload. P300 appears able to index memory scanninp workload, at least in this memory load range. Presumably, in view of the evidence relating P300 latency to the time taken to evaluate a stimulus, this sensitivity is due to the increased difficulty of memory scanning and decision as workload increases. practical technique could be developed which would utilize the primary task of interest as the stimulus which generates the evoked response. The stimuli would then be meaningful discrete items which occur as a natural part of the subject’s usual task. For instance, in an aircraft simulation, the appearance of visual stimuli which require a response dependent on memory (e.g., a selection of data from a multi-function display) could be used to trigger an evoked response. The amplitude or latency of the P300 could then be used to index the workload of that particular multifunction display. This approach has been used by Biferno A
I0
G.F. Wilson and R.D. O’DonneN
(1985a: 1985b) in a simulated piloting experiment in which EP’s were recorded in response to the pilot’s “call sign“. Differences in the EP’a were found to depend upon the level of difficulty of the simulated flying task. This procedure has the advantage of being relatively non-obtrusive and transparent to the operator, and could provide extremely specific and highly diagnostic workload data on the task. However, it must be pointed out that complete data bearing on sensitivity of this approach, and even its validity at higher memory loads, is still lacking. Thus, it can only be recommended with considerable caution, although it appears to be a reasonable experimental technique which offers advantages well worth the required cautions. P300 to Secondary Task.@. It would be very desirable if a physiological analog to the secondary task paradigm for workload assessment could be developed. As noted in other chapters of this volume, secundary task paradigms present several advanthpes to the workload researcher. They can be very diagnostic and, with proper control, can be extremely sensitive. Ideally, a physiological analog would utilize a stimulus which, while it was not really an intrinsic part of the primary task, would be relatively non-obtrusive and could be integrated easily into the primary task.
Gopher and Donchin (1986) have described a technique which satisfies these criteria. This procedure utilizes stimuli of at least two classes. One class is presented frequently and the other less frequently. If the situation is set up so that the subject attends to (e.g., counts) either class of stimuli, then the P300 elicited by the rare class of stimuli shows some consistent relationships with workload (Donchin, 1981). This technique has been termed the “oddball“ paradigm, and has been used in several laboratory and simulator studies of workload. In one study, a visual tracking task was used in which subjects were required to track in either one or two dimensions (Wickens, Isreal, and Donchin, 1977; Isreal, Wickens, and Donchin, 1979). An auditory oddball task was presented simultaneously a6 a secondary task, and the P300 showed a clear reduction from baseline conditions with the imposition of either tracking task. The pattern of results suggested that the P300 may not be sensitive to the response load, but may be specific to the perceptual/central processing load of the task. A eubsequent experiment confirmed this hypothesis. The perceptual workload of a display monitoring task was manipulated without requiring different response loads (Israel, Wickens, Chesney, and Donchin, 1980). Subjects were required to monitor various numbers of targets and detect changes in their course. An auditory oddball response was obtained during the task performance. In this case, P300 amplitude was monotonically related to the number of elements to be monitored. These results have been confirmed in a number of subsequent studies. In fact, the P300 has even shown enhanced sensitivity to the number of dimensions to be tracked when a
Measurement of Operator Workload
71
multidimensional (second order) system was used. This was interpreted to mean that in such systems, the perceptual/central processing component of the task is so overloaded that it interacts with the motor load of the tracking task (Sirevaag, Kramer, Coles, and Donchin, 1984). It was suggested that such results could be used to assist in determining the optimal load level of the various resources for tasks involving complex tracking. Natani and Gomer (1981) used an embedded form of this task in a low-fidelity simulation, and reported significant differences attributable to workload. Kramer, Wickens and Donchin (1985) reported using a visual oddball paradigm (intensification of a visual stimulus) in conjunction with a visual tracking task. They found significant effects on the P300 as a function of the workload of the task, as determined by both the types of tracking (acquisition vs. alignment) and the control order (first vs second order). An aspect of this study worthy of note is that a visual stimulus was used with a visual primary task. This finding, added to previous evidence that the oddball paradigm works between modalities (auditory oddball with visual primary task), indicates that the paradigm has great flexibility with respect to inter- and intra-modality use. In another application of the oddball technique an F16 simulator was used by Thiessen, Lay, and Stern (1986) to provide different levels of pilot workload, and the P300 was recorded. It was found that even at moderate levels of workload the pilots ignored the auditory oddball stimuli, and no P300’s were obtained. Yet, in another simulator study by the same authors, significant differences were found. Electronic warfare officers were employed as subjects and visual stimuli were used in the oddball test. The subject’s task was primarily visual and was located inside of the cockpit. In this case, P300’s were found to vary with the level of workload of the primary task. Similarly, Kramer, Sirevaag and Braune (1987) have reported decreases in P300 amplitude as a function of workload in simulated instrument aircraft flights. Subjects flew two flights with different degrees of difficulty. The more difficult flight was associated with increased deviations in performance, increased subjective estimates of workload, and decreased P300 amplitudes. While the P300 in general, and the oddball paradigm in particular, appears to hold great promise for becoming a standard workload assessment technique, some theoretical cautions are important. It is reasonable to postulate that the reason the technique is valid as a workload measure is that it taps any interference with a subject’s normal ability to establish a pattern or expectancy based on immediate past experience. As the primary task becomes more “loading“ (i.e., occupies more of the central processing resources) there are fewer resources left to develop a set of expectancies based on the pattern of stimuli in the oddball task as has been demonstrated by Wickens, et a1
12
G.F. Wilson and R.D. O'Donnell
(1983). It is then critical to assure that the importance of the oddball task remain constant throughout the experiment. If this importance to the person fluctuates from causes other than workload, results may be contaminated. The embedding of stimuli within the primary task can be used to assure such relevance, but not guarantee it. Wickens et a1 (1977) have discussed this problem, and have pointed out an additional difficulty. Since the P300 is typically based on an ensemble average, moment-to-moment fluctuations in the workload of a task may cause unwanted variability in the average and reduce its sensitivity. Single trial evaluations may reduce this problem, but these techniques have yet to demonstrate their total reliability and sensitivity. Until such techniques are validated, the above factors must be controlled experimentally. It is probably safe to recommend this oddball paradigm for use as a non-obtrusive, secondary task workload assessment technique in relatively simple environments. It is appropriate where the goal is to assess the central processing load of a task. In some cases, it can be used in the same form as was used in the laboratory, where a separate auditory oddball task is presented while the operator is performing a visual primary task. The subject would be instructed to attend to the auditory tone, and to perform some response to it which would assure attention (e.g, count the tones, respond after 5 tones, etc.). Alternatively, if it is desirable to have the task completely embedded within the primary task, signals or other stimuli which are a normal part of the primary task can be used to elicit the P300, much as was described under the primary task section above. If these are structured 80 that they mimic the oddball paradigm (two classes of stimuli with one less frequent than the other) then the data should be interpretable in the same way as the laboratory results. -"Probe" Technique,. In this procedure single tones or flashes are presented to the subject while they are performing a primary task. These probe stimuli are not attended to, nor is the subject required to keep track of them. In this way they are critically different from the stimuli in the oddball paradigm. They are presented at a comfortable intensity, and occur randomly during the primary task. The procedure has been used successfully to determine hemispheric involvement in various tasks (see Papanicoloau and Johnstone, 1984, for a review). It has been reported that the EP'E to these "probes" decrease in amplitude in the hemisphere primarily responsible for processing the stimulus information (i.e., left hemisphere during a language task). Bauer, Coldstein and Stern (1987) have also demonstrated that the amplitude of the probe EP change as a function of the number of memorized items in the Sternberg task. Probe stimuli were presented at fixed times during the processing of the stimuli in the Sternberg experiment. The amplitude of the probe EP's varied as a function of when the probes were presented, and also with the size of the memory set. However, it appears that the probe stimuli must be presented
Measurement of Operator Workload
73
during times when the subject is actively processing the primary task stimuli. Wilson, McCloskey and Davis ( 1 9 8 6 ) presented probe stimuli while subjects performed a linguistic processing task. The primary task stimuli were discrete and appeared every several seconds, and the probe stimuli appeared at random intervals that were independent of the linguistic stimuli. They did not find any changes in the probe EPs as a function of the difficulty level of the primary task; this was attributed to the non-overlapping nature of the discrete stimuli. The probe did not occur during times of actual primary task processing often enough to produce any effects. (1987), using a spatial processing task, McCloskey presented probe stimuli that were time locked to the primary task stimuli. The probe stimuli followed the primary task stimuli by 2 5 0 msec or 7 5 0 msec. Significant effects on the probe EPs were reported that were related to the difficulty of the primary task. The AAMRL Magnetoencephalograph (MEG) laboratory has reported that magnetically recorded auditory probe responses were significantly reduced in the left hemisphere while subjects performed a linguistic monitoring task. They were instructed to detect and keep a count of the number of times a particular syllable was heard while listening to a recording in Greek. Auditory tones were superimposed on the Greek and the evoked fields were recorded. (Wilson, et al, 1987). These results suggest that the probe technique may be a valid and sensitive workload assessment technique if the timing of the probe is controlled carefully and if the subject maintains a constant level of workload during the time that the probe EPs are collected. Steady .-
State Evoked Resvonses (SSER)..
The visual/neural system can respond to stimuli which are presented at a rate higher than those used for transient responses. At stimulus rates faster than approximately four or five per second, the response to single stimuli are no longer discernible. The succeeding stimuli arrive before the response to the preceding one is completed. With visual stimuli, the response is a sinusoid at the frequency of the stimulus and its harmonics. Several seconds of stimulation are required for the brain’s response to stabilize and achieve the steady state condition (Spekreijse, 1 9 6 6 ; Regan 1966; 1972). The stimuli may be patterned, such a s checkerboards or sine wave gratings, or they may be unpatterned. Responses to patterned stimuli seem to be while restricted to flicker rates up to about 2 0 Hz, responses to unpatterned stimuli have been recorded to flicker rates well above flicker fusion (Regan, 1 9 7 7 ; Moiae 1980).
On the basis of amplitude and phase lag four distinct response ranges have been isolated. The low range includes the frequencies between 7 Hz and 13 Hz; the medium range is
14
G.F. Wilson and R.D. O'DonneN
from approximately 14 Hz to 25 Hz; an intermediate range extends from 30 to 4 0 Hz, and the high range is from 40 Hz to 59 Hz. (Regan 1972; Wilson and Ritter, 1987). The amplitudes and apparent latency of the SSER responses decrease from the low to high ranges. Apparent latency is derived from the phase lag between the input waveform and the brain's response. If the phase lags are plotted as a function of frequency, a linear relationship is observed. The slope of the linear regression line to the phase lag data is divided by 360 degrees to arrive at an estimate of the latency of transmission through the visual system (Regan, 1972). Since the nervous system is able to resolve individual frequencies in a complex waveform, it is possible to arrive at an estimate of the apparent latency very frequencies can be mixed quickly. Three or more conveniently and used to modulate the stimulus lamps. A single 20 to 30 second epoch can then be used to derive an estimate of the conduction latency (Regan, 1976). The SSER has been found to be correlated with performance in the Sternberg memory scanning task. A significant correlation exists between subject's apparent latency of transmission in their high frequency SSER and the speed of the input/output aspect of this task. Subjects with shorter transmission times of the SSER also showed faster input/output components of their reaction time data. Further, the medium range transmission times were found to be correlated across subjects with the speed of the memory scanning aspect of the task. Subjects with shorter conduction times of the medium range SSER also scanned through the list of memory items faster. (Wilson & O'Donnell, 1986). Thus, the SSER can be used to study differences between subjects and to predict performance in certain tasks. These findings suggest that the unpatterned SSER is related to several processing mechanisms in the brain, and that one must test the SSER in the various frequency ranges with a variety of tasks which use or emphasize different aspects of the human information processing system. As an example of this, Wilson and Heinrich (1987) found changes in the SSER phase lags in the medium frequency range 'as subjects performed an easy or difficult tracking task. This was not true of the SSER in the other frequency ranges, or during the performance of probability monitoring or mental math tasks, which tapped different aspects of the information processing system. The SSER response to unpatterned stimuli has been proposed as a candidate measure of operator workload (O'Donnell, 1979). The unpatterned SSER would be especially useful since the apparent latency can be estimated in approximately 20 seconds, which is much shorter than many other methods of estimating workload effects. Further, since stimulus frequencies above flicker fusion can be used to produce SSERs, it may be possible to have a nonintrusive measure of workload.
Measurement of Operator Workload
15
The data supporting the SSER as a measure of workload are still sketchy. It has been shown to be sensitive to drugs, (Rizzuto, et all 1985, Rizzuto, 1 9 8 5 ) and fatigue (Purvis, et al., 1 9 8 4 ) . There are indications that it can be used to distinguish between a resting level and performance on a task. However, recent unpublished data from the WrightPatterson AFB laboratory suggest that the SSER may not be a sensitive indicator of different levels of workload. Since not all frequency ranges have been tested with a wide range of tasks, this question is still unanswered. More research obviously needs to be accomplished to determine the usefulness of the unpatterned SSER to workload measurement. Due to the ease of application, its low intrusiveness and speed of collection it deserves further attention. Even if it is found to be applicable in a small number of situations, its positive features make it a good candidate for inclusion into the human factors practitioner’s battery of tests. BrainStem Bvoked.Res!onseL The brain stem evoked response (BSER) consists of seven waves that occur during the 10 msec following a click. The latency of each wave is very stable, and each has been associated with generators in the brain stem. The putative generators of wave I through VII are the acoustic nerve, cochlear nuclei, superior olive, lateral lemniscus, inferior oolliculus, medial geniculate and the thalamocortical radiations. The click stimuli are presented at a rate between about 5 and 30 Hz, and typically the averaged responses to 2000 or more stimuli are recorded. The stability of the BSER has made it very useful in clinical medicine, where it is used to detect and localize various lesions in the brainstem. Delayed latency of a few tenths of a millisecond can be clinically significant. Since each wave is associated with a particular brain structure, the location of a lesion or other problem can often be determined. Current thinking holds that the BSER is insensitive to cognitive cortical activity. If this is true, it would Beem that the utility of the BSER in non-clinical applications would be limited to assessing drug effects. However, recent findings suggest that the BSER may be sensitive to at least some higher cortical functions. Wave VI in particular has been found to increase in latency during cognitive activity. The latency shift was noted between a pretest baseline and BSER recorded when the subjects were performing three levels of difficulty of a grammatical reasoning task. (Gilliland, et all 1 9 8 4 ) . Further, there were no differences between the BSERs collected during performance of the various levels of the task and the BSER taken 5 minutes after the task. This suggests that cognitive activity of this type results in longer latency wave VI components, and that recovery to baseline must be a long-term process. Unpublished data from research in the Air Force laboratory, (also obtained by
16
G. E Wilson and R. D. O’Donnell
Gilliland) showed that it took 4 0 minutes of subject inactivity for the latency to return to its pre-task latency. These results need replication and extension to determine the strength of the effect and how general it is in other tasks. It may be that the BSER can be used as a measure of attention, fatigue or general level of arousal. MEASURES OF HEART RATE The heart beat is a relatively easy and unobtrusive measure to obtain. The basic ‘QRS’ complex is a large biological signal, and there is typically little electrical noise with which to contend, The time between beats is calculated as the interbeat interval, and can be converted to beats-perminute. Generally, increased heart rates (HR) are associated with increased levels of workload. Such increases have been found to vary with workload in pilots flying regular airline flights (Ruffel Smith, 1 9 6 7 ) , during landings with different gradients of approach (Roscoe, 1 9 7 6 ) , during instrument landing approaches (Hasbrook and Rasmusser, 1 9 7 0 ) , and during a variety of mission segments in experimental aircraft (Roscoe, 1 9 7 6 ) . Simulator missions have also been found to produce changes in heart rate as a function of task workload (Opmeer and Kral, 1 9 7 3 ; Lindholm and Cheatum, 1 9 8 3 ; and Lindgvist, et al, 1 9 8 3 ) . Further, heart rate was one of the measures used in the certification of workload levels for the BAC 1 4 6 aircraft (Roscoe, 1 9 8 4 ) and workload measurement in the development of the airbus A 3 1 0 (Speyer, et al, 1 9 8 7 ) . However, not all investigators report consistent findings of heart rate changes with differences in workload (Wierwille and Connor, 1 9 8 3 ; Wierwille, Rahimi and Casali, 1 9 8 5 ) . This inconsistency of results has caused a number of investigators to abandon simple heart rate measures and to look instead at the v a r i a b i u of the heart rate as a possible measure of cognitive workload. A number of methods for calculating heart rate variability have been proposed. Opmeer ( 1 9 7 3 ) reports 2 6 different methods. A number of these measures give contradictory results, which leads to a great deal of confusion for the investigator wishing to use this method. An entire issue of &nonomj-, was devoted to this topic ( 1 9 7 3 , 15, 1 - 1 1 2 ) . Van Dellen, et al, ( 1 9 8 5 ) have compared a number of these methods with one another, and to a newer method which uses spectral analysis techniques. They reported that, for the most part, the older methods did not correlate with one another, nor with the levels of difficulty of cognitive tasks. However, the results of the spectral analysis technique were correlated with cognitive task difficulty.
Using spectral analyses, several investigators have found three components of the heart rate variability that have been associated with different biological control mechanisms. The lowest, centered at about 0 . 0 3 - 0 . 0 6 Hz, has been related to temperature regulation mechanisms. The middle component, at approximately 0.07 HZ to 0 . 1 4 Hz., is
Measurement of Operator Workload
I1
believed to be associated with blood pressure regulation, while the third, in the range of 0.15 - 0.50 Hz, is due to the respiration effects upon the heart rate (respiratory The sinus arrhythmia) (Hyndman Kitney and Sayers, 1971). middle component, 0.07-0.14 Hz, has been shown to vary with the mental workload of a task. The power in this component decreases with increased workload, which means that heart rate variability decreases under high work load levels (Mulder, 1979; Mulder and Mulder, 1980; Aasmam, Mulder and Mulder, 1987; Vincente, Thornton and Moray, 1987). Heart rate measures therefore appear to be good candidates for investigating workload. However, it is clear that further research must be done to determine the best method to be used, and the situations which are appropriate for their use. The strength of heart rate variability as a measure may be in its specificity to measure particular effects. Heart rate may be a good measure of general arousal or physical work, but may not yield information about other variables such as mental workload. Heart rate variability may be useful as an index of specific levels or types of mental workload. Several labs are currently evaluating methods of determining heart rate variability to ascertain the appropriate use of each one. Baaed upon the results of this evaluation, new methods of determining variability may be recommended as standard workload measures. EYE BLINK MEASURES Since most of our information about the world comes to us through our eyes, their functioning can tell us a great deal about operator state. Embryologically, the eye is an extension of the brain, and performs a great deal of processing of visual information before this information A number of reaches the brain (Ranson and Clark, 1959). eye-related variables are candidates for measures of workload, including: eye point of regard, eye ‘movements, electroretinogram, pupil size, and eye blinks. We have used eye point of regard in one study to determine the pattern of eye and head movements in emergency situations in a single seat aircraft simulator. Our purpose was to evaluate the cockpit layout, as it impacted quick reactions to control system malfunctions which included visually locating an emergency panel and making a A determination as to the appropriate course of action. head mounted occulometer was used to measure eye point of regard inside the simulator cockpit, or outside the simulator during target acquisition and weapon delivery6 Using these methods it was demonstrated that the layout of the instruments and actuator panels were not a problem when these emergencies arose. The occulometer and performance data showed that the pilot’s reactions to the emergencies were very fast and accurate (Wilson, O’Donnell, & Wilson, 1983 ) However , the head mounted occulometer, while useful proved to be a bit cumbersome to use and calibrate.
.
G.F. Wilson and R.D. O'Donnell
I8
The type of eye-related measure which has shown great promise, and which was included in the Air Force NWTB, is the eye blink. Its measurement is easy to implement, and it has been shown to be a useful measure in several situations. Laboratory studies have demonstrated that tasks requiring attention, especially visual attention, are associated with fewer blinks and shorter duration blinks (Coldstein, et all 1 9 8 5 , Bauer, et all 1 9 8 5 ; see Stern, Walrath and Coldstein, 1 9 8 4 , for an overview of eye blink theory). Eye blink measures have been used in a limited number of applied studies. For example, in a 4 . 5 to 5 hour aircraft simulation with pilot and copilot, Stern & Skelly ( 1 9 8 4 ) report that the pilot in charge of the "aircraft" exhibited fewer blinks, and that the blinks were of shorter duration than those of the noncontrolling pilot. When the pilot and copilot switched roles (i.e., when the copilot was "flying") the pattern was reversed. The copilot blinked less often and with shorter duration blinks when he was flying than when he was acting as copilot. Further, blink rate was lower and exhibited shorter durations in visual vs. auditory segments of the experiment. These effects were superimposed on an overall increase in blink rate and Data such as duration over the 4 . 5 hr to 5 hr mission. these indicate that eye blinks can be used to show not only attentional effects but also changes due to operator fatigue
.
COMBINED PHYSIOLOGICAL, PERFORMANCE AND SUBJECTIVE MEASURES. Several studies have been carried out at AAMRL, in which simultaneous physiological, performance and/or subjective data were collected from subjects in situations involving several levels of workload. In three studies, information processing tasks (Shingledecker, 1 9 8 4 ) provided three levels of central processing difficulty. Wilson, MCloskey and Davis ( 1 9 8 6 ; 1 9 8 7 ) examined changes in the E P s , HR, eyeblink, reaction times, error scores and subjective estimates of difficulty in linguistic processing task having different levels of demand. It was found that two components of the EP (P200 and P300) varisd as a function of task difficulty. HR and eyeblink measures were not significantly different, even though reaction times and subjective estimates of workload were significantly affected by task difficulty. The failure of HR to show significant effects may be explained by the fact that the subjects were highly practiced, and were very familiar with the difficulty levels of the task. Yolton, et all ( 1 9 8 7 ) used the same measures with the mathematical reasoning task of the CTS (Shingledecker, 1984). Their results were basically the same. The EPs varied as a function of workload, while the HR and eyeblink did not. McCloskey ( 1 9 8 7 ) repeated the procedures using the spatial processing task of the CTS. Again, changes in the
Measurement of Operator Workload
19
EP with task workload were seen, but there were no significant effects in the HR and eyeblink measures except for a decline in HR over time. Reaction times and subjective measures were significant in all three studies. These results demonstrate the value in making simultaneous multiple physiological measures. While one measure may show changes in response to certain workload manipulations, others may not. Further, this information has great value when applying these techniques in other environments. Multiple physiological measures have also been found useful in a study in which EEG, HR and eyeblinks were monitored while pilots flew 90 minute missions in A7 aircraft and an A7 simulator (Wilson, et all 1987; Skelly, Purvis and Wilson, 1987). Each pilot flew the same mission in the lead or wing position of a four ship formation. They also flew a similar mission in an A7 simulator. More difficult segments of the mission were associated with higher HR, fewer blinks, and increased EEG activity in the aircraft and the A7 simulator. Further, the lead position was associated with the same basic pattern of mission segment results when compared to wing position. The simulator flights were lowest in HR, highest in blinks and showed lower levels of EEG arousal. Two emergency incidents were recorded during the in-flight recording portion of this study. Both were associated with a 50 per cent increase in HR, decreased HR variability, but no reliable changes in blink activity or EEG epoch analysis. These data point out the fact that physiological measures are useful indices of pilot workload in actual flight conditions, and can be used to compare flight vs simulator missions. In addition to individual indices, the patterns of change in multiple physiological measures can also be important in the evaluation of workload in these situations. THE NEUROPSYCHOLOGICAL WORKLOAD TEST BATTERY (NWTB) The need for a psychophysiological battery of workload tests seemed obvious to the U.S. Air Force and, based upon the existing literature, AAMRL undertook the task of developing such a battery beginning in 1 9 7 9 . The product of this effort is the Neuropsychological Workload Test Battery (NWTB). The NWTB was designed to operate in laboratory and simulator settings. It was to be as flexible as possible in terms of the number of tests available and ease of use by the operator. These tests were selected from those in the literature which had been used to evaluate workload, or ones which seemed to be useful to assess workload and other operator states. The operator interaction with the NWTB was designed to be as easy as possible. In order to foster this simplicity, it was decided to limit user options to a minimum. This would avoid confusion and help to standardize testing. The NWTB is a computer based physiological test system that currently has 13 different tests. It is shown in Figure 1. The central processing unit is a PDP 11/73 with 128 K bytes
80
G.F. Wilson and R. D. O'Donnell
of memory. This amount of memory has proven sufficient Only 5 since there is a 30 megabyte disk for storage. megabytes are required for program storage, leaving 25 megabytes of storage capacity for long term retention of the data. There is a 10 megabyte removable disk which can be used to store data, and which allows rapid changing of disks so data collection can continue when one 10 megabyte disk is filled. Since analog physiological signals are to be processed, eight channels of analog to digital conversion are provided. The multiplexer permits 4 gain level settings for each channel. This feature permits the simultaneous digitization of data having different amplitudes, since each input channel can be independently adjusted. For convenience, and to reduce the opportunity for error, A/D channel assignments are set such that the first three are for EEG, the fourth EOG, the fifth EMG, and the eighth ECG. Channel 6 is currently not used, and channel 7 is used for A set of specially joystick input in the tracking task. designed amplifiers are provided which have computer controlled gain and filter characteristics for each channel. Commercially available biological amplifiers may also be A calibration routine is used to establish gain used. settings for each amplifier.
Figure 1 Neuropsychological Workload Test Battery. The disk is in the upper left corner and the printer/plotter in the lower left corner. The head phones are used for auditory stimuli. Two of the four digital channels are used to record subject response switch closures so that yes-no type responees may be collected. The output ports are used to provide synchronization pulses that coincide with the beginning of individual trials in the tests using transient stimuli. These synch pulses can be recorded on FM tape recorders for back up, sent to other analysis devices, or even used to
Measurement of Operator Workload
81
trigger peripheral devices which provide nonstandard NWTB stimuli. A block diagram of the components of the NWTB is presented in Figure 2 . The stimulus presentation devices consist of a video monitor, fluorescent tube unit, voice synthesizer and headphones. Alpha-numeric and other graphic stimuli are displayed on the video monitor. The scan of the video A display is synchronized with the onset of digitizing. light box contains two fluorescent lamps whose intensities are modulated by a sinewave input from a separate driver device. Flicker of these lights can be a single frequency, or the sum of two to four sinewaves whose intensity and modulation depth are controlled from the driver unit. A voice synthesizer is used to generate letters, numbers and words that are presented to the subjects via head phones. Tone stimuli in the auditory oddball, and clicks for the brain stem evoked response task, are also presented via the head phones.
COMPONENTS OF THE NWTB
-
LSI 11/73 SYSTEM I
I
I
I
SUBJECT
I
8 A/D CHANNELS LAMP DRIVER AND LAMPS
2 D/A CHANNELS 4 DIGITAL 110
RESPONSE
Figure 2 Block diagram of the major components of the NWTB. Experimental parameter selection and data display are aocomplished with'a graphics terminal. The operator chooses the tests to be used and determines the parameters for each test using this graphics terminal. Results, in the form of average curves, etc. are displayed on this terminal. A printer/plotter is used to provide a hard copy of the information displayed on the graphics terminal. This provides a permanent copy of the data, in addition to the stored form on the disk. The software is programmed in FORTRAN, using the RT 11 operating system. The software is user friendly in that
G. F. Wilsori arid R. D. O'Donnell
82
operator options have been kept to a necessary minimum. Upon installing the System, the operator has the option of analyzing previously collected data, calibrating the system, or selecting and running tests. The heart rate and eye blink tests can be used simultaneously with any of the evoked potential tests except the brain stem test. This enables the operator to measure central and peripheral nervous system activity simultaneously. The tests in the battery are as follows: 1. 2. 3. 4. 5. 6. 7.
8. 9.
10. 11. 12. 13.
Odd-ball task, (auditory and visual forms) Memory scanning task (auditory and visual forms) Continuous performance task Flash evoked response Monitoring task Tracking task with evoked response to cursor Auditory brain stem response Checkerboard evoked response Sine wave grating evoked response Unpatterned steady state evoked response Electrocardiograph Electrooculograph Electromyograph
Each test has its own menu of options to chose from and each option is pre-set to a default value. When options are changed, the new information is saved onto a disk file so that these values are in force the next time the test is used during a given session.
Odd,--!"@?-1.-.. TS.Et' The odd-ball test used in the battery is capable of presenting either auditory or visual stimuli. The auditory stimuli are in the form of pure tones whose frequencies are determined by the operator. The visual stimuli are squares of two different sizes presented on the video monitor. The presentation and analysis in the visual form of this test is identical to that of the auditory form. The probability of occurrence of the rare event can be lo%, 2 0 X , 30% or 40% of the total number of trials, with an operator option to vary For example, 20% (plus or minus 5 % ) of each of these by 5%. the tones could be chosen as the "rare" category. This is useful in order to prevent the subjects from learning the total number of rare stimuli in the sequence. The audio rare event parameter selection menu can be seen in Figure 3 . The stimuli are presented randomly, with the restriction that a rare stimulus can not immediately follow another (no A s an alternative to asking the "strings" are allowed). subjects to count the rare stimuli, key presses may be required in response to each stimulus. In the case of key presses to each rare stimulus, the reaction times and error scores are recorded. The length of each blook of trials may be determined by the total number of rare events presented, or by specifying the total length of time that the test is to run.
Measurement of Operator Workload
83
AUDIO RARE EVENT MONITORING 1 FREOOF RARETONE(500.20a)HZ) 2 FREO OF COMMON TONE (500.2000HZ) 3 INTERSTIMULUS INTERVAL(SECJ 4 PROB OF RARE EVENT (10,20,30,401 5 FIXEDPROB FORRARETONE(Y/N) 6 DURATION OFTEST(1-30 MIN) 7 R E 0 N O O F RAREEVENTS(0-100) 8 REJECTION THRESHOLD(E0G 0.2048) 9 REJECTION THRESHOLD(EMG0.2048) 10 TONE INTENSITY (1-10) 11 EXTERNALTRIGGER(Y/N) 12 SAVE RAW DATA?(Y/N)
1500
1200 2
20 Y 3
5 262 2048 5 N Y
21 E N 0 22 ABORT ENTER PARAMETER NO, SPACE, AND NEW PARAMETER
Figure 3 NWTB menu for the auditory odd ball test. Evoked potential averages for the rare and frequent stimuli are separately determined and displayed. The EOG and BMG channels are also averaged for the rare and frequent stimuli. A 150 msec baseline prior to stimulus onset is used to adjust each average to a zero voltage base line. Since 1000 msec of data are digitized, at a 200 Hz rate, 850 msec of post stimulus response are used. The program finds the P300 by determining the most positive post-stimulus point, labels it on the waveform displayed on the terminal screen, and indicates its latency, peak amplitude and an area measure in the region of the P300. The averages calculated for both frequent and rare stimuli are displayed side by side on the screen, or they may be displayed separately. An example of this display is presented in Figure 4 .
.
gq
Ez.
Frequent
Rare
Ia
I I
IS 13 11
I I
9
1
7
1
.
v -3 5 -150
85s
-158
85B
Ruw52
Figure 4 EP’s from the auditory odd-ball test, note the larger P300 component to the rare stimuli. Amplitude in microvolts and latency in milliseconds
84
G.F. Wilson and R.D. O'Donnell
If filtering is desired, a box car filter is used to smooth the waveform. A movable cursor can be positioned anywhere on either EP waveform, and the latency and amplitude at that point is displayed. This permits one to measure peaks other than P300. The displayed waveforms may be plotted on the printer/plotter. Since each data file has its own unique label, it can be easily retrieved at a later time for further analysis. The digitized single trial EEC can be retained on the disk and/or only the averaged waveform can be saved. EOC and EMG signals that are digitized simultaneously with the EEG signals are used to reject trials from being included in the ensemble average. These can also be saved and/or averaged. The actual value of the EOQ and/or EMQ signal that determines rejection is selectable by the operator. Since the responses to single trials are saved, it is possible to change the rejection values and re-average the data based upon these new values. Single trial records can be scanned on the terminal and each trial can be manually selected to be included in the average or rejected. In this and most other tests, an external synchronization signal may be used to start data collection on each trial. This feature is useful if another device such as a simulator or other computer is providing the stimuli, or if the data have been recorded on analog tape. In effect, this mode transforms the battery into a general purpose averaging device, with two categories of EP's, which can be used for a variety of other physiological data recording paradigms. A calibration routine is used at the beginning of each day's
testing. Sine wave signals of known amplitude and frequency are introduced into the system at each amplifier input. For example, 20 microvolt, lOHz sinewaves for each EEQ channel are recorded. These signals are then used for each channel to determine the amplitude scales for the averaged data display and print out.
Elemerx8s.axE4ns.r-c.!3t The memory scanning test uses the Sternberg paradigm (Sternberg, 1969) with fixed memory set ("M-SET") sizes of one, three and five letters. The stimuli that make up the M-SET are displayed on the screen so that the subject can memorize them. Then, individual letters are presented on the subject's video monitor or by voice synthesizer, one at a time. For all set sizes, 50% of the stimuli are those of the memorized set and the rest are non-set letters. The subject responds as quickly as possible to indicate if the stimuli was a member of the M-SET (positive set items) or was not (negative set items). Reaction times and error scores are recorded for each category. The evoked reaponae from up to three EEQ channels is digitized and included in the ensemble average for positive and negative sets separately. The data from each trial may be saved and/or
Measurement of Operator Workload
85
only the average may be saved. EOC and EMG signals can be used to reject any trial containing significant artifact, as described above. The total number of stimuli or the duration of the test can be used to determine the length of a block of trials. The averaged EEO data for the positive and negative set trials is displayed at the end of each block, along with the reaction times and error scores The display, filtering, cursor, and plotting routines are the same as for the odd-ball task.
.
Continuous Performance Test The continuous performance test (Friedman, Vaughan and Erlenmeyer-Kimling, 1981) taps both short and long term memory. There are three levels of this task. In the first level (which is analogous to the Sternberg M-SET, 1 condition) the subject is to detect the occurrence of a particular letter of the alphabet. The subject is told before the test which letter is to be responded to as the positive item. Individual letters are then presented on the video monitor and the subject is to respond by pressing one of two keys to indicate if that letter was or was not the positive item. EPs, reaction times and error scores are recorded. Fifty percent of the presented stimuli are the positive item while the remaining stimuli are other letters. In the second level, which uses short term memory, the subject is told to respond by pressing a key when any letter immediately follows itself. Any letter can be the target letter, and the subject has to remember each item until the next one is presented in order to determine whether or not it was repeated. In this task 60% of the items are repeated. The evoked response to the repeated and nonrepeated stimuli are recorded, as are the reaction times and error scores. In the third level, the subject is instructed to look for the occurrence of a particular letter which immediately follows another particular letter. For example, they are told to respond only when the letter B is presented following the letter M. Evoked responses, reaction times and error scores are collected to these stimuli. The evoked potentials to the positive stimuli and to the negative stimuli are displayed at the end of each block. The display, analyses and plotting routine are the same as the previous tests. Flash Evoked Resoonse On this task, a square is presented at the center of the subjects’ video screen. Artifact rejection routines as used in the other tasks are in force. In this simple reaction time task, the subject is only required to detect the stimulus and respond with a button press to each one. The number of trials is preset or is determined by the length of time of data collection. Evoked potentials from up to three channels of EEG are recorded along with the averaged eye and
86
G.F. Wilson and R.D. O’Donnell
muscle activity. The EP’s and reaction times are displayed at the end of the block of trials. Data display and plotting is the same as the previously described tests. Monitoring Task During the monitoring task the subject watches a visual display in which triangular and square shapes move across the screen. In one form of the test, the subject is required to detect when a designated shape changes direction in its travel across the screen. Most of the time the shapes enter the screen and continue in a straight line across the screen. The subject is to detect and press a key when a change of course is detected in the target shape. In the other form, the subject is required to detect and respond to the increase in intensity of the target shape. The targets enter the screen from anywhere on the periphery of the screen in a random order. The difficulty of the task is varied by increasing the number of objects on the screen at any one time from 4 to 8 . These target events occur 5 0 % of the time during a block of trials. That is, of all of the course changes or intensifications during a block of trials, 50% would be of the target. The number of trials is determined by the total block length. The evoked response to the target changes are averaged and may be displayed at EOG and EMG data may be used to the end of the block. reject trials containing artifacts. The EOG and EMQ data are also averaged and displayed. The analysis, display, and plotting are the same as for the previously described tasks.
Track.ih&. T-ai!ik The tracking task is designed to provide both EP and performance scores while subjects are engaged in a compensatory tracking task. The subject’s goal is to keep the moving cursor in the center of the video screen by manipulating a joy stick. The difficulty of this task is determined by the gain on the feedback to the controlling program (the lambda level). RMS error and the number of off-screen excursions are recorded. EPs are derived from brief offsets of the tracking cursor. This brief turning off of the cursor is sufficient to elicit an evoked response but does not interfere with the subject’s performance. The cursor is turned off for approximately 2 0 0 msec during the performance of the task. A s with the other transient evoked response tasks, the digitized data from each trial can be retained or not. EOG and EMG levels are used to reject trials containing artifact contamination. The average is displayed and the P300 amplitude and latency is determined. Filtering, cursor movement and plotting are all accomplished in the same way as with the other transient EP tests, Brain Stem Evoked Response Click stimuli of one msec duration and 66dB(A) intensity can be presented at rates ranging from 5 to 11 per second. Typically 1000 to 2000 stimuli are presented for each
Measurement of Operator Workload
87
average. Rarefaction, compression or both rarefaction and compression clicks may be used as stimuli. Only one channel of EEG is used in this task. The average of 1 0 msec samples is stored on disk and displayed on the graphics terminal. The seven peaks of the BSER are found by the software and their latencies are displayed. Each peak may be identified manually by the operator if necessary. The BSER, with peak latency values, can be plotted as a permanent visual record.
Checkerboarb~~~aZfX..9~.".,.E~~.k,ed.R~~.~O~~.~. In this test a black and white checkerboard pattern is displayed on the subject's video monitor. The black and white checks alternate at either 4 Hz or 7 Hz. The size of the checks can be determined by the operator prior to testing. The length of the analysis epoch can be selected to be either 2 0 0 , 4 0 0 or 6 0 0 msec. The longer epoch i s typically used with the lower frequency stimulation. During averaging, each analysis epoch starts with the onset of movement of the checks on the screen. The number of epochs is also selected by the operator. The averaged responses for up to 3 channels are displayed, one at a time, on the graphics terminal. The first peak and trough are identified and marked on the averaged waveform by the computer, and their latencies and amplitudes are displayed. The latencies and amplitudes of other components can be measured by positioning a movable cursor, from the keyboard. This information, along with the EPs, can be plotted.
s inewave orating_Steed~ ._State E . V ~ . k ~EFe_sDonse.. d This test is essentially the same as the checkerboard evoked response except that a vertically oriented sinewave grating is used as the stimulus. A spatial frequency between 1 and 1 2 5 cycles per screen is selected by the operator, and the display alternates horizontally 180 degrees at either 4 Hz or 7 Hz. The evoked response analysis is the same as for the checkerboard stimulus. The averaged curves and cursors are displayed along with the amplitude and latency values for the first peak and trough. The averages are stored on the disk for later retrieval. UnDatterned Steady State Evoked Potential. In this test flickering lamps are used to evoke a response. In contrast to the two previous tests, the stimulus field is not patterned. In its simplest form the subject fixates on the center of a white 4 0 by 25 cm field. Two fluorescent tubes 1 8 cm long are mounted horizontally 10 cm apart, and are used to flicker this field. The intensity of the lamps is modulated by the sum of up to four sine waves. Three frequency ranges are used; the low range frequencies are 8 , 9 , and 1 2 Hz, the medium range frequencies are 1 4 , 1 7 and 2 0 Hz, and the high range frequencies are 4 2 , 4 6 and 5 0 Hz. A fast fourier transform is used to find the energy at each frequency of the EEG, and from the output of a photocell
G.F. Wilson and R.D. O'Donnell
mounted inside of the light box. Since the actual light input to the visual system is known from the photocell response, and the response from the visual/neural system is known from the BEG, it is possible to calculate the coherence at any flicker frequency and phase lag between these signals. The coherence is used as a criterion for acceptance of the data. High coherence suggests that the brain response is due to the light flicker. The phaee lags are used to calculate the apparent latency of transmission of these signals through the nervous system. Using the phase lags to the three flicker frequencies, linear regression is used to calculate the slope of the best fitting straight line through the three points on the phase lag vs. frequency plot. By dividing this slope by 360 degrees one arrives at an estimate of the apparent latency of transmission (Regan, 1 9 7 2 ) . The plot of the three phase lags and the best fitting straight line are displayed along with the RMS amplitude values, phase lags, and coherence values for each stimulus frequency. This data is stored on disk and may be plotted on the printer/plotter. The operator selects one of the three frequency ranges and the number of two second epochs to be included in the test. The intensity and modulation depth of the stimulus are set on the separate lamp driver. When the test is begun, the lights flicker for 10 seconds before the BEG data collection starts in order to permit the visual system to reach a steady state condition.
For the cardiac test the mean interbeat interval (IBI) and its variance are calculated. The operator specifies the length and number of epochs to be recorded. The data are digitized at a rate of 1000 Hz. Each R wave of the ECG is identified by the software, and the IBI between successive R For each epoch, the mean IBI and its waves is noted. variance are reported, and the mean IBI is converted into mean beats per minute and displayed with the mean IBI and variance. The grand means of these parameters are also calculated and displayed for all of the epochs analyzed. In order to eliminate the effects of muscle and movement artifacts, criteria defining the R wave in terms of slope, amplitude and minimum and maximum IBI are used. Epochs containing "bad beats" which do not meet the current criteria are noted on the operator's terminal. Since the digitized data are still available, the operator can view the epochs and determine that the artifacts indeed occurred. It is also possible to change the acceptance criteria at any time so that BCG data may be correctly accepted or rejected. The ECG data can be collected at the same time that EEG data are collected in the previously discussed tests. EOG and EMG may also be simultaneously collected with the ECQ and evoked response tests. This permits one to determine the effects upon a number of paychophyoiological parameters to the same situation.
Measurement of Operator Workload
W L r A o c u I.eLr-agh
89
.
Eye blinks are recorded from electrodes placed above and below the eye. The data from these electrodes are digitized at a 100 Hz rate in 10 second epochs and stored on the disk for later analysis. The operator can use default parameters for blink identification, or can change them to tailor the selection parameters for the unique blink pattern of a given subject. Once the program determines that a blink has occurred, the maximum closure amplitude is determined and the "half amplitude" (half the closing and opening excursion of the eyelid) is determined. This half amplitude point is used to measure the "half amplitude duration" (the time between the half amplitude point on the closing portion of the blink to the same voltage value on the opening portion). The EOG data are displayed, and the selected blink points are marked. If the operator disagrees with the program, blinks which the program rejected or accepted can be added or deleted. For each blink, the operator selects the number At the end of the of 10 second epochs to be analyzed. analysis the number of blinks, mean closure duration and mean blink interval is displayed and may be saved and/or printed. As with the ECG and EMG tests, this one can be used concurrently with one of the evoked potential tests. Electromuograph. The EMG test has two forms at present. The EMG data is digitized and stored on disk in both cases. The analysis can take the form of the variance about the mean voltage or the centroid frequencies of 4 0 epochs. The centroid frequency analysis is best for situations in which muscle fatigue occurs because of periods of maximal contraction. Since this situation occurs in very few instances involving mental workload it will hot be described here. In the variance procedure, the number of times that the rectified EMG activity exceeds three amplitude standard Epochs with small amplitude EMG deviates is recorded. activity would have almost all counts in the smaller level category while epochs with a great deal of muscle activity will have counts in all categories including the higher levels. Operating Procedures,. A data collection session is begun with calibration of the amplifiers that will be used in the session's testing. This is accomplished by providing calibration signals in the form of sine waves of known amplitude. For example, 2 0 microvolts for the EEC, 100 microvolts for the ECQ and BOG. This is done so that the output graphs and the saved data will all be calibrated. Commercially available amplifiers or amplifiers that are available with the NWTB may be used. The NWTB amplifiers are computer controlled; gain, highpass, low-pass and 60 Hz notch filters may be set by the NWTB under operator control.
90
G. F. Wilson and R. D. O'Donnell
Next, the operator selects the tests to be run. For each test, appropriate values are selected for stimulus parameters, number of trials, length of each block, etc. The default values may be used or the appropriate parameters changed. Once parameters are set they will be used for subsequent data collection runs unless the operator chooses to change them. All of the tests measuring peripheral activity (ECC, EOC, EMC) can be run simultaneously with either the transient or steady state evoked response tasks. The BSER test is unique in that it can only be run by itself, this is due to the very high sampling rate required by this test. Once the test battery has been set up, the electrodes are applied to the subject and the tests are run after appropriate training and practice. If data from a number of blocks are to be collected, the operator can check the validity of each block, or can quickly proceed from block to block without looking at the data. Following data collection, the results may be viewed, stored on disk and/or plotted. The data is stored on removable disks, so that there is essentially no data-imposed volume limit. If another device, such as a simulator, is controlling the experiment and providing the discrete stimuli for transient evoked response averaging, the battery can operate as an averaging and storage device. In other situations, it is desirable to measure only peripheral responses such as heart rate and eye blinks. In this situation the subject is instrumented and the battery only needs to receive a synch signal when digitizing of the data is to begin. The above synch pulse methodolgy also applies to data that has been recorded on analog tape at another location, such as in an aircraft. OVERVIEW OF CURRENT STATUS The NWTB has provided a good start on the development of physiological workload assessment techniques which are usable by persons not specifically trained in electrophysiology. However, it is recognized that this technology was, of necessity, outdated almost from the time it was first conceptualized. Progress in this area is s o rapid that it is essential to carry out refinement and evolution of the techniques on a continuing basis. For instance, the results of validation studies and attempts at field use must be fed into these refinements, so that tests which are not practical in real-life situations can be eliminated, and those which are valid can be enhanced, Continued efforts at miniaturizing, standardizing, and field-hardening the test battery must be carried out. Finally, new techniques such as spectral analysis of HR, the probe EP, and others mentioned above must be evaluated in laboratory situations, and candidate versions of those which were successful should be incorporated into the new test battery on a trial basis. In this way, an increasingly refined battery will be developed which should eventually
Measurement of Operator Workload
become standardized approach.
as
a
usable
workload
91
assessment
GUIDELINES FOR APPLICATION OF PHYSIOLOGICAL MEASURES. The ways in which a system such as the NWTB could be used depend on the stage of system development in which the workload assessment is carried out, and the level of diagnosticity desired. Basically, it would be expected that physiological measures would be most useful in the development stage, where part-task mock-ups and full system simulators can be used, and in the final test and evaluation stages, where the actual system is available for test or certification. In the former, it is necessary to evaluate candidate designs in order to make selections between competing systems. Many times this can be done with subjective or behaviorally-based measures. At other times, less intrusive or more indirect measures are desired for specific resource evaluations. For instance, where “traditional“ systems may be favored by a manufacturer or experienced operator even though they may not be in fact better, it is desirable not to rely on subjective techniques. Similarly, where the system may already tax the person’s limits, or where the introduction of an artificial secondary task would be undesirable, physiological techniques may provide an ideal option. Specifically, in these situations one may be able to utilize such techniques as eye blink analysis, pupilometry, or cardiac variability to provide an overall screening of workload levels. This general survey will not be diagnostic with respect to the source of the workload, but should determine if a workload problem exists at all, or if one design option is better than another. Selection of a specific physiological technique will depend on the requirements for sensitivity, and on practical constraints (O’Donnell and Eggemeier, 1986). Most importantly, even when the goal is to provide a general workload screening, it is necessary to assure that the evaluation techniques be matched to the task to be evaluated. Clearly, one should not use pupilometry in a task which requires a great deal of eye movement at varying illumination levels. Similarly, if it is suspected that the source of workload is in the motor output resources, one should not choose cardiac measures, which should be insensitive to these and may be contaminated by the motor activity. On the other hand, eye blinks (or even use of epoch analysis of the EEG) may provide the desired level of screening, particularly if sufficient attention is paid to the requirements of experimental design. In the early development phases of a system, of course, it may be desirable to do a highly diagnostic workload assessment once it is established that a workload problem exists. In such a case, measures such as the cortical evoked response may be used to probe the central processing resources, the perceptual input stages, or various other
92
G.F. Wilson and R.D. O'Donnell
specific resources. At the present time, this may require laboratory studies which would be done outside of the simulators, and which would attempt to isolate the relevant resources involved in the task in question. The goal of these studies would be to reveal the "choke points" within the limited resources which were being depleted by the task. Hopefully, redesign could then develop a more efficient system. Application of physiological measures during the final system test, or in any redesign or certification, would have the advantage of having the actual system available. Most often, such applications would involve general questions designed to determine whether a given system is acceptable from a workload viewpoint. It is anticipated that answers will be required in relative terms. Thus, the question will be whether a new system has higher or lower workload than an existing system which has already been proven to be safe. To answer this, it will be necessary to test both the old and the new systems with the same measures, and to provide relative workload measures on several dimensions. Such measures should be sensitive and have high operator acceptance. Physiological (NWTB) measures which fit these criteria are the steady-state evoked response, heart rate variability, and eye-blink analysis. Under carefully designed conditions, the transient evoked response might also be used. In any case, such evaluations must take advantage of the actual system and procedures to be used. Measures should be obtained with as little interference in the normal operation of the system as possible, and should at least examine several expected levels of workload, (e.g., from average to extreme).
A final area of application for physiological measures will require considerably more development in the state-of-theart, but will perhaps prove most valuable in the long-run. This is the area of on-line monitoring of workload. Conceivably, it will be possible to utilize non-obtrusive measures to determine the moment-to-moment workload variations in the operator. This data could then be used to warn the person of impending overload, or even to call in automated systems to reduce the load. (Stern, Wilson, and Obviously, as should be clear from the Thiessen 1 9 8 6 ) . above review, the techniques are not yet mature enough to be used for this purpose. However, the potential is clearly present, and attempts at such application are being made. If such attempts validate the feasibility of physiological measures as on-line metrics for workload, they would provide an ideal set of field-usable techniques. SUMMARY This chapter has attempted to provide a general overview of some of the physiological technique0 which might prove valuable in the assessment of workload. It has been limited to descriptions of those procedures which have been used extensively in laboratory, simulator, and field studies.
Measurement of Operator Workload
93
Several other techniques exist, of course, which might be of equal or even greater value than those described here. The present techniques, however, have been incorporated into a specific test battery which is being applied in a number of settings. They therefore represent a cross-section of techniques which are of current interest to the general question of physiological measures. This chapter has also argued that physiological measures should be differentiated on the basis of whether they tap one or several information processing resources. Rather than adapting an overall activation-level view of such measures, it is proposed that some measures are capable of targeting specific resource pools. If such measures are used in assessing tasks which do not load those resource pools, they will yield negative results, even though other measures may be positive. Thus, the appropriate use of physiological measures requires attention not only to the usual criteria of validity and reliability, but to the questions of diagnosticity and sensitivity. Attention to such factors should result in optimum use of the correct measure. If the above factors are taken into account, then physiological measurement should provide a useful adjunct to subjective and behavioral measures in the assessment of workload. One cannot expect any single approach to be sufficient in itself, due to the multi-dimensional nature of the workload construct and to the many environments in which it must be used. However, all three techniques can be combined to form an exhaustive and, in some cases, overlapping set of procedures which can be adapted to many different workload questions. Continued definition and refinement of these techniques will result in standardization and wide utilization of physiological procedures for workload assessment. REFERENCES Aasman, J., Mulder, G. and Mulder, L.J.M., Operator effort and the measurement of heart-rate variability, Human Factors, 2 9 ( 1 9 8 7 ) 1 6 1 - 1 7 0 . Aunon, J. I., McGillem, C. D. and O'Donnell, R. D.,Comparison of linear and quadratic classification of event-related.potentials on the basis of their exogenous and endogenous components, Psychophysiology, 1 9 ( 1 9 8 2 ) 5 3 1 - 5 3 7 . Bauer, L. O., Goldstein, R. and Stern, J.A., Effects of information processing demands on physiological response patterns, Human Factors, 29 ( 1 9 8 7 ) 2 1 3 - 2 3 4 . Bauer, L. O., Strock, B.D., Goldstein, R., Stern, J. A. and Walrath, L.C., Auditory discrimination and the eyeblink, Psychophysiology, 2 2 ( 1 9 8 5 ) 6 3 6 - 6 4 1 .
94
G. F. Wilson arid R.D. O'Donnell
Biferno, M. A,, Mental workload measurement: event-related potentials and ratings of workload and fatigue (1985a) Final Report, NASA Contract NAS2-11860. Biferno, M. Mental Workload measurement in aircraft systems with event-related potentials, Psychophysiology, 22 (198513) 524. Chiles, W. D., Workload, task, and situational factors as modifiers of complex human performance, in: Alluisi, E. A. and Fleishman, E. A. (eds.), Human Performance and Productivity (Erlbaum, Hillsdale, N. J. 1982). Donchin, E., Event-related brain potentials: A tool in the study of human information processing, in: Begleiter, H. (ed.) Evoked Potentials in Psychiatry (Plenum, New York 1981). Donchin, E. and Herning, R. I., A simulation study of the efficiency of stepwise discriminant analysis in the detection and comparison of event-related potentials, Electroencephalography and Clinical Neurophysiology, 38 (1975) 51-68. Donchin, E., Kutas, M . and McCarthy, G., Electrocortical indices of hemispheric utilization, in: Harnad, 9 . (ed.) Lateralization in the Nervous System (Academic, New York 1976). Duncan-Johnson, C . C. and Donchin, E., On quantifying surprise. The variation in event-related potentials with subjective probability, Psychophysiology, 1 4 (1977) 456-467. Friedman, D., Vaughan, H. G. and Erlenmeyer-Kimling, L., Multiple late positive potentials in two visual discrimination tasks, Psychophysiology, 18 (1981) 636-649. Frustorfer, H., Langanke, P., Munzer, K., Peter, J . H . and Pfaff, A., Neurophysiological vigilance indicators and operational analysis of a train vigilance device: a laboratory and field study, in: Mackie, R. R. (ed.) Vigilance: Theory, Operational Performance, and Physiological Correlates (Plenum, New York 1977). Gilliland, K . , Shingledecker, C. A., Wilson, 0. P. and Peio, K . , Effect of workload on the auditory evoked brainstem response, Proceedings of the Human Factors Society annual meeting (1984) 37-39. Goldstein, R. Walrath, L.C., Stern, J.A. and Strock, B.D., Blink activity in a discrimination task as a function of stimulus modality and schedule of presentation, Psychophysiology 22 (1985) 629-635.
Measurement of Operator Workload
95
Comer, F. E., Spicuzza, R. J. and O’Donnell, R. D . , Evoked potential correlates of visual item recognition during memory scanning tasks, Physiological Psychology, 4 (1976) 61-65.
Gopher, D. and Donchin, E., Workload: An examination of the concept. in: Boff, K., Kaufman, L. and Thomas, J. P. (eds.) Handbook of Perception and Human Performance (Wiley, New York 1986) 41-1-41-49. Hasbrook, A. H. and Rasmussen, P. C . , Pilot heart rate during in-flight simulated approaches in a general aviation aircraft, Aerospace Medicine, 41 (1970) 1148-1152. Hassett, J., A Primer of Psychophysiology (Freeman, Francisco 1978).
San
Hillyard, S.A. and Kutas, A.M., Electrophysiology of cognitive processing. Annual Review of Psychology, 34 (1983) 31-61.
Hyndman, B. W. , Kitney, R. I. and Sayers, B. , Spontaneous oscillations in physiological control systems, Nature, 233 (1971) 339-341.
Isreal, J. B. , Wickens, C. D. , Chesney, C . L. and Donchin, E., The event-related brain potential as an index of display-monitoring workload, Human Factors, 22 ( 1 9 8 0 ) 211244.
Isreal, J. B., Wickens, C. D. and Donchin, E., The eventrelated brain potential as a selective index of display load, Proceedings of the twenty-third annual meeting of the Human Factors Society (1979) 558-562. Johnson, L . C. A psychophysiology Psychophysiology, 6 (1970) 501-516.
for
all
states,
Kramer, A. F., Sirevaag, E.J., and Braune, R. A psychophysiological assessment of operator workload during simulated flight sessions, Human Factors, 29 (1987) 145-160. Kramer, A. F., Wickens, C . D. and Donchin, E. Processing of stimulus properties: Evidence for dual-task integrality, Journal of Experimental Psychology: Human Perception and Performance, 1 1 (1985) 393-408. Lacey, J. I. and Lacey, B. C., The relationship of resting autonomic activity to motor impulsivity, The Brain and Human Behavior, Vol 36 (Williams and Wilkins, Baltimore 1958). Lindholm, E. and Cheatum, C. M., Autonomic Activity and workload during learning of a simulated aircraft carrier landing task, Aviation, Space and Environmental Medicine, 54 (1983) 435-439.
96
G.F. Wilson and R.D. O’DonneN
Lindqvist, A , , Keskinen, E l Antela, K,, Halkola, L., Peltonen, T. and Valimoki, I., Heart rate variability, cardiac mechanics, and subjectively evaluated stress during simulated flight, Aviation, Space and Environmental Medicine, 54 (1983) 685-690. McCloskey, K. , Evaluating a spatial processing task using EEG and heart rate measurement. Proceedings of the Human Factors Society (1987). Moise, Samuel, L. Development of neurophysiological and behavioral metrics of human performance. Armstrong Aerospace Medical Research Laboratory Technical Report, AFAMRL-TR-80-39 (1980). Mulder, G., Mental Load, mental effort .and attention, in: Moray, N. (ed.) Mental Workload: Its Theory and Measurement (Plenum, New York 1979). Mulder, G. and Mulder, L. J. M., Coping with mental workload, in: Levine, S. and Ursine, H. (eds.) Coping and Health (Plenum, New York 1980). Natani, K. and Gomer, F. E., Electrocortical activity and operator workload: A comparison of changes in the electroencephalogram and in event-related potentials, MacDonnell-Douglas Astronautics Co. Report MDC E2427 (1981). Navon, D. and Gopher, D. , On the economy of the human processing system, Psychological Review, 86 (1979) 214-255. On data-limited and Norman, D. A. and Bobrow, D. G . , resource-limited processes, Cognitive Psychology, 7 (1975) 44-64 O’Donnell, R. D., Contributions of psychophysiological techniques to aircraft design and other operational problems, AGARD-AG-244 (1979) NATO Advisory Group for Aerospace Research and Development, Neuilly sur Seine, France. O’Donnell, R. D., The USAF neuropsychological workload test battery: Concept and validation, Proceedings 338 of the NATO Advisory Group for Aerospace Research and Development, Paris (1983) 5/1-5/9. O’Donnell, R. D. and Bggemeier, F. T., Workload assessment methodology, in: Boff, K. R,, Kaufman, L. and Thomas, J. P. (eds.) Handbook of Perception and Human Performance Vol I1 (Wiley, New York 1986). Opmeer, C. H. J. M., The information content of successive RR interval times in the ECG, Preliminary results using factor analysis and frequency analyses, Ergonomics, 16 (1973) 105-112.
Measurement of Operator Workload
91
Opmeer, C. H. J. M. and Kral, J. P., Towards an objective assessment of cockpit workload: 1. Physiological variable8 during different flight phases. Aerospace Medicine, 44 (1973) 527-532. Papanicalaou, A. C. and Johnstone, J., Probe evoked potentials: Theory, methods and applications, International Journal of Neuroscience, 24 (1984) 107-131. Pritchard, W, S., Psychophysiology of P 3 0 0 , Bulletin, 89 (1981) 506-540.
Psychological
Purvis, B., Skelly, J., Simons, J . and Detro, S., Aircrew workload assessment in a sustained environment: B-52 operations. AFAMRL-TR (1984), Armstrong Aerospace Medical Research Laboratory, Wright-Patterson Air Force Base, Ohio. Ranson, S. W., and Clark. S. L., The Anatomy of the Nervous System (Saunders, Philadelphia 1959). Regan, D., Some characteristics of average steady-state and transient responses evoked by modulated light, Electroencephalography and Clinical Neurophysiology, 20 (1966) 238-248. Regan, D., Evoked Potentials Physiology and Clinical Medicine. 1972).
in Psychology, Sensory (Chapman and Hall, London
Regan, D., Latencies of evoked potentials to flicker and to pattern speedily estimated by simultaneous stimulation method, Electroencephalography and Clinical Neurophysiology, 40 (1976) 654-660. Regan, D., Steady-state evoked potentials, Journal of the Optical Society of America, 6 7 (1977) 1475-1489. Rizzutto, A. P., Diazepam and its effects on psychophysiological and behavioral measures of performance, Ph.D. Thesis, Dept. of Psych., Bowling Green (1985). Rizzutto, A. P., Wilson, G . F., Yates, R. E. and Palmer, R., Diazepam and its effects of psychophysiological measures of performance. AFAMRL-TR-85-036, Armstrong Aerospace Medical Research Laboratory (1985). Roman, J., Older, H. and Jones, W. L. , Flight Research Program: VII. Medical Monitoring of Navy Carrier Pilots in Combat, Aerospace Medicine (1967) 133-139. Roscoe, A. H., Heart rate monitoring of pilots during steep gradient approaches, Aviation, Space and Environmental Medicine, 46 (1975) 1410-1415. Roscoe, A. H., Use of pilot heart rate measurement in flight evaluation, Aviation, Space and Environmental Medicine, 47 (1976) 86-90.
98
G.F. Wilson and R. D. O’Donnell
Roscoe, A. H., Assessing pilot workload in flight. ACARD Proceedings No. 373, Flight test techniques (1984) 12/112/7. Ruffel-Smith, H. P., Heart rate of pilots flying aircraft on scheduled airline routes, Aerospace Medicine, 38 (1967) 1117-1119. Sem-Jacobsen, C. W. Blackout and unconsciousness revealed by airborne testing of fighter pilots, Aerospace Medicine, 32 (1961) 247. Shingledecker, C. A., Behavioral and subjective workload metrics for operational environments, Proceedings of the ACARD (AMP) symposium on sustained intensive air operations: Physiological and performance aspects, ACARD-CP-338 (1983) 6/1-6/10. Shingledecker, C.A. A Task Battery for Applied Human Performance Research. AFAMRL-TR-84-071, Air Force Aerospace Medical Research Laboratory, (1984). Shingledecker, C. A., Crabtree, M. S . and Acton, W. H., Standardized tests for the evaluation and classification of workload metrics, Proceedings of the Human Factors Society annual meeting (1982) 648-651. Sirevaag, E., Kramer, A. F., Coles, M. 0 . H. and Donchin, E. i P300 amplitude and resource allocation, Psychophysiology, 2 (1984) 598-599. Skelly, J. J., Purvis, B. and Wilson, C. F., Fighter pilot performance during airborne and simulator missions: physiological comparisons. ACARD Symposium (in press) Electric and magnetic activity of the central nervous systems: research and clinical applications in aerospace medicine, Trondheim, Norway (1987) 23/1-23/15. Spekreijse, H., Analysis of EEC responses in Man (Junk, The Hague 1966). Speyer, J. J., Fort, A., Fouillot, J. P. and Blomberg, R. D. , Assessing workload for minimum crew certification. In The Practical Assessment of Pilot Roscoe, A. H. (ed.) Workload, ACARDograph No. 282 (1987) 90-115. Spyker, D. A , , Stackhouse, S. P., Khalafalla, A. S. and McLane, R. C,, Development of techniques for measuring pilot workload (Report No. NASA CR-1888) NASA, Washington, D. C. (1971). Sterman, M. B. Measurement and modification of sensory system characteristics during visual-motor performance, AFOSR Annual Report (1986).
Measurement of Operator Workload
99
Stern, J. A . and Skelly, J. J., The eye blink and workload considerations, Proceedings of the Human Factors Society (1984) 942-944.
Stern, J. A., Walrath, L. C. and Coldstein, R., endogenous eyeblink, Psychophysiology, 21 (1984) 22-33.
The
Stern, J. A., Wilson, G. F. and Thiessen, M., Closing the man-machine loop: on the use of physiological measures to affect computer-controlled devices. ACARD-CP-414, Neuilly sur Seine, France, NATO Advisory Group for Aerospace Research and Development (1986). Sternberg, S., The discovery of processing stages: Extension of Donder’s method, in: Koster, W. G. (ed.) Attention and Performance I1 (North-Holland, Amsterdam 1969). Sutton, S., Tueting, P., Zubin, J. and John, E. R., Information delivery and the sensory evoked potential, Science, 155 (1967) 1436-1439. Thiessen, M. F., Lay, J. E. and Stern, J . A., Neuropsychological Workload Test Battery validation study, Final report on Air Force Contract F 33615-82-C-0517, Armstrong Aerospace Medical Research Laboratory, WrightPatterson AFB, Ohio (1986). Van Dellen, H. J., Aasman, J., Mulder, L. 9 . M. and Mulder G. Time domain versus frequency domain measures of heartrate variability, in: Orlebeke, J.F., Mulder, G. and van Doormen, L . J . P . (eds.) Psychophysiology of Cardiovascular Control; Models, Methods and Data (1985). Vincente, K. J., Thornton, D. C,, and Moray, N. Spectral analyses of sinus arrhythmia: a measure of mental effort, Human Factors, 29 (1987) 171-182. Wickens, C. D . , The structure of attentional resources, in: Nickerson, R. (ed.) Attention and Performance VIII (Erlbaum, Hillsdale, N. J. 1980). Wickens, C. D., Isreal, J. and Donchin, E. The event-related cortical potential as an index of task workload, Proceedings of the twenty-first annual meeting of the Human Factors Society (1977). Wickens, C . D . and Kessel, C., The effect of participatory mode and task workload on the detection of dynamic system failures, IEEE Transactions on Systems, Man, & Cybernetics, 13 (1979) 21-31.
Wickens, C., Kramer, A., Vanasse, L., and Donchin, E., Performance of concurrent tasks: a peychophysiological analyses of the reciprocity of information processing resources, Science, 221 (1983) 1080-1082.
100
G.F. Wilson and R. D. O’Donnell
Wierwille, W. W., and Connor, S . A , , Evaluation of 20 workload measures using a psychomotor task in a moving-base aircraft simulator, Human Factors, 25 (1983) 1-16. Wierwille, W. W., Rahimi, M. and Casali, J.G., Evaluation of 16 measures of mental workload using a simulated flight task emphasizing mediational activity, Human Factors, 27 (1985) 489-502. Wilson, G. F. and Heinrich, T., Steady state evoked responses used to measure task difficulty in three performance tasks. In preparation, Armstrong Aerospace Medical Research Laboratory Technical Report. Wilson, G. F. , McCloskey, K. and Davis, I., Evoked Response, performance and subjective measures in a linguistic processing task. Proceedings of the fourth International Symposium of Aviation Psychology (1987). Linguistic Wilson, G. F., McCloskey, K. and Davis, I., Processing : physiological, performance and subjective correlates. Proceedings of the Human Factors Society annual meeting (1986). Wilson, G. F. and O’Donnell, R. D., Steady-state evoked responses: Correlations with human cognition, Psychophysiology, 23 (1986) 57-61. Wilson, G. F . , O’Donnell, R. D. and Wilson, L., Neuropsychological measures of A-10 workload simulated in low altitude missions. AFAMRL-TR-83-0003, Armstrong Aerospace Medical research Laboratory, Wright-Patterson AFB, Ohio (1983). Wilson, G. F., Papanicalaou, A., Busch, C., DeRego, P., O r r , C., and Davis, I., Hemispheric asymmetries in phonetic processing assessed with probe magnetic fields. Proceedings of the 6th International Conference on Biomagnetism (1987). Wilson, 0. F., Davis, I., workload in Proceedings of
Purvis, B., Skelly, J., Fullenkamp, P., and Physiological data used to measure pilot actual flight and simulator conditions. the Human Factors Society (1987).
Wilson, G. F., and Ritter, M., Steady state evoked responses in the intermediate stimulus range (1987) unpublished data, Armstrong Aerospace Medical Research Laboratory, Wright-Patterson AFB, Ohio. Yolton, R. L., Wilson, 0 . F., Davis, I. Physiological correlates of behavioral mathematical processing subtest of Proceedings of the Human Factors Society
and McCloskey, K., performance on the the CTS battery. (1987).
HUMAN MENTAL WORKLOAD P.A. Hancock and N. Meshkati (Editors) 0 Elsevier Science Publishers B.V. (North-Holland), 1988
101
HEART RATE VARIABILITY A N D MENTAL WORKLOAD ASSESSMENT
N. Meshkati Human Factors Department Institute of Safety and Systems Management University of Southern California Los Angeles, CA 90089
ABSTRACT Heart rate variability is probably the most used physiological method in mental workload measurement experiments. This work, from a new perspective, attempts to review studies which have employed heart rate variability. It has been noted that there are two basic approaches in quantification of heart rate variability: the use of scoring methods and spectral analysis. It is concluded that, regardless of the quantification method and reported insensitivity of the measure, heart rate variability remains one of the most promising physiological measures of mental workload and operator effort. 1. Introduction
One of the most widely investigated topics in the quest for accurate mental workload assessment is the utility of heart rate and heart rate variability data. Traditionally, there have been two major quantification approaches: scoring methods and the use of spectral analysis of the heart rate variability. Users of these approaches disagree as to whether heart rate variability is consistently sensitive enough to reflect changes in the level of operator effort or mental workload. This study attempts to analytically review, from a new perspective, these traditional approaches and their respective findings.
2. Heart Rate Variability The underlying hypothesis of the relationship between mental workload and heart rate variability was developed by Lacey (1967). Martin and Venables (1980) considered Lacey's hypothesis as the most influential hypothesis concerning the 'directional fractionation' of cardiac activity of situational stereotype. Frith (1973), also regarded Lacey's hypothesis as a general theory about the cardiovascular system which allows interpretation of both short-term and long-term changes in heart rate. Essentially, the original Lacey hypothesis related the directional fractionation of cardiac activity according to the type of situation in which information occurred. It was argued that the situation may be appraised by the subject as one which required either environmental intake or environmental rejection; heart rate was said to decelerate in situations which required environmental intakes and to accelerate in situations in which environmental rejection was involved. Martin and Venables (1980) considered the importance of this hypothesis not only because it was "solely descriptive," but also because it was extended to a functional explanation of the relationship between cardiac activity, cortical activity
102
N. Meshkati
and behavior.
The presence of cardiac deceleration was not just associated with environmental intake per se, but was said to be instrumental in the facilitation of sensory processing. Similarly, cardiac acceleration was said to lead to an inhibition of sensory processing. Firth ( 1973) interpreted Lacey's theory in a relatively similar manner and wrote: "The theory states that the cardiovascular system exerts some control over the bulbar inhibitory area within the brain, an area which appears to control the duration of stimulus-evoked cortical activation. In this way, it is hypothesized, heart rate (or more specifically pressure which is detected by sensors within the cardiovascular system) may affect the amount and duration of sensory-evoked potentials within the brain. Thus, the higher the heart rate, the more sensory eftects are inhibited." She continued: "As a direct consequence of such a theory, Lacey ( 1967) suggests that short-term cardiac deceleration occurring both prior to and during a stimulus event could be a physiological mechanism facilitating stimuli detection. Similarly, cardiac acceleration would inhibit the effects of sensory stimuli during 'environmental rejection' or parts of a task with little or no perceptual requirement." Lacey's hypothesis was disputed later, particularly by Elliott (1972) who reported "relatively direct tests of the (Lacey's) hypothesis seem not to support it well." This important hypothesis has been verified by Graham and Clifton (1966) and many other investigators whose works were reported by Martin and Venables (1980). There are pros arid cons of employing Heart Bate variability (HR) as an indicator o f mental workload. Some researchers managed to find statistically significant changes of HR as a function of the operator's mental workload while others did not. This study reviews the reported relationship between heart rate variability (sinus arrhythmia) and mental workload. Until then, all discussion of heart rate variability will reflect the points of view of both those researchers who found a correlation among mental workload and the HR parameters as well as those who did not. The following review of the literature on heart rate variability and mental workload is divided into two sections according to the (i) presence or (ii) absence of a significant relationship between them. Each section is further classified on the basis of the method employed in measuring HR variability; (a) parameters of HR data (e.g., standard deviation) or scoring, (b) spectral analysis, and (c) combination of the two methods. 2. I. Significant Relationship Between Heart Rate Variability and Mental Workload
2.1. I . Parameters of HR Data and Scoring Methods Among those who found significant results, Kalsbeek is the best known researcher. First Kalsbeek and Ettema ( 1 963) referred to a "gradual" suppression of heart rate irregularity due to increasing task difficulty, and they concluded that it could possibly be used for measuring perceptual load. Later, Kalsbeek and Sykes (1967) tested two groups of subjects each consisting of seven members. One group was motivated and the other was not. The experiment results showed that the motivated group remained at a constant level of suppression of sinus arrhythmia while the neutral group started at a lower level of suppression and followed the decreasing trend. In general, at a lower fraction of the subject's maximum performance, there was a systematic trend of decreasing sinus arrhythmia as a function of the increasing performance. Later, Kalsbeek (1968), after testing 30 healthy subjects on auditory binary choice tasks, concluded: "An increase of
HR V and Mental Workload Assessment
I03
mental load consisting of t h e number of binary choices per minute is reflected by a decrease in t h e score irregularity of the heart pattern." In another study, Kalsbeek (1973) referred to a study by Opmeer and Krol (1973) that found a significant difference in different phases of a flying task performed by junior pilots. Sinus arrhythmia scores decreased respectively with level flight, holding, take off and approach. Kalsbeek (1973), with reference to Welford (l959), established the relationship between single-channel capacity and sinus arrhythmia. With this assumption, he finally concluded that sinus arrhythmia is an indicator of proportional occupation of an individual's singlechannel capacity during rest and work. Ettema and Zielhuis (197 I ) tested 24 subjects on a binary choice task as the mental load and found the suppression of the sinus arrhythmia score during mental load and some significant correlations between various physiological effects of mental load (e.g., between systolic and diastolic blood pressure, between breathing rate and systolic blood pressure, and heart rate). Steptoe (1981) has referred to the above study as the evidence of a relationship between cardiovascular activitation and psychological or behavioral load. Rohmert and Laurig (1971) chose a mental task similar to the air traffic control task. They referred to the Kalsbeek and Ettema (1963) studies and its calculated arrhythmia as having "certain correlation to mental load." Later, Rohmert, Laurig. and Luzak (1973) examined three parameters of heart rate variability: 1. The measure of amplitude variations, which is the difference between two successive heart rate values. They call this measure "Delta Heart Rate (6HR,)."
2. The measure of frequency as recorded by counting the number of changes from increasing to decreasing values and vice versa. They called this measure "change of the sign (a)."
3. The number of negative delta heart rate values, which is a linear function of the 6HRi. They used a binary choice task with 18 male subjects, 19-57 years old. T h e parameters of heart rate variability showed the expected tendencies, but due to only small effects from different loads. The results did not show significant differences between the means. However, they presumed that emotional stress and more complex tasks would provoke stronger effects in the parameters of heart rate variability than in the binary choice task. Finally, they concluded in the analysis of heart rate variability as a measure of strain, that there is the risk of under-assessing the strain because of increased variation of the heart rate with a typical tendency towards increasing the mean values. Meers and Verhagen (1972) conducted an experiment on two groups of subjects. T h e first group consisted of 20 psychology students between 20 and 25 years of age. T h e second group consisted of 26 technicians. Their ages ranged from 19 to 35 years old, and they were told that the results would have some significance on their future careers. Both groups were given a binary choice task. Sinus arrhythmia was defined as the sum of absolute differences between succeeding instantaneous heart rates. Only 12 out of 20 of the subjects showed the expected decrease of sinus arrhythmia in condition of maximal
104
N. Meshkati
load, but, in general, there was no definite relationship between the scores in rest and in the condition of maximum load. In the motivated group, 24 out of 26 subjects showed a decrease in sinus arrhythmia under the condition of maximal load compared with rest condition. The authors concluded that the decrease of sinus arrhythmia was more marked in older subjects. Their explanation was that the psychological examination may be subjectively more important and more stressful for older subjects. And finally, they suggested that besides a high rate of information transmission, some emotional tension is indispensable to get a decrease of sinus arrhythmia. The impact of emotional tensions as referred to by the above authors, is consistent with Sheridan and Stassen (1979), Pasmooij et al. (1976). and Hopkin (1979) who all acknowledge the existence of an emotional workload parallel to mental workload. Among t h e supporters of heart rate variability as a viable measure of mental workload, Zawaga (1973) considered the work of Kalsbeek and Ettema (1963)as an indication of the existence of 'long-term adjustment'. He referred to 'short-term adjustment' as the indication of decrease of heart rate during the first minute of a mental arithmetic task. He also cited a study by Brunia and Diesfeldt (1971) which was directly aimed at longterm changes of indices of mental effort, and in which subjects performed a binary choice task during 20 minutes at 80% of their personal maximum. This task period was preceded and followed by a rest period of 10 minutes. "As could be expected, sinus arrhythmia was lower during the task than during rest" (Zawaga. 1973). H e also suggested: "A plausible explanation of short-term as well as long-term adjustment can be offered when using the concept of arousal (or activation) and stress." Moreover, he was able to explain the change in the level of activation and the contradicting findings of Meers and Verhagen (1971). Boyce (1974) designed an experiment to provide conditions in which the physical and mental loads were independently variable. The mental load included two levels. The first one consisted of a series of arithmetic subtractions of single digit numbers and the second one of two digit numbers. Ten male graduate workers in the age range of 20-40 years were used in the experiment. The standard deviation of the distribution of the interbeat interval was used as a measure of sinus arrhythmia. After performing analysis of variance, he reported that: "There is a significant decrease in the sinus arrhythmia, i.e., less variability for an increase in mental load .... T h e conclusion from this experiment should be, therefore, that sinus arrhythmia is affected by mental load." Strasser (1 977) examined changes in tracking performance and sinus arrhythmia under hypoxia. He defines sinus arrhythmia as "the sum of absolute differences between succeeding R-top intervals." He exposed 10 young, healthy male subjects to 0 - N, gas mixture by means of a respiration mask. There were four test trials eac2 run on different days. Each test session consisted of 3 sections lasting for about 45 minutes each. The sections included tracking and adaptively changing difficulty level tasks. From the results he concluded: "With an increasing degree of hypoxia lasting for about 45 minutes, elevations of the heart rate and suppressions of arrhythmia can be expected." In this work, he referred to another one of his studies where he was able to detect a "decrease in the amplitude of changes in heart rate (sinus arrhythmia) during the work, while heart rate did even show a slightly lower level during the load of the tracking task." Later. Strasser (1979) considered sinus arrhythmia as the only result of the indirect influence of mental workload on the "peripheral physiological indicators," but he still
HR V and Mental Workload Assessment
105
acknowledged the high value of physiological data for measuring strain. Sheridan and Stassen (1979), in their review of different models and measures of workload, reported a decrease of heart rate variability under both kinds of mental load (i.e., information processing and einotional load). Opmeer (1973) reviewed 26 different sinus arrhythmia scoring methods. He concluded that one should avoid the word "measure" in assessing Heart rate Irregularity (HI) and recommended the use of "scoring" because of the nature of HI. Meshkati (1983), and Robertson and Meshkati (1985) used Kalsbeeks (1968) sinus arrythmia scoring method in their mental workload studies. They observed and recorded any significant supression of SA due to imposed mental workload.
2.1.2. Spectral Analysis of HR Data Luzak and Laurig ( 1973) conducted an experiment on 12 male subjects under 4 different laboratory conditions with 3 different types of load using a Kalsbeek binary choice generator (Kalsbeek, 1968). Giving 20 or 60 signals per minute and one with no mental load in a recumbent position, they developed eight arrhythmia measures, based on interbeat intervals and conducted extensive spectral and time-series analysis. Their conclusion indicated the partial variance of the amplitude spectrum is a reliable measure of heart rate variability and a more exact indicator of strain. Hyndman and Gregory ( 1975) employed spectral analysis of electrocardiogram signals in their experiment. To induce mental load in subjects, two kinds of tasks were set. T h e first was an adapted version of the binary choice task (nine subjects); the second was a tracking task (five subjects). The sinus arrhythmia was scored by calculating the area under the spectral power density curve to obtain 'Average Total rower' (ATP) of the Low Pass Filtered Cardiac Event Sequence (LPFCES). After data reduction, they were able to show that "the ATP of sinus arrhythmia decreases substantially during the performance of perceptual tasks that required physical responses to implement a decision (decisionmaking tasks)." They concluded that "the greater ATP during a rest period than during the preceding task period appears to be task-intensity dependent. Thus, a sinus arrhythmia during the rest period immediately following a task may be a measure of the degree of mental loading produced by that task and might provide a means for scaling the effect of different tasks." Rompelman, Kampen, and Backer (1980) conducted an investigation on 2 groups of subjects characterized by a large difference in psychic state, 30 medium- and long-stay psychotic patients and 10 staff members of the psychiatric center. They recorded about 10 minutes of electrocardiogram and respiration of each subject during physical rest. With the help of cluster analysis methods applied to parameters extracted from the heart rate variability power spectra, it was found that there was a relationship among physiological factors underlying the heart rate variability spectra, age, and psychological factors of the subject. Furthermore, the authors concluded: "This result is in agreement with those of Hyndman (1980) who found a marked reduction in heart rate variability power during mental experiments." Recently, Aasman, Mulder and Mulder (1987) used spectral analysis of sinus arrhythmia as the indicator of operator effort. They reported that 0.10 Hz component of the "cardiac interval" (i.e.. the R-R interbeat interval) signal systematically decreased as the
I06
N. Meshkati
load on working memory increased. Moreover, Vincente, Thornton and Moray (1987) also employeed spectral analysis of sinus arrythmia as a measure of operator effort. According to their findings, the power in t h e heart rate variability spectrum between 0.06 Hz and 0.14 Hz is a n accurate measure of the amount of effort being invested by the operator.
2. I .3. Combination ot Calculated Parameters of H R Data and Spectral Analysis Mulder and Mulder ( 1973) evaluated nine different heart rate variability measures based on R-R intervals data. They concluded that in forced choice reaction tasks the number of reversal points in the cardiotachogram is the most sensitive measure of the load of thr task. They also reported that the "spectral analysis of the heart rate variability revealed the existence of a frequency component at about 0.10 H z . " Six years later, Mulder (1979) related the sensitivity of this frequency component to discriminating between the levels of difficulty of an operator's loading task. Wildervanck, Mulder. and Michon (1978) also suggested that: "It is necessary to distinguish between changes in the tonic' .) level of heart rate and phasic' level of activity. The response would typically have a relatively rapid onset and a return to baseline within a period which is the characteristic of dift'erent response systems (Ibid).) changes." They proposed that tonic heart rate reflects both the task demands and habitutation to the experimental situation, and they concluded that continuous processing of information is an important determinant of the level of tonic and phasic heart rate changes as showii in the momentary cognitive and motor demands of the task. In the same context, they referred to the occurrence of cardiodacceleration before the presenration o f a signal and an immediate heart rate acceleration after the presentation. This last finding is very much consistent with Frith's (1973) interpretation of Lacey's theory. Kitchen, Brodie, and Harness (1980) carried out an experiment using subjects with a modal age of 19 years and a range of 18-24 years. The task was to listen to a series o f pre-recorded digits through stereo headphones. The digits were presented sequentially in random order. The subject was required to listen for a specific 'odd-even-odd' sequence, and upon hearing this sequence, he was to depress a microswitch which was held in the left hand. By increasing the rate of presentation of the signals, the task would become more difficult. This task would require minimal response movement. They calculated inter-beat intervals and several other statistical properties ( e g , mean, standard deviation, variance, number of reversal points) and also performed spectrum analysis and Fast Fourier Transtorm (FFT) in order to get fine frequency resolution. After performing an analysis of variance on the inter-beat intervals, it was found to be significant during the more demanding task only. By analyzing the power spectra obtained from the inter-beat intervals, they found that "during mental loading there was a shift of power from one spectral band to a higher level spectral band, but this could not be directly attributed to a change in the respiration." Finally, they concluded that the: "heart rate variability decreased during the tasks."
'Tonic referv to anping physiological amvity. which m a y show slow changes. i e 'phasic' r e r p o n m ( U r n md Vewhlrs. 1W
,' h w '
rplrtivr to the speed of
5hoTtPr
term
' P h a s ~ :refers to shon-term c lunge in phyriological acuvay. often following an idsntifiabk rumulus. which can be dirtinyuxrhrd II hxkground. ongoing ('fun~c*
apinnr
HR V and Mental Workload Assessment
I07
2.2. Absence of Significant Relationships Between Heart Rate Variability and Mental Workload 2.2. I . Parameters of HR Data and Scoring Methods Gaume and White (1975, I and 11) conducted a mental workload experiment on 10 male subjects. The task involved decision making and the time sharing of self scheduling of a multiple-task situation. Stimuli consisted of from three to seven light-emitting diodes (LEDS) displaying single numerals which increased in value at varying rates selected by the experimenter. l h e subject's task was to monitor the LED displays and prevent their values from increasing beyond the numeral 9 by taking reset actions on a 12-button keyboard. Reaching to select buttons was not necessary. The researchers recorded pulse rate, systolic and diastolic blood pressures, respiration rate and basal skin response. After three tests on each subject and data analysis, they concluded that "no consistent relationships were found between pulse rate or heart rate and mental workload." Their findings were consistent with Mulder and Mulder (1973) who found that the mean heart rate does not change significantly under different levels of loads. The findings of Gaume and White did not dispute the validity of heart rate variability because according to the experiment, only heart rate was considered and not heart rate variability. The latter has to be measured by some sort of feature extraction from the heart rate data. If heart rate variability had been properly measured and no significant change detected, it still would not dispute the validity of the heart rate variability method due to the nature and duration of each test trial. According to Kalsbeek and Ettema (1964). sinus arrhythmia is the predictor of the reserve capacity of an individual. Kalsbeek (1973) cites a case where the subject's reserve capacity was utilized, indicated by a period of suppressed sinus arrhythmia, for a period of three minutes. He concluded that "the recovery time after a peak load is relatively long." According to an experiment of Lille et al. (1968): "It took about 12 minutes after a demanding task involving 3 minutes of EEC variables to return to their initial values." In the experiment performed by Caume and White (1975, 11), each test session consisted of eight test trials, each lasting two minutes. After each trial, subjects were given a twominute rest. The probable reasons for not having significant heart rate variability could be the short duration of the test trial itself or the short period of rest following it. There is another argument about using the values of the first minute of each test in view of "artifacts caused by orientation reactions, etc." (Ettema and Zielhuis, 197 I). As mentioned before, the nature of the task plays an important role in causing suppression of sinus arrhythmia. It is possible that the task during each trial does not really ask for utilization of reserve capacity and. consequently, the sinus arrhythmia suppression does not happen. In their conclusions and recommendations Ettema and Zielhuis (197 I ) acknowledged the short rest between successive trials: "There was evidence of incomplete recovery from stress between trials." As a result, they recommended that in future tests, subjects should be tested "with adequate rest intervals between trials." In another study, Gaume and White (1975, 111) used the Integrated pulse Volume (IPV) as a measure of mental load. They found the IPV score differed significantly under
108
N. Meshkati
resting and workload conditions and under low and high levels of mental workload Pasmooij, Opmeer, and Hyndman (1976) referred to an experiment by Opmeer and Krol (1973) on air traffic controllers with a flight simulator. They calculated the correlation coefficient on the basis of minute by minute values between traffic density and heart rate irregularity as an indicator of workload in terms of information processing. "The correlation comes to 0.16 and is not significant in spite of the fact that increasing density of traffic, according to the controllers own subjective rating, leads to an increasing difficulty of the task and to a higher workload." They tried to relate the heart rate irregularity to the subjective ratings which themselves may not be viable indicators of mental workload. Rault (1976) conducted an experiment with 10 pilots on the simulations of test flight. The task was an instrument landing procedure on a transport airplane. The difficulty was controlled by the injection of different levels of perturbations and engine stall. T h e mean value of cardiac rhythm (beatshin) was computed every 15 seconds to eliminate the respiratory variations. The findings were that "the mean value of cardiac 'rhythm varies in the same manner as the progammed difficulty, "but to consider the cardiac rhythm as a workload indicator is not appropriate. Indeed, even with such a homogeneous population as test pilots, personal dispersion appears to be large." Later, Rault (1979) reaffirms his original idea and regards cardiac rhythm based on "Moving Average" analysis too sensitive to interpersonal variations. However, he considers the "actual findings" of the cardiac rhythm measurement based on "variability" analysis as "quite fair."
The major theme of Rault's research deals with "inter-person variations," even in an apparently homogeneous group of test pilots. This could be a valid case for not having a convergence in cardiac rhythm analysis. As it will be discussed in the next chapter, finding homogeneity in decision-making behavior of different subjects is not easily achieved. Hacker et al. (1978) conducted an experiment in which two or five choice reactions were made to signals consisting of dot patterns of varying size over a period of 60 minutes. They reported that the heartbeat interval tended to become larger during the first half of the experiment while their variance remained constant "presumably indicating a habituation effect .... In the second half, the intervals remained constant while the variance increased, indicating a fatigue effect." With these results on hand, they avoided drawing any explicit conclusions on the validity of variance either as a measure of heart rate variability or as an indicator of mental workload. Ursin and Ursin (1979) refer to an experiment by Blix et al. (1974) in an experiment of their own in which they measured heart rate and oxygen consumption of helicopter and transport aircraft pilots. They reported that during flight operations, the heart rate accelerated without a corresponding increase in oxygen consumption. "This heart rate increase(d) beyond that expected from the oxygen uptake, i.e., 'additional heart rate' is therefore used as an indicator of psychological activiation." Furthermore, they noticed that changes in stimulus conditions and psychological challange did not always produce heart rate acceleration. They also found that heart activation did not always depend on the stimulus characteristics, but above all. on the individual himself, how he perceives the situation, and how he responds to his environment. This is why Blix e t al. considered
HR V and Mental Workload Assessment
109
both experience and responsibility as important factors in determining the level of 'additional heart rate ' Lack of significant difference in the heart rate of parachutist trainees after their period of tower training (Stromme et al., 1978) has been reported, and it is consistent with the findings of Blix. This fact can explain why there is no significant change in the heart rate level of some subjects after some test trials due to learning and gaining experience. Sharit and Salvendy (1982) used 32 subjects in a study aimed at assessing differences in mental workload between Machine-paced (MP) and Self-paced (SP) work. In the experiment, two tasks having contrasting attentional demands were performed both MP and SP by all subjects. These tasks were called ' t h e exterrnal task' and 'the internal task,' and took 10.5 and 1 1 minutes respectively. In the first task, the emphasis was predominantly on visual detection and was based on the "suspected direction of attentional demands." This task was characterized as 'external.' The 'internal' task required mental solutions of arithmatic problems.
The authors considered S,, (sample variance statistic based on heartrate data) as an inflated estimator of popdation variance. Therefore, they preferred to use the Mean Square Successive Difference (MSSD) statistic as a measure of sinus arrhythmia (see Appenidix A for formula). According to Heslegrave et al. (1979). this is an appropriate measure of variability. After the proper statistical analysis, the authors report both the S, and MSSD measures of Sinus Arrhythmia (SA) are more sensitive to pacing conditions than to informational load. They propose that the reason for failure of SA in detecting differences in informational load implicit in the two tasks was due to the attentional characteristics associated with the tasks. Thus, they suggest that "the effects of attentional mobility on SA were capable of obscuring those of informational processing." The nature and duration of external and internal tasks also can be regarded as factors leading to the lack of detection of informational load content by sinus arrhythmia. The total time duration of the tasks were much longer (10.5 and I 1 minutes) than those in the Kalsbeek study (1973) where suppression and reappearance of sinus arrhythmia during three-minute tests were observed. The authors used Mean Square Successive Difference (MSSD) tests as the scoring method and measure of sinus arrhythmia. It should be noted that: "Sometimes one scoring method reflects a supposed increase or decrease in mental loads, whereas another one shows no change" (Kalsbeek, 1973). (b) Spectral Analysis of HR Data Sayers (1973) studied the effect of mental load on heart rate variability, and employed spectral analysis of the inter-heart intervals data. He declared that mean and variance of heart rate are unreliable measures, but he did acknowledge that "in both laboratory and industrial conditions, imposing a mental workload on the subject provokes an effect o n the cardiac inter-beat interval signal." He attributed this phenomenon to the respiration pattern that affects heart rate. As his conclusion he proposes: "All the present indications are that respiratory-vasomotor interactions are predominantly responsible for the part of the effect."
110
N. Meshkati
This hypothesis is disputed by Hitchen at al. (1980) who ruggest: "For the spectral analysis method to be used accurately on heart rate values, some method of interpolation must be performed to provide a wave which can be sampled regularly; Luzak e t al. (1973) explained three commonly used methods." Hitchen et al. continue: "Sayers ( I 973) derived the spectrum without using any interpolation methods, and where the variation in heart rate is small, this deviation represents little error." With this improved method, Hitchen et al. (1980) are able to conclude that "the results show that the frequency component attributed to respiration rate is small and of no significance." Also, there are two other equally logical alternative hypotheses to Sayer's hypothesis which are discussed in detail in Hyndrnan and Gregory (1975), and which relate heart rate variability directly to mental workload. Jex and Clement (1979) designed an experiment for a series of 100-second alternating rests and tracking runs with first-, second- and third-order tasks in succession. There were four subjects whose heart rate, passive arm EMGs, breathing frequency and palinar skin resistance were monitored. Theyperformed cross spectral analysis on breath flow. heart rate and calculated the mean (HR) and the standard deviation (6HR) fronfithe heart rate power spectrum data. They found that the heart rate variability (GHWHR). which went from 0.066 to 0.422. correlated very well with breathing frequency. Due to this high correlation, they raised the question of "Why not use the simpler instrumentation and measure breathing frequency in the first place?" Finally, the authors express their uncertainty on the interpretation of the heart rate variations in the absence of a theoretical basis for scoring heart rate variabilities. In the authors' view, respiration (and its frequency) play a significant role in heart rate variability. The evidence for this was shown when breath-flow correlated portions of the heart-rate spectra was subtracted. This leaves a wide band, low pass spectral, which "show(s) little difference between resting and tracking" Uex and Clement, 1979). One of the most dominant influences on cardiac activity is respiration.
Its effect on a resting heart rate demonstrates a phasic cycling known as Pespiratory Sinus Arrhythmia (RSA). However, in the foregoing research, heart rate variability was under the shadow of RSA. Melcher (1976) regards RSA as the "manifestation of the mechanisms which regulate the performance of the heart." He also refers to RSA as it "represents an adjustment of the heart rate to cyclic changes in the preload of the heart. This adjustment allows the heart to increase its output by increasing its rate and prevents the systemic arterial baroreflex from counteracting the tachycardia." He explains the cardiac reflexes which control the heart rate during the respiratory cycle by "the reflex control of the heart rate elicited from the heart itself." By examining Jex and Clement's (1979) arguments with Melcher's findings, it can be concluded that they underestimate the importance of heart rate variability as an independant factor from respiration and RSA. Luzak and Laurig's ( 1973) conclusion is in accordance with this argument: "Therefore, those theories of respiratory arrhythmia that advocate a dependency of heart rate variability on respiration alone cannot totally explain the phenomenon."
Lacey (1967) also contends that the cardiac response patterns produced by external and internal attentional environments are independent of respiratory intluences. Looking at RSA, as Jex and Clement (1979) suggest, and deriving heart rate variability from it, can be misleading also, since, according to Hellman and Stacey (1976), RSA is
HR V and Mental Workload Assessment
111
age-dependent and "there is indeed an age-dependent degration of the mechanism producing sinus arrhythmia associated with respiration." Another misleading factor in studying breathing frequency as the sole cause of RSA, and consequently heartrate variability as reported by Jex ( 1979), is disregarding breath depth. Stroufe (197 I ) reported deep breathing produces faster, more variable heart rate while shallow breathing has the opposite effect. Also. Jex and Clement (1979) refer to the work of Sayers (1973) as the "exemplary summary OF past work." Due to this declaration by them, the argument which has been presented by Sayers ("...Imposing rnental workload on the subject provokes an effect on the cardiac inter-beat-interval signal") can be applied to them too. 3. REFERENCES Aasman, J . , Mulder, G . and Mulder, L.J.M. (1987). Operator effort and the measurement OF heart-rate variability. Human Factors, 29(2), 16 1-170. Blix, A S . , Stroinme, S.B. and Ursin, H. (1974). Additional heart rate - an indicator ot psychological activation. Aerospace Medicine, 45, 12 19- 1222.
Boycc, P.R. 177-183
(
1973). Sinus arrhythmia as a measure of mental load. Ereonomics, l7(2).
Brunia, C.H.M. and Diesfeldt, H. (1971). Onderdruking van de sinus aritmie tijdens een taak. Tiidschrift Voor Sociale Geneeskunde, 49(5), 130-132. Elliott, R. (1972). The significance of heart rate for behavior: A critique of Lacey's hypothesis. Journal of Personality in Social Psychology. 22, 398-409. Ettema. J . H . and Zielhuis R.L. Ergonomics, l4(1). 137-144.
(1971). Physiological parameters of mental load.
Firth, P.A. (1973). Psychological factors influencing the relationship between cardiac arrhythmia and mental load. k o n o m i c s , s ( l ) , 5-16. Gaume, J.G. and White, R.T. (1975). Mental Workload Assessment, I . Laboratory lnvestieation of Decision Making and Short-Term Memory 2 Multide-Task Situation. McDonnell Douglas Corporation, Long Beach, CA, Report No. DAC-I 1-75-R2 17. Gaume, J G . and White, R.T. (1975). Mental Workload Assessment, 11. Physiological Correlates of Mental Workload: Reports of Three Preliminary Laboratory Tests. McDonnell Douglas Corporation, Long Beach, CA, Report No. DAC-1 I-75-R2 17. Gaume, J C. and White, T . T . (1975). Mental Workload Assessment, 111. Laboratory Evaluation of One Subiective and T w o Physiological Measures of Mental Workload. McDonnell Douglas Corporation, Long Beach; CA, Report No.MDC-J702410 I . Graham, F.K. and Clifton, R.K. (1966). Heart-rate change as a component of orienting
N. Meshkari
112
response. Psychology _. Bulletin, 65, 305-320. Hacker, W., Plath, H.E., Richter, P. and Zimmer, K. (1978). Internal representation of task structure and mental load of' work: Approaches and methods of assessment. Ergonomics. 2 4 3 ) , 187-194. Hellman, J.B. and Stacey, R.W. (1976). Variation of respiratory sinus arrhythmia with age. Journal of Applied Phvsiolow. 4 l . 734-738. Heslegrave, R.J.. Ogilivie, J.C. and Furedy. J.J. (1979). Measuring base-line treatment differences in heart rate variability: Variance versus successive difference mean square and beat per minute versus interbeat intervals. PsychophysioloPy, l6, 151- 157. Hitchen, M., Brodie, D.A. and Harness, J.B. (1980). Cardiac responses to demanding mental load. Ereonomics, 23(4),379-385. Hopkin, V.D. (1979). General discussion based upon interactive group sessions. I n N. Moray (Ed.), Mental Workload: Its Theory and Measurement. New York: Plenum Press, 1979. 484-487. Hyndman, B.W. (1980). Cardiovascular recovery to psychological stress: A means to diagnose man and task? I n R.I. Kitney and 0. Rompelman (Eds.) The Study of Heart Rate Variability. Oxford: Clarendon Press, 19 1-224. Hyndman, B.W. and Gregory J.R. (1975). Spectral analysis of sinus arrhythmia during mental loading. Ereonomics, l 8 ( 3 ) ,255-270. Jex, H.R. and Clement, W.F. (1979). Defining and measuring perceptual-motor workload in manual control tasks. In N. Moray (Ed.) Mental Workload: Its Theory and Measurement. New York: Plenum Press, 125-277. Kalsbeek, J.W.H. 99-104.
(1973).
Do you believe in sinus arrhythmia?
Ersonomics,~6(1),
Kalsbeek, J.W.H. (1968). Measurement of mental workload and of acceptable load: possible application in industry. The International Journal of Production Research, ?(I), 33-45. Kalsbeek, J.W.H. and Ettema, J.H. (1964). Physiological and psychological evaluation of distractions stress. Proceedings of the 2nd International Congress on Ergonomics. Dortmund, West Germany, 443-447. Kalsbeek, J.W.H. and Ettema. J. (1963). Scored regularity of the heart rate pattern and the measurement of perceptual or mental load. Ergonomics, ti, 306. Kalsbeek, J.W.H. and Sykes. ' Psycholoeica, 27, 253-261,
R.N. (1967). Objective measurement of mental load. Aeta
Lacey,, 1. (1967). Somatic response pattering and stress: Some revisions of activation
I I3
HR V and Mental Workload Assessment
theory. In M.H. Appeley and R. Trumhall (Eds.), Psvchological Stress: Issues i n Research. New York: Appleton-Century-Crofts, 14-37. Lille, F., Pottier, M. and Scherrer, J . (1968). Intluence chez I’homrne des niveaux d’activite’ mentale sur les potentiels evoques. Revue Neuroloeiaue. 118,476-480. Luzak, H . and Laurig, W. (1973). An analysis of heart rate variability. 85-97.
Ergonomics,
16( I ) ,
Martin, I. and Venables, P.H. (1980). Techniques Sons.
~ I J PsvchoDhvsiology.
John Wile) and
Meers, A. and Verhaegen, P. ( 1972). Sinus arrhythmia. information trarisniission a n d 45-53. emotional tension. Psvchological Belgrade, = - I , Melcher, A. (l97ti). Supplemental, 435.
Respiratory sinus arrhythmia in man.
& PhysioloEica
:
Meshkati, N. (1983). A conceptual model of the assessnient of mental workload based upon individual decision styles. llnpublished Ph.D. dissertation. University of Southern California, Los Angeles. CA. Mulder, C . (1979). Sinus arrhythmia and mental workload. In N. Moray (Ed.), Mental Workload: I t s Theory Measurement. New York: Plenum Press, 327- 343. Mulder, G. and Mulder-Haj Onides van der Meulen, W.R.E.H. (1973). Mental load and the measurement of heart rate variability. Ereonomics, l6( l ) , 69-83. Opmeer, C.H.J.M. (1973). T h e information content of successive RR-Interval times in the ECG. Preliminary results using Factor Analysis and Frequency Analysis. Ergonomics, l6( I ) , 85-97. Opmeer, C.H.J.M. and Krol, J.P. (1973). Towards an objective assessment ot cockpit workload: Physiological variables during different flight phases. Aerospace Medicine, 44,527-532. Pasmooij, C.K., Opmeer, C.H.J.M. and Hyndman, B.W. (1976). Workload in air traffic control, a field study. In T.B. Sheridan and G. Johannsen (Eds.). Monitorinx Behavior and Supervisory Control. New York: Plenum Press, 107- 117. Rault, A. (1979). Measurement of pilot workload. In N. Moray (Ed.), Mental Workload: Its Theory and Measurement. New York: Plenum Press, 4 17- 422. Rault, A. (1976). Pilot workload analysis. In T.B. Sheridan and C. Johannsen (Eds.), Monitorinp Behavior and Supervisory Control. hew York: Plenum Press, 139- 155. Work measurement, psychological and Rhomert. W. and Laurig, W. (1971). physiological techniques for assessing operator and workload. International Journal for Production Research, ! I I )( , 157-168.
1 I4
N. Meslikati
Rhomert. W., Laurig, W , Phillip, V. and Luzak, H. (1973). Heart rate variability and workload measurement. Ergonomics, 16( I ) , 33-44. Robertson, M . M . and Meshkati, N . (1985). Analysis of t h e effects of two individual differences classification models on experiencing mental workload of a cornputergenerated task: A new perspective to job design and task analysis. Proceedings of the __ 29th Annual MeetinR 01 the Human Factors Society, Human Factors Society, Santa Monica, CA. Rompelman, O., Van Kainpen, W.H.A. and Backer, E. (1980). Heart rate variability in relation to psychological tactors. Ergonomics, 23( I2), 1 I0 I - I I 15. Sayers, B. McA. (1973). Analysis OF heart rate variability. Ergonomics, 16(1). 17-32 External and internal environments, 11. Sharit, J . and Salvendy, G. (1983). Reconsideration o t the relationship between sinus arrhythmia and information load. Ergonomics, 25(2). I 2 I - 132. Sheridan, T.B. and Stasscn, H.G. (1979). Definitions, models and measures of human workload. In N. Moray (Ed.), Mental Workload: Its Theory and Measurement. New York: Plenum Press, 219-233. Steptoe, A. (198I ) Academic Press.
Psychological Factors in Cardiovascular Disorders.
New York:
Strasser, H. (1979). Measurement of mental workload. In N. Moray (Ed.), Mental Workload: Its Theory and Measuerment. New York: Plenum Press, 345- 348. Strasser, H. (1977). Physiological measures of workload-correlations between physiological parameters and operational performance. ACARD-CP-2 16, (A8- I - A8-8. Stromes, S . , Wilkeby, P., Blix, A S . and Ursin, H. (1978). Additional heart rate. In H. Ursin, E. B a d e and S. Levine (Eds.), Copine: &I I A Studv Human Psychophysiology. New York: Academic Press. Ursin, H and Ursin, R. (1979). Physiological indicators of mental workload. In N. Moray (Ed.), Mental Workload: I t s Theory Measureme-nl. New York, Plenum Press, 349-364. Vicente, K.J., Thornton, D.C. and Moray, N. (1987). Spectral analysis of sinus arrhythmia: A measure of mental effort. Human Factors, 29(2), 171- 182. Welford, A.T. ( 1959). Evidence of a single-channel decision mechanism limiting performance in a serial reaction task. Quarterly Journal of Experimental Psychology, 1. 193. Wildervanck, C., Mulder. G. and Michon, J.A. (1978). Mapping mental load in car driving. Erponomics, a ( 3 ) ,225-229.
HR V and Mental Workload Assessment
115
Zwaga, H.J.G. ( 1973). Psychophysiological reactions to mental tasks: Effort or stress? Ergonomics, l6, 6 1-67,
This Page Intentionally Left Blank
HUMAN MENTAL WORKLOAD P.A. Hancock and N. Meshkati (Editors) Elsevier Science Publishers B.V. (North.Holland), 1988
I I7
MEASURING MENTAL FATIGUE I N NORMAL DAILY WORKING ROUTINES J. Aasman, A.A. W i j e r s , G. Mulder, L.J.M. Mulder I n s t i t u t e f o r Experimental Psychology & T r a f f i c Research Center U n i v e r s i t y o f Groningen Groni ngen The Nether1 ands The p r e s e n t experiment i n v e s t i g a t e d t h e e f f e c t s o f w o r k l o a d and work s t r e s s , p o s s i b l y p r e s e n t i n t h e o c c u p a t i o n o f c i t y b u s d r i v e r , on mental e f f i c i e n c y and p h y s i o l o g i c a l s t a t e . 27 b u s d r i v e r s served as s u b j e c t s i n s h o r t l a b o r a t o r y s e s s i o n s on w o r k i n g days and d a y s - o f f . I n t h e s e s e s s i o n s we measured performance on a number o f s t a n d a r d i z e d l a b o r a t o r y t a s k s (so c a l l e d ’QRST’ t a s k s ) , and i n a d d i t i o n r e c o r d e d a number o f p h y s i o l o g i c a l v a r i a b l e s ( b l ood-pressure, h e a r t r a t e , h e a r t - r a t e v a r i a b i l i t y ) . The r e s u l t s showed e f f e c t s o f workload on t a s k performance and p h y s i o l o g y . However, i t appeared t o be d i f f i c u l t t o s e p a r a t e t h e e f f e c t s o f t i m e - o f - d a y f r o m t h e e f f e c t s o f workload a c c u m u l a t i n g d u r i n g t h e day. N e v e r t h e l e s s we t e n t a t i v e l y concluded t h a t w o r k l o a d r e s u l t e d i n l e s s e f f e c t i v e and e f f i c i e n t mental t a s k performance.
1. INTRODUCTION Jahns (1973) has argued t h a t mental w o r k l o a d i n v o l v e s a t l e a s t t h r e e m a j o r components: I n p u t l o a d -> O p e r a t o r E f f o r t -> Performance. The i n p u t l o a d c o n s i s t s o f t h e e n v i r o n m e n t a l and t a s k demands p l a c e d on t h e o p e r a t o r . Human o p e r a t o r e f f o r t r e f l e c t s t h e o p e r a t o r ’ s r e a c t i o n t o t h e i n p u t l o a d . The amount o f e f f o r t i n v e s t e d by s u b j e c t s i s d e t e r m i n e d by i n t e r n a l g o a l s , m o t i v a t i o n , t a s k c r i t e r i a adopted, much l i k e t h e d e c i s i o n c r i t e r i o n parameter, i n s i g n a l d e t e c t i o n t h e o r y ( V i c e n t e , T h o r n t o n and Moray, 1987). The i n t e n s i t y o f e f f o r t i s pr-+bably one o f t h e most i m p o r t a n t components o f mental workload. The f i n a l stage i s t h e l e v e l o f performance achieved by t h e u s e r - machine system. I n l a b o r a t o r y c o n d i t i o n s i t has been shown t h a t p h y s i o l o g i c a l i n d i c e s such as t h e a m p l i t u d e o f components i n b r a i n evoked p o t e n t i a l s , p u p i l d i l a t a t i o n and h e a r t r a t e v a r i a b i l i t y a r e s e n s i t i v e t o t h e c o g n i t i v e demands o f mental t a s k s (see Mulder, 1986 f o r a r e v i e w ) . I n a r e c e n t s t u d y Aasman, Mulder and Mulder (1987) showed t h a t h e a r t r a t e v a r i a b i l i t y (HRV) s y s t e m a t i c a l l y decreased as t h e l o a d on w o r k i n g memory i n c r e a s e s and t h a t HRV i s m a i n l y s e n s i t i v e t o r e s o u r c e - l i m i t e d processes and i n s e n s i t i v e t o d a t a - 1 i m i t e d processes (Norman and Bobrow, 1975). I f t h e
I 18
J. Aasman e l al.
demands i n c r e a s e beyond t h e l i m i t s o f w o r k i n g memory, t h e s u b j e c t s g i v e up co ping w i t h t h e t a s k and t h i s i s v i s i b l e i n a i n c r e a s e i n h e a r t r a t e v a r i a b i l i t y . An i m p o r t a n t aspect o f mental e f f o r t i s e f f i c i e n c y . Mental e f f i c i e n c y r e f e r s t o t h e amount o f e f f o r t t h e s u b j e c t has t o i n v e s t i n o r d e r t o keep performance w i t h i n a c c e p t a b l e l i m i t s . I f more e f f o r t has t o be i n v e s t e d mental e f f i c i e n c y decreases, i f t h e same l e v e l o f t a s k performance can be achieved w i t h l e s s e f f o r t , mental e f f i c i e n c y i s s a i d t o inc re as e. The p r e s e n t s t u d y aims t o d i s c o v e r t h e presence o f mental f a t i g u e i n a d a i l y - l i f e task: c i t y busdriving. However, i t has been p a r t i c u l a r l y d i f f i c u l t t o demonstrate a f t e r e f f e c t s ( i n terms o f l o n g - t e r m impairment on o t h e r t a s k s ) o f prolonged work. Hold ing(1 983) suggests a number o f p o s s i b l e reasons f o r t h i s . F i r s t , change p e r se appears t o p l a y a s i g n i f i c a n t r o l e i n overcoming e f f e c t s o f f a t i g u e . Second, s u b j e c t s may be a b l e t o compensate f o r a r e d u c t i o n i n t a s k performance f o r example by choosing a n o t her s t r a t e g y , and f i n a l l y and r e l a t e d t o t h e second p o i n t , H o l d i n g argues t h a t t e s t s o f a f t e r e f f e c t s have n o r examined t h e most c e n t r a l f e a t u r e o f t h e t o n i c f a t i g u e s t a t e , t h a t o f a v e r s i o n t o e f f o r t . Many o f t h e e f f e c t s o f prolonged work may be seen i n terms o f l e s s a c t i v e c o n t r o l over b e h a v i o r and t h e s e l e c t i o n o f easy b u t r i s k y a l t e r n a t i v e s (Hockey,1986). However, t h e a n a l y s i s o f p o s s i b l e mental f a t i g u e e f f e c t s i n r e a l t a s k s i s s e v e r e l y hampered by a n o t h e r e f f e c t : d i u r n a l v a r i a t i o n i n performance. Many s t u d i e s have shown v a r i a t i o n i n t h e e f f i c i e n c y o f performance over t h e normal w ork in g day ( s e e Folkhard, 1983 f o r a r e v i e w ) . One o f t h e main problems w i t h a l l t h e s e s t u d i e s i s t h a t t h e y f a i l e d t o separat e t i m e - o f day e f f e c t s from t h o s e o f f a t i g u e . O f p a r t i c u l a r i n t e r e s t f o r t h e present study are t h e studies o f Kleitman (1963) and Blake (1967). K l e i t m a n found e v i d e n ce o f a peak i n performance i n t h e m i d d l e o f t h e day ( n o r m a l l y d u r i n g t h e a f t e r n o o n ) on t a s k s such as RT, c a l c u l a t i o n s , and o t h e r s i n v o l v i n g r a p i d d e c i s i o n making. I f t h i s o b s e r v a t i o n i s v a l i d , t h e n a d r o p i n performance a t t h i s moment o f t h e day d u r i n g a normal w o r k i n g day may i n d i c a t e t h a t t h e e f f e c t s o f f a t i g u e a r e s t r o n g e r t h a n t h e e f f e c t s o f t i m e - o f day. F i v e o f t h e s i x t a s k s used b y B lak e show a gen e r a l r i s e i n performance t h r o ugh t h e day. These t a s k s , v i g i l a n c e , c a r d s o r t i n g , s e q u e n t i a l responding, l e t t e r c a n c e l l a t i o n , and calculation, a l l r e q u i r e speeded d e c i s i o n making f o r e f f e c t i v e performance. The o n l y t a s k which d i d n o t show t h i s t r e n d was d i g i t span, a t a s k i n v o l v i n g a component o f w o r k i n g memory. The performance on t h i s t a s k was o p t i m a l around 1O:OO and g r a d u a l l y decreased a f t e r t h a t t ime.
Hockey and Calquhoun (1972) suggested t h a t t h e a f t e r n o o n and evening s u p e r i o r i t y only applied t o tasks r e q u i r i n g f a s t processing with l i t t l e o r no " h o l d i n g " re q u i r e m e n t . Tasks which i n v o l v e b o t h speeded p r o c e s s i n g and a h i g h dependence on t h e use o f w o r k i n g memory d i s p l a y a t i m e - o f - d a y e f f e c t w i t h a peak i n t h e m i d d l e o f t h e day,- suggest ing a compromise between t h e two "pure" forms (Hockey,1986).
Mental Fatigue in Normal Daily Working Routines
1 I9
Together t h i s suggests t h a t a decrease i n performance o v e r t h e day i n t a s k s r e q u i r i n g speeded performance and which a r e c a r r i e d o u t i m m e d i a t e l y a f t e r w o r k i n g c o n d i t i o n s , i n d i c a t e s t h e presence o f f a t i g u e . S i m i l a r l y , a l e s s o p t i m a l performance on t a s k s r e q u i r i n g b o t h speeded p r o c e s s i n g and immediate memory a t t h e m i d d l e o f t h e day a g a i n suggest e f f e c t s o f f a t i g u e due t o t h e p r e c e d i n g w o r k i n g c o n d i t i o n s . There a r e a l s o t i m e - o f - d a y e f f e c t s i n p h y s i o l o g i c a l a c t i v i t y . O f p a r t i c u l a r i n t e r e s t i n t h e p r e s e n t s t u d y a r e changes i n p u l s e r a t e . H i l d e b r a n d t (1961) showed t h a t t h e average d a i l y p u l s e r a t e i s h i g h e s t around 1O:OO (66 beats p e r m i n u t e i . e . a mean c a r d i a c i n t e r v a l t i m e 952 msec), i s l o w e s t a t about 12:OO (62 b e a t s p e r m i n u t e , i . e a mean c a r d i a c i n t e r v a l t i m e o f 952 msec). H e a r t r a t e t h e n e i t h e r s t a b i l i z e s u n t i l 18:OO o r i n c r e a s e s somewhat. T h i s suggests t h a t a decrease i n h e a r t - r a t e , e s p e c i a l l y a f t e r t h e m i d d l e o f t h e day can n o t be t h e r e s u l t o f a t i m e o f day e f f e c t . I n t h e p r e s e n t paper an a t t e m p t i s made t o measure p o s s i b l e e f f e c t s o f w o r k l o a d and work s t r e s s u s i n g s t a n d a r d l a b o r a t o r y t e c h n i q u e s a p p l i e d b e f o r e and a f t e r work. The t a s k s a r e a p p l i e d a f t e r s e v e r a l h o u r s o f work and r e q u i r e speeded p r o c e s s i n g , immediate memory and t h e a b i l i t y t o t i m e share. D u r i n g t h e s e t a s k s and r e s t p e r i o d s h e a r t r a t e i s r e c o r d e d . We s h a l l t r y t o d e t e r m i n e p o s s i b l e e f f e c t s o f mental f a t i g u e . The p r e s e n t experiment i s p a r t o f a l o n g s e r i e s o f s t u d i e s on t h e causes and consequences o f o c c u p a t i o n a l s t r e s s . The u l t i m a t e aim i s t o o b t a i n a database i n v o l v i n g d i f f e r e n t o c c u p a t i o n s . C o m p a r a b i l i t y can o n l y be achieved i f measurement t e c h n i q u e s a r e k e p t c o n s t a n t a c r o s s t h e d i f f e r e n t o c c u p a t i o n s . I n o u r approach we d e c i d e d t o measure b e f o r e and a f t e r work p e r i o d s u s i n g n e u r o - e n d o c r i n e , p h y s i o l o g i c a l , s u b j e c t i v e and performance measures. C i t y b u s d r i v e r s seem t o e x p e r i e n c e an unusual burden o f o c c u p a t i o n a l s t r e s s as i s evidenced by t h e r e average absenteeism (17-25%)), o r t w i c e t h e Dutch y e a r l y i n d u s t r i a l mean. An e p i d e m i o l o g i c a l s t u d y o f abzenteeism and t u r n o v e r o v e r a 1 5 - y e a r p e r i o d (1964-1978) r e v e a l e d t h a t o n l y one o u t o f t e n d r i v e r s l e a v i n g t h e company d u r i n g t h a t p e r i o d had reached t h e o f f i c i a l r e t i r e m e n t age, 60 y e a r s . S i x o u t o f t e n d r i v e r s r e t i r e d e a r l y f o r reasons o f medical d i s a b i l i t y , a t an average age o f 47 (Meyman e t a l . 1983; Mulders e t a1 1982).
.
The experiment w h i c h w i l l be r e p o r t e d here, was conducted i n 1983/1984 w i t h 27 b u s d r i v e r s as s u b j e c t s . E f f e c t s o f o c c u p a t i o n a l w o r k l o a d and s t r e s s were assessed on t h r e e d i f f e r e n t l e v e l s o f measurement. F i r s t , i t was i n v e s t i g a t e d how e f f e c t s o f work were e x p e r i e n c e d on t h e s u b j e c t i v e l e v e l . The s u b j e c t s completed a number o f s e l f - r e p o r t s c a l e s . Second, as mentioned above, we t r i e d t o e s t i m a t e mental e f f i c i e n c y by t a s k performance on a number o f l a b o r a t o r y t a s k s . A d d i t i o n a l i n f o r m a t i o n about t h e e f f i c i e n c y o f f u n c t i o n i n g on t h e b e h a v i o r a l l e v e l was o b t a i n e d by c l o s e l y m o n i t o r i n g t h e d r i v i n g performance o f t h e s u b j e c t s d u r i n g t h e i r
J. Aasmaii el al.
120
daily duty. Third, we evaluated effects of work on the physiological 1 evel by measuring hormonal level s in urine samples, blood pressure and cardiovascular variables. The present paper describes part of this vast data-set, namely the performance measures and physiological variables (blood-pressure, heart rate, heart-rate variability) obtained in the laboratory sessions. The effects on other measures (driving performance, sel f-report scales, hormonal 1 evel s) wi 1 1 be reported el sewhere. Using performance measures as index of mental fatigue requires a number of precautions: First, the tasks should be well known in terms of the processes and strategies involved. We designed the tasks after Massaro (1975; see Mulder and Mulder, 1981a). Secondly, the subjects should be well practiced in order t o avoid a confounding of the possible effects o f fatigue with practice. Finally, the whole procedure should not interfere too much with the daily working routines, otherwise wide applicability cannot be expected and the measures will not anymore reflect the effects of work alone. Stimulus presentation is shown in fig. 1. MEMORY-SET ( 8 sac
1
DISPLAY.SETS ( 4 s e c each)
I
I
I
TOTAL FREQUENCY OF LETTER OCCURRENCE 14
sec each,
1
1 FREQUENCY ENCY
LOAD. 4
Q ?
QRST
‘\ I
40 presentations; alter each pre.
sentation subjects responded by pushing either a “ W S ” - M ”no”. button.
R? I
num~r,cal response on key.board,
Fig 1. Schematic representation o f the QRST task (Figure: Pruyn,1986). Before each block of stimulus frames a memory set is presented, i.e. a to-be-memorized target set, consisting of a variable number o f letters (1-5), dependent on condition. Then a series of stimulus frames, consisting of a single letter, is presented. Half of the stimulus frames consist o f targets (letters from the memory set), half consist of distractors (nontargets). O f the nontarget frames, 60% consist of a distractor letter and 40% consist o f a distractor digit. Since the memory set always contained only letters, and digits were always distractors, this provides a consistent mapping (CM) situation (Schneider Shiffrin, 1977). Consistent mapping conditions facilitate automatic processing. Since there is a a priori difference in features between letters and digits, the subjects will only minimally process the digits. Already at an early level o f evidence digits can be rejected as being a nontarget. I%
Mental Fatigue in Normal Daily Working Routines
121
A c c o r d i n g t o Schneider, Dumais and S h i f f r i n (1984) a u t o m a t i c p r o c e s s i n g i s l e s s a f f e c t e d by changes i n s t a t e . Consequently we s h o u l d expect a s m a l l e r e f f e c t o f work on t h e RT on d i g i t d i s t r a c t o r s t h a n on l a t t e r d i s t r a c t o r s , which r e q u i r e c o n t r o l l e d p r o c e s s i n g . The s u b j e c t s ' t a s k was t o push as f a s t as p o s s i b l e a ' Y E S ' - b u t t o n on t a r g e t t r i a l s and a 'NO'-button on n o n t a r g e t t r i a l s . U n t i l now we d e s c r i b e d t h e s i m p l e c o n d i t i o n s o f t h e QRST-task, w h i c h i s a memorysearch t a s k ( S t e r n b e r g , 1969; Schneider & S h i f f r i n , 1977). T h i s t a s k i s t h o u g h t t o be composed o f t h e f o l l o w i n g elementary processes: 1) encoding o f s t i m u l u s frames 2) memory search f o r t h e presence o f p r e s e n t e d l e t t e r s 3) d e c i s i o n whether o r n o t t a r g e t was p r e s e n t and 4) s e l e c t i o n , p r e p a r a t i o n and e x e c u t i o n o f m o t o r response. I n dual t a s k c o n d i t i o n s , t h e r e was a secondary t a s k i n a d d i t i o n t o YES-NO r e s p o n d i n g . T h i s t a s k c o n s i s t e d o f k e e p i n g a r u n n i n g mental c o u n t o f t h e number o f t i m e s each memory s e t l e t t e r was presented. A f t e r each b l o c k o f s t i m u l i t h e s u b j e c t had t o r e p o r t how o f t e n each member o f t h e memory s e t was presented. I n t h e s e c o n d i t i o n s t h e processes 5) remembering c o u n t e r s 6) u p d a t i n g c o u n t e r s , had t o be t i m e - s h a r e d w i t h t h e processes mentioned above. It was s t r e s s e d t o t h e s u b j e c t s t h a t b o t h t a s k s were e q u a l l y i m p o r t a n t . T h i s t i m e - s h a r i n g aspect poses a heavy l o a d on w o r k i n g memory. F i g u r e s 2 and 3 show t h e t y p i c a l p a t t e r n o f r e s u l t s o b t a i n e d w i t h t h i s k i n d o f t a s k (Aasman e t a l . , 1987).
I n t h e s i m p l e t a s k c o n d i t i o n s , an i n c r e a s e i n RT as a f u n c t i o n o f memory l o a d i s found, due t o an i n c r e a s e i n t h e d u r a t i o n o f t h e memory search process. TASKS YITHOUT COUNTING REACTION
(SIMPLE)
TASKS Y I T H COUNTING (DOUBLE)
TIME (HSECI
loo(
901
800
_-
,NO
700
600
F i g 2. R e a c t i o n t i m e s , R e a c t i o n t i m e e r r o r s (PE) and C o u n t i n g E r r o r s (CE) f o r 8 s u b j e c t s i n t h e QRST t a s k (Aasman e t a1.1987)
122
J. Aasman et al.
ENERGY BP BAND
+ L O A D
Fig 3. Inter Beat Interval times (IBI) and energy in the bloodpressure related band of the powerspectrum for 8 subjects in the QRST task (Aasman et al. 1987). YES-responses are faster than NO-responses, a t W i butable to the process o f binary decision, but there is no interaction between load and YES-NO responding. Altogether, this pattern o f results is taken as evidence for serial exhaustive memory search (Sternberg, 1969; Schneider & Shiffrin, 1977; Treisman & Gelade, 1980). RT to nontarget digits (RT-OUT) does not increase significantly with memory load. This has been taken as evidence for automatic processing (Schneider & Shiffrin, 1977). No effects of load on heart-rate variability were found in these simple task conditions. In dual task conditions the slopes of the functions relating RT to load are much steeper than in the simple conditions, especially for YESresponses. This shows that the time-sharing aspects of these tasks (memorizing counters, and especially updating of counters involved in YES-responding) heavily interferes with the process of memory search. Therefore, the difference in RT between simple and dual task conditions is an index of ’time-sharing capability’. If this difference is small, subjects are able to maintain a high level o f performance, despite the heavy load that time-sharing imposes on working memory. Reaction time errors (pressing the wrong response-button,i.e making a Pressing Error, PE) and Counting Errors (reporting the wrong number of target occurrences: CE) show a continuous increase as a function of memory load. In dual-task conditions, heart rate variability (especially the 0.10 Hz component of the cardiac interval spectrum) decreases strongly with increases in memory load, showing that time-sharing
Mental Fatigue in Normal Daily Working Routines
123
r e q u i r e s increased mental e f f o r t . I f mental f a t i g u e i s present a f t e r several hours d r i v i n g a bus, then i t should become evident as a decrease i n ’time-sharing c a p a b i l i t y ’ , and/or i n a general increase o f RT, and/or i n an increased number o f e r r o r s . I t i s a l s o p o s s i b l e t h a t n e i t h e r o f these e f f e c t s occur because t h e subject invested more e f f o r t . I n t h a t case the 0.10 Hz component should be more suppressed a f t e r than before the working period. I t i s also conceivable t h a t d r i v e r s w i t h a high sequence o f sickness (a high sickness r a t e ) are more vulnerable than d r i v e r s w i t h a low sequence. I n order t o t e s t t h i s hypothesis we used 27 d r i v e r s d i v i d e d i n t h r e e d i f f e r e n t subgroups w i t h a high, medium and low sickness r a t e respectively
.
The l a s t question concerned the e f f e c t o f the preceding day. The question was how f a s t possible negative e f f e c t s o f workload on mental e f f i c i e n c y and/or physiology ’ b u i l d - u p ’ and decrease. D r i v e r s were i n v e s t i g a t e d on two working-days and two days-off. For one o f these days t h e preceding day was a working day, f o r the other i t was a d a y - o f f . I f there i s b u i l d - u p o f e f f e c t s o f workload, working days preceded by working days should show the l a r g e s t aversive e f f e c t s . I f t h e r e i s abatement, the f i r s t d a y - o f f should show more residual aversive e f f e c t s o f workload than the second d a y - o f f 2. METHOD 2.1 Subjects
.
Twenty seven busdr vers were selected from the l o c a l DODU a t ion ( ~ 2 2 0 ) A l l volunteered t o become subjects when they were i n d i v i l i a l l y asked t o p a r t i c i p a t e . They were 30-45 years o f age and had worked more than 5 years as d r i v e r s f o r the same company. Three d i f f e r e n t groups o f subjects were chosen, on basis o f frequency o f absenteeism f o r medical reasons i n t h e preceding year. I n computing t h i s frequency o n l y s h o r t periods o f absence ( l e s s then 2 weeks) were considered. The f o l l o w i n g groups were chosen: Low sickness-rate group (LS): l e s s than 15 calendar days. Medium sickness-rate group (MS): more than 15 and less than 60 days. (HS): 60 and more days. High sickness-rate group 2.2.1 Design - general aspects A schematic representation o f the design i s shown i n f i g . 4a. A week before the experiment we t r a i n e d the subjects 2 1/2 hours on t h e tasks and experimental procedures t o minimize l e a r n i n g e f f e c t s . The actual experiment consisted o f f o u r experimental days, two working days and two days-off. Each subject was i n v e s t i g a t e d on a working day f o l l o w i n g another working day (Wl) and a working day f o l l o w i n g a d a y - o f f
J. Ausniun et al.
124
(W2). On working days we t e s t e d the d r i v e r s i n t h r e e 20 minute sessions: a morning session (08.40 h), a midday session (12.40 h) and an evening session (17.40). Between the morning and the midday sessions, and between t h e midday and t h e evening sessions t h e subjects worked 3 . 5 hours. We a l s o i n v e s t i g a t e d a d a y - o f f a f t e r a working day ( F l ) and a day- o f f a f t e r another d a y - o f f (F2). On d a y s - o f f t h e subjects were t e s t e d once, a t 13.00 h. These f o u r days were balanced.
TRAINING-DAY
WORKING-DAYS
RELIEF-DAYS APTER
MTEB
WRYING
RELIEF
9.10
BLOODPRESSURE MEASUREMENT
L I,
REST
( 3 minutes)
TASK
4 DOUBLE
11.110
4 SIMPLE
BLOODPRESSURE M E A S U R M E N T
F i g 4. L e f t : The general design. Except f o r t h e t r a i n i n g day a l l boxes represent 20 minute l a b o r a t o r y sessions. The r i g h t f i g u r e shows t h e design o f a 20 minute session. 2 . 2 . 2 Design - 20 minute sessions Fig. 4b shows the design o f a 20 minute session. A session s t a r t e d w i t h a blood pressure measurement w i t h normal arm-cuff method. Subjects were seated before t h e video monitor and ECG electrodes connected t o a r e g i s t r a t i o n u n i t . A session consisted o f f o u r tasks and two r e s t periods. Each t a s k and r e s t p e r i o d l a s t e d 3 minutes. The order i n which t h e subjects performed t h e counting task and dual task was balanced, the simple tasks were always performed a t t h e times shown i n the f i g u r e . The task conditions w i l l be explained i n the next s e c t i o n ( 2 . 3 . 1 ) . 2 . 3 . 1 S t i m u l i and task c o n d i t i o n s Before each block o f 40 stimulus frames, a memory set o f 2 o r 4 l e t t e r s was presented f o r x seconds. This memory set consisted o f f o u r successive l e t t e r s from the alphabet. Stimulus frames consisted o f a s i n g l e l e t t e r (randomly chosen from t h e alphabet) o r d i g i t (randomly chosen from t h e
Mental Fatigue in Normal Daily Working Routines
12s
s e t o f d i g i t s 1 - 9 ) p r e s e n t e d f o r 3.5 sec, f o l l o w e d by a f i x a t i o n d o t f o r .5 sec. 50% o f t h e s t i m u l u s frames were t a r g e t s , 50% were n o n t a r g e t s . O f t h e n o n t a r g e t s 40% were d i g i t s and 60% were l e t t e r s . I n s i m p l e t a s k c o n d i t i o n s , t h e s u b j e c t s had t o p r e s s a YES-button as f a s t as p o s s i b l e when t a r g e t s were presented, and a NO-button when n o n t a r g e t s were presented. I n t h e c o u n t i n g c o n d i t i o n , no o v e r t response was r e q u i r e d , b u t a f t e r each b l o c k , t h e s u b j e c t w r o t e down how o f t e n each t a r g e t l e t t e r was p r e s e n t e d (e.9. Q=6, R=4, S=7, T=3). I n t h e d u a l t a s k c o n d i t i o n , s u b j e c t s s i m u l t a n e o u s l y performed b o t h t a s k s . I t was s t r e s s e d t h a t b o t h t a s k s were e q u a l l y i m p o r t a n t . T h i s a l l r e s u l t e d i n t h e f o l l o w i n g f o u r t a s k conditions: 2s 4s 4C 4D
-
Simple c o n d i t i o n , memory l o a d 2. Simple c o n d i t i o n , memory l o a d 4. Count c o n d i t i o n , memory l o a d 4. Dual t a s k c o n d i t i o n , memory l o a d 4.
2.4.1 Dependent v a r i a b l e s (Performance) I n t h e r e a c t i o n t i m e t a s k and i n t h e d u a l t a s k we computed r e a c t i o n t i m e s f o r t a r g e t s (RT-YES), n o n t a r g e t l e t t e r s (RT-no) and n o n t a r g e t d i g i t s (RT-out). R e a c t i o n t i m e e r r o r s were computed as t h e t o t a l number o f t i m e s t h e s u b j e c t s pressed t h e wrong b u t t o n . I n t h e d u a l t a s k and c o u n t i n g t a s k c o u n t i n g e r r o r s were o b t a i n e d b y computing t h e a b s o l u t e d i f f e r e n c e between r e p o r t e d and a c t u a l number o f t a r g e t s . 2.4.2 Dependent v a r i a b l e s ( C a r d i o v a s c u l a r i n d i c e s ) ECG was r e c o r d e d d u r i n g l a b o r a t o r y sessions f r o m p r e c o r d i a l e l e c t r o d e s and s t o r e d on magnetic tapes. R-R i n t e r v a l t i m e s were o b t a i n e d by a Schmidt T r i g g e r i n g D e v i c e and a PDP 11/34 computer. S p e c t r a l a n a l y s i s was performed on t h e s e d a t a (Mulder, L., 1985). F o r each 3 - m i n u t e t a s k o r r e s t p e r i o d we computed t h e mean i n t e r - b e a t i n t e r v a l t i m e ( I B I ) and t h e s p e c t r a l energy i n t h e .06-.14 Hz range (SP-BP). T h i s measure r e f l e c t s heart-rate v a r i a b i l i t y related t o the short-term regulation o f a r t e r i a l b l o o d - p r e s s u r e (Axel rod, Gordan, Ubel , Shannon, Barger and Cohen, 1981). I t i s well-known t h a t t h e s e c a r d i o v a s c u l a r v a r i a b l e s may show s u b s t a n t i a l i n d i v i d u a l d i f f e r e n c e s and v a r i a t i o n s o v e r t i m e (Mulder,G., 1980). T h e r e f o r e , i n some analyses, t h e mean v a l u e s o v e r b o t h r e s t p e r i o d s were used as a b a s e l i n e v a l u e ; d i f f e r e n c e s between t a s k c o n d i t i o n s and r e s t p e r i o d s p r o v i d e an e s t i m a t e o f t a s k - r e l a t e d changes i n p h y s i o l o g y , independent o f i n d i v i d u a l d i f f e r e n c e s and time-dependent f l u c t u a t i o n s . 2.5
Data a n a l y s i s
The f o l l o w i n g planned comparisons between t a s k s and r e s t p e r i o d s were performed :
A comparison o f s p e c i a l i n t e r e s t i s c l , which r e f l e c t s , as mentioned i n t h e i n t r o d u c t i o n t h e degree o f ’ t i m e - s h a r i n g c a p a b i l i t y ’ . Comparisons c 4 - c 6 r e f l e c t d i f f e r e n c e s i n p h y s i o l o g y r e l a t e d t o t a s k performance (see 2.4.2). We performed ANOVAs (SPSS) on f o u r d i f f e r e n t
J. Aasman et al.
126
designs . Sickness r a t e c o n s t i t u t e d t h e Between-Subject f a c t o r , t h e o t h e r v a r i a b l e s W i t h i n - S u b j e c t f a c t o r s . (i)Average working-day versus d a y - o f f - 3 x 2 x 2 x 2 de s i g n . T h i s d e s i g n i n v e s t i g a t e s t h e e f f e c t s o f workload as t h e d i f f e r e n c e between w o r k i n g days and d a y s - o f f . The average performance and p h y s i o l o g y on w o r k i n g days (morning, midday and evening sessio ns ) i s compared t o d a y s - o f f (mid-day s e s sion o n l y ) . T able 1. comparison dependent v a r i a b l e s .............................................................. [RTs, R T- e r r o r s , SP-BP and I B I ] c l : 4s - 4D ~ 2 4C : - 4D [ c o u n t i n g e r r o r s ,SP-BP and I B I ] c3: 4s - 4c [SP-BP and I B I ] [SP-BP and I B I ] c4: Rb - a l l t a s k s ~ 5 Rb : - 4s [SP-BP and I B I ] ~ 6 Rb : - D [SP-BP and I B I ] Rb i s t h e mean o f b o t h r e s t p e r i o d s . Between:
-
s i c k - r a t e group. w i t h i n : - work vs d a y - o f f e f f e c t o f t h e p r e c e d i n g day ( w o r k i n g day (WI) vs d a y - o f f (W2)) comparisons between t a s k s ( c l - c 6 )
( i i ) Sessions on w o r k i n g days v e r s u s d a y - o f f - 3 x 2 x 2 x 2 design. I n t h i s d e s i g n t h e morning, midday and e v e n i n g s e ssions on w o r k i n g days a r e compared s e p a r a t e l y t o t h e d a y - o f f s e s s i o n . ( i i i ) D i f f e r e n c e s between sessio ns on w ork in g days - 3 x 3 x 2 x 2 d e s i g n. I n t h i s d e s i g n morning, midday and evening s e s s i o n s a r e compared t o each o t h e r . between: within :
-
s i c k - r a t e group s es s ions - 3 l e v e l s - morning midday o r evening e f f e c t o f p r e c e d i n g day (W1 vs W2) - comparisons between t a s k s ( c l - c 6 )
( i v ) E f f e c t s w i t h i n d a y s - o f f - 3 x 2 x 2 d e sign. T h i s d e s i g n compares w i t h i n d a y s - o f f t h e e f f e c t s o f t h e p r e c e d i n g day ( d a y - o f f (F2) o r working day ( F 1 ) ) . between:
-
s ic k nes s r a t e group p re c edin g day (F1 vs F2) comparisons between t a s k s ( c l - c 6 )
3. RESULTS
I n o u r design, c o n d i t i o n S2 formed an e x c e p t i on, i n t h a t t h i s t a s k was n o t balanced w i t h i n a session; i t was always performed as t h e f i r s t t a s k w i t h i n a session. We have reasons t o b e l i e v e t h a t t h i s i s t h e reason t h a t t h e t a s k showed some a b e r r a t i o n s f r o m t h e p a t t e r n o f r e s u l t s i n t h e r e s t o f t h e dat a .
Mental Fatigue in Normal Daily Working Routines
127
To l i m i t t h e l e n g t h o f t h i s s e s s i o n , and t o a v o i d h a v i n g t o d e s c r i b e a t o o complex p a t t e r n o f r e s u l t s , i t was t h e r e f o r e d e c i d e d t o o m i t t h i s t a s k from t h e present discussion. The r e s u l t s a r e d i v i d e d i n t o f o u r s e c t i o n s . The f i r s t s e c t i o n p r o v i d e s an o v e r a l l comparison between t h e d i f f e r e n t t a s k c o n d i t i o n s ( s i m p l e , c o u n t and d u a l t a s k s ) . The second s e c t i o n d e a l s w i t h e f f e c t s o f workload. The d i f f e r e n t sessions w i t h i n w o r k i n g days a r e compared, and t h e d i f f e r e n c e between w o r k i n g days and d a y s - o f f . The t h i r d s e c t i o n d i s c u s s e s e f f e c t s o f t h e p r e c e d i n g day ( w o r k i n g day o r day-off). The f i n a l s e c t i o n d e a l s w i t h d i f f e r e n c e s between t h e t h r e e s i c k n e s s r a t e groups. 3.1
Effects o f task conditions
3 . 1 . 1 Performance Table 2 shows t h a t t h e r e was a c o n s i d e r a b l e i n c r e a s e i n RTs from t h e s i m p l e t a s k c o n d i t i o n 4s t o t h e d u a l t a s k c o n d i t i o n , 40. T h i s i n c r e a s e i s l a r g e s t f o r t h e YES-responses, and s m a l l e r f o r n o n t a r g e t d i g i t s t h a n f o r n o n t a r g e t l e t t e r s ( w i t h t h e e x c e p t i o n o f t h e d a y - o f f s e s s i o n ) . The number o f RT e r r o r s a l s o increased. Comparison c l (see t a b l e 1) showed s i g n i f i c a n t e f f e c t s f o r a l l performance v a r i a b l e s ( a l l p<.OOl). RT t o t a r g e t s was f a s t e r t h a n t o n o n t a r g e t l e t t e r s i n t h e 4s c o n d i t i o n . RT t o n o n t a r g e t d i g i t s was f a s t e r t h a n t o n o n t a r g e t l e t t e r s i n a l l c o n d i t i o n s . The number o f c o u n t i n g e r r o r s was l a r g e r i n t h e d u a l t a s k c o n d i t i o n 40, t h a n i n t h e c o u n t o n l y c o n d i t i o n 4C (c2: F(1,24)=5.1, pc.05, see t a b l e 3 ) . These r e s u l t s n i c e l y r e p l i c a t e d and extended t h e f i n d i n g s o f Aasman e t a l . (1987) and Pruyn, Aasman & W i j e r s (1985). I n t h e s i m p l e t a s k (4S), n o n t a r g e t l e t t e r s were responded t o more s l o w l y t h a n t a r g e t s , r e f l e c t i n g t h e e x t r a d e c i s i o n t i m e f o r n o n t a r g e t s ( b i n a r y d e c i s i o n ) . RTs t o n o n t a r g e t d i g i t s a r e f a s t e r t h a n t o n o n t a r g e t l e t t e r s , due t o a u t o m a t i c p r o c e s s i n g i n t h e s e CM t r i a l s . I n t h e d u a l - t a s k c o n d i t i o n , on t h e o t h e r hand, c l e a r ' t i m e - s h a r i n g c o s t s ' were found e s p e c i a l l y f o r t a r g e t s t i m u l i . T i m e - s h a r i n g o f m u l t i p l e processes i n w o r k i n g memory (memory search, memorizing c o u n t e r s , updating counters), resulted i n a d e t e r i o r a t i o n o f performance: s l o w e r r e s p o n d i n g and worse c o u n t i n g performance. 3.1.2 P h y s i o l o g y H e a r t - r a t e was f a s t e s t i n t h e d u a l t a s k c o n d i t i o n and s l o w e s t i n t h e s i m p l e t a s k 4s and r e s t p e r i o d s . Mean I B I was 808, 809, 843 i n c o n d i t i o n s 40, 4C and 4 s r e s p e c t i v e l y , averaged o v e r a l l s e s s i o n s ( w o r k i n g days t d a y s - o f f ) . I n r e s t p e r i o d s t h e mean I B I was 838 ms. Comparison c l showed p<.OOOl). c 4 a s i g n i f i c a n t d i f f e r e n c e between 4s and 40 (F(1,24)=53.6, showed a d i f f e r e n c e between r e s t p e r i o d s and t a s k c o n d i t i o n s (F(1,24)=
J. Aasrnari et al.
128
7.77,
p<.05).
S p e c t r a l energy i n t h e 0.10 Hz r e g i o n showed l e a s t v a r i a b i l i t y i n t h e dual t a s k c o n d i t i o n , more v a r i a b i l i t y i n t h e c ount and simple c o n d i t i o n s , and most energy i n t h e r e s t p e r i o d s . The s p e c t r a l energy i n 40, 4C, 4s and r e s t s was 1090, 1531, 1365 and 2129 r e s p e c t i v e l y . c2 showed a d i f f e r e n c e between 4D and 4C (F(1,24)=22.25, p<.OOOl), b u t no d i f f e r e n c e between c o n d i t i o n s 4s and 4C. c4 showed a s i g n i f i c a n t d i f f e r e n c e between r e s t s and t a s k p e r i o d s (F(1,24)=29.32, p<.OOOl). These d a t a were as c o u l d be expected f r o m p r e v i o u s r e s e a r c h (Aasman e t a1 ., 1987; Pruyn e t a l . 1985). The c a r d i o v a s c u l a r v a r i a b l e s i n d i c a t e d incre as ed mental e f f o r t i n t h e dual t a s k c o n d i t i o n as compared t o t h e s i n g l e - t a s k c o n d i t i o n s 4s and 4C. Also, t h e i nvest ment o f e x t r a mental e f f o r t i n t a s k s as compared t o r e s t p e r i o d s was n i c e l y r e f l e c t e d i n t h e p h y s i o l o g y . The measure o f s p e c t r a l energy (SP-BPI seems t o be most s e n s i t i v e i n t h i s r e s p e c t , as c o u l d be expected f rom van O e l l e n e t a l . (1985) and Aasman e t a l . (1987). 3.2 E f f e c t s o f wo r k l o a d 3.2.1 E f f e c t s o f workload on t a s k performance React ion t ime s and r e a c t i o n t i m e e r r o r s i n t h e 4s and t h e 4D dual t a s k a r e pre s ent e d n t a b l e 2. Table 3 g i v e s t h e c o u n t i n g e r r o r s i n c o n d i t i o n s 4C and 4D. Table 2. Reac t io n t ime s f o r YES, NO and OUT s t i m u l i , and r e a c t i o n t i m e e r r o r s f o r NO- responses n t h e 4s and 4D t a s k on w o r k i n g days and on d a y s - o f f . -------------_ ................................................. day - o f f morning midday evening 4s
40
4s
4D
4s
4D
4s
4D
................................................................. RTLYES RT-NO RT-Out RT-e rr.
643 723 603 .24
-
938 776 681 .17
! 699 - 1012 ! 706 - 869 ! 628 - 734 ! .07 - .17
620 - 969 665 - 873 592 - 694 - 0 4 - .70
-
641 - 1001 719 - 862 628 - 729 .12 - - 5 4
F i r s t we w i l l d i s c u s s o v e r a l l d i f f e r e n c e s between d a y s - o f f and working days, t h e n we w i l l t a k e a l o o k a t e f f e c t s o f workload w i t h i n working days. Some evidence was found f o r a decrease i n t h e speed o f t a s k performance on wo rk in g days as compared t o d a y s - o f f . RT-YES d i d n o t show a s i g n i f i c a n t d i f f e r e n c e between d a y s - o f f and w o r k i n g days, RT-NO showed a t r e n d (F(1,24)=3.7, p=.067), and RT-OUT d i d show a s i g n i f i c a n t d i f f e r e n c e (F( 1,24)=4.18, p< .05). The d i f f e r e n c e i n RT between t h e 4D and 4s c o n d i t i o n s , t h o u g h t t o r e f l e c t
Mental Fatigue in Normal Daily Working Routines
129
'time sharing c a p a b i l i t y ' (see 2.4.1 ) , was s i g n i f i c a n t l y l a r g e r on working days than on d a y s - o f f f o r RT-YES (341 vs 295 ms, i n t e r a c t i o n w i t h c l - F(1,24)=6.33, p<.05) and RT- NO (171 vs 43 ms, i n t e r a c t i o n w i t h c l F(1,24)=13.64, p<.Ol)). Although t h e same p a t t e r n o f r e s u l t s was found f o r RT-OUT 102 vs 78 ms), t h i s e f f e c t d i d not reach s i g n i f i c a n c e . Fig. 6 and 7 show t h i s e f f e c t f o r RT-YES.
llFfERLNCE - S I M P L f TASK ( 4 1
REACTION TIME (MSEC)
DOUBLE TASK ( 4 )
'
YORK
/
/
/'
FREE PRESS-ERR
COUNT-ERR
.
..
..
.
I E V C L ON FREE DAYS 111.001
8.40
12.40
T l M $
17.40
~~~
......
LEVEL ON fRLE DAYS 111.00)
I-SIMPLE
4-DOUBLE
8.40
12.10
17.10
T I M E
F i g 5. L e f t : Reactiontimes (RT), Reactiontime e r r o r s ( E r r ) and counting e r r o r s (Counte r r ) i n t h e 4D t a s k d u r i n g a working day compared w i t h d a y - o f f l e v e l . Deviation scores computed as the d i f f e r e n c e between a working day session and a day-off session, d i v i d e d by the o v e r a l l standard d e v i a t i o n (over working day sessions and d a y - o f f sessions). F i g 6. Middle and Right: Middle: Reactiontime on a working day and a d a y - o f f i n t h e 4s and the 4D task. Right: The d i f f e r e n c e between t h e 4s and the 4D task p l o t t e d both f o r reactiontimes and reactiontime e r r o r s during a working day w i t h the d a y - o f f l e v e l as a reference. Accuracy o f performance was not d i f f e r e n t on working days and days o f f , nor the r e a c t i o n time e r r o r s , nor the counting e r r o r s . However, the 4D-4S d i f f e r e n c e i n r e a c t i o n time e r r o r s was l a r g e r on working days then on d a y s - o f f ( i n t e r a c t i o n w i t h c l F(1,24)-24.14, p<.OOOl). See f i g . 6. Second, we l o o k a t d i f f e r e n c e s between morning, midday and evening sessions w i t h i n working days. I n general, performance speed appeared t o be f a s t e s t i n t h e midday, and slowest i n the morning. For RT-YES t h e r e was a s i g n i f i c a n t d i f f e r e n c e between sessions (F(2,48)=3.24, p c . 0 5 ) . For RT-NO, t h e r e were no d i f f e r e n c e s between sessions i n t h e 4D task, but i n
J. Aasman el al.
130
the 4s t a s k RTs were f a s t e s t i n the midday session. This was evidenced by a i n t e r a c t i o n between sessions and c l (F2,48)=3.73, p<.05). For RT- OUT, again RTs were f a s t e s t i n t h e midday (F(1,24)=4.18, p<.05). The 4 D - 4 s d i f f e r e n c e increased i n the course o f the day f o r RT- YES, and was l a r g e s t i n the midday f o r RT-NO. These e f f e c t s were n o t s i g n i f i c a n t however. Accuracy o f YES-NO responding was best i n the morning sessions, and worse i n the midday and evening (F1,2,48=13.41, p<.OOOl). The 4D-4S d i f f e r e n c e i n the number o f e r r o r s was l a r g e s t i n the midday, and minimal i n the morning ( i n t e r a c t i o n w i t h c l - F(2,48)=12.91, p<.OOOl). The accuracy o f counting was worst i n the midday, and comparable i n t h e morning and evening (F(2,48)-5.61, pc.01). See t a b l e 3. Table 3. Counting e r r o r s on working days and days-off. The counting e r r o r s were computed as the absolute d i f f e r e n c e between reported and actual number o f t a r g e t s .
..................................................... day - o f f
4c 4D
2.04 2.28
morning midday 1.52 3.07
2.83 3.07
evening 2.04 2.46
3.2.2 Physi o l ogy 3.2.2.1 Blood-pressure I n the course o f working days blood-pressure increased ( D i a s t o l i c , F(4,48)- 5.73, p
Blood-pressure l e v e l s showed the same values i n the morning session on working days, as i n the 13.00 session on a d a y - o f f . On working days, s y s t o l i c blood-pressure showed an i n t e r a c t i o n between time o f measurement ( b e f o r e / a f t e r l a b o r a t o r y sessions) and time o f day. I n the morning and
Mental Fatigue in Normal Daily Working Routines
131
midday session s y s t o l i c blood pressure decreased d u r i n g a session. I n t h e evening session t h e r e was a s l i g h t increase. (F(2,48)=4.65, p<.05). Although t h e e f f e c t s reported here were s t a t i s t i c a l l y s i g n i f i c a n t , should be noted t h a t the magnitude o f the e f f e c t s i s q u i t e small.
it
3.2.2.2 Cardiovascular variables I n order t o deal w i t h the general changes i n the cardiovascular s t a t e as a r e s u l t o f workload we aggregated I B I and SP-BP over a l l tasks and r e s t periods. These values are shown i n t a b l e 5.
During a working day I B I g r a d u a l l y increased ( h e a r t - r a t e decreases). The morning l e v e l was comparable t o the d a y - o f f l e v e l . Figure 5 ( r i g h t side) shows the mean I B I values and spectral energy (SP-BP) i n t h e t h r e e l a b o r a t o r y sessions separately f o r r e s t periods and t h e 4D tasks. DIFFERENCE
REST-TASK
.
?.
LEVEL ON FREE DAYS (13.00)
--.,BPBAND
LEVEL ON FREE OAYS (11.001
'/
/ 8.40
12.40
T I M E
17.40
0.40
12.40
17.40
T I M E
F i g 7. L e f t : I B I and SP-BP ( o r BP Band) i n the r e s t periods and i n the 4D t a s k d u r i n g a working day compared w i t h d a y - o f f l e v e l . Right: The d i f f e r e n c e between the r e s t p e r i o d s and t h e dual task d u r i n g a working day f o r I B I and SP-BP.
I32
J. Aasman et al
The l e f t side shows the rest-4D d i f f e r e n c e . The d i f f e r e n c e i n I B I between r e s t s and the 4D task increased i n t h e course o f t h e day ( i n t e r a c t i o n w i t h c4 - F(2,48) 13,93, pcO.00001). This d i f f e r e n c e r e f l e c t s the p h y s i o l o g i c a l r e a c t i o n t o task performance as a d e v i a t i o n f r o m the momentary baseline (values i n r e s t periods). The o t h e r task conditions showed t h e same p a t t e r n o f r e s u l t s (although l e s s pronounced than 40). SP-BP showed about the same p a t t e r n o f r e s u l t s as I B I . There was a t r e n d f o r SP-BP t o r i s e d u r i n g a day. There was a maximum a t 12.40 h. The SP-BP value i n the morning was comparable t o the 13.00 h value on a d a y - o f f . I n the course o f a working day, the d i f f e r e n c e between the r e s t periods and the 4D task increased ( i n t e r a c t i o n w i t h c4 - F(2,48)=2.70, p=0.07). The rest-4D d i f f e r e n c e tended t o be l a r g e r on working days than on days o f f f o r SP-BP ( i n t e r a c t i o n w i t h c4 F(1,24)-3.21, p=0.07), but not f o r I B I .
3.3. The e f f e c t s o f the preceding day It seems p l a u s i b l e t h a t e f f e c t s o f workload on mental e f f i c i e n c y might ' b u i l d up' when people work more days a f t e r another, and d e c l i n e i n the course o f d a y s - o f f . It was t h e r e f o r e expected t h a t e f f e c t s o f work on t a s k performance o r physiology would be stronger on working- days preceded by another working day, as compared t o working days preceded by a d a y - o f f . On t h e other hand, i t was expected t h a t performance and physiology would show stronger recovery on the second d a y - o f f , than on a d a y - o f f preceded by a working day. However, no such e f f e c t s were found.
3.4 I n d i v i d u a l d i f f e r e n c e s Subjects were d i v i d e d i n a low, medium and h i g h sickness r a t e group. I t was expected t h a t high sickness r a t e subjects would show t h e l a r g e s t s u s c e p t i b i l i t y t o occupational stress. Sickness r a t e was included as a between-subjects f a c t o r i n a l l our analyses. However, no more e f f e c t s were found than could be expected on t h e basis o f chance alone, and none o f the small number o f e f f e c t s found were r e a d i l y i n t e r p r e t a b l e . We included a summarizing f i g u r e f o r t h e t h r e e groups. I n f i g u r e 8 the d i f f e r e n c e between working days and days-off f o r the t h r e e groups i s p l o t t e d f o r several variables. The most s u r p r i s i n g f i n d i n g was t h a t there was absolutely no c o n t i n u i t y from the low t o t h e h i g h i l l n e s s group. On t h e contrary, i t i s the medium sickness r a t e group which i s deviant from t h e other two groups. These d i f f e r e n c e s never reached s i g n i f i c a n c e i n an ANOVA although a d i s c r i m i n a n t analysis on these difference-scores r e s u l t e d i n a f u n c t i o n which c l a s s i f i e d t h e subjects i n t h e c o r r e c t groups w i t h a percentage o f 90 percent.
Mental Fatigue in Normal Daily Working Routines
nm I -
R E L A T I V E SCORES FREE
BAR
2
BAR 3
-
133
Lou SICK RATE KEDIVII SICK RATE HIGH SICK RATE
F i g 8. The r e l a t i v e d i f f e r e n c e between a working day and a day- o f f f o r a number o f v a r i a b l e s . SD = Standard Deviation, E = Reactiontime Errors, C . E = Counting Errors, Re1 BP-Band = the d i f f e r e n c e between a 4D taskperiod and the restperiods f o r SP-BP o r BP-Band. Re1 I B I = the d i f f e r e n c e between a 40 taskperiod and the resperiods f o r 161. A l l measures are normalized t o f i t i n t h e same p i c t u r e . The medium i l l n e s s group tended t o respond t o work stress w i t h increased r e a c t i o n times and standard deviations and decreased numbers o f counting e r r o r s . This may i n d i c a t e t h a t they adopted a more cautious s t r a t e g y on working days. Since h e a r t - r a t e v a r i a b i l i t y a l s o tended t o be l a r g e r on working days, t h e r e seems t o be a s h i f t towards a l e s s e f f o r t f u l strategy. The other two groups o f subjects showed more speeded processing on working, but a t the expense o f more e r r o r s . Though these r e s u l t s suggest d i f f e r e n t s t r a t e g i e s t o cope w i t h t h e working conditions, they should be considered w i t h much caution. We only would l i k e t o emphasize the importance o f i n d i v i d u a l differences. These d i f f e r e n c e s could i n an i n d i r e c t way be r e l a t e d t o other i n d i v i d u a l d i f f e r e n c e s such as sickness rate. 4. DISCUSSION
F i r s t o f a l l , i t should be noted t h a t the standardized l a b o r a t o r y tasks showed the same p a t t e r n o f r e s u l t s as was observed i n o t h e r studies using students as subjects. A l s o , t h e e f f e c t s o f these tasks on h e a r t - r a t e v a r i a b i l i t y were e x a c t l y as was expected. As t h e reader may remember, i t i s d i f f i c u l t t o d i s t i n g u i s h between the e f f e c t s o f time-of-day and work per se. However, i f we consider the 4s c o n d i t i o n as a task mainly a f f e c t e d by speed o f processing, we should expect increasing speed o f task performance from morning t o midday t o evening. I f there i s no d i f f e r e n c e between the morning and the midday session, t h e e f f e c t s o f work might have i n t e r f e r e d : t h e e f f e c t s o f
134
J. Aasman et 01.
f a t i g u e counterbalanced the e f f e c t s o f time-of-day. I f we consider t h e c o n d i t i o n 4C and 4D as being a l s o dependent on working memory, we should expect an improvement from morning t o midday and a decrease f r o m midday t o evening. F i n a l l y , the midday value on the d a y - o f f i s o n l y a f f e c t e d by time-of-day, and n o t by workload. Now consider the r e s u l t s . A l l tasks show a decrease i n processing time from the morning t o midday and an increase t h e r e a f t e r . This i s not what would be expected, a t l e a s t n o t f o r t h e more simple t a s k ( 4 s ) . One i s tempted t o conclude t h a t e s p e c i a l l y the afternoon session, i .e. the a d d i t i o n a l 3.5 hours work a f t e r the f i r s t working p e r i o d counteracted the e f f e c t s o f the d i u r n a l rhythm. Now consider the d i f f e r e n c e s between midday on the d a y - o f f and midday on t h e working day. RT i s increased i n t h e most d i f f i c u l t condition: 4D. The d i f f e r e n c e i n RT between 4 s and 4D, r e f l e c t i n g the effectiveness o f time-sharing i s a l s o increased. This suggests t h a t mental f a t i g u e diminishes the p o s s i b i l i t y o f combining t w o d i f f e r e n t tasks. H e a r t - r a t e v a r i a b i l i t y tends t o be l a r g e r on working days, suggesting t h a t l e s s e f f o r t i s invested, but see also below. F i n a l l y , heart r a t e decreases s t e a d i l y across t h e day, w h i l e blood-pressure s t e a d i l y increases. The l e v e l o f blood-pressure on the morning o f a working day i s comparable w i t h the l e v e l i n the midday o f the d a y - o f f , suggesting a gradual change i n base-level. I t i s i n t e r e s t i n g t h a t d u r i n g morning and midday sessions s y s t o l i c BP decreased d u r i n g a session w h i l e i n the evening session there was a small r i s e . It seems as i f the blood-pressure r e g u l a t i n g system i s able t o r e l a x o n l y during the morning and midday sessions and i s l e s s able t o deal w i t h the experimental s i t u a t i o n i n the evening. However one should r e a l i z e t h a t t h e absolute d i f f e r e n c e s are q u i t e small. Both I B I and SP-BP showed i n t h i s respect the same p a t t e r n o f r e s u l t s as BP: an increase i n t h e course o f t h e working day, and values i n the morning session on working days comparable t o values on d a y s - o f f a t 13.00. Possibly, increases i n h e a r t - r a t e and SP-BP i n the course o f a day can be explained as e f f e c t s o f increased f a t i g u e . I t i s known t h a t f a t i g u e r e s u l t s i n lower h e a r t - r a t e due t o a g r e a t e r parasympathetic c o n t r o l . According t o Axelrod (Science, 1981) a higher parasympathetic c o n t r o l r e s u l t s i n higher energy i n t h e mid-frequencies o f t h e spectrum. Thus, we found e f f e c t s o f working both as a d i f f e r e n c e between working days and days-off, and as changes w i t h i n working days. These e f f e c t s were found f o r the cardiovascular v a r i a b l e s and blood-pressure. It i s i n t e r e s t i n g t o speculate about a possible mechanism which might explain both t h e increase i n blood-pressure and t h e decrease i n h e a r t - r a t e i n the course o f working days. Blood-pressure can be increased both by an increased h e a r t - r a t e o r an increased peripheral resistance. Since h e a r t r a t e decreases and d i a s t o l i c blood-pressure increases one i s tempted t o conclude t h a t p e r i p h e r a l resistance increases d u r i n g t h e day and due t o the working o f the b a r o r e f l e x h e a r t - r a t e decreases i n order t o keep
Mental Fatigue in Normal Daily Working Routines
135
b l o o d - p r e s s u r e w i t h i n homeostatic l e v e l s . The most i n t e r e s t i n g r e s u l t i s t h e i n c r e a s i n g r e s t minus d u a l t a s k d i f f e r e n c e i n h e a r t - r a t e and h e a r t - r a t e v a r i a b i l i t y (SP-BP) i n t h e course o f w o r k i n g days. T h i s d i f f e r e n c e r e f l e c t s t h e p h y s i o l o g i c a l r e a c t i o n t o t a s k performance w i t h i n a l a b o r a t o r y session, c o r r e c t e d f o r t h e s l o w e r t r e n d s ( i n r e s t ) o v e r t h e day. I t may be argued t h a t t h i s change i n p h y s i o l o g y , e s p e c i a l l y f o r t h e SP-BP v a r i a b l e , r e f l e c t s t h e e f f o r t i n v e s t e d t o m a i n t a i n an a c c e p t a b l e l e v e l o f performance. The r e s u l t s show t h a t as t h e d r i v e r becomes t i r e d i n t h e c o u r s e o f t h e w o r k i n g day, he has t o i n v e s t more e f f o r t t o cope w i t h t a s k demands. The e f f e c t s o f work on t h e s e measures i s n o t l a r g e . I t may be t h a t t h e p r e s e n t measures a r e n o t s e n s i t i v e enough t o measure mental f a t i g u e . I t i s e q u a l l y p o s s i b l e t h a t t h e j o b o f d r i v i n g a bus i n a c i t y i s n o t r e a l l y as heavy as one would have expected. I t i s p o s s i b l e t h a t t h e j o b i s more p h y s i c a l l y demanding. However t h e s e q u e s t i o n s can o n l y be answered i f we compare o t h e r o c c u p a t i o n s w i t h t h e same methodology. Such experiments a r e c u r r e n t l y under way i n o u r I n s t i t u t e .
ACKNOWLEDGEMENT The r e s e a r c h r e p o r t e d was supported by t h e N e t h e r l a n d s O r g a n i z a t i o n f o r t h e Advancement o f B a s i c Research (Z.W.0).
J. Aasinan et al.
I36
REFERENCES
Aasman,J., Mulder,G., Mulder,L.J.M., O p e r a t o r E f f o r t an t h e Measurement o f H eart Rate V a r i a b i l i t y , 1987. Human Fa c t o r s , 29(2),161-170. Axelrod,S., Gordon,D., Abel,F.A., Shannon,D.C., Barger,A.C. and Cohen, R.J.,1981. Power spectrum a n a l y s i s o f h e a r t r a t e f l u c t u a t i o n : A q u a n t i t a t i v e probe o f b e a t - t o - b e a t c a r d i o v a s c u l a r c o n t r o l . Science, 213, 220-222. Blake,M.S.F (1967) Time o f day e f f e c t s on a range o f t a s k s . Psychonomic Science, 9, 349-350 Folkard,S. (1983). D i u r n a l V a r i a t i o n . G.R.J. Hockey (Ed.) S t r e s s and F a t i g u e i n Human Performance.Chisester: W iley. H i l d e b r a n d t , (1961) Rhythmus und Regulation.Med.Welt.,S.,
73-81.
Hockey,G.R.J.& Colquhoun,W.P.(1972) D i u r n a l v a r i a t i o n i n human p e r formance. I n W.P.Colquhoun (Ed.). Aspects o f human e f f i c i e n c y . London: E n g l i s h U n i v e r s i t y Press. Ho1ding.D.H. (1983) Fa t i g u e . I n G.R.J.Hockey i n Human Performance. C h i s e s t e r : Wi l e y .
(Ed.). S t r e s s and F a t i g u e
Jahns,D.W. (1973). A concept o f o p e r a t o r w orkload i n manual v e h i c l e o p e r a t i o n s . (T e c hn i c a l Report n o 4 ) . Meckingheim, West Germany: F orschungsinstitut Antropotechnik. Kleitman,N. (1963). Sleep and Wakefulness ( r e v .ed). Chicago: Chicago University. Massar0,D.W. (1975). Experimental Psychology and I n f o r m a t i o n Processing. Chi gago : Rand-McNal 1y
.
Meyman,T., Linden,A. v . d . , Kompier,M., B u t t er, E. (1983) Trends and i n t e r r e l a t i o n s o f absence f i g u r e s , psychosomatic c o m p l a i n t s and s l e e p c o m p l a i n t s o v er a 5 - y e a r p e r i o d i n c i t y bus d r i v e r s . Symposium working environment i n urban p u b l i c t r a n s p o r t . Stockholm. (Heymans B u l l e t i n s , HB-83 - 654 -EX, In s t it u t e f o r Experimental and O ccupat ional Psycho1 ogy , U n i v e r s i t y o f Groningen), Groningen, The Nether1 ands. Meyman,T.F., Zijlstra,F.R.H.(1986). The measurement o f p e r c e i v e d e f f o r t . Annual Congress Ergonomic S o c i e t y , i n Oborne D.J. Ed: Contemporary Ergonomics, 1986. London: T a y l o r & F r a n c i s . Mulder,G. (1980). The H e a r t o f Mental E f f o r t . Unpublished d o c t o r a l d i s s e r t a t i o n . U n i v e r s i t y o f Groningen, Groningen, The Net herlands. Mulder,G. (1986). Mental e f f o r t and i t s measurement. I n G.R.J.Hockey, A.W.K.Gaillard, and M.Coles (Eds.), E n e r g e t i c s i n Human I n f o r m a t i o n Processing. D ord re c h t : M a r t i n u s N i j h o f f .
Mental Fatigue in Normal Daily Working Routines
137
Mulder,G. and Mulder, L.J.M. (1981). T a s k - r e l a t e d c a r d i o v a s c u l a r s t r e s s . I n J.Long and A.D.Baddeley (Eds.) A t t e n t i o n and Performance, I X . H i l l s d a l e , NJ: Erlbaum. Mulders,H., Meijman,T., O’Hanlon,J.F, Mulder,G. (1982). D i f f e r e n t i a l p s y c h o p h y s i o l o g i c a l r e a c t i v i t y o f c i t y bus d r i v e r s . Ergonomics, 25, 1003-1011. Norman,D. and Bobrow,D.J. (1975). On d a t a and r e s o u r c e l i m i t e d processes. C o g n i t i v e Psychology, 7, 44-66. Pruyn,A.T.H.( 1986). Performance and a c t i v a t i o n under s o c i a l e v a l u a t i o n . Del ft : Eburon. Pruyn,A.T.H., Aasman,J. and Wyers,A.A., S o c i a l i n f l u e n c e s on mental processes and c a r d i o v a s c u l a r a c t i v i t y , 1985. I n : J.F. Orlebeke, G.Mulder and L.J.P. van Doornen Eds: The Psychophysiology o f C a r d i o v a s c u l a r C o n t r o l . New York: Plenum Press. Schneider,W. and S h i f f r i n , R . M . (1977). C o n t r o l l e d and a u t o m a t i c i n f o r m a t i o n p r o c e s s i n g : 11. Perceptual l e a r n i n g , a u t o m a t i c a t t e n d i n g , and a g e n e r a l t h e o r y . P s y c h o l o g i c a l Review, 84, 127-190. Schneider,W., Dumais,S.T. and S h i f f r i n , R . M . (1984). A u t o m a t i c and c o n t r o l p r o c e s s i n g and a t t e n t i o n . I n R.Parasuraman and D.R.Davies (Eds.) V a r i e t i e s o f A t t e n t i o n . New York: Academic Press. Sternberg,S. (1969). Memory scanning: Mental processes r e v e a l e d r e a c t i o n t i m e experiments, 1969b, American S c i e n t i s t , 57, 421-457.
by
Treisman,A.M. and Gelade,G. (1980). A f e a t u r e i n t e g r a t i o n t h e o r y o f a t t e n t i o n . C o g n i t i v e Psychology, 19, 1-18. Van Dellen,H.J., Aasman,J., Mulder,L.J.M. and Mulder,G. (1985). Time domain v e r s u s frequency domain measures o f h e a r t - r a t e v a r i a b i l i t y . I n J.F. Orlebeke, G.Mulder and L.J.P. van Doornen (Eds.), The Psychop h y s i o l o g y o f C a r d i o v a s c u l a r C o n t r o l . New York: Plenum Press. Zijlstra,F.R.H., Doorn, L. van. C o n s t r u c t i o n o f s u b j e c t i v e e f f o r t , 1987, ( i n p r e s s ) .
a s c a l e t o measure
This Page Intentionally Left Blank
HUMAN MENTAL WORKLOAD P.A. Hancock and N. Meshkati (Editors) Elsevier Science Publishers B.V. (North-Holland), 1988
139
D e v e l o p m e n t of N A S A - T L X ( T a s k L o a d Index): R e s u l t s of E m p i r i c a l a n d T h e o r e t i c a l R e s e a r c h
Sandra G. Hart Aerospace Human Factors Research Division \.Z$A-Ames Research Center Lloffett Field. California Lowell E. Sta\rlarid San Jose State Ini\ersit> San Jose. California
ABSTRACT T h e results of a multi-year research program t o identiJy the Jactors asaoriaied with variations i n subjective workload uizthin and betweerr different types OJ tasks are reviewed. Subjecizve evalualions oJ 10 utorkload-related factors were obtained J r o m 16 different urperzments. T h e ezperimental tasks included simple cogn i t i w and manual control tasks, complez laboratory and supervisory control tasks, and aircraJi simulation. T a s k - , behavior-, and subject-related correlates OJ subjeciive workload ezperiences w r i e d as a Junction oJ difficulty manipulations within experiments, different sources OJ workload between experiments. and individual differences in workload definition. A multi-dimensional rating scale is proposed in which inJormation about the magnitude and sources oJ six workload-related factors are combined i o derive a sensitzve and reliable estimate of workload.
INTRODUCTION This chapter describes the results of a multi-!ear research effort aimed a t empirically isolating and defining factors that are relevant to subjrctivc, experiences of workload and to formal evaluation of workload across a variety of activities. I t includes information on how people formulate opinions about workload and hob they express their subjective evaluations using rating scales. Despite much disagreement about its naturr and definition, workload remains a n import.ant. pracLicallg rele\ ant. and measurable entity. Workload assessment techniques abound; however. subjertive ratings are the most commonly used method and are the criteria against which other measures are compared. In most operational environments, one of the problems encountered w i t h the usc of subjective rating scales has been high between-subject variability. *e propose a rating t.erhnique by which variability is reduced. Another problem has been that the sources of workload are numerous and vary across tasks. sources of workload. T h e proposed rating technique, which is multidimensional, provides a method by which specific sources of workload relevant t o a given task can be identified and considered in computing a global workload rating. It combines information about these factors, thereby reducing some sources of between-subject variability that are experimentally irrelevant, and emphasizing the contributions of other sources of variability that are experimentally relevant.
S. C. Hart and L. E. Staveland
140
Conceptual Framework began with the assumption that workload is a hypothetical construct t h a t represents t h e cost incurred by a human operator 1.0 arhieve a partirular level of performance. Thus, our definition of workload is human-centered. rather than cask-centered (refs. 1-12, 1-22). An oprrator’s subjertivr experience of workload summarizes t h e influences of many factors in addition t o the objertive demands imposed by the task. Thus. workload is not an inherent propert!. but rather i t emerges from the interartion betwerii thr requirements of a task. the cirruriis1anrec: under which it is performed. and the skills. behaviors, and perreptions of the oprrator. Siiirr inan) apparently unrelated variables may rombirre 1.0 create a subjective workload experience. a ronreptual framework w a s proposed (ref. 1-12) in which different sources and modifiers of workload were enumerated and related (Figure 1). K e
Imposed workload refers t o the situation encountered by an operator. The intended demands of a task are rreaLed by its objectives, duration, and structure and by the human and system resources provided. The actual demands imposed by a task during its performance by a specific operator may be modified by a host of factors (e.g., the environment, system failures, operator errors) that are uniqur to that occurrence. These incidental factors may rontribute either subtle or subst,antial sources of variability t o the workload imposed by the task from one performance t o the next.
IMPOSED WORKLOM ASK VARIAWES
OUECTIVES:
GOAU
CIIITERIA TEMPORAL
sinuciune:
DURATION RAT€ PROCEDUclES
I
SELECTION OF STRATLGIES OPERATOR CAPABILITIIS SENSORVIMOTOR SKILLS COGNITIVE SKILLS XNOWLEDGE m e
ACCURACVbcl EClSlON RELIABILIW
vEEo
SVSTIM RESOURCES:
lNfORMATlOl EOUIPMENT PLISONNCL ~. OPf RATOR OUALlFlCATIONI ENVIRONMENT
SOCIAL mvsicAL
NCIOENTAL VARIAILES SVSTEM FAILURES
oTenATon ennons CNVlRONMlNTIL CHANPES STATE or w e o * e n A q
I
OPERATOlll~ERCEPTlOYOF: TASK GOALS L sinuciune P€RlORMANCI rRECONCEPTIONS L (11ASES
NWLCTIVE EXPERIENCE
I
I
DlRLCT F U O I A C K KNOWLEDGE OF RESULTS
mvsioLoGic&L CONSIOUENCES
Figure 1 . Conceptual framework for relating variables t h a t influence human performance and workload.
Development of NASA-TLX
141
System response refers to the behavior and accomplishments of a man-machine system. Operators are motivat ed and guided by the imposed demands, but their behavior also reflects their perceptions about what they are expected 1.0 do and the strategies, effort, and system resources expended to accomplish the task objectives. Operators exert effort in a variety of ways. Physical effort is the easiest t o conceptualize. observe, and measure, yet its importance i n advanced systems is diminishing. Mental effort serves as a potent intervening variable between measurable stimuli and measurable responses, but it is difficult t o quantify directly. S! stem pc,rfornrance represents thr product of an operator's artions and the limitations, capahilities. and characteristics of the system ront r o l l d . Pcrfuriiianre feedback provides operators information about their sucress in meeting task requirrrrienth. allowing them t.0 adopt different strategies or exert diffvrenr levels of effort to corrert their OM n errors. Experienced workload and physiological consequenres reflert the effert on an operator of performing a task. I t i s the subjective experience of workload that is t h e legitimate domain of subjertive ratings. However. i t is not likely t h a t an operat.or.5 experience of workload is a simple combination of thv relevant factors. Moreover, ratings ma! be biased by preconceptions. Since operators are unlikel! LO be aware of every task variable or the processes t h a t underlie their decisions and actions. their experiences will not reflect all relevant factors. In addition, they are influenced b! preconreptions about the task and their definition of workload. Thus, we draw a distinction among the level of workload that a system designer intends t o impose, the responses of a specific man-machine system t o a task, and operators' subjective experienres. The importance of subjective experiences extends beyond its association with subjective ratings. The phenomenological experiences of human operators affect subsequent behavior, and thus affect their performance and physiological responses t o a situation. If operators consider the workload of a task to be excessive the) may behave as though they are overloaded, even though the task demands are objectively low. They may adopt strategies appropriate for a high-workload situation (e.g.. shedding tasks, responding quickly), experience psychological or physiological distress, or adopt a lower criterion for perfornianre. Information Provided by Subjective Ratings
I n romparison \I ith other workload assessment methods (refs. 1-15, 1-22), subjective ratings m a j come closest 10 rapping the essence of niental workload and provide the most generally valid and sensitive indicator. They provide the onl! source of information about the subjective impact of il t.ask on operators and int.egrat,e the effects of many workload contributors. However. there arc prartiral problems associated with translating a personal experience of workload into a formalized workload rating. People often generate evaluations about the difficulty of ongoing experierrres and the impact of those experiences on their physical and mental state. However. they rarely quantify, remember, or verbalize these fleeting impressions. In fact, they may not identify their cause or effect with the concept of "workload" a t all. They are aware of their current behavior and sensations and the results of cognitive processes, although they are nor aware of the processes themselves (refs. 1-8, 1-18). Only the most recent information is directly accessible for verbal reports from short-term or working memory. Thus. a great deal of information may be available as an experience occurs; however, t h e experience of each moment is replaced by that of the next one. The workload of an activity may be recalled or re-created, but the evaluation is limited Lo whatever information was remembered, incidentally or deliberately, during the activity itself. For these and other reasons. subjective ratings d o not. necessarily include all of the relevant information and they may include information t h a t is irrelevant. Workload is experienced as a natural consequence of many daily activities. However, a formal requirement to quantify such an experience using experimentally-imposed rating scales
142
S. G. Hart and L. E. Staveland
is nor a natural or commonplace activity and may result in qualitatively different responsrs. For this reason, Turksen and Moray (ref. 1-25) suggested that the less precise "linguistic"
approach provided by fuzzy logic might. be appropriate for workload measurrment because people naturally describe their experiences with verbal terms and modifiers (e.g., "high", "easy", or "moderate") rather than with numerical values. If workload is a meaningful construct. however, it should be possible t o obtain evaluations i n a variety of ways either while a task is being performed or at its conclusion. A formal requirement t o provide a rating does encourage subjrcts t o adopt a more rareful mode of evaluation. to exprrss tlirir judgrrirnts i n a standardized format. arid to adopt the evaluation criteria imposed b! t h v experimenter. \Vorkload evaluations are typically given with reference to arbitrar! scales labeled with nurnbrrs or verbal descriptions of the magnitudes represented by extremr values. Thrsr often have no dirert analog in the physical world. Sinre it is unlikely that individuals rrmember specific instances o f lo-. medium or high workload to servr as a mental rrferenrr scale labeled "workload". absolutr judgements or comparisons across different types of t.asks are not generally meaningful. For features that can be rneasurrd i n physical units, i t is possible to distinguish among absolute, relative and value judgements from the objective information available. For workload ratings, i t is relatively more difficult to distinguish between an "objective" magnitude estimate and a judgement niadr in comparison to a n internal reference. Rating formats might include discrete numeric \ alurs. alternativr descriptors, or distances marked off along a continuum. Finally, rating scales might be single-dimensional or multi-dimensional requiring judgements about several Iask-rrlated or psychological variables.
Evaluating Ill-Defined C o n s t r u c t s It is likely that the cognitivr evaluation processrs involved when people makr workload assessments are similar to those adoptrd whrn they evaluate other complex phenomena. Evaluation is typically a constructivr process. operating on niiiltiple at1.ribute.s of available information. It relies on a series of infrrenrrs in which the weight and value that an individual places on each piece of information may be unique and refers to their existing knowledge base (ref. 1-1). Some evaluations are relatively direct, based on immrdiat,r sensory or perceptual processes, whereas others involvr organization of background knowledgr. infrrence, and relat.. ing existing knowledge to different aspects of the current situation. We feel that t,he experience of workload represents a combination of immediate experiences and preconceptions of thr rater and is, t,hrreforr, the result of constructive cognitive processes. I n niaking many judgements, people apply heuristics that are natural to them and seem to br appropriate to the situation. Heuristics simplify evaluation and decision processes
because they can be applied with incomplete information, reducing the parameters that must. be ronsidered by relating the current situation t o similar events in the rater's repertoire. However, their use may lead to systematic biases (ref. 1-26). Different components of a coniplex construct may be particularly salient for one individual but not for another and for one situation but not. another. Thus, different information and rules-of-thumb ma) be considered. The heuristics used to generate evaluations of various physiral features ran be determined systematically. This is done by varying different featurrs of a n objert and comparing the evaluations t o the objective magnitudes of the components. If there is a direct mapping between an increase in a relevant, physical dimension and the obtainrd rvaluation, the nature of the relationship can be identified. These relationships are not likely to be linear, however. Rather, noticeable differences in one or more dimensions are proportional t o t h e magnitude of the change. In addition, by varying the wording of written or verbal instructions, or presenting different reference objects, the basis and magnitude of judgements can be manipulated (ref. I10,I-ll).
Development of NASA-TLX
143
When people eva1uat.e the workload of a task there is no objective standard (e.g.. its "actual" workload) against which their evaluations can be compared. In addition there are no phjsiral units of measurement that. are appropriate for quantifying workload or many of i t s component attributes. This absence of external validation represents one of the most difficult problems encount,ered in evaluating a candidate workload assessment technique or the accuracj of a particular rating. There is no objective workload continuum, t h e "zero" point and upper limits are unclear. and intervals are often arbitrarily assigned. Th e problem of a "just rioticeablr differenre" is particularly acute in workload assessment. since rating dimensions ar e often indirect I! relatril tcr ohjective. quantifiable, physical dinrrrisions. T he attributes that contribute to workload experienres bar? between tasks and between raters because workload is not iiiiiquely defined by the objective qualities of the task demands; workload ratings also reflect ari operaLor's response t o the task. Thus, the workload experiences of difTerrrit iridi\ id u ak faced w i t h identical task rc-quirerrients may be quite different because the relationship b e t ~ e e i iobjective rhanges in a task arid the magnitudes of workload ratings is indirect ratlier than direct. This factor distinguishes workload ratings from many other types of judgements. Furthermore. if workload is caused by one particularly salient source or by very high levels of one or niore fartors, then it is likely t h at other factors will not be considered i n formulating a workload judgement. Specific workload-related dimensions might be so irnperativr, or so imbedded i n a particular context, t h at they rontaminate other, less subjectivelj salient factors. Conversely, less salient factors cannot be evaluated without also considering those th at ar e more salient.
Individuals' Workload Definitions T wo facets of subjective workload experiences ar e of interest: the immediate, often unverbalized impressions that occur sponta~ieously,and a rating produced in response t o a n experimental requirement. It is unlikely that the range of ratings t h at subjects typically give for the same task reflects misiriterpretation of th e question--most people have some concept of what the term workload meaiis. However. they use th e most natural way to think about it for themselves. Individuals ma) consider different sets of variables, (which may be identical to those experimenter intended) because they define ( an d thus experience) workload in different ways. The amount of "work" t h a t is "loaded" on them, the time pressure under which a task is performed. the level of effort exerled. success in meeting task requirements, or the psychologiral and physiological roiisequences of the task represent th e most typical definitions. Thus, one individual's "workload" rating ma! reflect her assessment of task difficulty while another's might reflrrt the let el o f effort he exerted. I t is impossible t o identify the source or sources of a workload rating froin th e magnitude of t h r numeric value. 111 general, p e o p l ~are unaware of the fuzziness of their own definitions or the possibility that theirs might be diflerent than somcaone else's. Given more informat.ion about what factors the) should consider. thry can evaluate these fartors (e.g.. they can rate stress, fatigue, frustration. task demands, or effort) even though the) might not naturally include them in a subjective experience of workload. However. it seems to be intuitively unlikely t h at their global, personal experiences of workload would be affected by instruction t o consider only one or two aspects of a situation.
Thus, we assume th at workload represents a collection of attributes t h a t may or may not be relevant in controlling assessments and behavior. They depend on the circumstances and design of a given task and t h e 4 priori bias of the operator. Th e natural inclinations of different individuals t,o focus on one task feature or another may be overwhelmed by t h e types and magnitudes of factors t h a t contribute t o the workload of a specific task. For example, t h e workload of one task might be created by time pressure, while t h at of another might be created by the stressful conditions under which it was performed. T h e workload of each task
144
S.G. Hurt and L. E. Stuveland
can be evaluated, but the two apparently romparable ratings would actually represrnt two different underlying phenomena. S o u r c e s of R a t i n g V a r i a b i l i t y Workload ratings are subject to a variety of task- and operator-specific sources of variability. some of which have been mentioned above (e.g.. identifiable biases held by t,he raters or the objective manipulations of task parameters). Othrrs represrnt t.hr less predirtable. but measurable. behavioral responses of operators to the task. The reniaiiider a r r more difficult t o ident if.: differences iri sensitivity 1.0 the types and magnit titles of task nianipulations. nlotivat ion. expectations. and subjective anchor points aiid inter\ al values. The large betweensubject variabilit! ctiaracteristir of subjective ratings does riot, therefore. occur exclusively as a consequenrc of random e r r o r or "noise.". Instead, many of the sources of variability can be ident.ified and minimized through giving inst.ructions, ralibrating raters by demonstrating concrete examples, providing reference tasks. and identifying subjective biases and natural inference rules. The waorkluad experienres o f operators are dificult to modify, but the procedures with which evaluations are obt.ained ran b( designed 1.0 reduce unwanted between-subject sources of variability. Research Approach The goal of the research described below wan to drvelop a workload rating scale t h a t provides a sensitive summary of workload variations w i t h i n and between tasks that is diagnostic with resprct 1.0 t h e sources of workload and relati\el) insensitive t o individual differences among subjects. We formulated a conceptual framework for discussing workload that was based on the following assumptions: workload is a hypothetical construct; it represents the cost incurred b) human operators to achieve a specifir level of performance and is not, therefore. uniquelq defined by t tie objective task demands: arid i t reflects multiple attributes that may have different relevance for different individuals; i t is an implicit combination of factors. Although the experience of workload may be commonplace, the experimental requirement t o quantify s u c h an exprrienre is not. Kevertheless, subjective ratings may come closest. to tapping the essence of mental workload and provide the most generally valid, sensitive and practically useful indicat.or. The ability of subjects t o provide numerical ratings has received limited theoretical attention because ratings a r e subject to "undesirable" biases. In fact, these biases may reflect interesting and significant cognitive processes (ref. 1-1). In addition, although there may be wide disagreement among subjects in the absolute values of ratings given for a particular task, the rank-ordering of tasks with respect to workload is quite consistent and the magnitudes of differenres in ratings among tasks arc' reasonably consistent. There is a comrnori thread t h a t unites subjective ratings that can be ternied "workload". The problem is how to maximize t.hr contribution of this unif! ing component 1.0 subjective rat.ings. and t o identify and minimize t h e infiuenres of other, experinirntallj irrelevant. sources of variability.
To accomplish this, a set of workload related factors was selerted and subjective ratings were obtained in order to determine the following: ( I ) What factors contribute to workload? ( 2 ) What are their ranges. anchor points. and interval values? (3) What subset of these factors contributes t o the workload imposed by specific tasks'! and ( 4 ) What do individual subjects take into account when experiencing and rating workload? The following sections review the results of a series of experiments t h a t were undertaken to provide such a d a t a base. The goal w a s t o provide empirical evidence about which factors individuals do, or do not associate with the experience of workload and the rules by which these factors are combined t o generate ratings of overall workload. First, we analyzed the d a t a within each experiment t o determine the sensitivity of individual scales, overall workload (OW) ratings, and weighted workload ( W W L ) scores to experimental manipulations. Next, the data from similar experiments were merged into six
Development of NASA-TLX
145
categories. Correlational an d regression analyses were performed on these d at a, as well as on thr entire d a t a base, to det.ermine ( 1 ) the statistical association among ratings and ( 2 ) the degree t o which these srales, taken as a group, predicted OW ratings. Th e results of these analyses were then used to select a limited set of subscales and the weighting procedure for a new multi-dimensional workload rating technique. M'e found t h a t , although the factors that contributed to the workload definitions of individual subjerts varied as predicted, task-related sources of variability were better predictors of global workload experiences than subjective biases. .4 model of t h e psychological structure of the subjertive workload estimation process evolbed from the analyses performed on this d at a base. I t is presented i n Figure 2. This model represents the psychological strurture of subjective workload evaluations. It is adapted from a similar strurture proposed by .4nderson (ref. 1-1) for stimulus integration, since the process of workload assessrrlent is alniost certainly an integrative process in which external rvent.s art. translared into subjective experiences and overt responses. The objective mental. physiral. and temporal demands (hlD.1'1) and TD) t.hat are imposed by a task ar e multi-dimensional and may or may not rovar!. They are rharacterized by objective magnitudes ( h l ) and levels of importance ( I ) specific L O a task. \+'hen the requirements of a task ar e perceived by the performer, their significance, magnitudes. and meaning may be modified somewhat depending on his level of experience. expectations, and uridrrst.anding. These psychological variables, which are counterparts to the objective Iask variables, are represented by md, pd, and td. They yield emotional (e.g.? FR). rognitive. and physical (e.g., EF)
wf OP
_ _ c _ *
TASK-RELATED FA C T 0 R S
1
SUEJECT.RELATED FACTORS
OVERT RESPONSE
PD,.MD. T D
Objective physical, mental and temporal task demands
M. I wl. md. td BR
Psychologicalrepresentationsof task demands
OP. EF, F R
Subjeetive responseslevaluations of behavioral responses
W
Subjective weighting of factors
Ewl
Integrated subjective experience of workload
Rwl
Formal numeric or verbal evaluation of workload
Objective magnitudesand importance of sources of demands Behavioral responses to task demands
Figure 2. A model of th e subjective workload estimation process.
146
S, G. Hart and L. E. Staveland
rrsporiws that ma) be evidenced as measurable overt behaviors (BH). The results of the indiL idual.' actions nray be self-evaluated (e.g., OP), t.hereby leading t o adjustments i n the levels o r types of responses or a re-evaluat ion of task requirements. These subjertive evaluations, too. ma! or may not covary w i t h each other and, although they are related t o the objective demands. specific st.iniulus attributes may differentially influenrc behavior under different cirruinstanres. Subjrctively weighted ( w ) combinations of such variables can be integrated into a rorripositca exprriencc of workload ( E w l ) . This implicit experience may br converted into an explicit workload rating ( R w l ) i n response t o an rxprriment.al requirement.. The resulting ~ a l u e sd o riot repreherit intirreiit propert,ie\ of t h v objectivr demands. Rather, they emerge from t,tieir int rract i o n w i t h a specific operat or. III order t.o prrdict and understand the relationship betwerri objrctivc, [,ask nrariipirlations and rated norkload. t.he salient. factors and the rules by which thr) arc objertively and subjertiLelj combined m u s t he identified and an apprupriatia proredurr developrd t i 1 ohrain an a r r u r a t r sunimary evaliiation. Ttiiis. 1 \ 1 1 ) type5 o f i r i f o r i ~ ~ n t i o nare needed ahout earh factor included i n a multidirnrnsional workload scale: ( 1 ) its subjrctivc. importance as a source of loading for t h a t type of task (its weight). and ( 2 ) its magnitude i n a particular example of t h e t.ask (the numerical value of a rating). f:or rxarnplr. the mental demands of a {.ask can be the most salient feature o f i t 5 deriiaiid structure. althoirgh the amount of such demands r a n v a r y from one version of t h e task t o another. Ctinversely, the valur of one might vary at different. levels of the other: tirrie pressure might beromr relrvant only uphen it is high eriough 1.0 interfere with performance.
.4 rat.ing scale is proposed, the NASA-Task Load Index (NASA-TLX), t h a t consists of six romporient srales. An average o f these six scales, weighted t,o reflrct the contribution of each fart.or to the workload of a specific activity from the perspective of the rat.er, is proposed as an integrated measure of overall workload. Finally, the resulth of a validation and reliability study are described. See Referenre Section I l l for a listing of recent experimental uses of the NASA-TLX.
Research Objectives a n d Background Our first step was t o ask people engaged i n a wide range of orrupations t o identify which of 19 factors were subjrrtively equivalent t o workload. related to i t , or unrelated (ref. 1-13). Surprisingl), nonr of the fart,ors was ronsiderrd t o be irrelevant by morr than a few raters, and at least 14 o f t h e factors wcre considered t o be subjertively rquivalent to workload by more than 60"; o f them. No rrlationstiip hetween t he rc*sponsr patterns and the evaluators' rducational o r ocrupat iorial backgrounds were found. Our n e x t strp was t o ask sevrral groups o f suhjerts to rvaluate their experiences with rrspert to the 14 inost salient factors follriwirig a variety of laboratory and simulated flight. tasks (refs. 1-2. 1-14.1-29). Different concepts of workload were identified by determining which rornponent ratings cokaried wit.ti an overall workload rating that was provided by each subject after earh experimental condition. Several fartors (e.g.. task difficulty and complexity, stress, and ment.al effort) were consistently related to workload across subjects and experiments. Other fact.ors (e.g.. time pressure, fatigue, physiral effort, and own performance) were closely related under some experimental conditions, and not under others. Again, the most salient factors were selected and a set of 10 bipolar rating scales were developed (Figure 3): Overall Workload ( O W ) , Task Difficulty (TD), Time Pressure ( T P ) , O w n Performance ( O P ) , Physical Effort (PE), Mental Effort (ME), Frustration ( F R ) , Stress ( S T ) , Fatigue (FA), and Activity Type ( A T ) . A T represented the levels of behaviors identified by Rasmussen (ref. I- 19): skill-based, rule-based, and knowledge-based. It has been suggest.ed that the three levels of behavior are associated with increasing levels of workload (refs. 1-16, I-
Development of NASA-TLX
~
~~
147
~
.-
~~
I
FIGI'RE 3 RATING SCALE D E S C R I P T I O N S - -
~
Title
~~~~~~~~
~
~~~~
~~~
~-
Descriptions .
__~___
~~~
-
L o u , H1gh
T h e total workload associated with the task, considering all source^ a n d components.
TASti DIFFICIJLTJ
Low, High
Whether t h e task was easy o r demanding, simple or complex, exacting or forgiving.
TIME P R E S S U R E
t i o n e , Rushed
T h e a m o u n t of pressure you felt d u e t o t h e rate at which t h e task elements occured. W a s the task slow and leisurely o r rapid and frantic?
PERFORMANCE
Failure, Pe rje c t
Ilow successful you think you were in doing what we asked you to d o and how satisfied you were with whal you accomplished.
M E N T A L SENSORY E F F O R T
None, lnipossible
T h e a m o u n t of mental a n d / o r perceptual activity t.hat. was required (e.g., thinking, deciding. calculating, remembering, looking. searching. etc.).
PHYSIC 4~ EFFORT
None. Impossible
T h e amount of physical activity that was required ( e g . . pushing. pulling, turning conLrolling. activating, e t c . ) .
F R U S T R A T I O N LF:VEL
Fulfilled, Ezasperated
How insecure. discouraged. irritated. and annoyed versus secure, gratified, content, and complacent you frlt.
S T R E S S LEVEL
Helazed, Tense
HOW anxious norried. uptight, and harresed or calm. tranquil, placid, and relaxed you felt
FATIGUE
Exhausted, Alert
How tired. weary, worn o u t , and exhausted o r fresh, vigorous, a n d energetic you felt
ACTIVITY T Y P E
Skill Based,
T h e degree t o which t h e task required mindless reaction t o well-learned routines o r required t h e application of known rules o r required problem solving and decision making.
Rule Based, Knowledge Based
I48
S. G. Hurt and L. E. Stuvelund
2 s ) . l k h wale was prrsrnted as an 12-cm line with a title (e.g., MENTAL E F F O R T ) and bipolar descriptors a t each end (e.g.. lIIGH/LOW). Numerical values were not displayed, but \slurs ranging from 1 to 100 were assigned to scale positions during d a t a analysis. This set of scales was used to evaluate the experiences of subjects in 25 different studies. The ratings were obtained after each exprrimental task. The results obtained in 16 of these experiments are the focus of t h r rurrrnt chapter. Since the resrarrh questions and erivironnients differed from one ruperiment 1.0 the n e x t . the d a t a base includes a broad set of experiences in which the associaI i o n \ arnong workload-related factors, global ratings of workload, and measures of perforirianre coiild IN c \ aliiated. The relat i \ c* iiriport.ance of the nine component factors t o each subject's personal defiriitiori of workload w a s determined i r i a pretrst. All possible pairs ( n = 36) of the nine factors wrre presented i n a different random order t o each subject. The member of each pair srlectrd as most relwant 1.0 workload was rerorded and the number of times each factor was selected was roriiputed. The resulting values could range from 0 (not relevant) to 8 (more important than an! other factor). The more important a factor was considered t o be, the more weight the ratings of that factor were given in computing an averagr weighted workload srorr (M'M'L) for each rxperiinental roridition. 'These d a t a were obtained for two reasons: (1) to examine (.he relat.ionship between the expressed biases of subjects about each factor and the associations between the magnitude of the ratings for the same factors and rated OW, and (2) to use thrse as weights in combining the nine bipolar ratings to produce a workload score that erriiilated the heuristics that subjects reported using.
In comput.ing thc weighted workload scores, we assumed the following: (1) The factors considered in forrnulnt,ing a single OW rating varied from one subject to the next, contributing to between-subject (B-S)variability. (2) Subjects would be able to evaluate all of the factors (even though they might not normally consider them in evaluating workload). (3) The subjects could judge the magnitudes of the component factors more acrurately and with less B-S variability than they rould the fuzzier concept of OW. ( 4 ) T h e ratings the subjects made iiiight represent tlir " r a w data" for subjects' natural inference rules. (5) Rj combining these coniporierit jridgei~leritsarrording to each subject's own inference rules (as reflected in the workload weight,s). an estimat.r of workload could be derived (WM'I,) that. would be less variable herweeii subjects than ratings of OW. (6) The combination rules would be linear. (7) The weighted averaged ratirigs would reflert the general import.ance of the fact.ors to individual subjects and their rated magnitudes i n a given task. Our goal as 1.0detrririirie which scales best reflected experimental manipulations within experiments. differentiated arnong different types of artivities, provided independent information. and Here subjectively and enrpirirally associated with global workload ratings. To accomplish this. nr at tempted LO ul)t,ain information about the individual and joint relationships among thr nine fartors. OM'. and experimental manipulations from many perspectives t,o obtain the most coinplete understanding of the underlying functions.
OVERALL RESULTS The experiments included in the d a t a base described in this chapter a r e listed in Reference Section 11. Each one was analyzed individually and the relationships among performance measures, ratings, W WL scores, and experimental variables have been reported elsewhere. Thus, specific experimental results will not be described below. Instead, more global statements germane to the definition and evaluation of workload in general will be made for categories of similar experiments and the entire d a t a base. Although many of the same sub scales and the weight,ing technique were used in other experiments, these were not included either because the raw d a t a were not readily available or because one or more subscales were not used (refs. 1-5, 1-17, 1-27, 1-28).
Development of NASA-TLX
149
The d a t a were divided into two "population" d a t a bases. The rating d a t a base contained 3461 entries for each of the 10 scales and WWL. The weight d a t a base contained the workload biases given by the same 247 subjects. Figure 4 presents the average weights given to the nine factors. and presents the average ratings. Tables l a and l b show the correlations among the weights placed on each factor and among the ratings, respectively. Figure 5 presents the relative frequency distributions of obtained ratings and W W L scores. A variety of statistical analyses were performed within individual experiments t,o demonstrat,e the effectiveness of the experiment.al manipulations. They included analyses of variance and correlations among measures of workload and performance. In addition, multiple correlations among individual rating scales were performed. the coefficients of variation (SD/Mean) for OW and for WWL were computed for individual experimental conditions, and sensitivity tests were conducted to compare the percent,ages of variance accounted for b) the O W rating scale and the WWL score. Additional analyses were also performed on the groups of d a t a in each categorj and for the entire dat,a base. Yon-parametric Komalgorov-Schmirnoff tests (ref. 1-23) were performed to compare distributions of ratings given for each scale among the categories of experiments and against the "population" d a t a base. Standard multiple correlations were performed among the scales and among the workload-importance weights. The individual scales were correlated with O W t o determine the associations of each one with the more global construct across all categories and within each category. In addition, all nine scales were regressed against O W to determine the percent of variance in OW ratings for which their linear combination accounted. Stimulus attributes were under only limited experimental control and may have been too inter-correlated t o discriminate among the range of individual dimensions represented in either individual or collective experiments. Furthermore, the variability in generating workload ratings may not have depended solely on the experimentally imposed tasks (ref. 1-1) because raters may or may not have perceived the task parameters in the same way (which could lead to a subject by task interaction). Finally, the fact that there was multi-collinearity among the component scales suggests t h a t the beta weights for individual factors may not have reflected their individual and joint predictive power. IVevertheless. the beta weights (Table 2a) taken in conjunction with the correlations between each factor and OW enabled us to identify the primary sources of workload in each type of task. For simplicity's sake. any correlation that accounted for more than 50 percent of the variance will be considered. The squared correlation coefficientsfor each fact,or with OW' are presented in Table 2b.
Weights Although there was considerable disagreement among subjects about which combinations of factors best represented their concept of workload, S O ~ Cconsistent trends were observed (Figure 4a). T P was considered the most important variable, followed by FR. ST, ME and TD. PE was considered the least important variable and FA and A T were also relatively unimportant. The importance assigned to each factor appeared t o be relatively independent of that assigned t o any other (Table la). To some extent this is an artifact of the pairwise comparison technique with which the weights were obtained; every decision in favor of one member of a pair of factors was made a t the expense of whatever factor was not selected. The greatest statistical association was found between A T and S T (-0.50) or F R (-0.40); if the type of activity performed was considered particularly important, feelings of ST or FR were not considered relevant, and vice versa. The next highest degree of association was found between OP and FA (-0.46) or ST (-0.35); subjects who equated workload with success or failure on a t a s k did not consider their feelings of FA or S T to be relevant and vice versa. This suggests that there may be a t least two patterns of workload definition: one based on t a s k and
S. G. Hart and L. E. Staveland
150
-~
..
~~
__
~~
Table la: POPULATION
Correlations among subjective importance values of 9 workload-relaled factors ~
TP -.24 -.31 -.24 .05 .07 -.03 -.I7
-.08
ME
.I6 -.37 -.21 -.21 .08
ST FA AT
OP
-.07 -.01 -.21 -.24 -.46 .08
~
~~~
.~
.
~~
PE
ME
-
-.05 -.26 -.35 .03 .I7
~
~~
FK
-
~
ST ~..____~_
-~
-.30 -.28 -.36 .30
.32 .10 -.40
-~ ~~-~-_____
.24 -.34
-.50
Table l b : POPULATION __ ____ Correlations among raw bipolar ratings and OW
_______~___.. .~ . ____~_ ._ -~ ~
~~
~~
~
~
TD
TP
.64 .58 .53 .76 .65 .63
.
PE
OP
~
~~~~~
.50 .57 .58
.38 .53 .68 .48 .40 .lI .50
.60
.66 .33 .29 .60
~
ME.^^ F R
~
~~~
-~
~~~
.4i .45 .56 .40 .20 .52
.6 1 .60 .37 .30 .73
ST
.52 .2 I .62
L
FA
AT
.I 1 .40
.30
~
__
Table 2a
-
-.
~~
~
-
-
~~~
~ - ~ ~ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ~ .- .
~
~~~
_ _ _ ..
~~~
__ -
~~
Beta weights for ratings regressed on OW ( * = p < . 0 1 )
- .___________
.-
~~~
r2 .75 .81 .85
-~
SING L E-COG N 1TI VE SINGLE-MANI'AL DI' A I,- TASK FITTSUEKG POPCORN SIM 1;1, AT I ON S POPULATION
TD
TP
.50* .02 .47 *.13* .49' . I 1* .80 .56* .03 .65 .4R' .23* .77 .79* .03 .73 .55* .09*
OP
PE
ME
.13* -.14* -.11* .05 -.12* .05 -.02
.06
.16 -.03 .2R* -.02 .34* .01 .18* .04 -.07* .17* .22* -.in*
.11*
.13* .04 .02 .04 .07*
.21*
FK ~ _ ST _ _ FA
.01
.09* .26' .03 .lo* .09* .05 .lo*
.07* -.03 .lo* .02 -.08* -.lo* -.01
AT .06 -.02
-.01 .06 .07* .09* .01
~
--~______
-
~~~
-.
~
.71 .51 .21 .63
_ ~ ~ _ _ _ ~ ~ - _ _ _ _ _ _ _ - .
~
-~
~~~~~
Table 2b
-~ __ ~._ _ ._. ~ _ _ _ Variance in OW accounted for by each factor for each experimental category ~
~~
~
._____~.___._____
TD
SINGLE-COGNITIVE SINGLE-MANUAL DUAL-TASK FITTSBERG POPCORN SIMULATIONS POPULATION
TP
________~_ .69 .26 .69 .36 .77 .58 .74 .44 .59 .55 .74 .I3 .69 .36
OP
PE
ME
FR
ST
FA
AT
.25 .19 .34 .I5 .29 .14 .25
.I4 .26 .36 .26 .I9 .I8 .27
.52 .58 .71 .58 .40 .42 .53
.41 .48 .49 .48 .37 .I1 .39
.30 .52 .50 .38 .37 -20 .38
.17 .20 .19 .18 .09 .04 .16
.14 .05 .I8 .I6 .09 .01
.09
Development of NASA-TLX
151
performance related factors and another based on the subjective and physiological impact of tasks on the performer. Ratings T he grand means of th e 10 scales across all of the experiments were not equivalent (Figure 4b ). This suggests either th at th e range of tasks was not sufficiently representative of the possible ranges for dill'erent scales, or th at th e bipolar drsrriptions used t o anchor the scales Here not subject.ively equivalent,. Average ratings given for the 1 0 scales ranged from 25 (PE) to 42 ( M E ) . Overall rating variability was relat.ively ronsistent across t,he ten scales (SDs ranged from 20 t o 24). As expected, the W W L scores were less variable (SD 1 7 ) . ~
Figure 5 depicts the frequenry distributions of ratings obtained across all experiments and subjects for each factor. Th e relative frequencies represent t h e average magnitude of ratings on each factor scaled in 10 point i n c r e m e n k Th e distributions of individual scales were quite different. TD, O P , ME, and OW ratings, and WN'L scores were normally distributed across subjects and experiments. T P . ST. FA. and PE distributions were skewed; most of t h e ratings were relatively low, but there were insranres in which very high values were given. A T ratings were birnodally distributed. Th e peaks rent.ered between the points designated "skillbased" and "rule-based" and between those designated a s "rule-based" and "knowledgebased". Each distribution was compared to every ot.her using the Kornalgorov-Schmirnoff test. Significant dilfererices were found among all of the distributions except among OW, TD , and T P . T he greatest differenres were found between \VWL scores (which rombines elements from all of the other scales weigkited t,o reflect. the individual subject's biases) and the individual scales. 'The rank-order correlation between mean OW ratings and M'WL scores within each experiment and across all experiments was very high (0.99). However. t h e coefficients of variation were substantially less for the W W L scores (0.39) than for OW ratings (0.48). Thus, the reduction in variability found for WU'L scores was not simp11 due to the smaller magnitudes of t.hese scores (mean 35) compared t o OW ratings (mean = 39) but represented a meaningful reduction of unwanted "noise". Thus, the linear combinat,ion of ratings, weighted acrording t o the information available about each subject's natural inference rules? discriminaled among experimental conditions a t least as well a s a single OW rating. More significant, ~~
la) RATED IMPORTANCE O F
lb) AVERAGE SUBJECTIVE RATINGS
WORKLOAD-RELATED FACTORS
40
1' 11n 3' a
0
TD
TP OP PE M E FR ST F A A 1 WORKLOAD-RELATED FACTORS
u30 I-
s
20 10
0 TO
TP OP PE M E FR ST F A AT OW WORKLOAD-RELATED FACTORS
Figure 4. Summary of a priori importance (4a) and task-related magnitudes (4b) assigned to ten factors by all subjects ( N s = 247) and for all experimental conditions ( N s X Nc = 3461).
152
S. G. Hart and L. E. Staveland
however, was the finding that B-S variability was less for W W L scores than for OW rat,ings in every experiment. The coefficients of variation were computed for each experimental condition and averaged for each experiment. They ranged from 0.19 to 0.73 for OW ratings and from 0.17 to 0.60 for WWL scores. The average reduction in variability was 20% between OW ratings and WWL scores. although it was as great as 46q for some experiments. Also, in all cases, differentially weighting the bipolars t o produce W U’L reduced B-S variabilit,y and increased sensitivity t o experiment.al manipulations beyond that which could be obtained by rornputing a simplc average of individual srales. The B-5 variability of the equal weighting srheme fell between that of WN’L and the O M ratings. Thus. we were able to synthesize a workload estima1.e from the elemental values given by the subjects ( t h e bipolar ratings) by combining them according t o an approximation of their own inference rules (the weights). This derived score appeared to reflect a common fartor in each experimental condition (its overall workload), but with less variabilit) among subjects than OW ratings.
A significant, positive association was found among many of the rating scales (Table Ib). Most of the correlations were significant, because so many d a t a points were included, but not all of them accounted for a meaningful percentage of variance. The highest correlations were found between ME and T D (0.76) and between S T and F R (0.71); however, only the correlations between T D and OW and between M E and O W accounted for more than 50 percent of the variance (Table 2b).
TD, ME, and S T had the highest loadings in the regression equation that related ratings on the nine component factors t o OW (0.55, 0.21, and 0.10, respectively) (Table 2a). Although FR was significantly correlated with O W , it contributed nothing t o the O W regression equation. This could reflect the fact that it was so highly rorrelated with most of the other factors (e.g., TD, T P , O P , ME, S T , F A ) that it did not contribute independently to O W . T P , often ronsidered t o be a primary component of workload, contributed surprisingly little to the regression equation (loading = 0.09). It is possible that this occurred because T P was not delib~rately manipuTable S lated as a source of loading i n many of the A priori rank-order of factors (weights) experiments. AT w a s notably unrelated to I comDared to emoirical associations with the other factors and did not contribute OW ratings significantly t o the OW regression equaCorrelation with: tion. FA, also, was relatively unrelated t o Weight Loading ow the other scales, most likely because the TP 4.75 .09 .60 effects of fatigue were counterbalanced TD 4.50 .55 .85 across experimental conditions (by varying ME 4.36 .21 .75 the order of presentation for different levels) OP 3.95 -.02 .50 in most of the studies. ST 4.56 .10 .62 FR It is interesting t o compare the associ4.51 .01 .63 FA 5.56 -.01 .40 ations between the nine facLors and workAT 5.60 .01 .so load as expressed in the preliminary pairPE 2.21 .07 .52 wise comparisons to the empirical relationships observed bet ween ratings on the same factors and OW ratings. Table 3 summarizes the a priori evaluations (the weights), the loadings for each factor in the OW regression equation, and the correlations between ratings on each scale and OW ratings across all subjects and experimental conditions. As you can see, there were some discrepancies. Most notably, TP was judged t o be more closely related t o OW (it was given the highest weight) than was apparent from the experimental results. The same was true for OP. On the other hand, PE was rarely selected a s an important component of workload (it was given the lowest
Development of NASA-TLX
153
weight). but ranked 5th in th e regression equation. These results, taken in combination with the success of the derived workload score in reducing B-S variability wit,hout substantially improving sensitivity t u experimental manipulations, suggest t h at other factors influenced the association between component factors and OW in addition to the differences among subjects' workload definitions.
EXPERIMENTAL CATEGORIES T he d a t a from similar types of tasks were grouped into six categories to determine whether different sources of loading (e.g., mental or physical effort, time pressure, task difficulty) did in fact rontribute to th e workload of different kinds of activities. Some studies TASK DIFFICULTY
TIME PRESSURE
"r\ OWN PERFORMANCE
PHYSICAL EFFORT
1
FRUSTRATION
ir\l FATIGUE
0
OVERALL WORKLOAD
I
20 40 60 80 RATING INTERVAL
WORKLOAD
1
INDICATES M E A N FREQUENCY
0 0
20 40 60 80 RATING INTERVAL
100
100
0
20 40 6 0 - 80 RATING INTERVAL
100
FIGURE 5 . RELATIVE FREQUENCY DISTRIBUTIONS1 OF RATINGS AND WWL SCORES FOR ALL SUBJECTS AND EXPERIMENTAL CONDITIONS (Nc X Ns = 3461).
154
S.G. Hart and L. E. Staveland
provided d a t a from different experimental conditions for more than one category. categories are
The
( 1 ) Simple, discrete tasks that emphasized S I N G L E C O G N I T I V E activities (refs. 11-2, 6, 7, 10, 1 1 , 13, 1 4 ) ,
(2) Continuous S I N G L E - a x i s M A N U A L control tasks (refs. 11-2, 14). (3) D U A L - T A S K experiments pairing c o n r u r r c ~but ~ ~ unrelated cognitive and manual r o n t r d activities (refs. 11-2, I S ) , ( 4 ) F I T T S B E R G tasks where response selection and execution elements were functionally integrat,ed and sequentially executed (refs. 11-6, 7, 11, 13, 16),
(5) P O P C O R N task supervisory rontrol simulations (refs. 11-1, 4, 5 ) , (6) S I M U L A T I O N S conducted in a motion-base, single-pilot, simulator (refs. 11-3, 8, 19).
The same analyses that were performed on the "populat.ion" d a t a bases were performed for each experimental category. In addition, each category was compared t,o the "population". The presence of task-related sources of variability in workload w a s determined by examining the rorrelation maLrices of factors, the correlation tables of fartors by categories, and the regressions of the subscales on O W (Table 2a). Our expectation was that. different factors would rontribute in different amounts t o the overall workload of various types of tasks. For example, ME should be more salient for the SINGLE-COGNITIVE tasks, whereas P E should be more important for the SINGLE-MANUAL tasks. TP should be a particularly important source of workload for the POPCORN tasks, as this was the primary factor that was experimentally manipulated, whereas i t should play a minor role in the FITTSBERG tasks, a s TP was not deliberately manipulated there. K e assumed that the subjects included in each category represented a random sampling from the population a5 a whole and t h a t there would be no systematic differences in workload biases of subjects who participated in one category of experimental tasks as compared to another. Since the workload biases were obtained in advance of each experiment, they should represent relatively stable opinions held by the subjects. rather than the effects of specific experimental manipulations. In fact, this w a s what we found. However, considerable variability was expected within each cat.egory due to the individual differences t h a t are the focus of the weighting technique. Because the weights given by the subjects in each category were not significantly different from the population, the specific values obtained for each category will not be presented.
SINGLE-COGNITIVE C a t e g o r y The SINGLE-COGNITIVE category included d a t a from seven experiments. Each experimental task generally presented one stimulus and required one response for each trial. The primary source of loading was on cognitive processes. Five groups of experimental conditions were the single-task baseline levels for other experiments. The tasks included (1) a spatial transformation task presented visually or auditorily and performed vocally or manually; ( 2 ) variants of the Sternberg memory search task presented visually or auditorily; (3) choice reaction time; ( 4 ) same/different judgements; ( 5 ) mental arithmetic; (6) time estimation; (7) greater/less than judgements; (8) entering a number or a number plus a constant with
Development of NASA-TLX
FIGURE 6B. SINGLE-MANUAL CATEGORY: SUMMARY O F RATINGS (Ns X Nc = 240).
FIGURE 6A. SIN(;LE-COGNITIVE CATEGORY SUMMARY OF RATINGS (Ns X NL = 554).
TD
TD TP OP
TP OP PE ME FR ST FA AT OW WORKLOAD-RELATED FACTOR
PE ME FR ST FA AT OW
WORKLOAD-RELATED
FACTOR
FIGURE 6D. FITTSBERG CATEGORY: SUMMARY OF RATINGS (Ns X Nc = 9 18).
FIGURE 6C. DUAL-TASK CATtGORY: SUMMARY OF RATINGS (Ns X Nc = 732) fin ""
155
I
T
40
n *
p- 30 2 20 I-
10 0 TD
TP OP PE M E FR ST FA AT OW WORKLOAD-RELATEDFACTOR
TD TP OP PE ME FR ST F A AT OW WORKLOAD-RELATED FACTOR
FIGURE 6E. POPCORN CATEGORY: SUMMARY O F RATINGS (Ns X Nc = 504).
FIGURE 6 F . SIMULATION CATEGORY: SUMMARY O F RATINGS (Ns X Nc = 396).
TP OP PE ME FR ST FA AT OW WORKLOAD-RELATED FACTOR
TD TP OP PE ME FR ST F A AT OW WORKLOAD-RELATED FACTOR
TD
*INDICATES GRAND MEAN OF POPULATION IN = 34611
S. C. Hart and L. E. Staveland
156
different input devices; (9)memory span; (10) flight-related heading calculations; and ( 1 1) mental rotation. Performance w a s evaluated by percent correct and reaction time ( R T ) . The typical finding was that accuracy decreased and RT increased as the difficulty of the information proressing requirement.s was increased. In addition. performance differences were found between alternative display (e.g., audirory versus visual) and response modalities (e.g., voice, keyboard, microswitc.h, touch-screen, joystirk). For ever! experimental t.ask, workload ratings tended to follow the same patt.erns a s performance nieasures: higher levels of subjective workload acrompanied poorer performance. In addition. st irriulus and response modalities that degraded performance were also rated a s having higher workload. The ratings obtained for the SINGLE-COGNITIVE tasks were either equal t o or lower t.han the overall means (Figure 6a). P E in particular was considered t o be very low, reflecting the task characteristics. The ratings were somewhat more variable than the norm, possibly reflecting the diversity of tasks with which they were obtained. Despite this, only three of the rating distributions differed significantly from the "population" distributions: OW, T D and PE. Relatively few scales demonstrated strong statistical relationships with each other. However, T D w a s highly correlated with M E and FR, and FR was also highly correlated with T P and S T (Table 4). Only T D and M E had correlations that, accounted for more than 50 percent of the variance in OW (Table 2b).
SINGLE-MANUAL Category A variety of one and two-axis tracking tasks were included i n this category. As with SINGLE-COGNITIVE, these tasks represented the single-task baseline levels for other categories. The primary source of loading was the physical demands imposed by different experimental manipulations: ( 1 ) the bandwidth of the forcing function (three levels in each experiment), (2) order of control (constant or variable), and (3) the number of axes controlled ( 1 or 2). T h e display modality was visual, the response modality, manual Performance and workload levels covaried with the bandwidth manipulations; as bandwidth increased, subjective workload and tracking error increased. In addition, the variable order of control tasks were performed more poorly and were rated a s having higher workload. Finally, two-axis tracking was considered t o be more loading than one-axis tracking. In general, SINGLE-MANUAL ratings were higher than the "population" ratings. (Figure 6). FR and S T ratings in particular were higher than for any other tasks, possibly
-_
.-
Table 4: SINGLE-COGNITIVE
~~
TP OP PE
ME FR ST FA AT
ow
TD .47 .41 .34 .74 .64 .50 .34 .34 .83
7 _______
__
TP
.40 .29 .49 .60 .55 .43 .17
.51
OP
.I3 .40 .59 .37 .28
.17 .50
PE
.36 .29 .39 .35 .08
.37
ME
FR
ST
FA
AT
.57 .45 .28 .31 .72
.71 .52 .20 .64
.54 .19 .55
.16 .41
.37
Development of NASA-TLX
I57
reflrcting the subjerts' perceptions th at some of th e conditions were relatively uncontrollable.
ME was rated relatively higher than might be expected by the nature of the tasks. AT w a s rated a s "skill-based". Th e subjects thought their own performance was generally poorer than on other tasks. Most of the rating distributions were significantly different from the "population" distributions except for W WL. ME, PE, and ST. Particularly high correlations among the scales were found between T D and ME, among FR. TP and PE. and among S'T. ME. FA and 12R (Table 5 ) . As might be expected from the nature of these tasks. a relatively high correlation was found between OM' and PE. However. only TD, ME and S T had rorrelations tha t accounted for m o w than 50 percent of the \ariarice (Table 2b).
DUAL- T A SK Category The d a t a from two experiments were included in this category. In each one, continuous one- and two-axis iracking tasks were combined with a discrete, cognitively loading task. Difficulty on the tracking task was manipulated by varying the order of control and bandwidth of the forcing function. For one experiment, the discrete task was three levels of difficulty of an auditory Sternberg memory search task, presented as a pilot's call-sign; responses were vocal. For the other, a spatial transformation task was presented visually or auditorily; re s pons ~ swere vocal or manual. Each task was presented in its single-task form first. T he d a t a from these baseline conditions ar e included in the SINGLE-COGNITIVE and SINGLE-MANUAL categories. Th e DUAL-TASK conditions represented different combinations of difficulty Irvels for the two tasks. Time-on-task was manipulated, as well, (ref. 11-2) to determine the relationships among fatigue, workload, and event-related cortical potentials in response to the call-signs. For one experiment, performance on both task components was degraded by time-ontask. Tracking performance was also related to bandwidth. O W , FA, tracking error, and the amplitude of the positive component of the event-related pot.ential were all significantly and positively rorrelated. For the second experiment (ref. 11-15), the visual input modality for the spatial transformation task imposed less workload and interfered less with tracking performance. Speech output resulted in better performance (on both tasks) and less workload than manual output because the latter interfered more w i t h the manual responses required for the tracking task. Subjective ratings were less sensitive to output modality manipulations than to input modality manipulations and to task combinations than individual task levels.
-
~~
~
Table 5: SINGLE-MANUAL
~--
~~ ~
~
- _.
TP OP PE ME FR
ST FA AT
OW
~~~
TD
TP
.49 .57 .39 .75 .72 .61 .39 .I5 .83
.32 .78 .39 .47 .54 .34 .25 .60
Correlations among bipolar ratings ______ OP PE ME FH ST -
.20 .44 .69 .50 .35 .02 .44
.29 .39 .43 .32 .31 .51
.69 .65 .42 .26 .76
.78 .54 .15 .69
.67 .23 .72
FA
AT
.I4 .45
.22
-
S. G. Hart and L. E. Staveland
I58
DI'AL'TASK ratings were higher, on the average, than the "population" means (Figure 6 r ) . It is not surprising they were higher than th e component single task ratings, but it is
somewhat surprising t h a t they were higher than th e ratings that were given for apparently more romplex simulat,ed flying tasks. DUALTASK distributions were significantly different from the corresponding "population" distributions for TD, PE. FR. S T , and FA. Among the srales, a few high correlations were notable (Table 6): TD with TP and ME; TP wit,h M E, FH and ST: OP with Ftl; and FR with ST--patt.erns almost identiral t o those observed for the "population". Again. T D , ME and ST were all highly correlated wit,h OW accounting for niort' t.han 50 percent of its varianre. reflectitig a patt,erri similar 1.0 that found for SINGLEhl.AN1'.41,. 111 addition, TI' also arrouiited for niorc than 50 perrent of the variance in OW.
FITTSBERG Category The FITTSBERG paradigm provides an alternative to the traditional dual-task paradigm in which two unrelated tasks are performed within the same interval. With the FITTSBERG paradigm, the component tasks are functionally related and performed serially: thc output or response 1.0 one serves to initiate or provide information for the other. A target acquisition task based on FI TTS Law (ref. 1-9) is combined with a SternBERG rnenrory search task (ref. 1-24). Two identical targets are displayed equidistant from a centered probe. Subjerts arquire th e target. on the right, if the probe is a member of the memory set and the target on th e left, if it is not. A wide variety of rwpnnse selertion tasks have been usrd in addition to the Sternberg memory search task: ( I ) rhoice reaction t.ime, (2) mental arit hrrietir. (3) pattern matching, ( 4 ) rhyming. (5) time estimat.ion, and (6) predirtion. \Iorkload levels for one or both components of th e complex task were either held constant o r syst,eniatically increased or decreased within a blork of trials. I n addition, t,hc stimulus modality of the two components was the same (visuallvisual) or different (aiiditorg ,visual). Response selection performanre was evaluated by reaction time ( R T ) and percent rorrert. Target acquisition performance was evaluated by movement t,ime ( M T ) . MT but not HT increased as target acquisition difficulty was increased. RT but not MT increased as the rognitive difficulty of response selection was increased. Information sources, processing requirements. and workload levels of the first stage (response selection) appeared t o bcs relat.ively independent of those for the serond stage (response execution), even though some or
~-
~
~
~~
__
TD
~
~~
01' PE
ME
FH ST
FA AT OW
.72 .65 .s2 .H3 .69 .65 .33 .39 .88
~~~~
~
~
~~~~
OP ~~~~~~
.57 .66 .70 .74
.73 .42 .42 .76
.
~
~
~~
'I'P ~
TP
~
SKS~~ _ ~ ~ ~ Table-6: _ _ _. _DUAL-TA _ Correlations among bipolar ratings -. -~
..
.43 .59 .79 .54 .50 .37 .58
~~
I'E
.46 .52 .57 .40 .35 .60
E'R
ME
_ _ _ ~. ~ -
~~
.69 .69 .34 .48 .84
~~~~
ST ~
FA
~ ~~~
..
AT ~~~
~
.77 .59 .47
.70
.49 .41 .71
.36 .44
.43
Development of NASA-TLX
159
inany of the processing stages were performed in parallel, and the activities required for one siniult.aneously satisfied some of the requirements of the other. Performanre decrements were not found for one t.ask component in response to an inrrease in difficulty of the ot her. Instead. performance and workload ratings for the combined tasks integrat,ed the component load levels F I TTSB ER G ratings and RTs were less than t h e s u m of those for the component tasks performed individually. There was only a small "conrurrenre" cost of about 40 rnsec for RT and a 14'T increase in ratings for t h e combined task over single-task baseline levels. FITTSI3F,R<~ ratings were gc>nerally Ion P X C C ~ ~for J ~ .\T (Figure, 6 d ) . Th e component tasks u e r e not individiiall) difficult and sut)jerts integrated them behaviorally and suhjertively. with a consequent "savings" in experiericed workload. In addition, rating variability was less than usual. Consequently. all of the rating distributions were signifirantly different from t tie "populatioil" distributions. T he following ratings were highly corre1at.e.d wit.h each other: T D , T P . ME, S T and FR (Table 7 ) . T he assoriation between TP and T D is somewhat surprising. as TP is not deliberately manipulated in the FITTSBERG paradigm. Th e fart that RT was the primary performance metric rriay have influenced subjerts t o respond as quirkly as possible-a selfimposed time pressure. However, the design of the experimental t.ask did not itself impose time constraints or limits. Th e low association between OP and OW is also surprising berause performance feedback was given frequently. Although TD, TP, ME, and FR were higtil! correlated w i t h OW, only the correlations between TD and O W . and M E and OW accourited for more than 50 percent of the variance.
POPCORN Category T he P O P C OR N task is a dynamic, multi-task, supervisory control simulation. It represents operational environments in which decision-makers are responsible for serniautomatic systems, Its name. "POPCORN." reflects the appearance of groups of task elements waiting t o be performed (they move around in a confined area and ''pop" out when selected for performance). Operat,ors decide which tasks t o d o and which procedures t o follow based on their assessment of the current and projected situation. the urgency of specifir tasks. and t.he reward o r penalty for performing or failing to perform them. Simulated control functions provide alternative solutions to different rircumstances. They ar e selected with a magnetir pen and graphics pad and exeruted by automatic subsystems.
TD
Table 7: FITTSBERG _ _ _ _ _ - ~ ~ Correlations among.~ bipolar ratings - -~ ~. TP OP PE ME FR
.68 .3a .50 .76 .69 .60 .4 1 .36 .86
.39 .56 .54 .67 .75 .39 .17 .66
~ ~~
~~-
TP OP
PE ME FR ST FR AT
OW
_
~
.16 .34 .45 .19 .20 .05 .39
~
_
_
.47 .44 .51 .25 .23 .51
-
.63 .52 .3a .42 .76 ~
~~
~
~
.70 .46 .20 .69
-
ST
_
.52 .15 .62
FA
AT
.13 .42
.40
_-~
S. G. Hart and L. E. Staveland
I60
T h u s , rontrol acbivities are intermittent and discrete. Task difficulty can be varied by rhanging the number of tasks, elements/task, scheduled arrival times for successive groups of task elements, speed with which elements move, and penalties imposed for procrastination. The penalties include imposing additional operations or accelerated rates for delayed tasks, dedurting points from the score, and losing control over when deferred tasks could be performed.
Exprrinients rondurt.ed w i t h t,his simulat ion drtermined the contributions of diffrrent task variablrs 1.0 workload and t.heir behat ioral and physiological consequences. Performanre was evalirated by examining the srore. number of unprrformed elements, and rompletion time. Strategies were evaluated by analyzing the functions selected. Schedule complexity. number of different tasks (rather thaii the number of elements in each one), and time-pressure-relatrd penalties for procrastination werr significantly reflected in the subjective. behavioral. arid physiological responses of subjects. Average rating magnitudes were higher for this group of experiments than for any other (Figure 6 e ) , and their variability was greater. F A was the only factor rated as lower, even though experimental sessions oft.en lasted as long as 5 hours. Distributions of ratings wrre significantly different from the "population" distributions for every factor except OP. Hrcause TP was the primary way in which workload levels were manipulated, T P ratings were highly correlated wit.h TD. ME, F R , ST, and OW ratings (Table 8) and were considerably higher than the grand mean (46 vs 32). This task was considered t o be the most unpredict.able and knowledge-based of the experimental categories ( A T = 43 v s 34). P E ratings were higher as well. Even though the computer actually performed the requested functions. virtually continuous selections were required t o activate the appropriate functions. This was reflected in a significant correlation between OW and T P . However, PE ratings were not highly correlated with OW acros: different rnanipulat~ions.FA and AT were not highly correlated with O W , either, because FA levels were counterbalanced across conditions and A T was relatively constant across all conditions. In this cat.egory, only T D and T P accounted for more than 50 percent of the varianre in O W . ,
SIMULATION Category Three aircraft simulations were combined for this rategory. Earh was conducted in a motion-base general aviation trainer. They were designed to determine the contributions of
OP
.68
PE ME
.51
FR ST
FA AT
OW
.77 .65 .69 .39 .27 .77
.69 .57 .82 .66 .71 .41 .25 .74
.55
.65 .74 .65 .43 .16 .54
.53 .5 1 .59 .55 .22 .44
.58 .71 .37 .30 .63
.68 .42 26 .61
.53 .24 .61
.I4 .30
.30
Development of NASA-TLX
161
individual flight-task components to overall workload and t o compare the obtained levels of workload to those ilredicted by a model. Workload was evaluated by performance on concurrent secondary t.asks and ratings. Th e first experiment (ref. 11-8) required control over one (e.g. heading), two (e.g., heading. speed), or three (e.g. heading. altitude, speed) components, with irrelevant dimensions "frozen." As expected, workload increased as t h e difficulty and complexity of each maneuver increased. The second experiment (ref. 11-9) coupled more coniple\ flight-task maneuvers. building up t o simulated instrument approaches. Again. workload levels increased as the romplexity of flight-task components increased. I n tlie final experiment (ref. 11-3). two scenarios. one "easy" and one "hard." were flown. Rating5 were obtained during and imn~ediatel) after each flight. For all three experiments, the various workload measures that were obtained reflected t h e same underlying phenomena, although the subjective ratings were consistently t h e most sensitive. With two exreptions (TP and AT ratings were considerably lower), SIMULATION ratings were similar 1.0 the "population" means (Figure 6f). This is surprising, considering the apparent11 great.er magnitude and complexity of task demands imposed on t h e pilots. In addition, the variability among ratings was the lowest of any category. This might reflect t,he fact tha t all of th e experimental subjects were instrument-rated pilots familiar with the types of tasks prrformed. A T was considered to be the most "skill-based" of all of t h e tasks included in the 16 experiments. Statistical associations among individual scales were lower for this category of experiments th an for th e rest (Table 9 ) . Th e highest correlations were found among ME, T D and O P , and among PE, T D , TP. and ST. TD was the only factor tha t had a strong correlation with OW (accounting for more than 50 percent of its variance)
CONSTRUCTING A WORKLOAD RATING SCALE Several key points emerged about the subjective experience and evaluation of workload: (1) A phenomenon exists th at can be generally termed workload, but i t s specific causes may
differ from one task 1.0 th e next. ( 2 ) Ratings of component factors are more diagnostic t h an global workload ratings. ( 3 ) Subjects' workload definitions differ (thereby contributing t o B-S variability): however. th e specific sources of loading imposed by a task ar e more potent determinants of workload experiences than such a przori biases. ( 4 ) A weighted combination of t h e magnitudes of factors th at contribute t o subjects' workload experiences during different tasks provides an integrated measure of overall workload that is relatively stable between raters.
OP PE ME FH ST
FA AT
OW
.4 1 .46 .64 .43 .53 .32 .19 .86
.25 .61 .20 .35 .64 .24 .33 .36
-
.25 .42 .63 .38 .43 -.13 .38
.31 .29 .60 .26 .24 .42
.38 .36 .28 .02 .65
.58 .50 -.01 .33
.39 .20 .45
-.04 .21
.08
I62
S.G. Hart and L. E. Staveland
One of our goals in gathering workload and workload-related ratings, i n addition t o the information they provided about experimental manipulations, was t o amass a d a t a base which would allow us t o examine the relationships among different task, behavior, and psychological factors in order t o crrate a valid and sensitive rating technique for subjective workload assessment. Our assumption was t h a t the scale would t)r multi-dimensional, but that t h r number of subscales should be less than the number used f o r research purposes. Thus. the first st,ep was t o select the most appropriate set of subscales. The second step was t o drlerrnine how t.o combine these subscales t o drrivr a workload score sensitive t o different. sourcrs and definitioiis of workload between ta5ks and raters. ‘The final step was t o determinr the best procedure for obtaining numeric values for these subscales. Subscale Selection We reviewed the information provided by each scale used in the 16 experiments t o select the subscales. They should represent the types of phenomena that influence subjective workload rxperiences in a broad range of tasks (e.g., task-related, subject-related, and performance-relaled fartors), although the importance of individual factors might vary from one type of task to the next. Our goal was t o select no more than six factors, so ratings could be obtained during, as well as following, activities performed in operational environments. T h e following information was considered: ( 1 ) sensitivity t o differences between tasks (Figure 7), ( 2 ) sensitivity to experimental manipulations within tasks(Tab1e Za), (3) association with subjective ratings of OW (Tables Ib, 3, 4-9), ( 4 ) independence from other factors (Tables Ib, 3, 4-9), and ( 5 ) subjective importance to raters (Tables l a , 3; Figure 4a). The following statements about the factors include information drawn from individual rxperiments, categories of experiments, and the entire d a t a base. Task-Related Scales Three of the original scales focused on the objective demands imposed experimental tasks. They were T D , T P , and AT.
by
the
T a s k Difficulty. A rating of T D provides the most direct information about subjects’ perceptions of the demands imposed o n them by a task. T D was considered t o be moderatel) relevant t o individual subject.s‘ definitions of workload in the preliminary pairwise comparisons. However, the empirical relationship found bet.ween T D and 0 W ratings was substantially greater than its a priori association. In all but one of the 16 experiments, this scale reflected the same experimental manipulations as OW; T D contributed significantly t o the OW regression equations in all six categories of experiments. T D was not statistically independent of t.he other factors that were also found t o be important, however. This reduced the informalion i t provided about the workload of different tasks. Although the T D scale was quite sensitive t o differences between categories of experiments, its diagnostic value might have been improved if different sources of T D had been distinguished (e.g., mental versus physical). T i m e Pressure. TP has been included as a primary factor in most operational definitions and models of workload, where it is quantified by comparing the time required for a series of subtasks t o the time available, and it w a s selected as the factor most closely related to workload in advance of the experiments. However, TP ratings proved t o be generally insensitive t o manipulations within these experiments. TP ratings were only moderately correlated with OW ratings for individual experiments and categories of experiments. It did discriminate among different types of tasks, however. These findings a r e due, in part, t o the fact t h a t TP was not explicitly manipulated as an experimental variable in many of the experimental tasks. Nevertheless, TP was highly related t o more than half of the other variables (the correlation coefficients were greater than 0.70) in 60% of the experiments. It was most closely associated
P
O
I
RELATIVE FREOUENCY
0 ' o : : W o B g
Z
k
3
:
g
RELATIVE FREOUENCY
b
0
j
Z 0
0
N 0
W 0
P 0
RELATIVE FREOUENCY m 0
O
S
Z
:
:
S
RELATIVE FREQUENCY
$
164
S. G. Hart and L. E. Staveland
with PE, ME, FR, and ST--subject-related variables--rather than t o the other task-related variables, however. This suggests that perceptions of high or low T P occur because of (and may, in turn, affect) subject-dependent rather than other task-related variables. A c t i v i t y T y p e . Subjects selected AT as a more important contributor to workload than i t appeared to be from the empirical results. Furthermore, although A T did discriminate well among categories of tasks. these differences had little or no relationship with their workload levels; the predicted association between skill-based activities and low workload or knowledge-based artivities and high workload was not found. AT ratings never correlated significantly with OW and they rontributed little to t,he O K regression equations. Although t,he type of task performed should have some association with the workload it imposes, this scale did not succeed in identifying such a relationship.
Summary of T a s k - R e l a t e d S c a l e s . We found that only two task-related scales, T D and TP, provided significant information about workload. Furthermore,we propose dividing the T D scale into two subscales (mental and physical) t o identify the specific sources of imposed workload within and between tasks. Thus, three task-relat,ed factors were selected: Physical Demands ( P D ) , Mental Demands (MD), and Temporal Demands ( T D ) . These three factors represent the most common ways that workload differences are manipulated across a broad range of activities. They do not represent the cost of arhieving task requirements for the operat.ors, however, nor how successful operators were in doing so. Behavior-Related Scales The three scales in this category ( P E , ME, and O P ) provided subjective evaluations of [,he effort t h a t subjects exerted to satisfy t,ask requirements and opinions about how successful they were in doing so. P h y s i c a l E f f o r t . Alt.hough P E is a romponent of most traditional definitions of workload, most of the subjects considered i t a priori to be essentially unrelated to workload. Empirically, however, this factor discriminated among the different types of experiments and reflected experimental manipulations for tasks with physical demands as a primary workload component. P E ratings were generally low, reflecting the typical nature of laboratory and simulation tasks. Heavy, physical exertion was never required in any of these experiments. PE was not highly correlated with OW within most experiments, however, and did not contribute significantly t o the OW regression equation in half of them. It did provide an independent source of information about the subject's experiences, as P E ratings were not highly correlated with ratings of other factors. Its strongest association was with TP (for tasks in which higher levels of imposed T P required higher response rates) and S T (for more complex tasks). M e n t a l Effort. M E has become an important contributor t o the workload of an increasing number of operational tasks because operators' responsibilities are moving away from direct physical control to supervision. A priori, M E was considered moderately important to our subjects. Empirically. however, M E ratings were highly correlated with OW ratings in every experimental category and were significantly related t o the independent variables in most experiments. ME ratings discriminated among different types of experimental tasks, as well, and it was the second most highly correlated factor with OW. M E ratings were highly correlated with many other task and subject-related variables (e.g., TD, FR, and ST).Thus, the information it provided was somewhat reduced by its lack of independence.
Own P e r f o r m a n c e . Success o r failure in meeting task requirements was considered a praorz as moderately related t o workload. Although OP ratings did not discriminate between types of experimental tasks, it did provide useful and significant information
Development of NASA-TLX
165
about how the suhjects perceived the quality of their performance. O P ratings were significantly correlated with O W ratings in half of the experiments and categories of experiments. and they were relatively independent of other ratings, in comparison t o the general finding of high statistical associations. S u m m a r y of B e h a v i o r - R e l a t e d Scales. .4lthough P E and ME each provided significant and relatively independent information about the workload of many experimental tasks. we feel that a single Effort ( E F ) scalc might be sufficient t o represent this aspect of workload. This was an arbitrarj decision, considering the useful information P E and M E contributed to workload ratings. However, since one of our goals was t.0 reduce the number of bipolar scales, we felt that a combined EF scale could capture the information provided by P E and ME. The additional information in the original PE and ME scales not captured by E F (e.g., the specific source of the load) would be provided by the new MD and P D scales. Information about the specific source of demands (e.g.. physical or mental) can be obtained more dirertly by asking subjects t o evaluate the objective demands t h a t are placed on them than by asking them t o introspect about the amount of mental or physical effort exerted. Furthermore, subjective evaluations of task demands can be compared with objective task manipulations for the purpose of validation and prediction. In addition, the B-S variability of ratings for task-related factors should be lower (because the only source of variability would be differrnces in individuals’ sensitivity and understanding), whereas there are a t least two interactive sources of variability for behavior-related ratings (the actual levels of effort exerted by each subject, as well as their ability t o evaluate these levels introspectively). The subjects’ (,valuations of the success or failure of their efforts to accomplish task requirements provided a valuable source of information about workload, because subject’s appraisal of perforniance during a task affects subsequent levels and types of effort exerted. Furthermore, performance decrements observed in operational environments often prompt workload analyses. Thus, some information about performance should be included in any workload assessment technique, even if it is only in the form of a subjective evaluation. Subject-Related Scales These scales focused on the psychological impact on the subjects of task demands, behavior, and performance on the subjects. They included FR, S T , and FA. F r u s t r a t i o n . Subjects reported, o priorz, that FR was the third most relevant factor t o workload. Empirically, FR ratings were significantly correlated with OW ratings in most individual experiments and all categories of experiments. FR did not contribute significantly t o the OW regression equations, however. This could reflect the fact that FR was not an independent factor: it was strongly correlated with ever) other factor except AT and PE. F R was only moderatel) sensitive t o experimental manipulations, yet it discriminated among five out of the six categories of experiments. The range of FR ratings across categories was substantial, further suggesting that they provide useful information in distinguishing among types of activities. S t r e s s . S T has been included in many other subjective rating techniques and is often equated with elevated levels of workload in operational environments. Subjects in these experiments rated ST as the second most important factor in the pretest. Within experiments, S T ratings reflected the same manipulations t h a t influenced OW ratings. However, S T ratings did not discriminate among different types of tasks, it was rarely associated with objective measures of performance and it was the least independent scale (it was highly correlated with every other scale except AT). For this reason, it contributed relatively less to the O W regression equation than its high degree of correlation with OW would suggest.
166
S. G. Hart and L. E. Staveland
Fatigue. FA was relatively unrelated t o workload in both a priori opinions and empirical ratings. Even though the range of FA ratings was the greatest for any scale arross rategories of experiments (it ranged from 24 t o 42), FA ratings rarely covaried with objective performance measures, OW ratings or other factors. One explanation for this lack of relationship could be that, fatigue was not manipulated as an experimental variable in most of the studies. In general, it appeared that subjects regarded fatigue as a separate phenomenon from workload. S u n i i n a r y of S u b j e c t - R e l a t e d Scales. In a mult,i-diniensional rating technique, it is impartant to retain some information about t,he psychologiral impart on subjects of performing the tasks. U orkload, especially the subjective experience of workload. reflects more than the objertivr demands imposed on an operator. It is apparent from their high intercorrelation. however, that both FR and S T scales are not necessary. ST might be too global a dimension, This terrri, like workload itself, can mean many different things. The term has been applied t o task, environmental, and human phenomena (e.g., heat stress, time stress, emotional stress, physical stress, physiological stress). In fact, an excess of almost any dimension can be termed "stress". FR, in a relatively less ambiguous way, relates task requirements, exerted effort, and success or failure. It provides information about how comfortable operators felt about the effectiveness of their efforts relative t o the magnitude of the task demands imposed on them. Although FA can be an experimentally and operationally relevant variable, it was not found t o be related t o the experience of workload; thus, it was not, included as a component of the multi-dimensional rating scale.
Overall W o r k l o a d R a t i n g s Although OW ratings were significantly associated with experimental manipulations in most experiments, and distributions of O W ratings were significantly different from one experimental category t o the next, the B-S variability within experimental conditions was high; coefficients of variation were often as great as 0.50. In addition, OW ratings appear to reflect different variables in different tasks. Although it is not likely that this contributed t o B-S variability within experimental conditions (all subjects experienced the same experimental difficulty manipulations). it does suggest that global workload ratings cannot be compared between tasks. Even though OW ratings provide the most direct and integrated information about the issue i n question -- workload -- the! may reflect time pressure for one task, variations in effort in anot.her. and different levels of decision making complexity in yet another. Each level of integration has a simplifying effect, reducing complex attributes t o progressively more global summaries. There is a point where higher levels of integration cease t o provide useful summarization and begin to mask important underlying phenomena. A global workload rating may represent such a point. The component scales can identify variations in sources of loading, as well as their magnitudes, and a weighted combination of them was shown to provide a more stable measure of OW than the global scale itself. This suggests that it is not necessary to obtain a specific OW rating as long a s the appropriate components are rated and can be combined. Weighted Workload Score The weighted averaging procedure succeeded in reducing B-S variability for all experimental conditions. However, the general information t h a t was obtained in the pretest about differences in workload definition were not sufficient t o characterize the specific experiences of subjects t h a t were unique t o individual experimental situations. Thus, the W W L score did not achieve the desired level of improvement in statistical sensitivity to experimental variables. Subjective estimates of weighting parameters would have been more useful had they been obtained with reference t o a specific experience (e.g., the experimental
Development of NASA-TLX
167
task) than in t h e alistract. Self-evaluations obtained in a context ar e preferable because the) provide direct information about the interaction of factors within t h at context (ref. 1-1). and it is this that determines th e level of workload.
Verification of Selected Subscales T h r high correlations between many of t.he factors and OW within different categories indicate that multipli dimensions arcs required to represent the workload of different types of tasks. There is a generic coniponent of workload acros5 tasks as reflected i n t.he correlations of TD. F f t . S T . and MI. wit,h each experinient.al rategor!. The task-sprcifir romponent of workload that is present i r i some (.asks and not i n others is reHected i n Ti’ and PE. One factor ( O P ) is moderatel! rialated throughout t h r different types of tasks but is never a primary contributor to workload. Th e other two factors (FA and A T ) are generally unrelated within and between tasks, and consequentlj were excluded from the new set of subscales. Before selecting the final set of subscales, several additional analyses were performed. The scales were rank-ordered from most to least relevant: TD , FR, TP, ME, P E , OP, S T . FA. AT. Three scales were eliminated (ST. FA, and A T ) , and two were combined ( E F -:ME and PE). ‘The five remaining scales were regressed on OW (Table 10). Th e percent of variance accounted for by these six scales did not decrease by more than .02 from the variance accounted for by the original nine scales for any of the six categories. Th e proposed division of TD into Menlal (MD) and Physical Demands (PD) could not be simulated with t h e existing d a t a base. We examined the three subscales in our d a t a base t h at are similar to those used in another popular mu11i-dimensional rating scale, the Subjective Workload Assessment Technique ( S W A T ) t o determine whether these factors alone might provide sufficient information. With the S W A T technique, a preliminary card-sort is performed by each subject to rank-order 27 combinations of three levels (low, medium, high) of the three factors (time load, psychological stress, and mental effort) with respect t o th e importance they place on them in their personal definition of workload (refs. 1-6, 1-7, 1-21). Conjoint analysis techniques are applied t o provide an interval scale of overall workload tailored for individual differences in definition. Subjects provide ratings of low, medium. or high for the three factors following the performance of each experimental task. A single rating of overall workload is obtained by referring to the position on t.hr interval scale identified by th at combination of values. It appears t h a t one of the key assumptions of conjoint analysis (i.e., statistical independence among the components) was not supported b) th e d a t a from these experiments; ratings of TP. ME, and ST mere highly interrelated. Correlations between TP ratings and S T ratings
___ _
--
_
_-__-SINGLE-COGNITIVE SING L E-MAN U A L DUAL-TASKS FITTSBERG POPC OR N SIMULATION
Table 10 a subset of rating- scales regressed on OW (*=p<.Ol) ~ r2 TD TP OP EF .74 .59* .06* .14* .18* .28* -.12* .lo* .79 .54* .32* -.lo* .lo* .84 .54* .22* .04 .78 .60‘ .04 .oo -.15* .25* .64 .52* .la* .06 .75 .77* .04
FR .04 .15* .11*
.lo* .22* -.lo*
I68
S.G. Hart and L. E. Staveland
were 0.50 or greater, bet,ween TP and M E were 0.65 or greater, and between ME and S T were 0.45 or greater in all experiments. For many experiments, correlations were 0.70 or higher. Furthermore, it appears t h a t these three factors alone are not sufficient t o represent the range of factors t h a t contribute t o workload for a broad range of experimental and operational tasks, as mentioned above. From a practical, rather than a psychometric, point of view, the independence of workload-related fact,ors presents less of a problem. First. f o r fartors t.hat are both highly related to each other and reflect experimental manipulations. their shared contribution to a weighted estimate of overall workload is simply enhanced, reflerting the artual situation. Second. behavior-relat.ed and subject-related factors necessarily reflert task-related factors. Yet task-related fartors alone do not provide information about the behavioral and psychological responses of individuals to imposed demands, each important contributors to overall workload. For exarnple, thr demand imposed on subjects may be extremely high, yet they may mitigate the levels of workload artually experirnred by shedding tasks, lowering their performance standards, or refusing L O exert greater and greater levels of effort as task demands increase beyond a certain level. Thus. evaluation of subjects' responses t o a task can provide additional information (even though thc brhavior occurred in response t o these demands) a s well as highly correlated information. Finally, these scales can be driven independently, even though there is often no experimental reason 1 0 do so.
Combination of Subscales Each of the select.ed subsrales provides useful and relevant information about different aspects of subjects' pxpcriences. However, a surnniary est,imate of the overall workload of a task is often nredrd. Since single O W ratings have been found t o be quite variable among subjects and may reflect different factors across tasks. the idea of combining weighted ratings on subscales was suggested as an alternative. However, the weighting procedure adopted for this set of experiments succeeded only in reduring B-S variabi1it.y. It did not provide estimates of workload t h a t were substantially more sensitive t o experimental manipulations than the global O W ratings. Similar sensitivity problems have been found with the S W A T technique. It, too, relies on a priori, global judgements about the importance of different factors rather than on the subjective importance of specific variables within the target activity to reduce B-S variability. However, B-S variability is often very high for S W A T ratings. Standard deviations t h a t are greater than 50% of the average magnitudes of ratings have been reported in a number of experiments (ref. 1-4, 11-14, 11-15). Despite the relative success of both techniques in identifying variations in workload associated with most experimental manipulations and obtained performance, neither scale has been able to account for a substantial percentage of the variance. For example, a tracking task bandwidth manipulation resulted in highly significant differences in performance, yet accounted for only 8.96% ol the WWL score variance and 6.16% of the S W A T ratings (refs. 11-14). Even though the former was statistically significant and the latter was not, neither represents the level of sensitivity required for a valid workload assessment technique.
Quantification Taking into account the results of these and other experiments, it is clear that using the a priori biases of subjects about workload t o weight or organize subscale ratings into a single workload value may not provide a sufficiently sensitive subjective rating technique. The element missing from both S W A T and the WWL score is information about the sources of workload for the specific task t o be evaluated. Regardless of how individuals might personally define workload, workload is caused by different factors from one task to the next and subjects are sensitive to factors t h a t are included in, as well a s excluded from, their workload
Development of N A S A - T L X
I69
definition. These may take precedence over their natural inclinations t o weigh one factor more heavily than another. Since t h e workload of a task represents the weighted combination of factors that are subjectively relevant during the performance of that task, the weighting function must include information about the sources of loading specific t o that task, as well a s Q priori subjective biases. The task-related drivers of Subjective experiences should be consistent across individuals who perform the same task. Thus, they should not increase B-S variability within experimental conditions. They do, however. affect the meaning of workload ratings from one task t o the next. By enhancing the rontribution of factors t h a t are most salient in a particular task l o the summary score: its sensitivity should be enhanced.
Figure 8: NASA-TLX RATING SCALE DEFINITIONS ~~~
Title _______
~
Endpoints __._
Descriptions
MENTAL DEMAND
L oui /High
How much mental and perceptual activity was required (e.g., thinking, deciding, calculating, remembering, looking, searching, e t r . ) ? Was the task easy or demanding, simple or complex, exacting or forgiving?
PHYSICAL DEhlAKD
Low/High
How much physical activit,y was required (e.g.. pushing, pulling, turning, controlling, activating. etc.)? Was the task easy or demanding. slow or brisk. slack or strenuous, restful or laborious?
TEMPORAL DEMAND
Low/High
How much time pressure did y o u feel due t o the rate o r pace a t which the tasks or task elements occurred? Was the pace slow and leisurely or rapid and frantic?
PERFORMANCE
good/poor
How successful do you think you were in accomplishing the goals of t h e task set by the experimenter (or yourself)? How satisfied were you with your performance in accomplishing these goals?
EFFORT
L ow/High
How hard did you have to work (mentally and physically) t o accomplish your level of performance?
FRUSTRATION LEVEL
Lo w/Hig h
How insecure, discouraged, irritated, stressed and annoyed versus secure, gratified, content, relaxed and complacent did you feel during the task?
170
S.G. Hart and L. E. Stavelarid
('sing the set of six subscales proposed earlier (Figure 8) to represent the possible sourres of workload, the following approach might be taken based on the model of the psyrhological structure of subjective workload estimation presented in Figure 2. For each task (or set of similar tasks), the contribution of earh factor t o its overall workload could be det.errriined. Although these values could be assigned by an experimenter. the information that is needed rrlat,es to the subjective importance of the factors ( w ) . rather than simply their objective contribution (I). as i t is Lhe former that inHuenres workload experiences most dirertly. Thr simplest way t o obtain iriforrnatiori about subjertivr import.ance would be to ask subjerts 1.0 assign values to earh o f tht. s i x scales (MD. PI>. TI). FR. Or', E F ) aftrr a task or set of similar tasks is performed. T h e same pair-wise roniparisori techniqui, used i l l computing the weights f o r the WWI, srore could tw adopted. Fifterri rornparisons would br required to decide whirh member of earh pair of the six fact.ors was most significant i n creating the level of workload experienred in performing a particular task. The decision-making proress is relatively simple from the subject's perspertive and is less tedious than t h e 36 roiriparisons used for the 9-factor wale or the 27-fartor rank-order used with SWAT. These values would be used to weight the magnitude ratings obtained for t.he six srales after each experimental condition. The advantage of task-specifir weights is that the two sources of variability in ratings that have been identified within tasks (subject's workload definitions) and between tasks (task-related differences in workload drivers) would be represented from the perspective of the raters. The alternatives of using weights provided by the creator of the t.ask to rrpresent the intended sources of loading, or weights that represent nonspecific subjert biases, each ignore one potential source of rating variability. A specific example of the proposed rating scale may be found in Appendix B. It summarizes the rating scale descriptions and format. the pairwise technique for determining the subjertive importance of each factor in a sperific. task, and a numerical example of the weighting procedure applied to ratings for two difficulty levels of one t.ask. Rating scales typically consist of an ordered sequence of response rategories that are rlosed at both ends. End anchors are usually given t o provide a frame of reference and to define the correspondence between stimuli (workload experiences) and responses (rated levels). Thus. ratings represent romparative judgements against these extreme values. Our approach has been t,o ask subjects to provide ratings along a 12-crn line bounded by bipolar adjectives. The anchors are designed to have natural psychological meaning rather than arbit.rary values. and t o exceed the likely range of rat.ed experiences t o avoid the nonlinearities observed for extreme values. Anderson (ref. 1-1) and others have suggested that this type of "graphical" format is preferable to discrete categories. The responses were quantified during d a t a analysis by assigning values that ranged from 1 1.0 100. The resulting values did not represent a ratio scale, and may not have provided even int.erval data. Ilowever, rating variabi1it.y was acceptably small, most of t h e scale range was used arross tasks, and the numcrical values were reliahly rorrrlated with experimental manipulations. The S W A T t.erhnique allows only three discrete values t o be assigned to each factorlow, medium or high--although reference to a scale provided by the conjoint. analysis procedure gives interval workload ratings that, range from 1-100. The use of only three scale values is understandable from a practical point of view (a greater number would make the initial sorting procedure nearly impossible), however, it significantly reduces the sensitivity of this technique. The workload of most tasks lies somewhere in the mid-range, and subjects often avoid giving extreme values. Furthermore, scales with fewer than six or seven increments are particularly susceptible t o response nonlinearities near the endpoints and, in addition, there are distribution effects (ref. 1-1). Furthermore, S W A T uses word labels for each interval, which may be risky because each may connote unequal subjective category widths (ref. 1-1). The strength of the S W A T technique lies in the fact that it provides an interval scale of workload by virtue of the conjoint analysis technique employed. Although the benefits of this are clear
Development of NASA-TLX
171
from a psychometric point of view, the practical cost of the procedure and the limitations it imposes on the range of rating values limits its utility. This is particularly true given the high B-S variability observed in the ratings. Thus, our recommendation is that a fairly wide range of increments is desirable. Anderson (ref. 1-1) suggested than the optimal range of rating steps is from 10 to 20. With more steps. ratings tend to cluster because subjects provide ratings in round numbers and are not sensitivr to very fin<,distinctions. Furthermore, graphir ratings that are quantified on a scale from 1-100 with I-point increments suggest greater sensitivitj to experimental manipulations than subjects are likelg to be capable of producing. Disrrete numeric ratings could be obtained verbally (e.g.. 0-20) during an operational task where it is not practically possible t o present an analog scale for rating each factor on a computer display or paper-and-pencil form. However, graphic scales, represented by an unmarked continuum bounded by extreme anchor values, are preferable. This continuum can be divided into equal intervals during d a t a analysis for scoring.
Reference Tasks
A final point will be considered briefly: the additional reduction in B-S variability t h a t can be obtained with the introduction of a reference task. It is unlikely t h a t workload ratings are given absolutely o r in reference t o a global internal scale of workload that can be applied equally t o all tasks. Rather, subjects compare the current situation with similar experiences and evaluate its workload with reference to the ranges and magnitudes of common features; each subject may select different reference activities unless one is explicitly provided. Furthermore, experimental conditions are often presented in a counter-balanced order, and the progression of task difficulties from easy to hard or vice versa may influence the subjective anchor points used in providing ratings differently. This source of rating variability is not obvious from the ratings t h a t are provided. Thus, even without an explicit reference task, presenting experimental subjects with illustrative examples of the range and average difficulties of the tasks to be evaluated helps provide a stable judgemental set and orients the subject to the types of tasks t o be performed (ref. 1-1). The use of reference tasks for workload ratings was suggested by Gopher (refs. 1-10, I11). His initial suggestion was that a single task could be presented as a common reference within and between experiments. It could be assigned an arbitrary value and the workload levels of the remaining tasks rated with respect t o this task. The initial hope was that one task could be used a s a reference for a wide range of different tasks. The goal was t o discover an underlying psychophysical function analogous t o that existing for many perceptual processes involving objective, physical stimuli. He found, as we did, that the workload of different tasks may be caused by different factors. Thus, reference tasks must be selected that share elements in common with t h e experimental tasks. When this is done, ratings can be assigned to similar tasks in comparison with a common activity. This approach could be coupled with the rating technique suggest.ed above. The reference task could be used t o obtain subjective estimates of the importance of the six workload-related factors for that type of activity. These weights could be applied to each member of a set of experimental tasks in which the magnitudes of different factors were experimentally varied. This would have the practical advantage of reducing the number of times importance weights would have t o be obtained, and it would emphasize the salient characteristics of the reference task. The disadvantage of obtaining factor weights for groups of tasks is the possibility that the subjective importance of the factors might interact with variations in their magnitudes from one task to the next. This procedure would still be preferable t o unweighted ratings or a priori weights based on abstract features or levels.
112
S. G. Hart and L. E, Staveland
The great success of the Cooper-Harper Rating Scale for Aircraft Handling Qualities (refs. 1-3, 1-29) suggests the additional value of providing concrete examples of scale values. Test pilots use this rating procedure to provide subjective evaluations of the handling qualities of aircraft and aircraft simulations. They are "calibrated" by experiencing different levels of aircraft handling qualities in variable stability aircraft. This provides concrete experiences as rderences for each of the 10 scale values. By providing examples of tasks designated as low or high workload, R-S rating variability could be redured.
Validat ion An extensive validation study u a s ctirnple~ed recently t o determine (1) whether the six NASA-TLX subscales are adequate to rharacterize variations in the sources of workload among different tasks, (2) whether the weights obtained from subjects are diagnostic with respect to the source of workload unique to each task, and (3) whether the task-related weighting procedure provides a global workload score that is sensitive to workload variations within and between tasks. Thirteen different experimental tasks were presented t o a group of six male subjecLs. Blocks of experimental trials were repeated a t least eight times per task, although many were repeated more often t o present different experimental manipulations
%
2
Y
>
2.5 2.0 1.5 1.0 .5
.
2.5 I SINGLE-MANUAL
;
g y
o
w
-5 -1.0 -1.5 -2.0 -2.5
Y
k a U
g
2.0 1.5 1.0 .5
w
-.5
8
-1.5 -2.0 -2.6
Y,
2
f
n
0 -.5 -1.0 -1.5
-
2.5 2.0
DUAL-TASK
1 1.5 .o
1
FITTSBERG
nI
I
o
0 2 -1.0
-2 $! 2 0
E ~
tfl
2.5 2.0
I
MD
I
POPCORN
PD TD OP EF WORKLOADFACTORS
FR
1.5 1.0 .6
o
-.5 2 -1.0 W K -1.5 -2.0
FIGURE 9 . EXAMPLES OF TASK RELATED WEIGHTS FROM THE VALIDATION STUDY.
E
MD
PD TO OP EF WORKLOAD FACTORS
FR
Development of NASA-TLX
I73
within a task. The t.asks included manual control (one axis compensatory tracking, subcritical instability tracking. step tracking, target acquisition), perception (iconic memory, pattern recognition), short-term memory (the Sternberg task, serial pattern matching), cognitive processing (mental rotation, logical reasoning, serial arithmetic, time production), parallel and serial dual-tasks (variations of FITTSBERG, two axis compensatory tracking), and the POPCORN supervisory control task. The experimental tasks were grouped according t o the categories in the initial d a t a base: ( 1 ) SINGLE-COGNITI\'E, ( 2 ) SINGLE-MANUAL, (3) DL'AL-TASK. (4) FITTSBERG. ( 5 ) and POPCORN. The SIMI'LATIOR' category was not included. The initial results will be discussed very briefly to illustrate the success of the proposed rating scale i n meeting its objert,ives. A more complete description of the experimental tasks, procedure. and results is in progress.
Weights Subjects were able to specifj which factors contributed most (and least) t o the workload they experienced during each type of task. A s an example the weights given for one task selected from each category are depicted in Figure 9. The workload sources for one of the tasks in each category (weights) are represented as deviations from an "average" weight of 2.5. The values each weight could attain ranged from 0 t o 5 (not a t all important t o more important than any other factor, respectively). The subject.ive evaluations of the contribution of different sources of workload varied significantl) among the different types of tasks. These evaluations reflected the objective experimental manipulations (e.g., MD, PD, and T D ) as well a s the subjects' individual responses t o them (e.g.. OF). EF. F R ) . For example, M D was the most significant contribiitor to the workload of the logical reasoning task, while P D was the most significant contributor to the workload of the subcritical instability tracking. For different tasks that shared common sources of loading, similar patterns of weights were found. For example, M D was the primary source of workload for SINGLE-COGNITIVE tasks t h a t
1 !z
PD
FR
ow
Table 11: Validation Study Correlations among bipolar ratings .57 .50
:! ::: .54 .84
;:1
.44 .70
_ _ _ _ _ . . _. ~ - ~
.52 .67
.40 .57 .46
.69 .84
.70
~~
Table 12: Validation Study
Beta weights for t h e six rating subscales regressed on OW (*=p<.OI) MD PD TD OP EF ~ .33* .o 1 .04 .15* .88 .43* SINGLE-COGNITIVE .21* .12* .11* .39* .38* .78' SINGLE-MANUAL .29* .02 .09* .19* .a2 .41* DUAL-TASKS .16* .09* .IT* .24* .86 .32* FITTSBERG .19* .03 .22* .23* .34* .90 POPCORN .24* .05 .08 .22* .86 .38* OVERALL
FR .13*
.oo .20* .19* .lo* .16*
I
114
S.G. Hart arid L. E. Stavelond
had no time constraints, whereas both MD and TD were equally important for SINGLECOGKITIVE tasks t,hat placed time limits on information gathering, processing, or response. When weights were obtained several times for the same task, the relative importance of task-related factors did not changr signifirantly, although the importance of the subjects’ emotional responses t o the task (e.g., F H ) was reduced as task performance improved through training. When weights were obtained for different components of a romplex task, they distinguished among the sources of load unique to each task corrlpont.nt as well a s for t h e combined tasks. It is clear from the results of analyses performed on t h r wrights. that the sources of load do. indeed, vary among tasks ( a t least, from th e perspectives o f the raters). Although these weights still reflect some individual differences in the subjertive importance of different factors, the variat.ions i n sources of workload characteristic of different types of activities provides a more potent description of the task chararteristics than could t h e a priori weights obtained from each rat.er. I t is likely th at these differences should be taken into account when computing a weighted average. Furthermore, the values assigned to each fact,or averaged across subjects provided a diagnost,ic tool. By identifying the specific source of workload in a task it provides a basis for deciding how to modify iinacccptably high levels of workload in operat,ional environments.
Ratings
As wc found with the initial set of nine scales. ratings on some of t h e s i x NASA-TLX subscales were significantly correlated (Table 1 1 ) ; however. the six subscales apprared to be somewhat more independent than werr the original nine wales. For some fartors ( e . g . , T D and FR) magnitude ratings were highly rorrelat,ed with t h r subjective importance placed on that fact.or a s a source of workload. For example. time prwsure was a significant source of workload only when it was high. When MI) or P D was a primary source of workload, however, the m a g n i t u d e ratings were not necessarily high. For example. PI) was considered to be the prirnary source of load for th e subrritical tracking task, yet PD ratings were quite low (26). Many tasks werr thought t o have MD as a prirnary source of workload. yet MD ratings ranged from 20 to 66. depcmding on the magnitude of the mental demands each task placed on the subjects. EF was considered t o be a moderately important source of workload (weights varied from 1.2 to 2.8) for every task and EF ratings were ronsistently highlj rorrelated with OW ratings. The iniportanrc of OP varied widely across tasks (weights varied from .8 t o 3.3), yet, OP ratings were relatively unrelated t o OW ratings. As expectkd. the sensitivity of individual scales to experimental manipulations varied depending on the sources of load and ranges of levels in each task. As with the initial d a t a base, ratings on the six NASA-TLX subscales were regressed against O W ratings within each category and across categories. Table 12 shows t h at these six scales were able t o account for a highly significant percentage of the variance in OW ratings (r-squared values ranged from 0.78 t o 0.90),even though their numbers was reduced from t h e original nine. In addition, the correlation among th e regression coefficients were rarely significant, providing additional evidenre th at t.hese six scales represent, relatively independent sources of information about the workload imposed by different tasks. Within each experiment, the B-S variability in th e magnitude of t h e WWL ratings for the six subscales was generally less than the B-S variability of global OW ratings. In contrast t o the subject-related weights used in the previous set of experiments, however, the taskrelated weights provided workload estimates t h a t were more sensitive t o experimental manipulations than the global workload OW ratings were. When TD, MD or PD was varied within a task the ratings obtained for these factors were significantly different. Since these factors
Development of NASA-TLX
175
were also weighted more heavily in computing the averaged weighted workload score, the sensitivity of the summar? value was enhanced as well. Highly significant differences in subjective workload ratings wvre found within each experiment t h at reflected meaningful experimental manipulations whicti covaried with objective performance measures. Using the POPCORN tasks a s ari example, both th e rate of movement of task elenients and the inter-arrival rate of groups of elements resulted in highly significant differences among scores. Average scores ranged from 200 t o 700 between the most difficult and the easiest versions while average workload ratings ranged from 47 to 73 for the same experimental ronditions. O n the other hand. where performance differences were not found (e.g.. anlorig replications once asyrnpLotic performance levels wen’ reached). subjective workload rrwasures Werr not significantly different.
In a different study. we looked at the effect of administering the IVASA-TLX either verbally, by paper-and-pencil, or by computer. Subjects provided T L X ratings following asymptotic performanre of two levels (E,H) of three tasks (target acquisition. grammatical reasoning, and unstable tracking) using the three methods. On the average, ratings obtained by the computer method were 2 points higher than by the verbal method, and 7 points higher than by the paper-and-pencil method. Although the ratings obtained by the computer method were significantly different than those obtained by the the paper-and-pencil method, the absolute differenres i n nunibers are less important than the fact t h a t the patterns in t h e magnitudes of the ratings were extremely consistent for all tasks. Th e correlations among the three methods were very high: computer vs verbal = .96, computer vs paper/pencil = .94, and verbal vs paper/penril = .95. This study was conducted again four weeks lat.er t o evaluate the test ‘retest reliability in the rating techniques. Th e relationships among the three methods were the same in the initial test a s in the retest: there were no significant differences between ratings given for a task in the initial test and ratings for that same task in t.he retest. for any of the three riiethods. Th e correlation between the test,’retest ratings was .83. Despite the consistency in the patterns of ratings in t,he three mct.hods, wc feel the verbal method is the least desirable method, even though it is the easiest t o administer. In particular, confusion can arise d u e to population stereotypes about whether ones own performance should have a high number associated with good performance and a low number associated with bad performance. In t h e T L X scale, good performance is ass0ciatc.d with a low number, as lower workload is usually accompanied by better performance.
SUMMARY This chapter has presented the rationale behind th e design of the NASA-TLX for subjective workload assessment based on th e results of a three-year research effort. Given t h e many problems outlined above. t h e ability of subjects t o give meaningful ratings is remarkable. Because this area has received relatively little theoretical attention, our goal was to provide a d a t a base containing examples of a wide variety of activities from which general principles and relationships could be drawn. Until recently, suhjective ratings have been treated a s tools t h at are subject t o undesirable biases and that represent the discredited practice of Introspectiori. Instead, it appears that the biases observed in workload ratings, as for subjective evaluations of other factors, may actually reflect interesting and significant cognitive processes (ref. 1-1). At least five sources of rating variability were identified: (1) variations in t h e objective and subjective importance of different features t o th e workload of different tasks; (2) experimental variations in the magnitudes of different factors; (3) differences in the rules by which individuals combine information about the task, their own behavior, and psychological responses t o the task into subjective workload experiences; ( 4 ) difficulties associated with translating a subjective experience into a n overt evaluation; and ( 5 ) lack of sensitivity t o experimental manipulations
176
S. G. Hart arid L. E. Stavelarid
or psychological processes. T o some extent, these variables are under experimental control. However, the subjective experience of workload represents the intersection between objective task demands and each individual's response to them. Thus, uncontrolled sources of variability are necessarily introduced. Differences in workload associated with the specific composition of a task and its psychological counterpart can be identified though subjective reports about specific (rather than abstract or general) artivities. This information is included in the proposed multi-dimensional rating scale, N A S A - T L S . in the form of weights applied t o ratings for specifir factors. The last t w o sources of variability, I hose related t,o psychometric and sensitivity problems, arc likely t.o remain as uncontrolled and undesirablr sourres of rat,ing variabilit,y. However. by soliciting appropriate subscales, weight.ing fartors. scale designs, and reference tasks, there should be a sufficient improvement in sensitivity and stability so that these other sources of variability should only add "noise" rather than compromise the utility of subjective ratings as a significant and practical source of information about workload. From all of the information obtained in the initial analysis of the original d a t a base and from the preliminary analysis of the set of experiments included in the validation study, it appears that the N A S A - T L X scale is more sensitive t o experimental manipulations of workload than either a global rating or a combination of subscales weighted to reflect the 4 priori biases of the subjects only. Furthermore, each of the six subscales was found t o be the primary source of loading in a t least one experiment and to contribute t o the workload of others. Each factor was, therefore, able t o contribute independent information about the structure of different tasks. Thus, N A S A - T L X provides additional information about the t.asks that is not available from either S W A T or the original. nine-factor scale.
N A S A - T L X ratings were obtained quickly (it took less than one minute to obtain the six ratings after each experimental condition). In addition, it took no more than two minutes to obtain the weights for each different type of task. This suggests that the proposed multidimensional rating scale would be a practical tool to apply in operational environments (which the nine-factor scale was not) and d a t a analysis is substantially easier to accomplish than it is with S W A T , which requires a specialized conjoint analysis program. The weighted combination of factors provides a sensitive indicator of the overall workload between different tasks and among different levels of each task, while the weights and the magnitude of the ratings of the individual scales provide important diagnostic information about the specific source of loading within the task.
Development of NASA-TLX APPENDIX A: Sample Application of the NASA-TLX. EXAMPLE: COMPARE WORKLOAD OF TWO TASKS THAT REQUIRE A SERIES OF DISCRETE RESPONSES. THE PRIMARY DIFFICULTY MANIPULATION IS THE INTER-STIMULUS INTERVAL (ISIJ - (TASK 1 = 500 msec. TASK 2 = 300 msecl PAIR-WISE COMPARISONS OF FACTORS: INSTRUCTIONS: SELECT THE MEMBER OF EACH PAIR THAT PROVIDED THE MOST SIGNIFICANT SOURCE OF WORKLOAD VARIATION I N THESE TASKS
8
PD I @
63I
TD f FR TD I EF
MD
OP OP I EF
FR
EF / FR
I MD
TALLY OF IMPORTANCE SELECTIONS MO Ill PD TD I I I I I OPI FR Ill EF Ill
=
SUM
= 15
3 0 5 1
=
= =
= =
3 3
RATING SCALES: INSTRUCTIONS: PLACE A MARK ON EACH SCALE THAT REPRESENTS THE MAGNI. TUDE OF EACH FACTOR IN THE TASK YOU JUST PERFORMED DEMANDS
RATINGS FOR TASK 1:
MO
LOW
PD TO OP
LOW LOW EXCL
FR EF
LOW LOW
I HIGH
X
I HIGH
X
I HIGH
X
I POOR I HIGH I HIGH
X X X
RATING
WEIGHT
30 15
X
60 40 30
X X
40
X
X
X
PRODUCT
3 0
=
90
=
5 1 3 3
=
0 150 40
=
=
90
=
120 -
SUM WEIGHTS (TOTAL) =
490
MEAN WWL SCORE = DEMANDS
RATINGS FOR TASK 2:
x x
MD PD TD OP FR
LOW LOW LOW EXCL LOW
I
X
I
X
EF
LOW
I
I
I I
I HIGH I HIGH X
x
I HIGH I POOR I HIGH I HIGH
RATING 30 25 70 50 50
30
WEIGHT
PRODUCT
X
3 0 5 1 3
=
X
3
=
X X X X
=
90
=
0 350 50 150
= =
SUM = WEIGHTS(T0TALJ =
90 730 15
MEANWWLSCORE = RESULTS: SUBSCALES PINPOINT SPECIFIC SOURCE OF WORKLOAD VARIATION BETWEEN TASKS (TDJ. THE WWL SCORE REFLECTS THE IMPORTANCE OF THIS AND OTHER FACTORS AS WORKLOAD-DRIVERS AND THEIR SUBJECTIVE MAGNITUDE IN EACH TASK
178
S.G. Hart arid L. E. Staveland REFERENCES I
Anderson, N. H . (1982). Methods of Information Integration Theory. Arademic Press.
New York:
Childress. M. C.. llart. S. ( i . k Iiortolussi. hl. R . (1982). The reliability and validity of Right task worhload ratings. I'roreedinga o/ / h a H u m o n ['ortors Society 26th Annual Meetzrig. Santa Monica. < ' A : lluiriari Factors Scicicr! . 31!)-323. Conper%G. F:. 8. liarper, R. 1'. (1969). Thr Ilsr o f Pilot Hating i n the Evaluation of Aircraft Handling Qualities ( N A S A 'I"-11-5153) M'ashington. D.C.: National Aeronautics and Spare Adrninist rat ion.
Courtright, J . I'. R. liupernian, G . (1984). Use of SWAT in I'SAF System T & E. Proceedings o/ t h e Human Factors Sorzety 28th Annual Meeting. Santa Monica, CA: Human Factors Society. 700-704. Darnos, D. L. (1984). Classification systems for individual differences i n multiple-task performance and subjective estimates of workload. Proceedzngs of thr 20th Annual L'onjerenrr on Manual Control. (NASA-CP 2341) Washington, D.C.: National Aeronautirs arid Spare Administration, 97-104. Eggcmeicr, F. 'I. (1981). Current issues i n subjective assessment, of workload. Proceedings of the Iluman Factors Society 2 5 f h Annual Aleeling. Santa Monica, CA: Human Factors Societ), 513-517. Eggerneir, F. T., Crabtree. M. S.. Zingg. J. J., Reid, 13. B., & Shingledecker, C. A. (1982). Subjertive uorkload assessment in a memory update t,ask. Proceedings of the Human Parlors Society 26lh Annual Meeting. Santa Monica, CA: Human Factors Society. 843-647. bicsson, K. A. R. Simon, H . A. (1980). Verbal reports as data. Payrhological Review, 87 ( 3 ) . 215-251. Fitts, P. M. & Peterson, J. R . (1964). Information capacity of discrete motor responses. Journal of Ezperimental Psychology, 67 103- 112. Gopher, D., & Braune, R. (1984). On the psychophysics of workload: Why bother with subjective measures? Human Factors, 26 (5), 519-532. Gopher, D., Chillag, N. & Arzi, N. (1985). The psychophysics of workload - A second look a t the relationship between subjective measures and performance. Proceedings of the Human Factors Society 29th Annual Meeting. Santa Monica, CA: Human Factors Society, 640-644.
Development of NASA-TLX 12
I79
Hart.. S. G. (1986). Theory and measurement of human workload. In J . Zeidner (Ed.) Human Prodtirtiidy Enhancement. New York : Praeger, 496-555.
113: Hart.. S. G., Childress. M. E., 8r Hauser, J . R. (1982). Individual definitions of the term "workload". Proreedings of the 1982 Psyrhology i n the DOD Symposium. USAFA. CO,
478-485. '14
Hauser, J. R., Childress. M. E. X. Hart, S . G. (1983). Rating ronsistenr! and romponent salience in subjective workload estimation. Prorredings oflhr f X f h .4nnual Conference 011 Manual C'ontrol. (AFW"41,-TR-83-3021) Kright-Patterson Air Force Hase, OH. 127-149.
1151 Johanssen, G., hloray, N., Pew. R.. Rasmussen. J.. Sanders. A. & Wickens, C. (1979). Final report of ekperirnental psychology group. In N. 'Llora) (Ed.), Mental Workload: I f s Theory and Measurement. New York: Plenum Press, 101-116. 116; Madni. .4. & Lyman, J. (1983). Model-based estimation and prediction of task-imposed mental workload Proceedzngs OJ the Human Factors Society 27th Annual Meeting. Santa Monica, CA: H u i n a n Factors Society, 314-318. '17' Mane, A. M. (1985). Adaptive and part-whole training in the acquisition of a complex perreptual-motor skill. llnpublished thesis. 1181 Nisbett, R. E:. & Wilson, T. D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84 (3), 231-259.
j 19j Rasmussen, J . (1983). Skills, rules, and knowledge; Signals, signs, and symbols, and other distinctions in human performance models. IEEE Systems, Man, and Cybernetics, hew York: Institute of Electrical and Electronic Engineers, 257-266. 12oj Reid. G . B.. Shingledecker. C. A . Nygren. T. E. & Eggemeier, F. T. (1981). Development of multidimensional subjertive measures of workload. Proceedings OJ the International Conference on Cybernetics and Society. New York: Institute of Electrical and Electronic Engineers, 403-406.
Reid. G . B.. Eggemeier, F. T.> & Nygren. T. E. (1982). An individual differences approach to SR .4T scale development. Proceedings oJ the Human Factors Society 26th Annual Meeting. Santa Monica, CA: Human Factors Society, 639-642. 1221 Sheridan. T. R., and Stassen, H. (1979). Toward the definition and measurement of the mental workload of transport pilots. (Final report DOT-OS-70055) Cambridge, MA: MIT. 1231 Siegel, S. (1956). Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill.
New York:
1241 Sternberg, S. (1969). The discovery of processing stages: Extensions of Donders' method. A c t a Psyeho[ogtca, SO, 276915.
I80
S. G. Hart and L. E. Staveland Turksen, 1. B. & Moray, N. & Fuller, K. (in press). A linguistic rule-based expert system for mental workload. In H. J . Bullinger & H. J . Warnecke (Eds.) Toward the Factory o j the Future. Tversky, A. & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124-1131.
127
('5 Arm) Combat Developments Cxperimentation Center. (1984). Srout-Observer (inzt
Test 11. Scout I 1 Test Report. (CI)l:('- TH- 84-015) 128' White, S. A , . Mcfiinnon, D. P., & Lyman, J . ( i n press). Modified petri net sensitivity t o workload manipulations. Proceedings o j the Pld Annual Conjerence on Manual Control. Columbus, OH: Ohio State University. 1291 Wierwille, W. W. (1984). Comparztzue Evaluation o j Workload Estimation Techniques in Piloting tasks. (NASA CR-166496) Washington D.C. : National Aeronautics and Space Administration. 1301 Wierwille. W. W,.Skipper, J. H. & Rieger, C. A. (1984). Derision tree rating scales for workload estimation: Theme and variations. Proceedings o j the 20th Annual Conference on Manual Control. (NASA C P 2341) Washington, D.C.: Kational Aeronautics and Space Administration, 73-84.
REFERENCES I1 Battiste, V. & Hart, S. G . (1985). Predicted versus experienced workload and performance on a supervisory control task. (Proceedings of the Third Biannual Symposium on Aviation Psychology). Columbus, OH: Ohio State University, 265-262. 121
Biferno, M. A. (1985). Mental Workload Measurement: Event-Related Potentials and Ratings of Workload and Fatigue. (NASA CR-177354) Washington. D. C.: National Aeronautirs and Space Administration. Bortolussi, M. R.. fiantowitz, B. H. & Hart. S. G . (1985). Measuring pilot workload in a motion base trainer: A comparison of four techniques. Proreedzngn of the Thrrd Btannual Symposium on 9uiation Psychology. Columbus, OH: Ohio Stale University, 263-270. Hart, S. G., BaLtiste, V . , Chesney, M. A., Ward, M. M. & MrElroy. M. (in press). Type A vs Type B: comparison of workload, performanre and cardio vascular measures Hart, S. G., Battiste, V. & Lester, P. T. (1984). POPCORN: A supervisory control simulation for workload and performance research. Proceedings of the 20th Annual Conference on Manual Control. (NASA CP-2341) Washington, D. C.: National Aeronautics and Space Administration, 431-454. Hart, S. G., Sellers, J. J . & Guthart, G. (1984). The impact of response selection and response execution difficulty on the subjective experience of workload. Proceedings of the 28th Annual Meeting of the Human Factors Society. Santa Monica, CA: Human Factors
Development of NASA-TLX
181
Society, 732-736. Hart, S. G., Shively, R. J., Vidulich, M. A. & Miller, R. C. (in press). The effects of stimulus modality and task integrality: Predicting dual-task performance and workload from single task levels. Proceedings of the 2 l s l Annual Conference on Manual Control. Columbus OH: Ohio State University. Kantowitz, B. 11.. Hart. 5. G.. Bortolussi. M. R.. Shively, R . J.. & Kantowitz, S. C. (1984) Measuring pilot workload in a moving-base simulator: 11. Building levels of load. Proceedinga of f h e 20th Annual Conference on Manual Control. (NASA CP-2341) Washington, D. C.: National Aeronautics and Space Administration, 359-372. Kantowitz, B. H., Hart, S. G.. Bortolussi, M. R., Shively, R. J. & Kantowitz, S. C. Measuring pilot workload in a moving-base simulator: Building levels of workload. (Unpublished manuscript). Miller, R. C . L Hart, S. G. (1984). Assessing the subjective workload of directional orientation tasks. Proreedings of the 20th Annual Conference on Manual Control. (NASA CP2341) Washington, D. C.: National Aeronautirs and Space Administration, 85-96. Mosier, K. L. & Hart, S. G. (in press). Levels of information processing in a Fitts Law task. Proceedings of the 21st Annual Conference on Manual Control. Columbus OH: Ohio State University. Shively, R. J. (1985). Evaluation of Data Entry Devices. Unpublished masters thesis. West Lafayette, IN: Purdue University. Staveland, L., Hart, S. G. & Yeh, Y.-Y. (in press). Memory and subjective workload assessment. Proceedings of the 21st Annual Conference on Manual Control. Columbus OH: Ohio State Lniversity. Vidulich, M. A. & Tsang, P. S. (in press). Techniques of subjective workload assessment: A comparison of two methodologies. Proceedings of the Thzrd Biannual Symposium on
Avzatzon Psychology. Columbus, OH: Ohio State University 239-246. To appear in a special issue of Ergonomzcs. Vidulich, M. A. 8: Tsang, P. S. (1985). Assessing subjective workload assessment: A comparison of SW.4T and the NASA-Bipolar methods. Proceedings of the of the Human Factors Society 29th Annual Meeting. Santa Monica, CA: Human Factors Society, 71-75. 1161 Yeh, Y.-Y., Wickens, C. D. & Hart, S. G. (1985). The effect of varying task difficulty on subjective workload. Proceedings of the of the Human Factors Society 29th Annual Meet-
ing. Santa Monica, CA: Human Factors Society, 765-770.
182
S. C. Hart and L. E. Stavehnd REFERENCES I l l Recent Experimental uses of the NASA-TLX
11.
Battiste, V. (1987). Part- Task us Whole- Task Training: Twenty years later. Unpublished Master's Thesis. San Jose State University.
12 1
Fuld. R., Liu. Y.,& Wirkens, C. D. (1987). Computer monitoring us self monitoring: The impact o/ automation on error detection. ( A H L - 8 7 - $,/NASA-87-4). Champaign: finiuersily of Illinois, Department of Auiation.
13i
Johnson, W . W. & Hart, S. G . (in press). Step trarking shrinking targets. In Proceedings of the 91st Annual Meeting of the Human b'arfors Society. Santa Monica: Human Factors Society.
141
Liu, Y. & Wickens, C. D. (1987). Mental workload a n d cognitiw task automation: An eualuafion of subjective a n d time estimation techniques. (ERL-87-2,'NASA-87-2). Champaign: Llniversit y of Illinois. Engineering-Psychology Research Laboratory.
!51
Liu, Y. Wirkens, C . D. (in press). The effect of representational code, response modality, and automation on dual-task performance and subjective workload: An integrated approach. I n Proceedings of the 91st Annual Meeling of the Human Factors Society. Santa Monira: H u r n a n Factors Society.
I61
NASA Task Load Index (TLX): Computerized Version. (1986). Moffett Field, CA: NASA-Ames Research Center, Aerospace Human Factors Research Division.
171
NASA Task Load Index (TLX): Paper-and-Pencil Version. (1986). Moffett Field, CA: NASA-Ames Research Center, Aerospace Human Factors Research Division.
!8i
Nataupsky, M. & Abbott, T. (in press). Comparisons of workload measures on a computer-generated primary Right display. In Proceedings of the 31sf Annual Meeting of the Human Factors Society. Santa Monica: Human Factors Society.
191 Pepitone, D. & Shively, R. J . (in press). Predicting pilot workload. In Proceedings ofthe 1987 Aerospace Technology Conference and Ezposition. Long Beach, CA: Society of Automotive Engineers. 1101 Shively, R. J., Battiste, V . . Hamerman-Matsumoto, J., Pepitone,'D. D., Bortolussi, M.
R., & Hart, S. G . ( i n press). lnflight evaluation of pilot workload measures for rotorcraft. In Proceedings of the Fourth Symposium on Aviation Psychology. R. Jensen (Ed.). Columbus: Ohio State University. I l l ] Tsang, P. S. & Johnson, W. W. (in press). Automation: Changes in cognitive demands and mental workload. In Proceedings of the Fourth Symposium on Aviation Psychology. R. Jensen (Ed.). Columbus: Ohio State University.
Development of NASA-TLX
183
12
Tsang, P. S. & Vidulich, M. A. (in press). Time-sharing visual and auditory tracking tasks. In Proceedings of the 91st Annual Meeting of the H u m a n Factors Society. Santa klonica: Human Factors Society.
13
Vidulich, M. A . ?i Pandit. P. (in press). Individiial differences and subjective workload assessment. In Proceedings OJ the Fourth Symposium oil .4uzatzon Psychology. R. Jensen (Ed.). Columbus: Ohio State l'nibersity
14
L'idulirh. I t . A . Pandit, P. ( i n press). Consistent \lappirig and spatial consistency in target dek=rtioii aild response execution. P r o r e e d m g . ~ o/ / h e Fourth M i d - C e n f r a l Ergonomirs ' H u m a n Factors Conference. Champaign: Lniversitj of Illinois.
, 1 5 , Vidulirh. b l . A , . & Tsang, P. S. (in press). Rating scale and paired comparison approaches 1.0 subjective mental workload assessment. In Proceedings of the 91st Annual A l e e f z n g of the Human Factors .Society. Santa Monica: Human Factors Society. 116, Wild, H. M., Stokes. J., Weiland. W. k tlarrington, N. (in press). Experimental evaluation of the submarine localization module for the naval air anti-submarine warfare P3
tactical coordinat,ion officer (Technical Report). Warminster, PA: Naval Air Developnient Center.
This Page Intentionally Left Blank
HUMAN MENTAL WORKLOAD P.A. Hancock and N. Meshkati (Editors) Elsevier Science Publishers B.V. (North-Holland), 1988
185
THE SUBJECTIVE YORKLOAD ASSESSUENT TECHNIQUE: A SCALING PROCEDURE FOR HEASURING MENTAL WORKLOAD Gary B. R e i d H a r r y 6. Armstrong Aerospace M e d i c a l Research L a b o r a t o r y Wright P a t t e r s o n AFB, Ohio
Thomas E. Nygren Department o f Psychology Ohio S t a t e U n i v e r s i t y Mental workload i s proposed t o be a m u l t i d i m e n s i o n a l cons t r u c t t h a t can be l a r g e l y e x p l a i n e d by t h r e e component factors: Time Load, Mental E f f o r t Load, and P s y c h o l o g i c a l S t r e s s Load. I n t h i s paper, we d e s c r i b e a s u b j e c t i v e s c a l i n g approach, t h e S u b j e c t i v e Workload Assessment Techn i q u e (SWAT), t h a t c a p t u r e s t h i s m u l t i d i m e n s i o n a l n a t u r e o f mental workload. We d e s c r i b e t h e SWAT procedure as a twophased method t h a t i n c l u d e s ( a ) a s c a l e developnent phase based on c o n j o i n t measurement and nonmetric s c a l i n g , and ( b ) an event s c o r i n g phase. The developnent o f SWAT and i t s measurement f o u n d a t i o n s a r e discussed. Recent r e s e a r c h i l l u s t r a t i n g SWAT'S widespread u t i l i t y and i t s s e n s i t i v i t y as a measure o f perceived mental workload i s summarized. INTRODUCTION Mental workload i s a c o n s t r u c t t h a t has c o n s i d e r a b l e i n t u i t i v e appeal. Almost everyone can t h i n k of examples where two o r more i n d i v i d u a l s perform e s s e n t i a l l y t h e same a t h l e t i c , w o r k - r e l a t e d , o r academic t a s k t o t h e same o b j e c t i v e l y measured performance l e v e l . Yet, i t i s c l e a r t o t h e i n d i v i d u a l s and t o o b s e r v e r s t h a t some o f t h e s e people must expend much more e f f o r t t h a n o t h e r s t o a c h i e v e t h i s same l e v e l o f performance. The f e e l i n g o f expended e f f o r t appears t o somehow be r e l a t e d t o t h e c o n s t r u c t c a l l e d work1 oad. D e s p i t e t h e importance t h a t many i n v e s t i g a t o r s have a t t a c h e d t o mental workload as a measurable c o n s t r u c t , i t has been p e r p l e x i n g f o r s c i e n t i s t s t o study. As t h e above example i l l u s t r a t e s , t h i s i s p a r t l y because performance measures cannot, o f themselves, d e s c r i b e workload. Operators o f t e n i n c r e a s e e f f o r t as a t a s k becomes more demanding, t h u s i n c r e a s i n g perceived workload, w h i l e s t i l l m a i n t a i n i n g h i g h performance. From t h e p e r s p e c t i v e o f measurement, t h i s i m p l i e s t h a t mental w o r k l o a d i s o n l y mode r a t e l y c o r r e l a t e d w i t h performance measures, and a d d i t i o n a l measures must be developed i f t h e c o n s t r u c t i s t o be adequately described. A g r e a t deal o f research has been conducted o v e r t h e past s e v e r a l y e a r s t o assess t h e f u n c t i o n a l r e l a t i o n s h i p between mental workload and a h o s t o f p h y s i o l o g i c a l , b e h a v i o r a l , and s u b j e c t i v e measures. Recently, s e v e r a l s u b s t a n t i a l papers have presented d i s c u s s i o n s o f ways t o b o t h measure mental w o r k l o a d (O'Donnell & Eggemeier, 1986) and adequately d e s c r i b e i t s m u l t i d i m e n s i o n a l c h a r a c t e r i s t i c s (Gopher & Donchin, 1986). Yet, t h e most fundamental i s s u e r e l a t e d t o t h e s t u d y o f workload, a p r e c i s e d e f i n i t i o n o f t h e term, has remained e l u s i v e and has spawned c o n s i d e r a b l e debate among r e s e a r c h e r s (Moray, 1979).
186
G.B. Reid and T.E. Nygren
One r e a s o n t h a t t h e c o n c e p t o f m e n t a l w o r k l o a d , d e s p i t e i t s d e f i n i t i o n a l e l u s i v e n e s s , i s b o t h t h e o r e t i c a l l y i n t e r e s t i n g and so e a s i l y a c c e p t e d i s t h a t t h e c o n c e p t o f p h y s i c a l w o r k l o a d has been e f f e c t i v e l y u s e d f o r decades. S c i e n t i s t s w o r k i n g i n t h e d i s c i p l i n e s o f e r g o n o m i c s and work p h y s i o l o g y have d e v e l o p e d many measures o f p h y s i c a l work t h a t r e l a t e amount o f work a c c o m p l i s h e d t o t h e e n e r g y c o s t (e.g., oxygen c o n s u m p t i o n A m a j o r work d e s c r i b i n g t h e s e a s s o c i a t e d w i t h l i f t i n g a g i v e n mass). r e s e a r c h methods i s e d i t e d b y S i n g l e t o n , Fox, and W h i t f i e l d ( 1 9 7 3 ) . However, t e c h n o l o g i c a l i n n o v a t i o n s have caused an i n c r e a s i n g amount o f work i n o u r s o c i e t y t o be a s s o c i a t e d w i t h t a s k s t h a t r e q u i r e l i t t l e o r no physical e f f o r t . I n c r e a s i n g l y , t a s k s a r e h e a v i l y loaded w i t h mental a c t i v i t i e s such as i n f o r m a t i o n p r o c e s s i n g , d e c i s i o n making, and s y s t e m monitoring. R e s e a r c h e r s have, t h e r e f o r e , expanded t h e c o n c e p t o f w o r k l o a d t o i n c l u d e m e n t a l work as w e l l as p h y s i c a l work. The e x p a n s i o n o f t h e c o n s t r u c t o f w o r k l o a d t o i n c l u d e m e n t a l a c t i v i t y i s c l o s e l y r e l a t e d t o t h e o r i e s o f a t t e n t i o n and i n f o r m a t i o n p r o c e s s i n g t h a t h a v e been a t o p i c o f r e s e a r c h f o r c o g n i t i v e p s y c h o l o g i s t s (Gopher & Donchin, 1986). Many models h a v e been proposed and d e b a t e d b u t , t o o v e r s i m p l i f y t h e argument, t h e essence o f t h e m a j o r t h e o r i e s i s t h a t t h e human i n f o r m a t i o n p r o c e s s i n g system has a f i n i t e c a p a c i t y o r c a p a c i t i e s , and d i f f e r e n t t a s k s i t u a t i o n s r e q u i r e v a r y i n g degrees o f c a p a c i t y expenditure. I f a person i s i n a h i g h w o r k l o a d s i t u a t i o n , t h e n h e o r she h a s l i t t l e "spare capacity." C o n v e r s e l y , i n a l o w w o r k l o a d s i t u a t i o n , a subs t a n t i a l p o r t i o n o f t h e p e r s o n ' s c a p a c i t y i s untapped. The e x a c t n a t u r e o f t h i s c a p a c i t y o r c a p a c i t i e s has been t h e t o p i c o f c o n s i d e r a b l e d e b a t e (Norman & Bobrow, 1975; Navon & Gopher, 1979; Wickens, 1984; K a n t o w i t z , 1985). I n a d d i t i o n t o t h e study o f t h e t h e o r e t i c a l p r i n c i p l e s t h a t u n d e r l i e t h e c o n s t r u c t o f w o r k l o a d , t h e consequences o f t h e c o n s t r u c t a r e o f c o n c e r n t o systems d e s i g n e r s and e v a l u a t o r s . Modern systems a r e b e i n g d e s i g n e d and put i n t o o p e r a t i o n t h a t i n c o r p o r a t e t h e v i r t u a l e x p l o s i o n o f e n g i n e e r i n g t e c h n o l o g i e s w h i c h have become a v a i l a b l e i n r e c e n t y e a r s . As p r e v i o u s l y n o t e d , t h e s e systems g e n e r a l l y p l a c e o p e r a t o r s i n a d i f f e r e n t t y p e o f work e n v i r o n m e n t f r o m what h a s been t r u e h i s t o r i c a l l y . The o p e r a t o r ' s r o l e i s i n c r e a s i n g l y t h a t o f a system m o n i t o r , i n f o r m a t i o n manager, and d e c i s i o n maker. Many o f t h e manual t a s k s t h a t o p e r a t o r s have p r e v i o u s l y p e r f o r m e d a r e now b e i n g automated. Due t o t h e advances i n computer t e c h n o l o g y and e l e c t r o n i c s e n s i n g , more i n f o r m a t i o n i s a v a i l a b l e t o d i s p l a y t o o p e r a tors. B u t , space t o d i s p l a y t h i s v a s t q u a n t i t y o f i n f o r m a t i o n a t an o p e r a t o r ' s work s t a t i o n has become overcrowded. To r e l i e v e t h i s o v e r c r o w d i n g , mu1 t i m o d e d i s p l a y s and mu1 t i f u n c t i o n s w i t c h e s have been d e v e l oped and i n s t a l l e d i n o p e r a t o r work s t a t i o n s . These advances, a l t h o u g h p r o v i d i n g o p e r a t o r s w i t h more c o m p l e t e i n f o r m a t i o n t h a n was p r e v i o u s l y p o s s i b l e , p l a c e new demands on them. O p e r a t o r s n o t o n l y must a s s i m i l a t e t h e i n f o r m a t i o n p r e s e n t e d t o them, b u t t h e y must f r e q u e n t l y a l s o d e c i d e w h a t i n f o r m a t i o n i s needed and where i t s h o u l d be d i s p l a y e d . On o c c a s i o n , as modern systems have come i n t o o p e r a t i o n , o p e r a t o r s have c o m p l a i n e d t h a t t h e w o r k l o a d a s s o c i a t e d w i t h o p e r a t i n g t h e s e systems i s excessive. A l s o , some a c c i d e n t s and n e a r a c c i d e n t s h a v e r a i s e d q u e s t i o n s a b o u t t h e l e v e l o f o p e r a t o r w o r k l o a d and s y s t e m s a f e t y . Such q u e s t i o n s have had t o be a d d r e s s e d i n human f a c t o r s e v a l u a t i o n s w h i c h , i n t u r n , h a v e d e m o n s t r a t e d a need f o r r e l i a b l e and e f f e c t i v e methods f o r m e a s u r i n g w o r k load. Thus, a c o n s i d e r a b l e amount o f r e s e a r c h i n r e c e n t y e a r s has been
The Subjective Workload Assessment Technique
187
d i r e c t e d toward developnent o f s e n s i t i v e and r e l i a b l e workload measurement i n s t r u m e n t s (cf., O'Donnell & Eggemeier, 1986). Other c h a p t e r s i n t h i s book d e s c r i b e t h e c u r r e n t s t a t e o f much o f t h i s research. Many o f t h e s e measurement procedures a r e very promising, b u t g e n e r a l l y t h e y a r e s t i l l It i s c l e a r , however, t h a t l a r g e l y r e s t r i c t e d t o r e s e a r c h environments. each of t h e s e r e p o r t e d measures has s t r e n g t h s and weaknesses. It appears e q u a l l y c l e a r t h a t because o f t h e c o m p l e x i t y o f t h e workload c o n s t r u c t , i t i s u n l i k e l y t h a t any s i n g l e measure w i l l be c o m p l e t e l y adequate i n prov i d i n g t h e t y p e o f a p p l i e d measurement mechanism t h a t i s d e s i r e d and a t t h e same t i m e be a p p l i c a b l e t o a l l k i n d s o f a p p l i e d work s i t u a t i o n s . These v a r i o u s measures o f mental w o r k l o a d a r e c u s t o m a r i l y d i v i d e d i n t o three classes: ( 1 ) s u b j e c t i v e , ( 2 ) p h y s i o l o g i c a l , and ( 3 ) b e h a v i o r a l o r performance. One o f t h e s e classes, s u b j e c t i v e measures, has c o n s i d e r a b l e appeal f o r a p p l i e d s i t u a t i o n s . The remainder o f t h i s c h a p t e r w i l l be focused upon t h i s c l a s s o f measures and upon one s u b j e c t i v e measure i n particular. S u b j e c t i v e Measurement o f Workload Although c o n s i d e r a b l e e f f o r t has been expended t o d e v e l o p automated and o b j e c t i v e measures o f workload, one method t h a t c o n t i n u e s t o be p o p u l a r i n o p e r a t i o n a l e v a l u a t i o n s i s s i m p l y t o ask t h e o p e r a t o r t o r e p o r t how h a r d he o r she i s working. T h i s i s , i n f'act, how we can d e f i n e a s u b j e c t i v e measure o f workload. It i s one t h a t i s based on a s u b j e c t ' s d i r e c t e s t i mate o r comparison judgment o f t h e workload experienced a t a g i v e n moment. W i l l i g e s and W i e r w i l l e r e p o r t e d i n t h e i r 1979 r e v i e w paper t h a t s u b j e c t i v e measures a r e t h e most f r e q u e n t l y used methods f o r workload assessment. The i n t e r v e n i n g e i g h t y e a r s o f workload r e s e a r c h have n o t dramatically altered the situation. There a r e s e v e r a l reasons f o r t h e p o p u l a r i t y o f s u b j e c t i v e w o r k l o a d measures. The f i r s t and probably t h e most i m p o r t a n t i s t h a t s u b j e c t i v e measures enjoy high face v a l i d i t y . Operators and d e s i g n e n g i n e e r s can r e a d i l y accept t h a t i f o p e r a t o r s t h i n k t h a t t h e r e i s t o o much work a s s o c i a t e d w i t h t h e o p e r a t i o n o f a c e r t a i n system, t h e n d e s i g n a l t e r n a t i v e s must be found. Secondly, s u b j e c t i v e measures a r e somewhat more d i r e c t t h a n many o f t h e o t h e r measures. I f someone wants t o know how much w o r k l o a d i s r e q u i r e d i n a c e r t a i n i n s t a n c e , measures o f p h y s i o l o g i c a l and b e h a v i o r a l v a r i a b l e s r e q u i r e knowledge o f t h e f u n c t i o n a l r e l a t i o n s h i p between t h e s e v a r i a b l e s and workload. A complete u n d e r s t a n d i n g o f t h e s e r e l a t i o n s h i p s i s n o t c u r r e n t l y a v a i l a b l e , a l t h o u g h progress toward t h i s end i s b e i n g made (see o t h e r c h a p t e r s i n t h i s volume; O'Donnell & Eggemeier, 1986; Gopher & Donchin, 1986). Conversely, i n t h e same s i t u a t i o n , i f o p e r a t o r s a r e asked t o assess t h e degree o f workload, t h e y can d e s c r i b e i n a t l e a s t a general o r d i n a l way, how h a r d t h e y a r e working. I n debating t h e issue o f what i s workload, Johanssen, Moray, Pew, Rasmussen, Sanders, and Wickens (1979) have concluded t h a t i f an o p e r a t o r t h i n k s he i s l o a d e d down and under s t r e s s i n a s i t u a t i o n , t h e n one must conclude t h a t h e i s , This l o g i c r e g a r d l e s s o f what o t h e r i n d i c e s m i g h t l e a d you t o conclude. i m p l i e s y e t a n o t h e r reason f o r t h e p o p u l a r i t y o f s u b j e c t i v e measures. O f t e n t h e approach used t o v a l i d a t e o b j e c t i v e measures i s t o demonstrate t h a t t h e s e measures can, i n f a c t , p r e d i c t o r a r e c o r r e l a t e d w i t h subj e c t i v e measures. F i n a l l y , t h e ease a s s o c i a t e d w i t h o b t a i n i n g s u b j e c t i v e measures makes them v e r y a d a p t a b l e t o o p e r a t i o n a l environments l i k e t h e system d e s i g n e v a l u a t i o n s p r e v i o u s l y mentioned. Instrumentat i o n
188
G.B. Reid and T.E. Nygren
requirements a r e minimal and t h e t i m i n g o f d a t a c o l l e c t i o n can be t a i l o r e d t o f i t t h e p a r t i c u l a r operational s i t u a t i o n . D e s p i t e t h e p o p u l a r i t y and u s e f u l n e s s o f s u b j e c t i v e measures f o r operat i o n a l s i t u a t i o n s , u n t i l r e c e n t l y , t h e y were t h e l e a s t researched c l a s s o f workload measures. W i l l i g e s and W i e r w i l l e (1979) n o t e d i n t h e i r r e v i e w t h a t u p t o t h a t p o i n t , i n most cases s u b j e c t i v e measures r e p r e s e n t e d " a s i t u a t i o n - s p e c i f i c , a d j u n c t measurement i n s t r u m e n t w i t h no accompanying v a l i d i t y o r r e l i a b i l i t y data" (p. 552). They a l s o observed t h a t " g i v e n t h e widespread use and general a p p l i c a b i l i t y o f r a t i n g s c a l e s as a t e c h n i q u e o f workload assessment, i t i s s u r p r i s i n g t h a t a r i g o r o u s l y developed workload r a t i n g s c a l e has n o t been developed" ( W i l l i g e s & W i e r w i l l e , 1979, p. 552). I n response t o t h i s recognized need, r e s e a r c h e r s a t t h e U.S. A i r F o r c e ' s H a r r y G. Armstrong Aerospace Medical Research L a b o r a t o r y have developed SWAT, t h e S u b j e c t i v e Workload Assessment Technique (Reid, Shingledecker, Nygren, ti Eggemeier, 1981). SWAT i s a s c a l i n g procedure t h a t has been developed f o r use i n a p p l i e d s e t t i n g s . What d i s t i n g u i s h e s SWAT f r o m most o t h e r s u b j e c t i v e r a t i n g methods i s t h a t i t was r i g o r o u s l y developed t o be r o o t e d i n f o r m a l measurement t h e o r y , s p e c i f i c a l l y c o n j o i n t measurement theory. The o v e r r i d i n g p r i n c i p l e s t h a t have guided t h e developnent o f SWAT have been ( a ) t o d e v e l o p as p r e c i s e a measure as p o s s i b l e w h i l e m i n i m i z i n g t h e i n t r u s i v e n e s s o f t h e d a t a c o l l e c t i o n procedure on t h e o p e r a t i o n a l s i t u a t i o n , ( b ) t o place minimal measurement c o n s t r a i n t s on t h e c o m p l e x i t y o f t h e judgmental t a s k t h a t i s r e q u i r e d o f t h e o p e r a t o r s making workload e v a l u a t i o n s , and ( c ) t o p r o v i d e a mechanism f o r t e s t i n g t h e v a l i d i t y o f t h e formal measurement model t h a t i s assumed by t h e u n d e r l y i n g a d d i t i v e model i n SWAT. One c r i t i c i s m o f t e n made o f s u b j e c t i v e measures i s t h a t t h e y a r e based on t h e s u b j e c t s ' a b i l i t y t o r e p o r t d i r e c t numerical e s t i m a t e s o f workload o r dimensional components o f workload. What i s o f t e n assumed i n t h e s e approaches, w i t h o u t v e r i f i c a t i o n , i s t h a t t h e s u b j e c t s ' judgments have i n t e r v a l o r r a t i o - s c a l e properties. That i s , i f i t i s assumed i n t h e s c a l i n g procedure t h a t we have i n t e r v a l measurement, t h e n i f a s u b j e c t were t o g i v e workload r a t i n g s o f 2, 4, and 6 t o t h r e e d i f f e r e n t t a s k s (A, B, and C ) , t h e d i f f e r e n c e s i n perceived workload between A and B and between B and C would be i n f e r r e d t o be equal. I n o t h e r words, i t i s assumed t h a t s u b j e c t s can make a c c u r a t e e q u a l - i n t e r v a l judgments on t h e workload scales. I n c o n t r a s t , procedures such as b i s e c t i o n o r magnitude e s t i m a t i o n assume t h a t s u b j e c t s can make r a t i o judgments. Thus, t h e s e t e c h n i q u e s assume t h a t a s u b j e c t can make r e l i a b l e i n f e r e n c e s t h a t B i s t w i c e as much work Any t i m e a s c a l i n g as A and t h a t C i s t h r e e t i m e s as much work as A. procedure r e q u i r e s s u b j e c t s t o b i s e c t an i n t e r v a l (e.g., f i n d t h e task t h a t i s h a l f as much work as t h e s t a n d a r d t a s k ) o r t o judge r a t i o s (e.g., t w i c e as l a r g e , etc.), i t i s assuming t h a t t h e numerical e s t i m a t e s a r e on a r a t i o scale. C l e a r l y , t h e s e t e c h n i q u e s make s t r o n g assumptions which A t t h e very l e a s t , such s u b j e c t i v e measuremust be e m p i r i c a l l y t e s t e d . ment procedures f o r c e a d i f f i c u l t judgment t a s k on t h e s u b j e c t s and would r e q u i r e t h a t t h e y be w e l l - t r a i n e d i n t h e use o f t h e scale. The SWAT procedure t o be d e s c r i b e d below does n o t make s t r o n g assumptions about s u b j e c t s ' a b i l i t i e s t o make judgements. Rather, i n SWAT, s c a l e
The Subjective Workload Assessment Technique
189
d e v e l o p n e n t i s based o n l y on o r d i n a l i n f o r m a t i o n t h a t i s i n f e r r e d f r o m Hence, SWAT o n l y r e q u i r e s t h a t r a n k i n g s o r p a i r e d c o m p a r i s o n judgments. t h e s u b j e c t can m e a n i n g f u l l y o r d e r t h e a l t e r n a t i v e s w i t h r e s p e c t t o t h e l e v e l o f perceived workload. The SWAT s c a l e was d e v e l o p e d on t h e b a s i s o f a m i n i m a l s c a l i n g method t h a t a l s o has f a c e v a l i d i t y . I n SWAT, one o b t a i n s a c t u a l w o r k l o a d o r d e r i n g s t o produce a w o r k l o a d s c a l e , i n a manner s i m i l a r t o t h e way m a r k e t i n g r e s e a r c h e r s , f o r example, u s e o b s e r v e d preference o r d e r i n g s t o o b t a i n a s t r e n g t h o f preference scale. In the r e m a i n d e r o f t h i s c h a p t e r , we w i l l d e s c r i b e t h e d e v e l o p n e n t p r o c e s s , t h e measurement b a s i s , and t h e p h i l o s o p h y t h a t has been used i n t h e d e s i g n o f SWAT, as w e l l as p r e s e n t some o f t h e d a t a t h a t h a v e been o b t a i n e d u s i n g t h i s procedure.
MENTAL WORKLOAD OPERATIONALLY DEFINED As p r e v i o u s l y mentioned, a d e f i n i t i o n o f t h e t h e o r e t i c a l c o n s t r u c t , m e n t a l w o r k l o a d , t h a t a l l w o r k l o a d s c i e n t i s t s c a n a c c e p t does n o t y e t e x i s t . T h i s was even more t r u e i n 1980 when SWAT d e v e l o p n e n t began. One e x p l a n a t i o n as t o why t h e p r e c i s e d e f i n i t i o n h a s been s o e l u s i v e may r e l a t e t o one a s p e c t o f w o r k l o a d t h a t many s c i e n t i s t s do a g r e e upon. Many researchers i n t h e f i e l d b e l i e v e t h a t mental workload i s n o t a s i n g l e u n i d i m e n s i o n a l phenomenon, b u t i s a c o n s t r u c t composed o f s e v e r a l e l e m e n t s or d i m e n s i o n s . A t t h i s p o i n t , however, t h e agreement among r e s e a r c h e r s a p p e a r s t o end. I n an a t t e m p t t o d e v e l o p a consensus d e f i n i t i o n o f m e n t a l w o r k l o a d , we c o n d u c t e d a l i t e r a t u r e r e v i e w and n o t e d what many s c i e n t i s t s b e l i e v e d t o be t h e c r i t i c a l components t h a t go i n t o t h e p e r c e p t i o n o f ment a l wprkload. T a b l e 1 p r e s e n t s t h e e l e m e n t s t h a t went i n t o o v e r 20 s c i e n tists d e f i n i t i o n o f m e n t a l w o r k l o a d i n 1980. These d e f i n i t i o n s were s t u d i e d f o r a r e a s o f agreement; i n many cases, a t r a n s l a t i o n was p e r f o r m e d t o c a p t u r e t h e essence o f an i n v e s t i g a t o r ' s d e f i n i t i o n w h i l e p u t t i n g a l l o f t h e d e f i n i t i o n s i n common terms. Based on o u r r e v i e w and d e s p i t e t h e d i s a g r e e m e n t as t o a p r e c i s e d e f i n i t i o n o f m e n t a l w o r k l o a d , i t c a n b e observed, i f our t r a n s l a t i o n i s a c c u r a t e , t h a t t h r e e v a r i a b l e s a p p e a r i n a majority o f the definitions. The l i t e r a t u r e r e v i e w c l e a r l y i n d i c a t e d t h a t a l m o s t e v e r y o n e t h o u g h t t h a t i n some way t i m e p r e s s u r e i s a m a j o r component o f w o r k l o a d . This i s , t o some e x t e n t s u p p o r t e d by t h e p r a c t i c e w i t h i n t h e a i r c r a f t i n d u s t r y o f u s i n g t i m e l i n e a n a l y s i s as t h e p r i n c i p a l way o f e v a l u a t i n g t h e adequacy o f a c o c k p i t design w i t h r e s p e c t t o o p e r a t o r workload. As a r e s u l t o f t h e g e n e r a l agreement r e g a r d i n g t i m e , t h e c o n c e p t u a l framework t h a t was d e v e l oped f o r SWAT i n c l u d e d Time Load as t h e f i r s t f a c t o r or d i m e n s i o n . Time Load, as o p e r a t i o n a l l y d e f i n e d f o r our purposes, means b o t h t i m e a v a i l a b l e and t a s k o v e r l a p . C l e a r l y , i f t h e t i m e r e q u i r e d t o p e r f o r m a t a s k exceeds t h e t i m e a v a i l a b l e , t h e o p e r a t o r has a t i m e l o a d problem. Another t i m e I f an f a c t o r , w h i c h on o c c a s i o n i s o v e r l o o k e d , i n v o l v e s t a s k o v e r l a p . o p e r a t o r i s p e r f o r m i n g a complex t a s k , i t may be made u p o f many component t a s k s or s u b t a s k s . Each o f t h e s e t a s k s h a s i t s own t i m e demands. I f we assume t h a t t h e o p e r a t o r has t h e s k i l l s or a b i l i t i e s demanded by t h e t a s k s , as l o n g as t h e t a s k s c a n b e c o m p l e t e d s e q u e n t i a l l y , t h e o p e r a t o r I f , o n t h e o t h e r hand, can m a i n t a i n p e r f o r m a n c e a t an a c c e p t a b l e l e v e l . t h e t a s k s s t a r t t o compete f o r t h e o p e r a t o r ' s t i m e r e s o u r c e s , h e or sh? w i l l b e f o r c e d t o e v a l u a t e t h e t a s k s f o r p r i o r i t y and a l l o w some t a s k s performance t o d e t e r i o r a t e and/or t h e i r completion t o be delayed. Under t h i s s e t o f c i r c u m s t a n c e s , we c o n t e n d t h a t t h e o p e r a t o r i s a l s o u n d e r a t i m e load.
190
G.B. Reid and T.E. Nygren TABLE 1.
TIME LOAD
ELEMENTS DEFINING MENTAL WORKLOAD
M E N T A L EFFORT LOAD
PSYCHOLOGICAL STRESS LOAD
The Subjective Workload Assessment Technique
191
The second d i m e n s i o n t h a t t e n d e d t o emerge f r o m i n s p e c t i o n o f t h e o p i n i o n s o f r e s e a r c h e r s i n o u r l i t e r a t u r e r e v i e w i s one t h a t d e a l s w i t h t a s k f a c t o r s such as d i f f i c u l t y , c o m p l e x i t y , o r e f f o r t . T h i s d i m e n s i o n i s r e l a t e d t o t h e w e a l t h o f r e s e a r c h i n c o g n i t i v e p s y c h o l o g y where t h e demands a s s o c i a t e d w i t h v a r i o u s l e v e l s o f a t a s k have been m a n i p u l a t e d by such t h i n g s as t h e number o f e l e m e n t s t h a t t h e s u b j e c t must p r o c e s s , t h e f o r c i n g f u n c t i o n d r i v i n g a t r a c k i n g t a s k , i n d u c t i v e reasoning, d e d u c t i v e reasoning, o r memory r e t r i e v a l . T h i s d i m e n s i o n a l s o t e n d s t o encompass t h e c o n c e p t o f mental c a p a c i t y o r c a p a c i t i e s r e f e r r e d t o previously. I n a p p l y i n g t h i s model, one assumes t h a t t h e human o p e r a t o r h a s a l i m i t e d c a p a c i t y . P e r f o r m a n c e o f one t a s k may consume a c e r t a i n amount o f an o p e r a t o r ' s r e s o u r c e s , w h i l e a n o t h e r t a s k may consume o t h e r r e s o u r c e s . The i m p l i c a t i o n i s t h a t t h e r e s o u r c e s t h a t a r e n o t expended i n t a s k p e r f o r m a n c e a r e h e l d i n r e s e r v e t o b e used f o r o t h e r t a s k s o r as a way t o prt more e f f o r t toward accomplishing a c u r r e n t task. The e x a c t n a t u r e o f t h i s l i m i t e d c a p a c i t y i s t h e s u b j e c t o f a l a r g e body of r e s e a r c h b u t t h e b a s i c n o t i o n o f l i m i t e d c a p a c i t y f o r work seems t o b e i n h e r e n t i n t h e c o n c e p t u a l i z a t i o n o f m e n t a l w o r k l o a d ( D o n c h i n & Gopher, 1986). The second d i m e n s i o n p o s t u l a t e d f o r t h e SWAT framework was, t h e n , c a l l e d M e n t a l E f f o r t Load. Mental E f f o r t Load i n v o l v e s such processes as p e r f o r m i n g c a l c u l a t i o n s , m a k i n g d e c i s i o n s , a t t e n d i n g t o i n f o r m a t i o n sources, p l a c i n g i n f o r m a t i o n i n s h o r t t e r m memory and r e t r i e v i n g i t , r e t r i e v i n g r e l e v a n t i n f o r m a t i o n f r o m l o n g t e r m memory, and e s t i m a t i o n . T h i s l i s t o f f e r s a s u g g e s t i o n as t o t h e k i n d s o f p r o c e s s e s t h a t a r e a s s o c i a t e d w i t h t h i s d i m e n s i o n and i s n o t intended t o be i n c l u s i v e . I n essence, M e n t a l E f f o r t Load i s t h e d i m e n s i o n t h a t i s u s e d t o a c c o u n t f o r most o f t h e c a p a c i t y e f f e c t s d i s c u s s e d e a r l ie r . The t h i r d commonly o b s e r v e d c h a r a c t e r i s t i c o f w o r k l o a d d e a l s w i t h t h e g e n e r a l c o n c e p t o f p s y c h o l o g i c a l s t r e s s and seems t o encompass a number o f o p e r a t o r v a r i a b l e s such as m o t i v a t i o n , t r a i n i n g , f a t i g u e , h e a l t h , and emot i o n a l state. T h i s d i m e n s i o n may b e r e p r e s e n t e d b y such s p e c i f i c s t r e s s o r s as f e a r o f p h y s i c a l harm, f e a r o f f a i l u r e , t e n s i o n , u n f a m i l i a r i t y , and d i s o r i e n t a t i o n , t o name a few. I n addition, physical stressors such as t e m p e r a t u r e , v i b r a t i o n , G - f o r c e s , and n o i s e may b e i n c l u d e d . These a r e s t r e s s o r s t h a t a r e known t o a f f e c t p e r f o r m a n c e when t h e y a r e present i n moderate t o h i g h l e v e l s . However, a t l o w l e v e l s t h e y may o n l y be a s o u r c e o f i r r i t a t i o n t o t h e o p e r a t o r . I n t h e s e s i t u a t i o n s , some d e g r e e o f e f f o r t may be r e q u i r e d b y t h e o p e r a t o r t o manage h i s o r h e r d i s c o m f o r t and, t h u s , a f f e c t t h e p e r c e i v e d w o r k l o a d . Presence o f v a r i a b l e s such as t h e s e i s d e f i n e d as b e i n g p a r t o f t h e m e n t a l w o r k l o a d d i m e n s i o n , psychological stress. Hence, we c a l l e d t h e t h i r d d i m e n s i o n P s y c h o l o g i c a l S t r e s s Load and d e f i n e d i t as a n y t h i n g t h a t c o n t r i b u t e s t o an o p e r a t o r ' s confusion, f r u s t r a t i o n , and/or anxiety.
It i s i m p o r t a n t t o emphasize t h a t t h e above summary d e f i n i t i o n o f m e n t a l w o r k l o a d was n o t proposed t o end t h e t h e o r e t i c a l d e b a t e c o n c e r n i n g a p r e cise definition. R a t h e r , t h e d e f i n i t i o n i s i n t e n d e d t o c a p t u r e most o f t h e i m p o r t a n t components t h a t a p p e a r t o i n f l u e n c e p e o p l e ' s percept% of workload. The p l r p o s e o f a t h r e e - d i m e n s i o n a l d e f i n i t i o n , r a t h e r t h a n one t h a t a t t e m p t e d t o i n c l u d e a l l r e l e v a n t d i m e n s i o n s , was t o make t h e measSWAT i s i n t e n d e d urement o f w o r k l o a d f e a s i b l e i n o p e r a t i o n a l s i t u a t i o n s . t o be a pragmatic approach t o t h e e s t i m a t i o n o f mental workload i n operational situations. An o v e r r i d i n g c o n c e r n i n d e v e l o p i n g t h i s p r o c e d u r e was t o minimize i n t r u s i o n t o operators w h i l e providing t h e best possible
192
G.B. Reid and T. E. N-vgren
mechanism f o r d i s c r i m i n a t i o n o f w o r k l o a d l e v e l s , e s p e c i a l l y d i f f e r e n t i a t i n g between p e r c e p t i o n s o f m o d e r a t e and h i g h l e v e l s . These o b j e c t i v e s seemed t o b e most amenable t o t h e u s e o f a s u b j e c t i v e s c a l i n g procedure. Since t h e working d e f i n i t i o n postulated a multidimens i o n a l c o n s t r u c t , a m u l t i d i m e n s i o n a l a l t e r n a t i v e t o t r a d i t i o n a l unidimenA s c a l i n g approach c a l l e d c o n j o i n t s i o n a l s c a l i n g approaches was used. m e a s u r e m e n t / c o n j o i n t s c a l i n g a p p e a r e d t o b e f e a s i b l e and h a d been u s e d i n c l o s e l y a s s o c i a t e d e f f o r t s m e a s u r i n g systems o p e r a b i l i t y ( D o n n e l l & O'Connor, 1978; D o n n e l l , 1979). It was chosen as an a p p r o a c h w i t h good p o t e n t i a l f o r m e a s u r i n g t h e complex c o n s t r u c t o f w o r k l o a d . Because o f t h e r e l a t i v e r e c e n c y o f i t s d e v e l o p n e n t and i t s f u n d a m e n t a l r e l a t i o n s h i p t o SWAT, a b r i e f o v e r v i e w o f t h e c o n j o i n t measurement m e t h o d o l o g y w i l l b e presented.
CONJOINT MEASUREMENT AND CONJOINT SCALING I n many judgment and d e c i s i o n m a k i n g s i t u a t i o n s where a s u b j e c t i v e s c a l i n g t e c h n i q u e seems p a r t i c u l a r l y r e l e v a n t o r u s e f u l , i t i s o f t e n assumed t h a t t h e v a r i a b l e o f i n t e r e s t , i n t h i s c a s e m e n t a l w o r k l o a d , i s a complex phenomenon t h a t i s a c t u a l l y c o m p r i s e d o f s e v e r a l p e r c e p t u a l l y i n d e p e n d e n t d i m e n s i o n s (i.e., Time Load, M e n t a l E f f o r t Load, P s y c h o l o g i c a l S t r e s s Load).' It i s a l s o o f t e n t h e c a s e t h a t s c i e n t i s t s w o u l d l i k e t o know t h e c o m p o s i t i o n r u l e t h a t p e o p l e a c t u a l l y use t o combine i n f o r m a t i o n f r o m t h e s e p e r c e i v e d d i m e n s i o n s o r f a c t o r s i n t o t h e more complex c o n s t r u c t . C o n j o i n t measurement t h e o r y p r o v i d e s a p o w e r f u l m e t h o d o l o g y f o r accomplishing this. I t s power l i e s i n t h e f a c t t h a t i t uses o n l y o b s e r v e d o r d i n a l o r rank o r d e r i n f o r m a t i o n a b o u t t h e complex c o n s t r u c t i n o r d e r t o e m p i r i c a l l y e s t a b l i s h a combination r u l e t h a t f i t s a respondent's data. Axiom T e s t s f o r C o n j o i n t Measurement A l t h o u g h t h e m a t h e m a t i c a l f o u n d a t i o n s f o r c o n j o i n t measurement t h e o r y h a v e been i n e x i s t e n c e f o r many y e a r s ( H o l d e r , 1 9 0 1 ) , p r o c e d u r e s f o r d e v e l o p i n g s c a l e s were i m p r a c t i c a l u n t i l t h e developnent o f numerical a n a l y s i s a l g o r i t h m s f o r u s e on modern compcters. I n 1964, Luce and Tukey p u b l i s h e d t h e f i r s t a r t i c l e t h a t described a s e t o f s u f f i c i e n t conditions f o r addit i v e c o n j o i n t measurement i n t w o f a c t o r s . I n t h e i r c l a s s i c a l work on measurement t h e o r y , K r a n t z , Luce, Suppes, and T v e r s k y ( 1 9 7 1 ) b u i l t on t h i s work and e a r l i e r i n d e p e n d e n t work by K r a n t z ( 1 9 6 4 ) and T v e r s k y (1967) and extended t h e t h e o r y o f a d d i t i v e c o n j o i n t s t r u c t u r e s i n t o a general t h e o r y o f p o l y n o m i a l c o n j o i n t measurement f o r s i m p l e p o l y n o m i a l c o m p o s i t i o n r u l e s i n t h r e e o r more f a c t o r s . The g e n e r a l t h e o r y as o u t l i n e d by K r a n t z e t a l . ( 1 9 7 1 ) p r o v i d e s f o r a s e r i e s o f axioms, w h i c h can be t e s t e d o n a s e t o f d a t a , t o d i s c r i m i n a t e 'It i s i m p o r t a n t t o d i s t i n g u i s h t h e c o n c e p t s t a t i s t i c a l i n d e p e n d e n c e f r o m p e r c e p t u a l independence. What i s c r i t i c a l f o r a v a l i d a d d i t i v e r e p r e s e n t a t i o n o f a psychological c o n s t r u c t i s n o t t h a t t h e dimensions o r f a c t o r s a r e completely u n c o r r e l a t e d i n t h e r e a l world, b u t r a t h e r t h a t t h e i n d i v i d u a l d e c i s i o n maker p e r c e i v e s them a s b e i n g p e r c e p t u a l l y i n d e p e n d e n t . I n o t h e r words, t h e i n d i v i d u a l c a n a l w a y s m e a n i n g f u l l y e v a l u a t e d i f f e r ences i n one f a c t o r w i t h t h e o t h e r s h e l d c o n s t a n t . (See K r a n t z e t a l . , 1971, f o r a f u r t h e r d i s c u s s i o n o f t h i s p r o p e r t y . )
The Subjective Workload Assessment Technique
193
among f o u r s i m p l e polynomial models t o determine which o f them b e s t f i t t h e s e t of data. F o r example, i n our case we l e t T, E, and S r e p r e s e n t t h e t h r e e proposed workload dimensions o f Time Load, Mental E f f o r t Load, As shown i n d e t a i l l a t e r i n o u r s p e c i f i c and P s y c h o l o g i c a l S t r e s s Load. d i s c u s s i o n of SWAT, t h r e e l e v e l s f o r each o f t h e s e dimensions can be d e f i n e d and l a b e l e d as t i , t 2 , t g ; e l , e2, e3; and s1, s2, and s3, respectively. F i n a l l y , l e t g ( t l ) , h ( e l ) , and k ( s 1 ) i l l u s t r a t e t h e s u b j e c t i v e s c a l e values a s s o c i a t e d w i t h t h r e e o f t h e l e v e l s f o r a g i v e n individual. These l e v e l s o f t h e t h r e e f a c t o r s combine t o f o r m a u n i q u e workload c o m b i n a t i o n (1,1,1), and i t s o v e r a l l judged value, f ( t 1 , e l , 51). can be found v i a e i t h e r : an a d d i t i v e model, i f f ( t i , el, s i ) = g(ti)+h(ei)+k(sl),
(1)
a m u l t i p l i c a t i v e model, i f f ( t i , el, s i ) = g(ti)*h(ei)*k(sl),
(2)
a d i s t r i b u t i v e model, i f f ( t i , e l , s i ) = g ( t )*[h(e i ) + k ( s
111,
(3)
o r a d u a l - d i s t r i b u t i v e model, i f f(t1,
el, si) = g(t
(4)
Note t h a t i n t h e l a t t e r t h r e e models, t h e o v e r a l l value o f t h e combined e f f e c t o f t h e t h r e e f a c t o r s , f ( t 1 , e l , s l ) , c o u l d be c o m p l e t e l y erased i f one o f t h e m u l t i p l i c a t i v e f a c t o r s has a z e r o l e v e l . I n t h i s case i t would n o t m a t t e r what t h e l e v e l s o f t h e o t h e r f a c t o r s were. F o r an a d d i t i v e model, o f course, t h i s i s n o t t h e case, s i n c e a zero l e v e l o f a f a c t o r would make o n l y t h a t f a c t o r i r r e l e v a n t f o r t h e combined s t i m u l u s e f f e c t . Since i n many a p p l i c a t i o n s one would n o t expect t o f i n d a m u l t i p l i c a t i v e f a c t o r w i t h t h i s z e r o l e v e l p r o p e r t y , most t h e o r e t i c a l and e m p i r i c a l r e s e a r c h i n c o n j o i n t measurement has focused on t h e a d d i t i v e model. The K r a n t z e t a l . (1971) axioms d e f i n e f i v e o r d i n a l p r o p e r t i e s t h a t a r e u s e f u l i n d i f f e r e n t i a t i n g among t h e models i n Equations 1-4. In addition, a l l a r e necessary a l t h o u g h n o t s u f f i c i e n t f o r t h e a d d i t i v e model. These a r e s i m p l e or s i n g l e f a c t o r independence, j o i n t f a c t o r independence, d o u b l e cancel 1 a t i on, d i s t r i b u t i v e cancel 1a t i o n , and d u a l - d i s t r i b u t i ve cancellation. I t i s c l e a r f r o m t h e r e s u l t s o f a r e c e n t Monte C a r l o s t u d y (Nygren, 1985) t h a t t h e c r i t i c a l axioms t h a t a r e used t o assess a d d i t i v i t y a r e s i m p l e independence, j o i n t independence, and d o u b l e c a n c e l l a t i o n . Simple or s i n g l e f a c t o r independence means t h a t t h e o r d e r i n g o f t h e l e v e l s o f one f a c t o r (e.g., Time Load) must s t a y t h e same a t a l l t h e l e v e l s o f t h e o t h e r f a c t o r s (e.g., E f f o r t Load and S t r e s s Load). Note t h a t t h i s s i m p l e independence i s an axiom t h a t d e s c r i b e s a m o n o t o n i c i t y ; i t does n o t r e f e r t o s t a t i s t i c a l independence. Hence, t h e o r e t i c a l l y i t would be q u i t e p o s s i b l e , f o r example, t o f i n d t h a t Time Load was independent (monotonic) o f E f f o r t and S t r e s s , b u t t h a t E f f o r t was n o t independent o f S t r e s s and Time or S t r e s s was n o t independent o f E f f o r t and Time. To t h e e x t e n t t h a t t h e m o n o t o n i c i t y or s i m p l e independence p r o p e r t y h o l d s , an a d d i t i v e model i s supported. For s i m p l i c i t y , we l e t ( l , l , l ) , (1,1,2), (1,1,3), ,
...
G.B. Reid and T.E. Nygren
194
(3,3,3) r e p r e s e n t t h e s t i m u l i f o r m e d b y c o m b i n i n g a1 p o s s i b l e l e v e l s o f t h e Time Load, E f f o r t Load, and S t r e s s Load f a c t o r s . As an i l l u s t r a t i o n o f t h e s i m p l e independence axiom, suppose t h a t i t was f o u n d t h a t an i n d i v i d u a l o r d e r e d t h e s t i m u l u s c o m b i n a t i o n s ( l , [ l , 1 ) < (2,[1,1]) but (1,[2,2]) > (2,[2,2]). These o r d e r i n g s w o u l d be a v i o l a t i o n o f s i n g l e f a c t o r i n d e p e n d e n c e f o r Time Load o f E f f o r t Load and S t r e s s Load, because t h e o r d e r i n g o n t h e Time Load f a c t o r i s " < " i n one c a s e (1 < 2 ) , b u t ">" i n t h e o t h e r c a s e when t h e c o m b i n a t i o n o f E f f o r t and S t r e s s changes f r o m [1,1] t o [2,2]. Note t h a t t h i s i s very s i m i l a r t o f i n d i n g an i n t e r a c t i o n i n an a n a l y s i s o f v a r i a n c e where a dependent v a r i a b l e c a n n o t b e e x p l a i n e d b y m a i n e f f e c t s alone. J o i n t f a c t o r i n d e p e n d e n c e i s s a t i s f i e d when t h e o r d e r i n g o f a l l combinat i o n s o f t h e l e v e l s o f any t w o o f t h e f a c t o r s (e.g., Time Load and E f f o r t Load) s t a y t h e same f o r a l l l e v e l s o f a t h i r d v a r i a b l e ( S t r e s s Load). In a manner comparable t o t h a t f o r s i n g l e f a c t o r independence, t h e r e a r e t h r e e f o r m s o f j o i n t f a c t o r independence--Time and E f f o r t j o i n t l y i n d e p e n d e n t o f S t r e s s , S t r e s s and E f f o r t j o i n t l y i n d e p e n d e n t o f Time, and Time > and S t r e s s j o i n t l y i n d e p e n d e n t o f E f f o r t . 2 The o r d e r i n g s ([2,2],2) ([1,1],2) b u t ([2,21,3) < ([1,1],3) represent a v i o l a t i o n o f j o i n t f a c t o r i n d e p e n d e n c e f o r Time and E f f o r t o f S t r e s s , because [2,2] and [1,1] produce o p p o s i t e o r d e r i n g s i n t h e c o r r e s p o n d i n g p a i r s when combined w i t h l e v e l s 2 and 3 o f t h e S t r e s s Load f a c t o r . F i n a l l y , d o u b l e c a n c e l l a t i o n i s d e f i n e d f o r a p a i r o f f a c t o r s each w i t h t h r e e l e v e l s and i s s a t i s f i e d i f t h i s 3 x 3 m a t r i x i s c o n s i s t e n t w i t h r e g a r d t o t h e o r d e r i n f o r m a t i o n i n i t s d i a g o n a l s ( K r a n t z e t al., 1971; K r a n t z and T v e r s k y , 1971).3 The t e r m " c a n c e l l a t i o n " i s u s e d s i n c e what t h e axiom r e a l l y i m p l i e s i s t h a t t h e psychological value o f a shared l e v e l o f a f a c t o r c a n b e e l i m i n a t e d o r " c a n c e l e d " f r o m each o f t w o s t i m u l u s combinations w i t h o u t a f f e c t i n g t h e i r o r d e r i n g with respect t o workload. Such a p r o p e r t y must, o f c o u r s e , h o l d i n an a d d i t i v e model s i n c e i t h o l d s a l g e b r a i c a l l y i n t h e a d d i t i o n o f r e a l numbers. I n p r a c t i c e , t h e way t h e a x i o m t e s t i n g p r o c e d u r e w o r k s i s t h a t , g i v e n a complex c o n s t r u c t made u p o f t h r e e d i m e n s i o n s as i n t h e c a s e o f SWAT, s u b j e c t s a r e r e q u i r e d t o order t h e s t i m u l u s c o n d i t i o n s t h a t a r e generated b y f o r m i n g a l l 27 c o m b i n a t i o n s o f t h e t h r e e l e v e l s o f each d i m e n s i o n i n a 3 x 3 x 3 design. These r a n k o r d e r d a t a a r e t h e n s u b j e c t e d t o t h e i n d e pendence and c a n c e l 1 a t i o n axiom t e s t s . The o b t a i n e d t h r e e - d i m e n s i o n a l ZLet A1, A z , and A 3 r e p r e s e n t t h r e e f a c t o r s i n a c o n j o i n t d e s i g n . Then we can d e f i n e s i m p l e i n d e p e n d e n c e and j o i n t i n d e p e n d e n c e i n an A1 x A2 x A 3 d e s i g n as i s inde i f and&i,
A1
n d e n t o f A 2 and A 3 whenever ( a l , a2, a 3 ) > ( b i , b2, b3) > ( b i , b2, b3), and
A 1 and A 2 a r e
( b l , bz, a3)
i n t l y i n d e p e n d e n t o f A3 whenever ( a i , a2, i3' oand o n l y if ( a l , az, b3) > ( b i , b2. b3).
82,
a3) >
3Double C a n c e l l a t i o n i s s a t i s f i e d i f ( a l , bz, a3) > ( b l , c2, a3), and ( b l , a2, a3) > ( c i r bz, a 3 ) , t h e n t h i s i m p l i e s ( a l , a2, a 3 1 > ( c i , C 2 , C3).
a3)
The Subjective Workload Assessment Technique
195
d a t a m a t r i x i s examined f o r c o m p l i a n c e w i t h t h e o r d e r i n g s e x p e c t e d among t h e s t i m u l u s c o m b i n a t i o n s o f t h e l e v e l s o f Time, E f f o r t , and S t r e s s when I f t h e s u b j e c t ' s rank o r d e r data a r e consist h e axioms a r e s a t i s f i e d . t e n t , t h e p r o p e r t i e s o f t h e models i n E q u a t i o n s 1-4 can b e t e s t e d t o h e l p d e t e r m i n e a b e s t - f i t t i n g model. Even t h o u g h t h i s p r o c e d u r e c a n examine t h e f u l l r a n g e o f models i n E q u a t i o n s 1-4, i n a p p l i c a t i o n s a s s o c i a t e d w i t h t h e s c a l e d e v e l o p n e n t phase o f SWAT, t h e p r o c e d u r e i s u s e d e s s e n t i a l l y t o v e r i f y t h e adequacy o f a n a d d i t i v e model. T h i s i s based o n t h e f i n d i n g t h a t o v e r f i v e y e a r s and an e s t i m a t e d number o f c a r d s o r t s t h a t exceeds one t h o u s a n d , l e s s t h a n one p e r c e n t have been a n a l y z e d t h a t were b e t t e r r e p r e s e n t e d by one o f t h e n o n a d d i t i v e p o l y n o m i a l models i n E q u a t i o n s 2-4. Conjoint Scaling It i s o f t e n t h e case, however, t h a t i n a d d i t i o n t o k n o w i n g t h e c o m p o s i t i o n r u l e t h a t d e s c r i b e s t h e way s u b j e c t s combine d i m e n s i o n s t o f o r m a complex phenomenon, t h e i n v e s t i g a t o r w o u l d a l s o l i k e t o have s c a l e v a l u e s t o r e p r e s e n t t h e s u b j e c t i v e v a l u e s o f v a r i o u s l e v e l s of b o t h t h e complex phenomenon and i t s component d i m e n s i o n s . T h i s i s p r e c i s e l y t h e case f o r SWAT and w o r k l o a d . The p r o c e d u r e t h a t i s used f o r t h i s purpose i s o f t e n c a l l e d numerical c o n j o i n t s c a l i n g i n order t o d i f f e r e n t i a t e i t from t h e a x i o m a t i c c o n j o i n t measurement p r o c e d u r e d e s c r i b e d above. Before t h e development o f m u l t i d i m e n s i o n a l s c a l i n g c o m p u t e r a l g o r i t h m s , i t was e s s e n t i a l l y impossible t o simultaneously f i n d these subjective scale v a l u e s f o r t h e l e v e l s o f b o t h t h e component d i m e n s i o n s and t h e i r combined effect. The s c a l i n g r o u t i n e i n SWAT t h a t i s used t o e s t a b l i s h a s c a l e f o r m e n t a l w o r k l o a d a c t u a l l y c o n t a i n s t w o such d i s t i n c t s c a l i n g p r o c e d u r e s . They a r e based o n m o d i f i c a t i o n s o f t w o n o n m e t r i c s c a l i n g a l g o r i t h m s , MONANOVA ( K r u s k a l , 1965) and NONMETRG (Johnson, 1973).
A n o n m e t r i c s c a l i n g p r o c e d u r e i s one t h a t a t t e m p t s t o f i n d t h e b e s t f i t t i n g s e t o f i n t e r v a l - s c a l e d values f o r t h e l e v e l s o f t h e perceptually i n d e p e n d e n t d i m e n s i o n s and t h e i r r e s u l t a n t combined e f f e c t based o n l y o n t h e rank o r d e r r e l a t i o n s h i p s t h a t a r e p r e s e n t i n t h e d a t a . Thus, nonm e t r i c s c a l i n g methods d i f f e r f r o m m e t r i c s c a l i n g p r o c e d u r e s i n t h a t t h e y d o n o t assume a l i n e a r r e l a t i o n s h i p between o b s e r v e d d a t a and f i n a l s c a l e values. N o n m e t r i c p r o c e d u r e s d o n o t need t o make t h e sometimes q u e s t i o n a b l e a s s u m p t i o n t h a t t h e r e s p o n d e n t c a n and w i l l make r e l i a b l e r a t i n g s t h a t have i n t e r v a l - s c a l e p r o p e r t i e s when j u d g i n g a complex c o n s t r u c t l i k e mental workload. A nonmetric s c a l i n g procedure o n l y r e q u i r e s t h e d a t a t o be r e l i a b l y r a n k o r d e r e d . The c o m p a r i s o n o f n o n m e t r i c t o m e t r i c i s , t h e n , e q u i v a l e n t t o f i n d i n g a b e s t - f i t t i n g monotonic f u n c t i o n r a t h e r t h a n a l i n e a r f u n c t i o n r e l a t i n g t h e scaled v a r i a b l e s t o t h e observable data. G i v e n t h e proposed a d d i t i v e c o m p o s i t i o n r u l e , each o f t h e s c a l i n g a l g o r i t h m s i n SWAT f i n d s a s e t o f s c a l e v a l u e s f o r t h e t w e n t y - s e v e n w o r k l o a d c o m b i n a t i o n s (3 x 3 x 3 ) such t h a t ( a ) t h e y a r e a d d i t i v e c o m b i n a t i o n s o f t h e s c a l e v a l u e s f o r t h e t h r e e l e v e l s o f t h e Time, E f f o r t , and S t r e s s f a c t o r s , and ( b ) t h e 27 s c a l e v a l u e s a r e as m o n o t o n i c as p o s s i b l e w i t h t h e Though s u b j e c t ' s o r i g i n a l r a n k o r d e r i n g o f t h e 27 w o r k l o a d c o m b i n a t i o n s . i t may n o t seem a t f i r s t t o be i n t u i t i v e l y r e a s o n a b l e , t h e r e s t r i c t i o n o f a n a d d i t i v e model c o u p l e d w i t h t h e o v e r d e t e r m i n a t i o n o f o r d e r i n g s among s t i m u l u s s c a l e v a l u e s b a s e d on t h e o b s e r v a b l e r a n k o r d e r i n g s , a r e s u f f i c i e n t t o a l l o w t h e nonmetric s c a l i n g algorithms t o f i n d a unique, bestf i t t i n g s e t o f s t i m u l u s values w i t h i n t e r v a l - s c a l e properties. The
196
G.B. Reid and T.E. Nygren
d e f i n i t i o n o f b e s t - f i t t i n g i s what d i f f e r e n t i a t e s t h e MONANOVA-based and NONMETRG-based procedures. The f i r s t s c a l i n g a l g o r i t h m t h a t i s used i n t h e SWAT program i s based on a m o d i f i c a t i o n o f K r u s k a l ' s monotonic t r a n s f o r m a t i o n procedure, MONANOVA ( K r u s k a l , 1965). MONANOVA performs a n o n m e t r i c s c a l i n g o f t h e d a t a v i a t h e w i d e l y used STRESS-based l e a s t - s q u a r e s approach. The s c a l i n g a n a l y s i s i s performed e i t h e r on each i n d i v i d u a l d a t a s e t s e p a r a t e l y o r on an a v e r age data m a t r i x as s e l e c t e d by t h e i n v e s t i g a t o r . T h i s procedure produces s c a l e values f o r each o f t h e l e v e l s o f t h e f a c t o r s and f o r t h e s t i m u l u s combinations produced by combining a l l o f t h e l e v e l s o f a l l o f t h e f a c tors. A n o r m a l i z a t i o n o f t h e s c a l e f o r t h e s t i m u l u s combinations r e s c a l e s t h e combinations s o t h a t t h e l o w e s t s c a l e value ( f o r s t i m u l u s c o m b i n a t i o n (1, 1, 1) i s zero and t h e h i g h e s t s c a l e v a l u e (3, 3, 3 ) i s 100. T h i s n o r m a l i z a t i o n i s p a r t i c u l a r l y u s e f u l i f t h e s t i m u l i a r e designed, as i n t h e case o f workload, such t h a t t h e l o w e s t (1, 1, 1) and t h e h i g h e s t (3, 3, 3 ) s t i m u l u s combinations a r e meaningful anchors f o r t h e complex phenomena under i n v e s t i g a t i o n . The SWAT procedure begins by rank o r d e r i n g t h e d a t a f r o m t h e s m a l l e s t t o t h e l a r g e s t , i f t h e y a r e n o t a l r e a d y i n t h a t form. Because t h e procedure i s nonmetric, f r o m t h i s p o i n t on o n l y rank o r d e r s o f t h e d a t a and n o t t h e d a t a values themselves a r e used. An a r b i t r a r y s e t o f i n i t i a l s c a l e values f o r t h e l e v e l s o f t h e f a c t o r s a r e formed t o produce i n i t i a l e s t i m a t e s o f t h e 27 s t i m u l u s combinations. From t h e s e i n i t i a l s c a l e values, a m a t r i x o f what a r e c a l l e d d i s p a r i t i e s i s formed. D i s p a r i t i e s are transformed d a t a values t h a t a r e monotonic w i t h t h e o r i g i n a l d a t a and as c l o s e as p o s s i b l e t o t h e i n i t i a l s e t o f workload s c a l e values. Next, a badness-off i t measure, STRESS, i s computed t o determine how c l o s e l y t h e monotonic a l l y t r a n s f o r m e d d i s p a r i t y values match t h e e s t i m a t e d s c a l e values f r o m STRESS i s computed by f i n d i n g t h e square r o o t o f t h e t h e a d d i t i v e model. sum o f t h e squared d e v i a t i o n s between t h e d i s p a r i t y values and t h e e s t i mated s t i m u l u s values. I f t h e o r i g i n a l rank d a t a a r e i n p e r f e c t agreement w i t h an a d d i t i v e r e p r e s e n t a t i o n , t h e n m o n o t o n i c a l l y t r a n s f o r m e d d i s p a r i t i e s w i l l be found t h a t , when s u i t a b l y normalized, a r e i d e n t i c a l t o t h e e s t i m a t e d s t i m u l u s s c a l e values, p r o d u c i n g a STRESS v a l u e o f zero. S u b j e c t s ' d a t a are, however, g e n e r a l l y n o t w i t h o u t some random e r r o r . In T y p i c a l l y then, t h e a l g o r i t h m w i l l n o t f i n d a STRESS v a l u e o f zero. t h e s e cases, t h e a l g o r i t h m works i t e r a t i v e l y . Following t h e comprtation o f STRESS, t h e e s t i m a t e d s t i m u l u s s c a l e values a r e r e c a l c u l a t e d v i a a leaSt-SqUareS e s t i m a t i o n procedure s i m i l a r t o t h a t employed i n s t a n d a r d regression analysis. The p a r t i a l d e r i v a t i v e o f STRESS w i t h r e s p e c t t o each s c a l e value i s found and a numerical a n a l y s i s procedure known as t h e method of g r a d i e n t s i s used t o f i n d a new s e t o f b e s t - f i t t i n g ( i n t h e l e a s t squares sense) s t i m u l u s s c a l e values. New d i s p a r i t i e s a r e formed, a new STRESS v a l u e i s computed, and t h e i t e r a t i v e process i s c o n t i n u e d u n t i l Following the l a s t iteration, the no improvement i n STRESS can be found. e s t i m a t e d s c a l e values f o r t h e 27 s t i m u l u s combinations a r e found and a r e n o r m a l i z e d as p r e v i o u s l y described, s o t h a t c o m b i n a t i o n (1, 1, 1) has a s c a l e v a l u e of z e r o and (3, 3, 3 ) has a s c a l e v a l u e o f 100. S c a l i n g employing a m o d i f i c a t i o n o f Johnson's (1973) n o n m e t r i c monotone It may a t f i r s t seem r e g r e s s i o n procedure i s t h e f i n a l s t e p i n SWAT. redundant t o perform two s c a l i n g procedures i n SWAT, s i n c e b o t h w i l l y i e l d i d e n t i c a l r e s u l t s f o r p e r f e c t l y a d d i t i v e data. A problem w i t h t h e s c a l i n g
The Subjective Workload Assessment Technique
I97
a l g o r i t h m d e s c r i b e d above and STRESS-based a1 g o r i t h m s i n g e n e r a l , however, i s t h a t t h e y a r e prone, i n a number of common n o n a d d i t i v e cases, t o p r o duce s c a l i n g s o l u t i o n s t h a t f o r c e t i e s i n t h e s c a l e v a l u e s f o r t h e l e v e l s T h i s produces a d e g e n e r a t e s o l u t i o n t h a t has t h e of some of t h e f a c t o r s . appearance o f a p e r f e c t f i t t o a n a d d i t i v e model. N i c k e r s o n and It i s M c C l e l l a n d ( 1 9 8 4 ) p r o v i d e examples o f seven such common s i t u a t i o n s . c l e a r f r o m t h e i r and o u r p r e v i o u s work (cf., Nygren, 1985) t h a t u n l e s s one examines t h e d a t a c a r e f u l l y w i t h r e s p e c t t o t h e c o n j o i n t axioms f o u n d i n SWAT, a z e r o l e v e l of STRESS o b t a i n e d f r o m MONANOVA-based s c a l i n g a l o n e m i g h t l e a d t h e u s e r t o an e r r a n t c o n c l u s i o n o f a d d i t i v i t y among t h e f a c t o r s , as w e l l as t o poor e s t i m a t e s o f t h e s t i m u l u s s c a l e v a l u e s . The second s c a l i n g p r o c e d u r e i n SWAT, t h e n , i s u s e d t o p r o v i d e a n o t h e r s c a l i n g of t h e d a t a , t h i s t i m e based on a b a d n e s s - o f - f i t measure o t h e r t h a n STRESS. T h i s measure, THETA, d i f f e r s f r o m STRESS i n t h a t i t i s based on a p a i r w i s e method i n w h i c h t h e d i f f e r e n c e s i n s c a l e v a l u e s f o r a l l p o s s i b l e p a i r s o f s t i m u l i ( 3 5 1 p a i r s f o r t h e 27 s t i m u l i i n SWAT) a r e compared w i t h t h e d i f f e r e n c e s i n t h e o r i g i n a l ranks. As i n t h e p r e v i o u s s c a l i n g a l g o r i t h m , t h i s r o u t i n e s t a r t s by f i n d i n g a s e t o f e s t i m a t e s o f t h e s t i m u l u s s c a l e values. F o r e f f i c i e n c y , i t uses t h e f i n a l e s t i m a t e s I f t h e d a t a do, i n f a c t , f o u n d by t h e p r e v i o u s STRESS-based procedure. c o n f o r m t o a n a d d i t i v e model, t h e p r o c e d u r e s t o p s a f t e r one i t e r a t i o n , s i n c e t h e s c a l e v a l u e s have a l r e a d y been d e t e r m i n e d . If t h e data are n o t a d d i t i v e , t h e n t h e b a d n e s s - o f - f i t measure THETA i s computed by summing t h e d i f f e r e n c e s i n s c a l e values f o r a l l p a i r s o f s t i m u l i f o r which t h e o r i g i n a l r a n k s a r e n o t i n t h e same o r d e r as t h e e s t i m a t e d s c a l e v a l u e s . T h i s sum i s t h e n n o r m a l i z e d by d i v i d i n g by t h e sum o f a l l d i f f e r e n c e s i n s c a l e v a l u e s and t a k i n g t h e s q u a r e r o o t . The n u m e r a t o r o f t h i s t e r m , and t h u s THETA, w i l l b e z e r o i f a l l p a i r s o f r a n k s and p a i r s o f e s t i m a t e d s c a l e v a l u e s a r e i n t h e same o r d e r . As i n t h e c a s e o f STRESS, t h e p a r t i a l d e r i v a t i v e o f THETA ( a c t u a l l y THETA-squared) i s t a k e n w i t h r e s p e c t t o each s c a l e v a l u e i n o r d e r t o f i n d new e s t i m a t e s t h a t w i l l m i n i m i z e t h e d i f f e r ences i n s c a l e v a l u e s f o r w h i c h t h e r e a r e i n c o r r e c t p a i r w i s e o r d e r i n g s . The i t e r a t i v e p r o c e d u r e i s t h e n c o n t i n u e d u n t i l n o s i g n i f i c a n t improvement i n t h e e s t i m a t e d s c a l e v a l u e s t h a t w i l l m i n i m i z e THETA can be found. It i s i m p o r t a n t t o n o t e t h a t t h i s THETA measure i s s t r o n g l y r e l a t e d t o K e n d a l l ' s Tau c o e f f i c i e n t , a l t h o u g h t h e y a r e n o t a s i m p l e f u n c t i o n o f o n e another, I n SWAT f o r example, f o r a s e t o f r a n k s t h a t f i t an a d d i t i v e model, Tau w i l l b e 1.0, i n d i c a t i n g t h a t a l l 351 p a i r s o f e s t i m a t e d s c a l e v a l u e s a r e i n t h e same o r d e r as t h e 3 5 1 p a i r s o f ranks. For nonadditive d a t a , i t i s s t i l l p o s s i b l e f o r THETA t o b e 0.0 ( b y p r o d u c i n g t i e d s c a l e v a l u e s ) b u t f o r Tau t o n o t b e e q u a l t o 1.0. The m a j o r a d v a n t a g e o f THETA n o n a d d i t i v i t y ) occurs i n a o v e r STRESS i s found, t h e n , i f e r r o r (i.e., s u b j e c t ' s d a t a ; t h e THETA-based s c a l i n g i s much more l i k e l y t o d e t e c t e r r o r t h a n i s t h e STRESS-based s c a l i n g . G e n e r a l l y , however, t h e t w o p r o I n a p p l i c a t i o n s o f SWAT where c e d u r e s w i l l produce v e r y s i m i l a r r e s u l t s . t h e y d i f f e r s i g n i f i c a n t l y i n t h e i r e s t i m a t e s o f s c a l e values, t h e r e s e a r c h e r h a s a much b e t t e r chance o f d i a g n o s i n g why t h e e r r o r o r nona d d i t i v i t y o c c u r r e d t h a n i f h e o r s h e had u s e d o n l y one o f t h e t w o s c a l i n g methods. F i n a l l y , i t i s , o f course, obvious t h a t b o t h s c a l i n g procedures w i l l a l w a y s y i e l d a s e t o f a d d i t i v e s c a l e v a l u e s t h a t a r e o n l y as good as t h e e f f o r t t h a t went i n t o t h e c r e a t i o n o f t h e d e s c r i p t i o n s o f t h e l e v e l s o f t h e f a c t o r s themselves. I n t h e n e x t s e c t i o n , we d i s c u s s t h i s s c a l e d e v e l o p n e n t procedure.
198
G.B. Reid and T.E. Nygren
SCALE DEVELOPMENT Any t i m e an i n v e s t i g a t o r wants t o use a r a t i n g scale, he o r she must ( a ) d e v e l o p a s e t o f d e s c r i p t o r s t o r e p r e s e n t t h e d i f f e r e n t p o i n t s on t h e s c a l e , and ( b ) t r a i n r a t e r s as t o what t h e meanings o f t h e d e s c r i p t o r s a r e and how t h e y a r e t o be used t o r a t e some event. I n p r a c t i c e , t h i s process i s o f t e n not given t h e l e v e l o f a t t e n t i o n t h a t theory requires. In c l a s s i c a l u n i d i m e n s i o n a l s c a l i n g such as T h u r s t o n i a n s c a l i n g o r L i k e r t s c a l i n g , t h e i n v e s t i g a t o r should, w i t h t h e a i d o f s u b j e c t m a t t e r e x p e r t s , w r i t e d e s c r i p t o r s and t h e n s u b j e c t t h e s e d e s c r i p t o r s t o an e v a l u a t i o n by a sample o f s u b j e c t s f r o m t h e p o p l a t i o n t h a t w i l l u l t i m a t e l y use t h e scale. The e v a l u a t i o n i s used t o s e l e c t which o f t h e c a n d i d a t e s c a l e i t e m s have t h e g r e a t e s t d i s c r i m i n a b i l i t y . T h i s process c o n t i n u e s u n t i l a s e t o f d e s c r i p t o r s i s a r r i v e d a t f o r each o f t h e p o i n t s on t h e s c a l e . A f t e r t h e s c a l e has been developed, t h e n each new sample o f s u b j e c t s t h a t a r e r e q u i r e d t o use t h e s c a l e must be c a r e f u l l y t r a i n e d s o t h a t t h e y understand t h e meaning o f t h e d e s c r i p t o r s i n t e n d e d by t h e o r i g i n a l g r o u p o f judges. I n SWAT, s c a l e developnent i s an a p p l i c a t i o n o f t h e c o n j o i n t measurement procedure o u t l i n e d above. The process o f d e t e r m i n i n g a c o m p o s i t i o n r u l e means t h a t each t i m e a workload i n v e s t i g a t i o n i s conducted t h e s u b j e c t s d e f i n e t h e r e l a t i v e w e i g h t s and t h e c o m p o s i t i o n r u l e t h a t f i t t h e i r perc e p t i o n s o f workload. T h i s i s d i f f e r e n t f r o m o t h e r s c a l i n g methods i n t h a t ( a ) t h e same s u b j e c t s d e f i n e t h e s c a l e and a p p l y i t t o r a t i n g s o f e v e n t s , and ( b m e s c a l e developnent process i s used t o t r a i n s u b j e c t s t o understand t h e meaning o f t h e d e s c r i p t o r s . As d e s c r i b e d e a r l i e r , f o r t h e purpose o f SWAT, workload i s d e f i n e d as b e i n g composed o f t h r e e l e v e l s o f each o f t h e t h r e e dimensions: Time Load, Mental E f f o r t Load, and P s y c h o l o g i c a l S t r e s s Load. Oescri p t i o n s o f t h e s e dimensions a r e presented i n Table 2 and t h e i r c o m b i n a t i o n i n t o t h e t h r e e dimensional workload c o n s t r u c t i s r e p r e s e n t e d i n F i g u r e 1. Each o f t h e c e l l s o f t h i s m a t r i x i n F i g u r e 1 i s r e p r e s e n t e d by a c o m b i n a t i o n o f one o f t h e d e s c r i p t o r s f o r each o f t h e dimensions, y i e l d i n g a t o t a l o f 27 combinations. These d e s c r i p t o r s a r e t y p e d on a s e t o f i n d e x c a r d s s o t h a t each c e l l i s r e p r e s e n t e d by a s e p a r a t e card. T h i s deck o f c a r d s i s t h e medium employed i n o b t a i n i n g t h e r a t e r ' s judgment o f t h e r e l a t i v e workload each c o m b i n a t i o n r e p r e s e n t s t o him o r her. S u b j e c t s a r e r e q u i r e d t o go t h r o u g h a c a r d s o r t procedure where t h e y p l a c e t h e c a r d s r e p r e s e n t i n g t h e 27 c e l l s o f t h e t h r e e - d i m e n s i o n a l m a t r i x i n rank o r d e r b e g i n n i n g w i t h t h e c o m b i n a t i o n o f d e s c r i p t o r s t h a t r e p r e s e n t s t h e l o w e s t workload s i t u a t i o n (1, 1, 1) and e n d i n g w i t h t h e c o m b i n a t i o n t h a t r e p r e s e n t s t h e h i g h e s t workload s i t u a t i o n (3, 3, 3 ) . w i t h an o r d e r i n g o f t h e 25 o t h e r s t i m u l i i n between. The s u b j e c t s a r e encouraged t o t h i n k o f s i t u a t i o n s f r o m t h e i r own e x p e r i e n c e s t h a t would have been a p p r o p r i a t e l y d e s c r i b e d by a p a r t i c u l a r combination. They t h e n compare t h a t s i t u a t i o n w i t h a s i t u a t i o n r e c a l l e d f o r a n o t h e r c o m b i n a t i o n and make a judgment as t o which o f t h e s i t u a t i o n s r e p r e s e n t s t h e h i g h e r p e r c e i v e d workload. S u b j e c t s t h e n place t h e s e t w o c a r d s i n t h e proper o r d e r and s e l e c t a n o t h e r c a r d and r e p e a t t h e same d e c i s i o n process. The s u b j e c t s a r e i n s t r u c t e d t o t r y t o imagine a s i t u a t i o n f o r each c a r d b u t i f t h e y cannot t h i n k o f an e v e n t t h a t c o u l d have been d e s c r i b e d by a c e r t a i n combination, t h e y a r e requested t o place t h e c a r d i n t h e i r o r d e r i n g a t t h e p o i n t where i t would f a l l i f an e v e n t d i d e x i s t t h a t would be p r o p e r l y
The Subjective Workload Assessment Technique TABLE 2.
.
SWAT
199
DIMENSIONS
Time Load
1.
O f t e n have spare time. I n t e r r u p t i o n s or o v e r l a p among a c t i v i t i e : o c c u r i n f r e q u e n t l y or n o t a t a l l .
2.
O c c a s i o n a l l y have s p a r e t i m e . a c t i v i t i e s occur frequently.
3.
Almost never have spare t i m e . I n t e r r u p t i o n s o r o v e r l a p among a c t i v i t i e s a r e very f r e q u e n t , o r o c c u r a l l t h e t i m e .
I.
I n t e r r u p t i o n s or o v e r l a p among
Mental E f f o r t Load
1.
Very l i t t l e conscious mental e f f o r t or c o n c e n t r a t i o n r e q u i r e d . A c t i v i t y i s almost a u t o m a t i c , r e q u i r i n g l i t t l e o r no a t t e n t i o n .
2.
Moderate conscious mental e f f o r t or c o n c e n t r a t i o n r e q u i r e d . Complexity o f a c t i v i t y i s m o d e r a t e l y h i g h due t o u n c e r t a i n t y , u n p r e d i c t a b i l i t y , or u n f a m i l i a r i t y . C o n s i d e r a b l e a t t e n t i o n required.
3.
E x t e n s i v e mental e f f o r t and c o n c e n t r a t i o n a r e necessary. complex a c t i v i t y r e q u i r i n g t o t a l a t t e n t i o n .
I I.
Very
P s y c h o l o g i c a l S t r e s s Load
1.
L i t t l e c o n f u s i o n , r i s k , f r u s t r a t i o n , or a n x i e t y e x i s t s and can be e a s i l y accommodated.
2.
Moderate s t r e s s due t o c o n f u s i o n , f r u s t r a t i o n , or a n x i e t y n o t i c e a b l y adds t o workload. S i g n i f i c a n t compensation i s r e q u i r e t o m a i n t a i n adequate performance.
3.
High t o very i n t e n s e s t r e s s due t o c o n f u s i o n , f r u s t r a t i o n , o r a n x i e t y . High t o extreme d e t e r m i n a t i o n and s e l f - c o n t r o l r e q u i r e d
r e p r e s e n t e d by t h a t s e t o f d e s c r i p t o r s . The o r d e r o f t h e combinations t h a t r e s u l t s f r o m t h i s c a r d s o r t procedure i s t h e n used as t h e i n p r t d a t a f o r t h e c o n j o i n t measurement a n a l y s i s . Given t h e rank o r d e r t h a t t h e s u b j e c t s have d e r i v e d f o r t h e combinations o f t h e l e v e l s o f t h e t h r e e dimensional c o n s t r u c t , t h e a l g o r i t h m i s used t o search f o r a s e t o f a d d i t i v e s c a l e values t h a t d e s c r i b e s t h e order o f t h e l e v e l s o f t h e t h r e e composite dimensions. T h i s a n a l y s i s can be performed on each s u b j e c t ' s o r d e r i n g or on a consensus o r d e r i n g o b t a i n e d by a v e r a g i n g a group o f subjects' orderings. The advantages o f u s i n g an average o r d e r f o r i n p r t w i 11 be d i s c u s s e d 1a t e r . The adequacy o f t h e o b t a i n e d s o r t s has been evaluated, and c o n t i n u e s t o b e e v a l u a t e d , by a n a l y z i n g t h e number o f axiom v i o l a t i o n s t h a t a r e present i n a s e t o f data. T e c h n i c a l l y , an a x i o m a t i c a n a l y s i s i s d e t e r m i n i s t i c so t h a t one axiom v i o l a t i o n i s s u f f i c i e n t t o i n v a l i d a t e t h e model b e i n g
G.B. Reid and T.E. Nygren
200
F i g u r e 1.
Three-Dimensional Workload C o n s t r u c t
tested. T h i s c r i t e r i o n i s very i m p r a c t i c a l because people do n o t g i v e e r r o r f r e e d a t a very o f t e n . Because o f t h i s d i f f i c u l t y , work on an e r r o r t h e o r y f o r c o n j o i n t measurement i s i n progress (Nygren, 1985, 1986). I n t h e meantime, " r u l e s o f thumb" have been e s t a b l i s h e d based on e x t e n s i v e e x p e r i e n c e w i t h s e t s o f c a r d s o r t d a t a (Reid, P o t t e r , & B r e s s l e r , 1987). B a s i c a l l y , t h e r u l e s a l l o w f o r u p t o a p p r o x i m a t e l y a 5 percent t o 10 perc e n t v i o l a t i o n r a t e f o r t h e independence axioms as l o n g as t h e s e i n c o n s i s tencies i n v o l v e adjacent o r near-adjacent pairs. A n a l y z i n g Card S o r t Data The f i r s t s t e p i n a n a l y z i n g c a r d s o r t d a t a i s t o determine t h e l e v e l of agreement among a p a r t i c u l a r group o f s u b j e c t s . A Kendall ' s C o e f f i c i e n t I f t h e W-value i s s u f f i o f Concordance (W) i s used f o r t h i s p r p o s e . c i e n t l y l a r g e (maximum value f o r p e r f e c t i n t e r s u b j e c t agreement i s l.O), t h e s u b j e c t s a r e placed i n t o a s i n g l e group a n a l y s i s where a l l o f t h e i r d a t a a r e averaged. A " r u l e o f thumb" t h a t has been e s t a b l i s h e d i s t h a t i f t h e W i s .75 o r h i g h e r , t h e r e i s s u f f i c i e n t agreement t o make a s i n g l e scale t h a t w i l l represent a l l of t h e subjects without i n c u r r i n g a l a r g e chance o f m i s r e p r e s e n t i n g any s i n g l e s u b j e c t . An e x c e p t i o n t o t h i s pract i c e would be f o r a s i t u a t i o n where t h e focus o f t h e i n v e s t i g a t i o n pert a i n s t o an i n d i v i d u a l d i f f e r e n c e s v a r i a b l e . T h i s t y p e o f a n a l y s i s would probably be b e s t accomplished u s i n g s c a l e s f o r each i n d i v i d u a l s u b j e c t . I n t h e event t h a t t h e o v e r a l l K e n d a l l ' s C o e f f i c i e n t o f Concordance i s l o w e r t h a n t h e proposed c u t o f f , a procedure c a l l e d SWAT p r o t o t y p i n g (Reid, Eggemeier, & Nygren, 1982) has been developed t h a t i n c o r p o r a t e s t h e advant a g e s o f an average s c a l e w h i l e a d j u s t i n g t o t h e i n d i v i d u a l s ' w e i g h t s f o r t h e composite dimension. D u r i n g t h e c a r d s o r t procedure, t h e comparisons t h a t each s u b j e c t must make between each o f t h e c e l l s o f t h i s m a t r i x , i n many cases, a r e very f i n e d i s c r i m i n a t i o n s . The degree o f v a r i a b i l i t y found f o r comparisons o f p a r t i c u l a r p a i r s o f c e l l s o r i n c o n s i s t e n c i e s f o u n d f o r s i m i l a r s t i m u l u s p a i r s can b e viewed as random o r u n s y s t e m a t i c error. I f t h e s u b j e c t s agree as t o t h e b a s i c s t r u c t u r e o f t h e c o n s t r u c t , t h e n a process o f a v e r a g i n g t h e i r i n d i v i d u a l o r d e r i n g s w i l l r e s u l t i n an o r d e r t h a t tends t o cancel o u t t h e s e random e r r o r s . W h i l e i t i s t r u e t h a t t h i s process w i l l a l s o mask some amount o f v a r i a t i o n t h a t i s a r e s u l t o f an i n d i v i d u a l I s unique c o n t r i b u t i o n , t h i s t h r e a t i s m i n i m i z e d by d e t e r m i n i n g t h e e x t e n t o f agreement among i n d i v i d u a l s and d i v i d i n g t h e s u b j e c t s i n t o homogeneous subgroups, i f a p p r o p r i a t e .
The Subjective Workload Assessment Technique
20 1
I f some s u b j e c t s ' o r d e r i n g s a r e based on a model o f w o r k l o a d t h a t p l a c e s t h e g r e a t e s t w e i g h t on t i m e r e l a t e d f a c t o r s and p l a c e s a m o d e r a t e w e i g h t o n f a c t o r s r e l a t e d t o m e n t a l e f f o r t , and v e r y l i t t l e w e i g h t on f a c t o r s t h a t r e l a t e t o psychological stress, w h i l e another s u b j e c t ' s ordering i s b a s e d on r e l a t i v e w e i g h t i n g s t h a t a r e i n t h e r e v e r s e o r d e r , t h e n t h e l e v e l F o r t h e p l r p o s e o f SWAT o f agreement i n t h e t w o o r d e r i n g s w i l l b e low. p r o t o t y p i n g , s i x h y p o t h e t i c a l o r d e r i n g s have been d e v e l o p e d w h i c h a r e based on a s t r i c t c o m p l i a n c e t o a r u l e d e f i n i n g t h e r e l a t i v e i m p o r t a n c e f o r each o f t h e t h r e e d i m e n s i o n s . The f i r s t p r o t o t y p e o r d e r i n g i s b a s e d on a r e l a t i v e w e i g h t i n g scheme w h i c h p l a c e s t h e g r e a t e s t emphasis o n t i m e , t h e second on e f f o r t , and t h i r d on p s y c h o l o g i c a l s t r e s s . If a subject o r d e r e d a c a r d deck a c c o r d i n g t o t h i s TES w e i g h t i n g scheme, t h e n t h e o r d e r i n g o f t h e combinations would be l i k e t h e o r d e r represented i n T a b l e 3 where, as can be seen, t h e l e v e l s o f s t r e s s change f a s t e s t , w h i l e t h e l e v e l s o f m e n t a l e f f o r t i n c r e a s e more s l o w l y , and t h e l e v e l o f t i m e increases a t t h e slowest rate. I n t h e same manner, an o r d e r i n g can b e e s t a b l i s h e d f o r t h e o t h e r r e l a t i v e w e i g h t i n g s TSE, ETS, EST, STE, and SET. R a t e r s ' r a n k o r d e r i n g s a r e c o r r e l a t e d w i t h each o f t h e s e p r o t o t y p e o r d e r i n g s u s i n g a Spearman's Rho t o d e t e r m i n e t h e r e l a t i v e i m p o r t a n c e each s u b j e c t p l a c e s on each o f t h e t h r e e d i m e n s i o n s . U s u a l l y , p r o t o t y p i n g
TABLE 3.
~
I
Rank Order 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Card Label
N
a W F
J C X S M U G Z V
Q
zz
K E R H P D Y A 0 L T
I
TES WEIGHTING SCHEME Descri ptor Colnbi n a t i on tff o r t St ress
rime 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3
1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
Scale Va 1ues 0.0 24.4 51.4 7.6 32.0 59.0 27.7 52.1 79.1 6.5 30.9 57.9 14.1 38.5 65.5 34.2 58.6 85.6 20.9 45.2 72.3 28.5 52.9 79.9 48.6 73.0 100.0
202
G.B. Reid and T.E. Nygren
a l l o w s a group o f s u b j e c t s t o be d i v i d e d i n t o two or t h r e e homogeneous subgroups. Once i t has been determined how many groups a r e needed t o r e f l e c t t h e w e i g h t i n g s f o r a p a r t i c u l a r group o f s u b j e c t s , t h e n t h e c o n j o i n t a n a l y s i s i s performed. A s e p a r a t e a n a l y s i s i s r e q u i r e d f o r each subgroup found i n the prototyping analysis. Since t h e procedure i s t h e same f o r each group, f o r t h e sake o f t h i s i l l u s t r a t i o n , we w i l l assume t h a t t h e s u b j e c t s have s u f f i c i e n t l y h i g h agreement t o preclude p r o t o t y p i n g . The o u t p u t from t h e c o n j o i n t a n a l y s i s t h e n p r o v i d e s a scale, r a n g i n g f r o m z e r o t o 100, t h a t l i s t s a s c a l e value f o r each o f t h e c e l l s o f t h e t h r e e dimensional m a t r i x i n F i g u r e 1 r e p r e s e n t i n g combinations o f l e v e l s o f Time Load, E f f o r t Load, and S t r e s s Load. This s c a l e can t h e n be used t o a s s i g n s c a l e values t o new s i t u a t i o n s v i a t h e p o r t i o n o f t h e SWAT procedure t h a t i s c a l l e d event scoring. S t a b i l i t y o f S u b j e c t s ' Judgments The f i r s t two m a j o r q u e s t i o n s t h a t had t o be answered i n t h e developnent o f SWAT were whether ( a ) s u b j e c t s c o u l d perform t h e 27 c e l l c a r d s o r t , and ( b ) i f t h e y could, whether t h e c a r d s o r t s change f r o m day t o day as a f u n c t i o n o f t h e i n d i v i d u a l s ' c u r r e n t experiences or whether t h e y would be s t a b l e w i t h i n i n d i v i d u a l s across time. Although t h e o r d e r i n g procedure can be r a t h e r u n e x c i t i n g and d i f f i c u l t f o r some i n d i v i d u a l s , i t has been shown t o be an e f f e c t i v e and r e l i a b l e way t o o b t a i n t h e needed judgments. The f i r s t assessment o f t h e s t a b i l i t y o f s u b j e c t s ' workload o r d e r i n g s was performed w i t h 30 A i r F o r c e p i l o t s who were p a r t i c i p a t i n g i n a s t u d y o f a i r - t o - a i r combat i n a h i g h f i d e l i t y s i m u l a t o r . T h i s s t u d y was run u s i n g d i f f e r e n t s i m u l a t o r c o n f i g u r a t i o n s o v e r s e v e r a l months. S u b j e c t s performed t h e c a r d s o r t p r i o r t o t h e b e g i n n i n g o f t h e s t u d y and were rechecked f o u r months l a t e r a t t h e b e g i n n i n g o f phase two o f t h e study. The recheck was conducted u s i n g n i n e p a i r e d comparisons. The p a i r s were formed by p l a c i n g two o f t h e c a r d combinations on a s i n g l e sheet o f paper. The p a i r s were s e l e c t e d s o t h a t t h e y r e p r e s e n t e d t h e f u l l range o f c a r d combinations. Based on t h e o r i g i n a l c a r d s o r t , p r e d i c t i o n s o f t h e s e l e c t i o n s were made f o r each s u b j e c t . These p r e d i c t i o n s were found t o b e c o r r e c t 80 percent o f t h e time. Comparable r e s u l t s were found i n two o t h e r s t u d i e s . One o f t h e s e i n v e s t i g a t i o n s was done u s i n g p a i d s t u d e n t s f r o m a l o c a l u n i v e r s i t y . In this s t u d y , s i x s u b j e c t s performed t h e c a r d s o r t on two d i f f e r e n t occasions separated by a year. The o r d e r i n g s f r o m t h e s e two a d m i n i s t r a t i o n s were c o r r e l a t e d u s i n g a Spearman's Rho c o e f f i c i e n t , and f i v e o f t h e s i x o r d e r i n g s were found t o c o r r e l a t e .90 or g r e a t e r , w i t h t h e o t h e r s u b j e c t ' s A t h i r d check was performed on 22 m i l i t a r y subc o r r e l a t i o n b e i n g .53. j e c t s p a r t i c i p a t i n g i n an i n v e s t i g a t i o n o f c o n t r o l room designs. This recheck was performed a f t e r two months and a g a i n r e s u l t e d i n very h i g h correlations. Twenty-one o u t o f 22 s u b j e c t s had o r d e r i n g s c o r r e l a t i n g .90 or g r e a t e r . The o t h e r s u b j e c t had a d d i t i v e o r d e r i n g s each t i m e , b u t t h e y i n d i c a t e d a s h i f t i n t h e r e l a t i v e w e i g h t i n g s o f t h e t h r e e dimensions. The c o r r e l a t i o n s f o r t h e s u b j e c t s i n t h e s e s t u d i e s show, i n g e n e r a l , remarkaThe b l e c o n s i s t e n c y f o r most s u b j e c t s over extended p e r i o d s o f time. o c c a s i o n a l i n c o n s i s t e n t s u b j e c t c r e a t e s some cause f o r concern s i n c e an e x p l a n a t i o n f o r t h e i n c o n s i s t e n c y has n o t been e s t a b l i s h e d . However, i t remains very e n c o u r a g i n g f o r t h e SWAT procedure t h a t when t h e s e low
The Subjective Workload Assessment Technique c o r r e l a t i o n s a r e sometimes found f o r s u b j e c t s , i t i s a s h i f t i n t h e r e l a t i v e importance o f t h e f a c t o r s i n and n o t due t o e x c e s s i v e e r r o r o r n o n a d d i t i v i t y i n a d d i t i v e model h o l d s w e l l f o r s u b j e c t s , even i f t h e t h e f a c t o r s may s h i f t s l i g h t l y o v e r time.
203
almost always due t o d e t e r m i n i n g workload t h e rank data. The r e l a t i v e weights o f
EVENT SCORING D e s p i t e t h e importance o f t h e s c a l e developnent phase, i t i s t h e event s c o r i n g phase t h a t people g e n e r a l l y a r e t h i n k i n g about when t h e y r e f e r t o a s c a l i n g procedure. Event s c o r i n g s i m p l y r e f e r s t o t h e experiment o r c o n d i t i o n t h a t an i n v e s t i g a t o r wants t o e v a l u a t e r e g a r d i n g mental workload. I n t h e experiment, o p e r a t o r s a r e asked t o p r o v i d e judgements t h a t can be c o n v e r t e d i n t o s c a l e values r e p r e s e n t i n g t h e degree o f mental workl o a d a s s o c i a t e d w i t h t a s k performance. F o r example, i f SWAT was used t o e v a l u a t e t h e mental workload a s s o c i a t e d w i t h u s i n g two a l t e r n a t i v e designs o f a power p l a n t c o n t r o l panel, s u b j e c t s m i g h t be r e q u i r e d t o p e r f o r m a s i m u l a t e d s c e n a r i o u s i n g each o f t h e panel c o n f i g u r a t i o n s . The s c e n a r i o would be segmented i n t o component t a s k s and as t h e s u b j e c t s performed t h e t a s k s t h e y would be asked t o apply t h e p r e v i o u s l y l e a r n e d d e s c r i p t o r s t o e v a l u a t e each t a s k w i t h r e g a r d t o t h e l e v e l o f Time Load, Mental E f f o r t 2, 3, 2). D u r i n g d a t a a n a l y s i s Load, and P s y c h o l o g i c a l S t r e s s Load (e.g., t h e i n v e s t i g a t o r would c o n v e r t t h e s e values i n t o an o v e r a l l workload s c o r e 64.4) by f i n d i n g t h e s c a l e value a s s o c i a t e d w i t h t h e c o m b i n a t i o n (e.g., d u r i n g t h e s c a l e developnent phase. These scores would t h e n be used as t h e dependent v a r i a b l e i n an a n a l y s i s o f t h e d i f f e r e n c e i n workload a s s o c i a t e d w i t h t a s k s performance as a f u n c t i o n o f t h e t y p e o f d i s p l a y c o n f i g u r a t i o n used. When we decided t o d e v e l o p a new s u b j e c t i v e workload measure, t h e p r i m a r y m o t i v a t i o n was t h a t , a l t h o u g h a number o f reasonable measures had been proposed, none o f them had been e x t e n s i v e l y t e s t e d and evaluated. Theref o r e , we have attempted t o c o l l e c t a d a t a base t h a t w i l l p r o v i d e i n v e s t i g a t o r s w i t h i n f o r m a t i o n r e l a t i v e t o SWAT'S u t i l i t y as a measure o f o p e r a t o r workload. T h i s d a t a base has been c o l l e c t e d b o t h i n l a b o r a t o r y and f i e l d s i t u a t i o n s . The l a b o r a t o r y r e s e a r c h has been c e n t e r e d on t h e m a n i p u l a t i o n o f t a s k v a r i a b l e s and temporal v a r i a b l e s t o see i f SWAT i s s e n s i t i v e t o changes i n demand l e v e l s . I n a d d i t i o n , SWAT has been used i n several operational tests. The o p e r a t i o n a l t e s t s have been c a r r i e d o u t t o more d i r e c t l y e v a l u a t e t h e u t i l i t y o f SWAT i n t h e environment t h a t i s most relevant t o i t s intended application. Although t h e s e t e s t s were u s u a l l y designed w i t h o t h e r o b j e c t i v e s i n mind, i n many cases t h e d a t a can be looked a t i n terms o f an e v a l u a t i o n o f t h e degree o f success a s s o c i a t e d w i t h t h e SWAT a p p l i c a t i o n . L a b o r a t o r y i n v e s t i g a t i o n s o f SWAT have c e n t e r e d around t h e use o f a n assessment b a t t e r y named t h e C r i t e r i o n Task Set (CTS). The CTS ( S h i n g l e d e c k e r , C r a b t r e e , & Acton, 1982; Shingledecker, 1984; Eggemeier, 1987) i s a b a t t e r y o f t a s k s t h a t has been developed t o p r o v i d e a s t a n d a r d i z e d s e t o f t e s t s and procedures t o be used i n t h e e v a l u a t i o n o f workload measures. The b a t t e r y , as c u r r e n t l y c o n f i g u r e d , i s composed o f n i n e t a s k s t h a t have been s e l e c t e d t o be s e n s i t i v e t o d i f f e r e n t components o f t h e human i n f o r m a t i o n processing system and t o be c l e a r l y r e p r e s e n t a t i v e of The model t h a t guided s e l e c t i o n o f t h e n i n e o p e r a t i o n a l A i r F o r c e tasks. t a s k s i s based p r i m a r i l y on a m u l t i p l e resources model o f i n f o r m a t i o n proc e s s i n g (Navon & Gopher, 1979; Wickens, 1980). The t a s k s were s e l e c t e d so
G.B. Reid and T.E: Nygren
204
t h a t each one p r i m a r i l y places a demand on one o f t h e proposed p r o c e s s i n g resources. Table 4 l i s t s t h e t a s k s t h a t a r e c u r r e n t l y i n t h e b a t t e r y and t h e associated processing function. TABLE 4.
CTS TASKS AND ASSOCIATED PROCESSING FUNCTIONS*
Task
Processing F u n c t i o n
V i s u a l D i s play Moni t o r i ng Continuous R e c o g n i t i o n Memory Search L i n g u i s t i c Processing Mathematical P r o c e s s i n g Spatial Processing Grammatical Reasoning Unstable Tracking I n t e r v a l Production
Visual Perceptual Input Working Memory Encodi ng/Storage Working Memory S t o r a g e / R e t r i e v a l Symbolic I n f o r m a t i o n Mani p l a t i o n Symbolic I n f o r m a t i o n Mani p l a t i o n Spatial Information Manipulation Reasoning Manual Response Speed/Accuracy Manual Res ponse T i m i ng
*From Eggemeier, 1987. P a r a m e t r i c r e s e a r c h has been conducted (Shingledecker, 1984; Eggemeier & Amell, 1986; Amell, Eggemeier, & Acton, 1987) t o e s t a b l i s h t h r e e d i s t i n c t i v e l y d i f f e r e n t l e v e l s f o r each task. A d d i t i o n a l l y , t h i s r e s e a r c h was used t o e s t a b l i s h t h e s u b j e c t t r a i n i n g requirements and o t h e r r e l e v a n t aspects o f t h e s t a n d a r d i z e d t e s t - a d m i n i s t r a t i o n procedures. An advantage o f u s i n g t h i s b a t t e r y as t h e c e n t r a l element o f SWAT e v a l u a t i o n i s t h a t t h e t a s k s were s e l e c t e d s y s t e m a t i c a l l y s o t h a t , when t h e t h e e n t i r e b a t t e r y i s used, d i f f e r e n t demand c h a r a c t e r i s t i c s a r e obtained. A second m a j o r advantage i s t h a t , because t h e b a t t e r y has a s t a n d a r d a d m i n i s t r a t i o n format, o t h e r i n v e s t i g a t o r s can e a s i l y r e p l i c a t e t h e r e s e a r c h o r can e v a l u a t e o t h e r mental w o r k l o a d measures under comparable c o n d i t i o n s . Several independent s t u d i e s have used t h e CTS t o i n v e s t i g a t e SWAT'S m e t r i c p r o p e r t i e s , and d a t a f r o m some o f them w i l l be presented here. One s t u d y i n p a r t i c u l a r used 104 s u b j e c t s t o perform a l l e i g h t o f t h e CTS t a s k s t h a t a r e composed o f m u l t i p l e l e v e l s , t h e I n t e r v a l P r o d u c t i o n Task i s n o t comThe procedure t h a t was posed o f l e v e l s ( S c h l e g e l & G i l l i l a n d , 1987). f o l l o w e d had t h e s u b j e c t s r e p o r t f o r a one-hour s e s s i o n f o r f o u r consecut i v e days. A f t e r a weekend, t h e y r e t u r n e d f o r one f i n a l p r a c t i c e day f o l l o w e d by f o u r c o n s e c u t i v e days o f t e s t i n g . A l l subjects p a r t i c i p a t e d i n one two-hour s e s s i o n each t e s t i n g day. The f i r s t and t h i r d t e s t i n g days were r e p l i c a t i o n s t h a t f o l l o w e d t h e same t e s t i n g procedure used d u r i n g t h e f o u r t r a i n i n g days. The day between t h e two d a t a runs was a s t r e s s o r day where t h e t h e s u b j e c t s performed t h e CTS t a s k s under one o f several s t r e s s o r conditions. T r i a l s on each day were t h r e e minutes l o n g w i t h t h e l e v e l s o f each t a s k b e i n g presented i n ascending order. A fixed sequence o f t h e n i n e CTS t a s k s was used f o r a l l d a t a runs and was as f o l lows: (1) Memory Search, ( 2 ) I n t e r v a l P r o d u c t i o n , ( 3 ) Continuous R e c a l l , ( 4 ) L i n g u i s t i c Processing, ( 5 ) P r o b a b i l i t y M o n i t o r i n g (renamed D i s p l a y M o n i t o r i n g ) , ( 6 ) Grammatical Reasoning, ( 7 ) Mathematical Processing, ( 8 ) U n s t a b l e Tracking, and ( 9 ) S p a t i a l Processing. F i g u r e 2 presents p l o t s o f SWAT r a t i n g s across t h e two r e p l i c a t i o n s o f t h e three d i f f i c u l t y levels. A n a l y s i s o f v a r i a n c e on t h e d a t a i n d i c a t e d t h a t t h e main e f f e c t f o r l e v e l s was s t a t i s t i c a l l y s i g n i f i c a n t ( p < .05) f o r a l l
The Subjective WorkloadAssessment Technique
UNSTABLE TRACKING
VISUAL DISPLAY MONITORING
205
SPATIAL PROCESSING
:1-
LINGUISTIC PROCESSING
MATHEMATICAL PROCESSING
J
I
::i
GRAMMATICAL REASONINQ
MEMORY SEARCH
.
CONTINUOUS RECOGNITION
F
*.~
Figure 2.
SYAT R a t i n g P l o t s
e i g h t t a s k s ( S c h l e g e l & G i l l i l a n d , 1987). The r e p l i c a t i o n on t e s t day t h r e e p r o v i d e d a check f o r s t a b i l i t y ( o f t e n c a l l e d r e l i a b i l i t y ) o f t h e SWAT measure. The b a s i c a l l y f l a t shape o f a l l o f t h e curves i n F i g u r e 2 demonstrates t h a t g e n e r a l l y SWAT i s a v e r y s t a b l e measure. An a n a l y s i s o f v a r i a n c e i n d i c a t e d t h a t seven o f t h e t a s k s were n o t s t a t i s t i c a l l y d i f f e r e n t on day t h r e e from day one ( p > .05). The one e x c e p t i o n was s p a t i a l p r o c e s s i n g which had s i g n i f i c a n t l y l o w e r SWAT r a t i n g s on day t h r e e , p o s s i b l y r e f l e c t i n g t h a t l e a r n i n g was s t i l l t a k i n g place.
206
C.B. Reid and T.E. Nygren
A n o t h e r s t u d y t h a t was based o n t h e u s e o f a C T S t a s k c o n s i s t e d o f a d u a l t a s k e x p e r i m e n t i n w h i c h t h e u n s t a b l e t r a c k i n g t a s k was used t o r e p r e s e n t a f l y i n g t a s k , and a r a d i o c o m m u n i c a t i o n s t a s k was u s e d as a s e c o n d a r y t a s k ( R e i d , S h i n g l e d e c k e r , & Eggemeier, 1981). Results indicated that b o t h t h e s e c o n d a r y t a s k and t h e SWAT r a t i n g s d i f f e r e n t i a t e d among t h e three d i f f i c u l t y levels o f the unstable tracking task. This concurrent r e s u l t was i n t e r p r e t e d as p r o v i d i n g s u p p o r t i v e e v i d e n c e t h a t t h e measures were r e f l e c t i n g a c t u a l d i f f e r e n c e s i n s u b j e c t s ' m e n t a l w o r k l o a d . Some s t u d i e s h a v e been u n d e r t a k e n t o i n v e s t i g a t e f a c t o r s r e l a t e d t o p r o c e d u r a l a s p e c t s o f SWAT a d m i n i s t r a t i o n . One such s e r i e s o f e x p e r i m e n t s was d i r e c t e d t o w a r d i n v e s t i g a t i o n o f t h e d e l a y between t a s k p e r f o r m a n c e and t h e t i m e when s u b j e c t s p r o v i d e a c t u a l w o r k l o a d r a t i n g s . Frequently, i n operational situations, t h e portion o f a task t h a t i s o f greatest i n t e r e s t t o an i n v e s t i g a t o r o c c u r s a t p r e c i s e l y t h e same t i m e t h a t an o p e r a t o r c a n n o t b e i n t e r r u p t e d t o c o l l e c t a w o r k l o a d r a t i n g . The i n v e s t i g a t o r , t h e n , must w a i t f o r a b r e a k i n t h e o p e r a t o r s ' a c t i v i t i e s i n o r d e r t o o b t a i n a r a t i n g f o r t h a t event. I f t h e d e l a y between t a s k p e r f o r m a n c e and a s s i g n m e n t o f t h e w o r k l o a d r a t i n g s i s v e r y l o n g , i t w o u l d seem r e a s o n a b l e t o e x p e c t an e f f e c t o f t h e d e l a y on t h e a s s i g n e d r a t i n g s . N o t e s t i n e (1984) i n v e s t i g a t e d a delay i n r e p o r t i n g o f workload r a t i n g s o f u p t o 30 m i n u t e s . I n t h i s e x p e r i m e n t , s u b j e c t s p e r f o r m e d a d i s p l a y monit o r i n g t a s k and s u p p l i e d SWAT r a t i n g s i m m e d i a t e l y a f t e r t a s k c o m p l e t i o n or a f t e r e i t h e r a 15- or 3 0 - m i n u t e d e l a y . A n a l y s i s o f t h e SWAT r a t i n g s d a t a revealed t h a t t h e r a t i n g s f o r t h e three l e v e l s o f t h e d i s p l a y monitoring t a s k w e r e s i g n i f i c a n t l y d i f f e r e n t f r o m one a n o t h e r ( p < .05), b u t t h a t t h e y were n o t s t a t i s t i c a l l y d i f f e r e n t as a f u n c t i o n o f t h e r e s p o n s e d e l a y . Eggemeier, C r a b t r e e , and L a P o i n t e ( 1 9 8 3 ) u s e d t h e same e x p e r i m e n t a l d e s i g n i n an e x p e r i m e n t i n w h i c h s u b j e c t s p e r f o r m e d a s h o r t - t e r m memory t a s k . Because t h e N o t e s t i n e (1984) e x p e r i m e n t h a d u s e d a p e r c e p t u a l t a s k , i t was t h o u g h t t h a t a n e f f e c t m i g h t b e o b s e r v e d i f t h e t a s k demand t a p p e d t h e same p r o c e s s i n g r e s o u r c e t h a t c o u l d b e e x p e c t e d t o a f f e c t r e t e n t i o n o f t h e rating. The s h o r t - t e r m memory t a s k r e q u i r e d s u b j e c t s t o k e e p t r a c k o f t h e number o f o c c u r r e n c e s o f f o u r l e t t e r s i n a sequence o f l e t t e r s . Task d i f f i c u l t y was m a n i p l l a t e d by h a v i n g t h e l e t t e r s appear a t 1-second, 2-second, or 3-second i n t e r v a l s . A g a i n , t h e d a t a i n d i c a t e d a s i g n i f i c a n t main e f f e c t f o r t a s k l e v e l s , b u t t h e d e l a y c o n d i t i o n s d i d n o t d i f f e r s i g n i f i c a n t l y f r o m t h e immediate r a t i n g c o n d i t i o n . I n b o t h t h e Eggemeier e t a l . (1983) and N o t e s t i n e ( 1 9 8 4 ) s t u d i e s , t h e i n t e r v e n i n g t a s k was a t a s k t h a t was d i s s i m i l a r f r o m t h e t a s k t o b e rated. T h e r e f o r e , Eggemeier, Me1 v i l l e , and C r a b t r e e ( 1 9 8 4 ) c o n d u c t e d a t h i r d e x p e r i m e n t i n w h i c h t h e p r i m a r y v a r i a b l e was t h e t y p e o f i n t e r v e n i n g task. One c o n d i t i o n was a n o i n t e r v e n i n g t a s k c o n d i t i o n , a second was a d i f f i c u l t i n t e r v e n i n g t a s k c o n d i t i o n , a t h i r d was an easy i n t e r v e n i n g t a s k c o n d i t i o n , and a f o u r t h was a m i x e d d i f f i c u l t y i n t e r v e n i n g t a s k c o n d i tion. I n t h i s e x p e r i m e n t o n l y one d e l a y c o n d i t i o n , 1 4 m i n u t e s , was used, and t h e t a s k was a v a r i a t i o n o f t h e s h o r t - t e r m memory t a s k used i n t h e previous study. I n t h i s v a r i a t i o n , s u b j e c t s k e p t t r a c k o f f i v e l e t t e r s w i t h t h e l e t t e r s i n t h e t a s k sequence p r e s e n t e d a t a s i n g l e r a t e f o r 500 m i l l i s e c o n d s each. A g a i n , SWAT was f o u n d t o d i s c r i m i n a t e between t h e l e v e l s of t h e memory t a s k b u t n o s i g n i f i c a n t d i f f e r e n c e s w e r e f o u n d as a f u n c t i o n o f t h e t y p e o f i n t e r v e n i n g task.
The Subjective Workload Assessment Technique
201
Taken as a whole, t h e n , t h i s s e r i e s o f e x p e r i m e n t s p r o v i d e s s u p p o r t f o r t h e s e n s i t i v i t y of SWAT t o m a n i p u l a t i o n s o f t a s k i n d u c e d w o r k l o a d . The d a t a a l s o s u p p o r t t h e c o n c l u s i o n t h a t i n o p e r a t i o n a l s i t u a t i o n s , where task c o n s t r a i n t s r e q u i r e t h a t t h e i n v e s t i g a t o r delay o b t a i n i n g workload r a t i n g s , t h e d e l a y may n o t h a v e a l a r g e i m p a c t on t h e s u b j e c t s ' r a t i n g s . On t h e o t h e r hand, because of p o t e n t i a l l o s s f r o m s h o r t - t e r m memory, i t w o u l d b e p r u d e n t t o o b t a i n r a t i n g s as soon as p o s s i b l e a f t e r c o m p l e t i o n o f a r e l e v a n t event. One a t t r i b u t e t h a t i s g e n e r a l l y t h o u g h t t o be d e s i r a b l e f o r a w o r k l o a d measure i s i t s d i a g n o s t i c i t y . D i a g n o s t i c i t y i s t h e a b i l i t y o f a measure t o r e f l e c t t h e cause o r causes t h a t u n d e r l i e an i n c r e a s e i n m e n t a l w o r k l o a d . Some have a r g u e d t h a t s u b j e c t i v e measures a r e u s u a l l y t h o u g h t t o the b e poor i n t h e i r a b i l i t y t o p r o v i d e d i a g n o s t i c i n f o r m a t i o n ( c f . , d i s c u s s i o n i n Gopher & Donchin, 1986). We a r g u e t h a t t h e m u l t i d i m e n s i o n a l c h a r a c t e r i s t i c o f SWAT p r o v i d e s an o p p o r t u n i t y t o i m p r o v e i t s d i a g n o s t i c c a p a b i l i t y . Because r a t i n g s a r e o b t a i n e d on t h e i n d i v i d u a l d i m e n s i o n s and t h e s c a l i n g a l g o r i t h m p r o v i d e s s c a l e v a l u e s f o r t h e component s c a l e s as w e l l as t h e i r o v e r a l l a d d i t i v e e f f e c t , s e p a r a t e i n d i v i d u a l a n a l y s e s can b e p e r f o r m e d t o a s c e r t a i n w h i c h d i m e n s i o n i s c h a n g i n g t h e most as t a s k demand increases. P o t t e r and A c t o n (1985) p e r f o r m e d a s u b s c a l e a n a l y s i s i n a s t u d y u s i n g t h e c o n t i n u o u s r e c a l l t a s k f r o m t h e CTS. They showed i n t h i s e x p e r i m e n t t h a t , a l t h o u g h a l l t h r e e o f t h e component s c a l e s were s e n s i t i v e t o t a s k demand, t h i s e f f e c t o c c u r r e d a t d i f f e r e n t l e v e l s o f demand. The M e n t a l E f f o r t Load s c a l e i n c r e a s e d s u b s t a n t i a l l y a t t h e l o w e s t l e v e l s o f t a s k m a n i p u l a t i o n and t h e n r e m a i n e d f a i r l y c o n s t a n t t h r o u g h t h e m i d d l e a n d h i g h e r demand mani p u l a t i o n s . Time Load and P s y c h o l o g i c a l S t r e s s Load s c a l e s , on t h e o t h e r hand, s t a r t e d o u t t o i n c r e a s e v e r y s l o w l y a t t h e l o w m a n i p u l a t i o n s and changed most i n r e s p o n s e t o t h e m o d e r a t e t o h i g h manipul a t i o n s o f t a s k demand. T h i s d i f f e r e n t i a l s e n s i t i v i t y was i n t e r p r e t e d as b e i n g s u p p o r t i v e o f appropriateness o f t h e i n d i v i d u a l dimensions s e l e c t e d f o r SWAT. A f o l l o w - u p e f f o r t ( P o t t e r , 1986) was c o n d u c t e d t o t r y t o independently manipulate t h e t i m e load dimension through a r a t e o f pres e n t a t i o n m a n i p u l a t i o n and t h e m e n t a l e f f o r t l o a d d i m e n s i o n t h r o u g h t a s k demand m a n i p u l a t i o n . Two t a s k s w e r e used: a memory s e a r c h t a s k , and t h e c o n t i n u o u s r e c a l l t a s k f r o m t h e CTS. Task d i f f i c u l t y f o r t h e memory s e a r c h t a s k was m a n i p u l a t e d by v a r y i n g t h e number o f i t e m s ( l e t t e r s ) t h a t a s u b j e c t h e l d i n memory. Task d i f f i c u l t y i n t h e c o n t i n u o u s r e c o g n i t i o n t a s k was m a n i p u l a t e d by v a r y i n g t h e number o f d i g i t s b e i n g h e l d i n memory as w e l l as how many back i n a c o n t i n u o u s s t r e a m o f numbers t h e s u b j e c t h a d As t o remember. F i g u r e 3 shows t h e r e s u l t f o r t h e memory s e a r c h t a s k . c a n be seen f r o m t h i s f i g u r e , t h e t i m e l o a d d i m e n s i o n does seem t o v e r y c l e a r l y r e f l e c t t h e manipulation o f rate. Although t h e e f f o r t dimension does c l e a r l y r e f l e c t t h e t a s k d i f f i c u l t y m a n i p u l a t i o n , t h e r e i s a l s o , a p p a r e n t l y , some e f f e c t o f t i m e p r e s e n t i n t h e s c o r e s . Also, t h e r a t i n g s on t h e psychological s t r e s s dimension i n c r e a s e even though no attempted m a n i p u l a t i o n o f t h i s d i m e n s i o n was u n d e r t a k e n . Since i t i s impossible t o know w h e t h e r or n o t p s y c h o l o g i c a l s t r e s s was b e i n g a f f e c t e d i n some way o r whether increased d i f f i c u l t y a c t u a l l y r e q u i r e s a d d i t i o n a l processing time, i t c a n o n l y be o b s e r v e d t h a t t h e p a t t e r n p r e s e n t i n t h e s e c u r v e s s u p p o r t s , t o some d e g r e e , t h e d i f f e r e n t i a l s e n s i t i v i t y o f t h e d i m e n s i o n s . Adding weight t o t h i s i n t e r p r e t a t i o n i s t h e f a c t t h a t t h e t y p e o f scale (time, e f f o r t , s t r e s s ) i n t e r a c t e d w i t h t h e experimental manipulations o f r a t e of presentation, d i f f i c u l t y l e v e l , and r a t e by d i f f i c u l t y .
C.B. Reid and T. E. Nygren
208 TIME LOAD
PSYCHOLOGICAL STRESS LOAD
MENTAL EFFORT LOAD
TASK DIFFICULTY (MEMORY SET SIZE) - 1
7Y
- 4 -7
6-
3 A
s
5-
Q
i a 3 =
4-
3211 1 I I I SLOW MEDIUM FAST SLOW MEDIUM FAST SLOW MEDIUM FAST 1
I
1
PRESENTATION RATE
F i g u r e 3.
R e s u l t s f o r Memory Search Task
SWAT has a l s o been used i n two experiments t o e v a l u a t e t h e e f f e c t s of s t r e s s o r s on work1 oad. The f i r s t experiment (A1 bery, Ward, & G i 11 , 1985) was designed t o e v a l u a t e t h e p o s s i b i l i t y t h a t a high-G environment l i k e t h a t found i n a modern f i g h t e r a i r c r a f t c o u l d c o n t r i b u t e t o o p e r a t o r ment a l workload. Impairment c o u l d occur as a r e s u l t o f reduced b l o o d f l o w o r from t h e conscious e f f o r t s t h a t must be expended i n c o u n t e r a c t i n g t h e blood f l o w effects. To i n v e s t i g a t e t h i s phenomenon, s u b j e c t s were r e q u i r e d t o s o l v e a two-dimensional maze problem on a CRT w h i l e b e i n g exposed t o G f o r c e s o f 1.5, 3.0, 5.0, and 6.0 Gs i n a human c e n t r i f u g e . The r e s u l t s o f t h i s s t u d y i n d i c a t e d t h a t scores a s s o c i a t e d w i t h maze performance were n o t a f f e c t e d by t h e G l e v e l s . However, SWAT r a t i n g s f o r moderate and high-G l e v e l s (5.0 and 6.0 Gs) were s i g n i f i c a n t l y h i g h e r t h a n SWAT r a t i n g s f o r t h e l o w e r G l e v e l s (1.5 and 3.0 Gs). Another study a l s o i n v e s t i g a t e d a p o t e n t i a l e n v i r o n m e n t a l s t r e s s o r ( A l b e r y , Repperger, Reid, Goodyear, Ramirez, & Roe, 1987). I n t h i s study l o w (90 dB) t o moderate (100 dB) n o i s e was used t o p r o v i d e a s t r e s s o r w h i l e t h e s u b j e c t performed a s i n g l e a x i s compensatory t r a c k i n g t a s k t h a t represented a f l y i n g task. Task d i f f i c u l t y was m a n i p u l a t e d by p r e s e n t i n g f i v e d i f f e r e n t f o r c i n g f u n c t i o n s f o r each o f t h r e e t r a c k i n g p l a n t dynami c s . B o t h performance measures and SWAT r a t i n g s e f f e c t i v e l y d i s c r i m i n a t e d between t h e t h r e e d i f f e r e n t p l a n t dynamics. The n o i s e s t r e s s o r d i d n o t have a measurable e f f e c t on t h e s u b j e c t s ' performance b u t t h e l e v e l s o f n o i s e d i d produce a s t a t i s t i c a l l y s i g n i f i c a n t e f f e c t on t h e SWAT r a t i n g s . Simul a t i on S t u d i e s As a r u l e , s i m u l a t i o n s t u d i e s do n o t have t h e degree o f e x p e r i m e n t a l c o n t r o l t h a t c h a r a c t e r i z e s l a b o r a t o r y experiments. Because t h e t a s k s a r e s o much more complex, t h e same degree o f p r e c i s i o n u s u a l l y cannot be achieved. On t h e o t h e r hand, t h e degree o f r e a l i s m p o s s i b l e i n a
The Subjective Workload Assessment Technique
209
s i m u l a t i o n study provides an environment t h a t can be used t o v e r i f y g e n e r a l i z a t i o n s o f l a b o r a t o r y r e s u l t s t o more " r e a l w o r l d " s i t u a t i o n s . SWAT h a s been used i n t w o k i n d s o f s i m u l a t i o n s . The f i r s t g r o u p o f s t u d i e s was d e s i g n e d and e x e c u t e d f o r t h e p l r p o s e o f e v a l u a t i n g m e n t a l w o r k l o a d measures, and t h e second g r o u p c o n s i s t e d o f f i e l d - t y p e I n t h e second g r o u p o f s t u d i e s , SWAT e v a l u a t i o n s o f o p e r a t i o n a l systems. was u s e d as a dependent v a r i a b l e f o r c o n s i d e r i n g m e n t a l w o r k l o a d i n t h e system e v a l u a t i o n . The f i r s t g r o u p o f s i m u l a t i o n s was c o n d u c t e d p r i m a r i l y t o i n v e s t i g a t e a number o f p h y k i o l o g i c a l w o r k l o a d measures. SWAT was i n c l u d e d i n t h e i n v e s t i g a t i o n s because t h e c h a r a c t e r i s t i c s o f ease o f a p p l i c a t i o n and l a c k o f i n s t r u m e n t a t i o n made i t a s i m p l e and i n e x p e n s i v e a d d i t i o n t o t h e study. The f i r s t e x p e r i m e n t i n t h i s s e t used B-52 p i l o t i n g t a s k s f o r c r e a t i n g d i f f e r e n t w o r k l o a d c o n d i t i o n s i n a 8-52 s i m u l a t o r ( T h i e s s e n , Lay, & S t e r n , 1987). The s c e n a r i o was w r i t t e n t o have t h r e e l e v e l s o f w o r k l o a d r e p r e s e n t e d . The l e v e l s were d e f i n e d as f o l l o w s : Low
S t r a i g h t and L e v e l F l i g h t
Medium
Normal Descent and ILS Approach
H i gh
D e s c e n t t o I L S Approach w i t h S u c c e s s i v e E n g i n e F a i l u r e s , Runaway T r i m , and C r o s s w i n d s
The s c e n a r i o was p r e s e n t e d i n t w o s i m u l a t i o n r u n s o f a p p r o x i m a t e l y 15 m i n u t e s each. L i n e p i l o t s f r o m a 8-52 s q u a d r o n s e r v e d as s u b j e c t s a n d f l e w t h e s c e n a r i o s i n a C u r t i s W r i g h t DEHMEL f l i g h t s i m u l a t o r a t C a r s w e l l A i r F o r c e Base, Texas. The t o p l e f t panel o f F i g u r e 4 shows t h a t t h e SWAT r a t i n g s c l e a r l y d i f f e r e n t i a t e d between t h e t h r e e w o r k l o a d c o n d i t i o n s . The SWAT s c o r e s were a m o n o t o n i c f u n c t i o n o f t h e a p r i o r i d e f i n e d l e v e l s o f t a s k demand. The second s t u d y e v a l u a t e d t h e w o r k l o a d o f t a s k s p e r f o r m e d by a B-52 t a i l gunner ( T h i e s s e n e t al., 1987). I n t h i s study, t h e s i m u l a t i o n scenario was w r i t t e n t o c r e a t e t h r e e w o r k l o a d l e v e l s f o r t h i s p r e d o m i n a n t l y percept u a l m o t o r and c o m m u n i c a t i o n t a s k . The l e v e l s o f w o r k l o a d d e f i n e d by t h e s c e n a r i o were: Low
H o s t i l e t a r g e t e n c o u n t e r s a t h i g h a l t i t u d e , enemy t e r r i tory, automatic t a r g e t a c q u i s i t i o n ;
Medium
H o s t i l e t a r g e t encounters low-level, acquisition;
High
H o s t i l e t a r g e t encounters a t low-level w i t h radar system ma1 f u n c t i o n s .
manual t a r g e t
The s c e n a r i o s were i m p l e m e n t e d on a gunner s t a t i o n t r a i n e r c o n s i s t i n g o f a r a d a r scope, i n d i c a t o r s , s w i t c h e s , and an i n s t r u c t o r s t a t i o n . The 13 s u b j e c t s were drawn f r o m o p e r a t i o n a l c r e w s and each s u b j e c t " f l e w " a one h o u r m i s s i o n w i t h a p p r o x i m a t e l y 30 m i n u t e s o f t h e t i m e d e d i c a t e d t o t h e a c t u a l t a r g e t p r e s e n t a t i o n segments d e f i n e d by t h e w o r k l o a d l e v e l s . The b o t t o m panel o f F i g u r e 4 i l l u s t r a t e s t h a t SWAT was s e n s i t i v e t o t h i s w o r k l o a d manipulation. SWAT r a t i n g s f o r each o f t h e t h r e e l e v e l s w e r e s i g n i f i A t h i r d s i m u l a t i o n s t u d y r e p o r t e d by c a n t l y d i f f e r e n t f r o m one a n o t h e r .
G. B. Reid arid T.E. Nygreri
210
100
IW
90 -
90
-
-
80
-
10
-
60
~.
5
0
80 70
w
~
60
-
>
50
-
2
40-
3
a
t
P
w 3
Q>
,,,"
c a
/ 30
~
20
-
uI
/'
~
,d' , '
~
, / '
30-
U
,A
0-
40
,,Y
_/'
20 10
d"
~
-
0 7
O T
HLDLOW
HED H ~ G H
Him
MISSION TYPE
Figure 4.
SWAT Results
Thiessen e t a l . (1987) i n v e s t i g a t e d a f i g h t e r a i r defense mission. In t h i s study, 13 s u b j e c t s f l e w an F-16 s i m u l a t o r w i t h a 36" x 48" wide a n g l e f i e l d o f view v i s u a l s i m u l a t i o n . The c o c k p i t was a f i x e d base F-160 w i t h m a j o r c o n t r o l s and d i s p l a y s f u n c t i o n a l . The s i m u l a t i o n i n c l u d e d f o u r d e f e n s i v e c o u n t e r a i r s c e n a r i o s designed t o p r o v i d e workload r a n g i n g f r o m l o w t o h i g h d e f i n e d as f o l l o w s : Low
An F-16 chases t h r e e enemy a i r c r a f t making an "S" weave escape.
Medium Low
F i v e enemy a i r c r a f t approach t h e F-16 head-on.
Medium High
An 'IS" One enemy f i g h t e r approaches t h e F-16 head-on. weave p a t t e r n ; two enemy f i g h t e r s approach head-on; f o u r enemy bombers approach t h e F-16 head-on b e h i n d t h e fighters.
H i gh
Seven enemy a i r c r a f t approach t h e F-16; two o f t h e a i r c r a f t s p l i t i n o p p o s i t e d i r e c t i o n s t o c a t c h t h e F-16 i n a p i n c h e r maneuver.
Posthoc t e s t s f o l l o w i n g a n a l y s i s o f v a r i a n c e on t h e SWAT r a t i n g s showed a s i g n i f i c a n t e f f e c t only for t h e h i g h workload condition. The means (on t h e 0 t o 100 SWAT s c a l e ) f o r t h e f o u r l e v e l s were 30, 41, 42, and 72. Although t h e SWAT scores t e n d t o go u p as a monotonic f u n c t i o n o f t h e
The Subjective Workload Assessment Technique
21 1
w o r k l o a d mani p u l a t i o n , t h e p o s t h o c t e s t s r e v e a l e d t h a t t h e m a g n i t u d e o f t h e d i f f e r e n c e s between l e v e l s was t o o s m a l l t o b e s t a t i s t i c a l l y r e l i a ble. T h i s f i n d i n g was s u b s t a n t i a t e d by f i v e p h y s i o l o g i c a l measures and a p e r f o r m a n c e measure. The l a c k o f s i g n i f i c a n t d i f f e r e n c e s i n a l l o f t h e s e measures must b e i n t e r p r e t e d as i n d i c a t i n g t h a t t h e i n t e n d e d m a n i p u l a t i o n was n o t as s t r o n g as t h e i n v e s t i g a t o r s h a d i n t e n d e d . However, even i n t h i s s i t u a t i o n , SWAT proved t o be as s e n s i t i v e as any o f t h e dependent v a r i a b l e s used. As a s e t , t h e s e s i m u l a t i o n s t u d i e s p r o v i d e s u b s t a n t i a l s u p p o r t f o r t h e s e n s i t i v i t y o f SWAT. T h i s was an i m p o r t a n t s e t o f e x p e r i m e n t s s i n c e i t i n v o l v e d t h e d i f f i c u l t amalgamation of l a b o r a t o r y c o n t r o l w i t h t h e r e a l i s m of an o p e r a t i o n a l t a s k . The s i m i l a r i t y o f t h e s e d a t a t o t h e r e s u l t s o b t a i n e d i n t h e psychology l a b o r a t o r y p r o v i d e evidence t o support t h e a s s e r t i o n t h a t SWAT i s s e n s i t i v e t o v a r i a t i o n s i n w o r k l o a d a c r o s s a w i d e v a r i e t y o f t a s k s and o p e r a t i o n a l c o n d i t i o n s . Another important f u n c t i o n o f these s i m u l a t i o n s t u d i e s i s t o b r i d g e t h e g a p between t h e r e s e a r c h f i n d i n g s and o p e r a t i o n a l a p p l i c a t i o n s . As p r e v i o u s l y s t a t e d , t h e o b j e c t i v e o f t h e d e v e l o p n e n t o f SWAT was t o p r o v i d e a measurement t o o l f o r u s e i n o p e r a t i o n a l e n v i r o n m e n t s l i k e f l i g h t t e s t s a n d o p e r a t i o n a l t e s t and e v a l u a t i o n (OT&E). I n t h e s e a p p l i c a t i o n s , t h e o b j e c t i v e s of t h e t e s t s a r e n o t r e l a t e d t o t h e e v a l u a t i o n o f w o r k l o a d measures. I n t h e s e i n s t a n c e s , t h e SWAT measure i s needed as a dependent v a r i a b l e t o e v a l u a t e such f a c t o r s as a l t e r n a t i v e s y s t e m c o n f i g u r a t i o n s , p r o c e d u r e s , o r v a r i o u s crew f a c t o r s . T a b l e 5 p r o v i d e s a sample o f e v a l u a t i o n s where SWAT has been employed. These e v a l u a t i o n s t y p i c a l l y d e a l w i t h new o r e x p e r i m e n t a l systems and, t h e r e f o r e , even when t h e systems a r e c o m m e r c i a l r a t h e r than defense-related, t h e data are considered sensitive. Because o f t h e s e n s i t i v i t y o f most o f t h e s e a p p l i c a t i o n s , t h e r e s u l t s o f t h e t e s t c a n n o t R a t h e r , t h e b r e a d t h of t h e t y p e s o f e v a l u a t i o n s and t h e be d e t a i l e d here. number o f a p p l i c a t i o n s a r e p r e s e n t e d t o d e m o n s t r a t e t h e u t i l i t y o f SWAT a s a w o r k l o a d dependent v a r i a b l e i n many t y p e s o f o p e r a t i o n a l e v a l u a t i o n s .
SUMMARY AND CONCLUSIONS Because o f t h e c o m p l e x i t y o f t h e c o n s t r u c t known as m e n t a l w o r k l o a d , i t s measurement p r e s e n t s a f o r m i d a b l e c h a l l e n g e . I n f a c t , i t has been a r g u e d t h a t , because o f t h i s c o m p l e x i t y , no s i n g l e measure i s l i k e l y t o adeq u a t e l y encompass a l l components o f w o r k l o a d i n a l l a p p l i e d s i t u a t i o n s The a l t e r n a t i v e t o a s i n g l e i n d e x o f w o r k l o a d i s a (Eggemeier, 1984). b a t t e r y o f measures, each o f w h i c h i s s e n s i t i v e u n d e r d i f f e r e n t c o n d i t i o n s o f task t y p e o r subject type, o r i s s e l e c t i v e l y s e n s i t i v e t o p a r t i c u l a r components t h a t c o m p r i s e t h e c o n s t r u c t . J u s t such a b a t t e r y i s u n d e r d e v e l o p n e n t as p a r t o f t h e r e s e a r c h program a t t h e H a r r y G. A r m s t r o n g Aerospace M e d i c a l R e s e a r c h L a b o r a t o r y . The o b j e c t i v e o f t h i s r e s e a r c h program i s t o d e v e l o p a b a t t e r y o f m e t r i c s and s u b j e c t t h e m t o r i g o r o u s t e s t i n g i n o r d e r t o d e f i n e ( a ) t h e c o n d i t i o n s u n d e r w h i c h each measure i s u s e f u l as a measure o f m e n t a l w o r k l o a d , ( b ) t h e c h a r a c t e r i s t i c s o f t h e measures, and ( c ) t h e i n t e r r e l a t i o n s h i p s among t h e v a r i o u s measures. The S u b j e c t i v e W o r k l o a d Assessment T e c h n i q u e (SWAT) i s a component o f t h i s battery. M e t r i c s r e p r e s e n t i n g o t h e r c l a s s e s o f measures i n c l u d i n g p h y s i o l o g i c a l measures a n d b e h a v i o r a l ( p e r f o r m a n c e ) measures a r e a l s o u n d e r d e v e l o p n e n t ( c f . , t h e c h a p t e r s by W i l s o n and O ' D o n n e l l , and by Eggemeier i n t h i s volume).
212
G.B. Reid and T.E. Nygren TABLE 5.
Category
SWAT APPLICATIONS STUDIES
System
Sinul ation Aircraft
F-16/F-15 A i r - t o - A i r KC-135 F l i g h t Deck M o d e r n i z a t i o n A-300 Approach and Landing ( S c h i c k & Hahn, 1987) 8-52 Long M i s s i o n ( S k e l l y & P u r v i s , 1985) DC-10 Approach and Landing ( B i f e r n o & Reid, 1983) 8-52 CG/Fuel Level A d v i s o r y System H e l i c o p t e r NOE (Haworth, Bivens, S h i l v e y , & Delgado, 1987) General A v i a t i o n T r a i n i n g ( H a s k e l l & Reid, 1985)
Control Center
Ground Launch M i s s i l e ( C r a b t r e e , Bateman, & Acton, 1984; Acton & Crabtree, 1985) Nuclear Power P l a n t T r a i n i n g (Beare & D o r r i s , 1984)
O i l R e f i n e r y ( B e v i l l e E n g i n e e r i n g , Inc.,
1986)
Operational Aircraft
F-16 F1 ight T e s t * A-10 F1 i g h t T e s t * Laser Guided M i s s i l e F l i g h t Test* (Ossard, A m a l b e r t i & Poyot, 1987)
Control Center
C-1412 A i r Drop/Ai r Land** KC-10 Boom Operator**
(Dodge, Wong, & Brown, 1984)
Command and C o n t r o l Center** 1984)
* **
( C o u r t r i ght & Kuperman,
F l i g h t Test O p e r a t i o n a l T e s t & E v a l u a t i o n (OT&E)
SWAT i s a s c a l i n g procedure t h a t r s designed t o a l l o w t h e m e a n i n g f u l assignment o f numbers t o i n d i v i d u a l s s u b j e c t i v e i m p r e s s i o n s o f t h e mental workload a s s o c i a t e d w i t h p e r f o r m i n g v a r i o u s t a s k s . As a s u b j e c t i v e measu r e we see t h e t e c h n i q u e as h a v i n g t h e f o l l o w i n g d i s t i n c t advantages: (1) i t i s based on f o r m a l p r o p e r t i e s o f c o n j o i n t measurement t h e o r y , ( 2 ) t h e u n d e r l y i n g assumption o f a d d i t i v i t y o f t h e t h r e e w o r k l o a d dimensions i s t e s t a b l e f o r b o t h i n d i v i d u a l and group d a t a , ( 3 ) o n l y o r d i n a l ( r a n k o r d e r )
The Subjective Workload Assessment Technique
213
d a t a i s r e q u i r e d , ( 4 ) t h e rank o r d e r i n g t a s k o f t h e 27 workload combinat i o n s has f a c e v a l i d i t y , ( 5 ) t h e SWAT s c a l i n g a l g o r i t h m s i m u l t a n e o u s l y produces i n t e r v a l - s c a l e d e s t i m a t e s o f t h e l e v e l s o f t h e t h r e e workload dimensions as w e l l as e s t i m a t e s o f t h e i r combined e f f e c t s , ( 6 ) i n d i v i d u a l d i f f e r e n c e s e s t i m a t e s o f t h e importance o f each dimension f o r e v a l u a t i n g workload can be o b t a i n e d , t h u s a l l o w i n g s u b j e c t s t o be prototyped, and ( 7 ) once t h e s c a l e has been o b t a i n e d , v a r i o u s t a s k s o r subtasks can be e a s i l y scored v i a a n o n i n t r u s i v e procedure we c a l l event s c o r i n g .
SWAT i s i n t e n d e d t o be a g l o b a l measure o f workload t h a t i s a p p l i c a b l e i n a l a r g e range o f s i t u a t i o n s . That i s , i t i s expected t h a t SWAT s h o u l d be g e n e r a l l y s e n s i t i v e t o i n c r e a s e s i n workload and t h e r e f o r e be h e l p f u l i n i d e n t i f y i n g areas o f concern w i t h i n a t a s k o r system d e s i g n t h a t r e q u i r e more i n t e n s e i n v e s t i g a t i o n . Because SWAT r a t i n g s a r e r e l a t i v e l y s i m p l e t o o b t a i n , SWAT can be used as t h e measure t h a t p r o v i d e s c o n t i n u i t y t h r o u g h An e a r l y s t u d y m i g h t i d e n t i f y a o u t a system d e s i g n o r f a m i l y o f s t u d i e s . p a r t i c u l a r phase o f a system o p e r a t i o n as b e i n g r e l a t i v e l y h i g h i n workload. That phase m i g h t t h e n be more t h o r o u g h l y s t u d i e d u s i n g one o r more o t h e r workload measurement t e c h n i q u e s such as one o f t h e p h y s i o l o g i c a l measures o r a secondary t a s k . I n most i n s t a n c e s d u r i n g t h i s more focused study, SWAT c o u l d a l s o be o b t a i n e d t o p r o v i d e supplemental i n f o r m a t i o n t h a t can c o n f i r m t h a t t h e s u b j e c t i v e i m p r e s s i o n has n o t been a l t e r e d by t h e change i n study c o n d i t i o n s w h i l e g e t t i n g t h e i n c r e a s e d focused s e n s i t i v i t y and d i a g n o s t i c i t y o f t h e a d d i t i o n a l measures. A l i m i t a t i o n o f s u b j e c t i v e measures o f workload i s t h a t t h e y p r o v i d e r e l a t i v e information. Under c u r r e n t c o n d i t i o n s , we a r e sometimes r e s t r i c t e d i n s a y i n g o n l y t h a t one t a s k has more o r l e s s workload t h a n another. Research i s needed t o c l e a r l y d e f i n e t h e degree o f i n f l u e n c e o f f a c t o r s such as t h e number and range o f t a s k l e v e l s present i n an i n v e s t i g a t i o n o r t h e o r d e r e f f e c t s o f t h e v a r i o u s c o n d i t i o n s i n an i n v e s t i g a t i o n . Once t h i s aspect o f t h e measurement process i s understood t h e n i t may be possib l e t o e s t a b l i s h what has been l a b e l e d a " r e d l i n e " f o r mental workload. The t e r m " r e d l i n e " was chosen t o i m p l y t h a t , i f an o p e r a t o r ' s w o r k l o a d exceeds a c e r t a i n value, t h e p r o b a b i l i t y o f performance breakdown i s i n c r e a s e d , r a t h e r t h a n t o i m p l y t h a t t h e r e i s a value t h a t w i l l d e f i n i t e l y r e s u l t i n performance breakdown. I n most i n s t a n c e s when i n v e s t i g a t o r s a r e concerned about measuring workload, t h e i m p l i e d q u e s t i o n i s , "Is t h e workl o a d t o o h i g h ? " I n o r d e r t o answer t h i s q u e s t i o n , i t w i l l be necessary t o d e v e l o p m e t r i c s t h a t a r e a b s o l u t e measures o f a known range o f a charact e r i s t i c o f operators. U n t i l t h i s goal can be achieved, i t i s d e s i r a b l e t o b u i l d a d a t a base o f measurement values t h a t a r e a s s o c i a t e d w i t h i n c r e a s e d e r r o r r a t e s and/or performance breakdown. Considerable data r e l a t i n g each workload measure t o performance d e g r a d a t i o n i s needed t o e s t a b l i s h t h e l o c a t i o n o f t h e t o l e r a n c e l e v e l s i n d i c a t e d by t h e v a r i o u s measures. F i n a l l y , i n o r d e r t o make t h e complementary use o f m u l t i p l e measures p o s s i b l e , e x t e n s i v e r e s e a r c h w i l l be necessary t o d e f i n e t h e r e l a t i o n s h i p between s u b j e c t i v e and t h e v a r i o u s o t h e r measures o f mental workload t h a t a r e presented i n t h i s volume. ACKNOYLEDGEHENTS T h i s work i n c l u d i n g most o f t h e developnent o f t h e SWAT s c a l i n g a l g o r i t h m was supported i n p a r t by a c o n t r a c t t h r o u g h t h e U. S. Air F o r c e and t h e second author. The a u t h o r s would l i k e t o thank Ms. J. B r e s s l e r f o r t h e s u p p o r t p r o v i d e d i n p r e p a r a t i o n o f t h i s m a n u s c r i p t . We would a l s o l i k e t o
214
G. B. Reid arid
T.E. Nygren
e x p r e s s a p p r e c i a t i o n t o s e v e r a l c o l l e a g u e s , Dr. H.A. P o l z e l l a , Mr. S.S. P o t t e r , and Dr. M.L. F r a c k e r who reviews o f t h e manuscript a t v a r i o u s stages.
C o l l e , D r . D.P. provided c r i t i c a l
REFERENCES
c11
A c t o n , W.H. and C r a b t r e e , M.S., W o r k l o a d assessment t e c h n i q u e s i n s y s t e m r e d e s i g n , P r o c e e d i n g s o f t h e IEEE N a t i o n a l Aerospace and E l e c t r o n i c s Conference (1985).
c21
The e f f e c t o f a c c e l e r a t i o n A l b e r y , W.B., Ward, S.L., and G i l l , R.T., s t r e s s on human w o r k l o a d , H a r r y G. A r m s t r o n g Aerospace M e d i c a l R e s e a r c h L a b o r a t o r y T e c h n i c a l R e p o r t , (AAMRL-TR-85-039), (1985).
C31
R e i d , G.B., Goodyear, C., Ramirez, A l b e r y , W.B., Repperger, D.W., L.E., and Roe, M.M., E f f e c t o f n o i s e o n a d u a l t a s k : S u b j e c t i v e and o b j e c t i v e workload c o r r e l a t e s , Proceedings o f t h e IEEE National Aerospace and E l e c t r o n i c s C o n f e r e n c e (1987).
C41
Eggemeier, F.T., and Acton, W.H., The c r i t e r i o n t a s k A m e l l , J.R., set: An u p d a t e d b a t t e r y , paper p r e p a r e d f o r p r e s e n t a t i o n a t t h e T h i r t y - F i r s t Annual M e e t i n g o f t h e Human F a c t o r s S o c i e t y (1987).
C5l
Beare, A.N. and D o r r i s , R.E., The e f f e c t s o f s u p e r v i s o r e x p e r i e n c e and t h e presence o f a s h i f t t e c h n i c a l a d v i s o r on t h e Derformance o f two-man c r e w s i n a n u c l e a r power p l a n t s i m u l a t o r , P r o c e e d i n g s o f t h e Human F a c t o r s S o c i e t y T w e n t y - E i g h t h Annual M e e t i n g T 9 8 4 ) 242-246.
C61
B e v i l l e Engineering,
Inc.,
Human f a c t o r s a n a l y s i s o f r e f i n e r y c o n -
sol id a t i o n (1986). 171
DC-10 s t u d y - Does t h e d i s t a n c e o f a B i f e r n o , M. and R e i d , G.B., touch-panel c o n t r o l i n f l u e n c e o p e r a t o r performance and/or workload? ( u n p u b l i s h e d r e p o r t , 1983).
C81
C o u r t r i g h t , J.F. and Kuperman. G.. Use o f swat i n u s a f s v s t e m t & e. P r o c e e d i n g s o f t h e Human F a c t o r s S o c i e t y T w e n t y - E i g h t h A n n u a i ' M e e t i n g ( 1 9 8 4 ) 7 00-703.
C91
C r a b t r e e , M.S., Bateman, R.P.. and Acton. W.H.. Benefits o f usinq o b j e c t i v e and s u b j e c t i v e - w o r k l o a d measures-, P r o c e e d i n g s o f t h e Human F a c t o r s S o c i e t y Twenty-Ei g h t h Annual M e e t i n g (1984) 950-953.
ClOl
Boom c o n t r o l s y s t e m Dodge, D.C., Wong, T.J., and Brown, K.W., improvement s t u d y Phase I 1 - S u p p l e m e n t a l i n d i c a t i o n system. R e p o r t No. MDC 59732 ( D o u g l a s A i r c r a f t Company, McDonnell D o u g l a s , 1984).
-
c111 D o n n e l l , M.L.,
An a p p l i c a t i o n o f d e c i s i o n - a n a l y t i c t e c h n i q u e s t o t h e t e s t and e v a l u a t i o n phase o f a m a j o r a i r s y s t e m : Phase 111. McLean, V i r g i n i a : D e c i s i o n s and D e s i g n s , Inc., TR-PR-79-6-91, 1979.
C121
D o n n e l l , M.L. and D'Connor, M.F., The a p p l i c a t i o n o f d e c i s i o n a n a l y t i c t e c h n i q u e s t o t h e t e s t and e v a l u a t i o n phase o f t h e a c q u i s i t i o n o f a m a j o r a i r s y s t e m : Phase 11. McLean, V i r g i n i a : D e c i s i o n s and D e s i g n s , Inc., TR-78-3-25, 1978.
The Subjective Workload Assessment Technique
215
El31
Eggemeier, F.T., W o r k l o a d m e t r i c s f o r s y s t e m e v a l u a t i o n , P r o c e e d i n g s o f t h e D e f e n s e Research G r o u p Panel V I I I Workshop " A p p l i c a t i o n s o f System Ergonomics t o Weapon System Developnent," Shrivenham, E n g l a n d ( 1 9 8 4 ) C/5-C/20.
El41
Eggemeier, F.T., P r o p e r t i e s of w o r k l o a d assessment t e c h n i q u e s , i n : Hancock and N. M e s h k a t i ( e d s . ) , Human M e n t a l W o r k l o a d (Amsterdam, The N e t h e r l a n d s , E l s e v i e r , 1987). P.
El51
Eggemeier, F.T. and A m e l l , J.R., Visual p r o b a b i l i t y monitoring: E f f e c t s o f d i s p l a y l o a d and s i g n a l d i s c r i m i n a b i l i t y P r o c e e d i n s o f t h e Human F a c t o r s S o c i e t y T h i r t i e t h Annual M e e t i n g (i9-
[161
The e f f e c t o f Eggemeier, F.T., M e l v i l l e , B.E., and C r a b t r e e , M.S., i n t e r v e n i n g t a s k p e r f o r m a n c e on s u b j e c t i v e w o r k l o a d r a t i n g s , Proc e e d i n g s o f t h e Human F a c t o r s S o c i e t y T w e n t y - E i g h t h Annual M e e t i n g ( 1 9 8 4 ).
[17]
Eggemeier, F.T., C r a b t r e e , M.S., and L a P o i n t e , P.A., The e f f e c t o f d e l a y e d r e p o r t on s u b j e c t i v e r a t i n g s o f m e n t a l w o r k l o a d , P r o c e e d i n s o f t h e Human F a c t o r s S o c i e t y Twenty-Seventh Annual M e e t 4
-
[I81
Gopher, D. and Donchin, E., W o r k l o a d -- An e x a m i n a t i o n o f t h e conK. R. B o f f , L. Kaufman, and J. P. Thomas (eds.), cept, i n : Handbook o f p e r c e p t i o n and human performance, V o l 2: Cognitive P r o c e s s e s and P e r f o r m a n c e (New York, W i l e y I n t e r s c i e n c e , 1986).
[19]
H a s k e l l , B. and R e i d , G.B., An i n v e s t i g a t i o n o f t h e s u b j e c t i v e p e r c e p t i o n o f w o r k l o a d and p e r f o r m a n c e i n l o w - t i m e p r i v a t e p i l o t s , A v i a t i o n Space and E n v i r o n m e n t a l M e d i c i n e , ( i n p r e s s , 1985).
[20]
Haworth, L.A., B i v e n s , C.C., S h i v e l y , R.J., and Delgado, D., Advanced c o c k p i t and c o n t r o l c o n f i g u r a t i o n s f o r s i n g l e p i l o t h e l i c o p t e r - n a p - o f - t h e - e a r t h f l i g h t , Paper p r e s e n t e d a t t h e A m e r i c a n H e l i c o p t e r S o c i e t y F o r t y - T h i r d Annual Forum and T e c h n o l o g y D i s p l a y (1987).
[21]
H o l d e r , O., D i e axiome d e r Q u a n t i t a t und d i e L e h r e vom h i s s , B e r i c h t e uber d i e Verhandlungen d e r K o n i g l i c h Sachsischen Gesellens c h a f t d e r W i s s e n s c h a f t e n zu L e i p z i g , M a t h e m a t i s c h P h y s i s c h e C1 a s s e 53 ( 1 9 0 1 ) 1-64.
[22]
Johannsen, G., Moray, N., Pew, R., Rasmussen, J., Sanders, A., and Wickens, C., F i n a l r e p o r t o f t h e e x p e r i m e n t a l p s y c h o l o g y group, in: N. Moray (ed.), M e n t a l Workload: I t s Theory and Measurement (New York, Plenum P r e s s , 1979).
[23]
R.M., Pairwise nonmetric Johnson, P s y c h o m e t r i k a 38 ( 1 9 7 3 ) 11-18.
[24]
K a n t o w i t z , B.H., Channels and s t a q e s i n human i n f o r m a t i o n p r o c e s s i n g : - A l i m i t e d a n a l y s i s o f t h e o r y and m e t h o d o l o g y , J o u r n a i o f M a t h e m a t i c a l P s y c h o l o c y 2 9 ( 1 9 8 5 ) 135-174.
multidimensional
scaling,
G.B. Reid and T.E. Nygren Krantz. D.H., C o n j o i n t measurement: The Luce-Tukey axiomat z a t i o n and some e x t e n s i o n s , J o u r n a l o f Mathematical Psychology 1 (1964) 248-277. Krantz. D.H., Luce, R.D., Suppes, P., and Tversky, A., o f Measurement, Vol. 1, (New York: Academic Press 1971).
Faun
ations
K r a n t z , D.H. and Tversky, - . A.. . C o n j o i n t measurement a n a l y s i s o f comp o s i t i o n r u l e s i n psychology, - P s y c h o l o g i c a l Review- 78 (1971) 151-169. K r u s k a l , J.B., A n a l y s i s o f f a c t o r i a l experiments by e s t i m a t i n q monotone t r a n s f o r m a t i o n s o f t h e data, J o u r n a l o f t h e Royal S t a t i s t i : c a l S o c i e t y , S e r i e s B, 2 7 (1965) 251-263. and Tukey, J.W., Simultaneous c o n j o i n t measurement: A Luce, R.D. new t y p e o f fundamental -measurement, J o u r n a l o f Mathematical Psychology 1 (1964) 1-27. Moray, N. Workload: 979).
(ed.), Models and measures o f mental workload, i n : Mental I t s Theory and Measurement (New York, Plenum Press,
Navon, D. and Gopher, D., On t h e e c o n o w o f t h e human p r o c e s s i n g system, P s y c h o l o g i c a l Review 86 (1979) 214-255. Nickerson, C.A. and McClelland, G.B., S c a l i n g d i s t o r t i o n i n numeric a l c o n j o i n t measurement, A p p l i e d P s y c h o l o g i c a l Measurement 8 (1984) 183-198. Norman, D. and Bobrow, D . , On d a t a l i m i t e d and r e s o u r c e l i m i t e d processing, Journal o f C o g n i t i v e Psychology 7 (1975) 44-60. N o t e s t i n e , J.C., S u b j e c t i v e workload assessment i n a p r o b a b i l i t y m o n i t o r i n g t a s k and t h e e f f e c t o f delayed r a t i n g s , Proceedin s o f t h e Human F a c t o r s S o c i e t y Twenty-Eighth Annual M e
-
Nygren, T.E., An e x a m i n a t i o n o f c o n d i t i o n a l v i o l a t i o n s o f axioms f o r a d d i t i v e c o n j o i n t measurement, A p p l i e d P s y c h o l o g i c a l Measurement 9 (1985) 249-264.
A two s t a g e a l g o r i t h m f o r a s s e s s i n g v i o l a t i o n s o f Nygren, T.E., additivity via axiomatic and numerical conjoint analysis, Psychometrika, 51 (1986) 483-491. O'Donnell, R.D. and Eggemeier, F.T., Workload assessment methodology, i n : K.R. B o f f , L. Kaufman, and J.P. Thomas (eds.) Handbook o f r c e p t i o n and human performance, Val. 2: C o g n i t i v e Processes and rformance (New York, W i l e y I n t e r s c i e n c e , 1986). Ossard, G., A m a l b e r t i , R., and Poyot, G., E v a l u a t i o n de l a charge de t r a v a i l du p i l o t e i n d u i t e par un systeme d'arme guide l a s e r , ( M i n i s t e r e de l a Defense: C e n t r e d'Etudes e t de Recherches de Medecine Aerospatiale, L a b o r a t o i r e d'Etudes M e d i c o p h y s i o l o g i q u e s 16/330, 1987).
The Subjective Workload Assessment Technique
217
[39]
Potter, S.S., S u b j e c t i v e workload assessment t e c h n i q u e (SWAT) s u b s c a l e s e n s i t i v i t y t o v a r i a t i o n s i n t a s k demand and p r e s e n t a t i o n r a t e , u n p u b l i s h e d masters t h e s i s , W r i g h t S t a t e U n i v e r s i t y , Dayton, Ohio (1986).
[40]
R e l a t i v e c o n t r i b u t i o n s o f SWAT dimenP o t t e r , S.S. and Acton, W.H., s i o n s t o o v e r a l l s u b j e c t i v e workload r a t i n g s , Proceedings o f t h e T h i r d Symposium on A v i a t i o n Psychology, (Columbus, Ohio, Ohio S t a t e I n i v e r s i t y , 1985) 231-238.
[41]
Reid, G.B., Eggemeier, F.T., and Nygren, T.E., An i n d i v i d u a l d i f f e r ences approach t o SWAT s c a l e d e v e l o p e n t , Proceedings o f t h e Human F a c t o r s S o c i e t y Twenty-Sixth Annual M e e t i n g (1982) 639-642. Reid, G.B., P o t t e r , S.S., and B r e s s l e r , J.R., U s e r ' s guide f o r t h e s u b j e c t i v e workload assessment t e c h n i q u e (SWAT), H a r r y G. Armstrong Aerospace Medical Research L a b o r a t o r y Technical Report, (AAMRL-TR8 7 - i n process), W r i g h t - P a t t e r s o n A i r Force Base, Ohio (1987). Shingledecker, C.A., and Eggemeier, F.T., Application o f Reid, G.B., c o n j o i n t measurement t o workload s c a l e d e v e l o p e n t , Proceedin s o f t h e Human F a c t o r s S o c i e t y T w e n t y - F i f t h Annualj - 1 1
-
Reid, G.B., Shingledecker, C.A., Nygren, T.E., and Eggemeier, F.T., Developnent o f m u l t i d i m e n s i o n a l s u b j e c t i v e measures o f workload. Proceedings o f t h e I E E E I n t e r n a t i o n a l - Conference on C y b e r n e t i c s and S o c i e t y (1981) 403-406. S c h i c k , F.V. and Hahn, R.L., The use o f s u b j e c t i v e workload assessment t e c h n i q u e i n a complex f l i g h t t a s k , A d v i s o r y Group f o r Aerospace Research and Developnent (AGARD), (AGARD-AG-282), (1987) 37-41. and G i l l i l a n d , K., Evaluation o f t h e c r i t e r i o n task Schlegel, R.E. set, (AAMRL-TR-87-in press), W r i g h t - P a t t e r s o n A i r Force Base, Ohio: A i r Force Aerospace Medical Research L a b o r a t o r y (1987). Shingledecker, C.A., A t a s k b a t t e r y f o r a p p l i e d human performance assessment research, A i r Force Aerospace Medical Research L a b o r a t o r y T e c h n i c a l Report, AFAMRL-TR-84-071 (November 1984). Crabtree, M.S., and Acton, W.H. , S t a n d a r d i z e d Shingledecker, C.A., t e s t s f o r t h e e v a l u a t i o n and c l a s s i f i c a t i o n o f workload m e t r i c s , Proceedings o f t h e Human F a c t o r s S o c i e t y Twenty-Sixth Annual M e e t i n g (1982) 648-651. S i n g l e t o n , W.T., Fox, J.C. and W h i t f i e l d , D. (eds.), Man a t Work (London, T a y l o r and F r a n c i s , 1973).
Measurement o f
and P u r v i s , B.D., 8-52 w a r t i m e m i s s i o n s i m u l a t i o n : S k e l l y , J.J. S c i e n t i f i c p r e c i s i o n i n workload assessment, Proceedings o f t h e 1985 A i r F o r c e Conference on Technology i n T r a i n i n g and t d u c a t i o n ( T l l t ) , 71985) 105-109.
218
G. B. Reid and T.E. Nygren
[51]
T h i e s s e n , M.S., Lay, J.E., and S t e r n , J.A., Neuropsychological workl o a d t e s t b a t t e r y v a l i d a t i o n s t u d y , H a r r y G. A r m s t r o n g Aerospace Medical Research L a b o r a t o r y T e c h n i c a l Re p o r t , (AAMRL-TR-87-i n p r e s s ) , Wri g h t - P a t t e r s o n Air F o r c e Base, O h i o ( 1 9 8 7 ) .
[52]
A g e n e r a l t h e o r y o f p o l y n o m i a l c o n j o i n t measurement, T v e r s k y , A,, J o u r n a l o f M a t h e m a t i c a l P s y c h o l o g y 4 ( 1 9 6 7 ) 1-20.
C.D., The s t r u c t u r e o f attentional 1 5 3 1 Wickens, R. N i c k e r s o n (ed.), A t t e n t i o n and P e r f o r m a n c e V I I I , J e r s e y , E r l b a u m P r e s s , 1980).
resources, (Hillsdale,
in: New
[541
P r o c e s s i n g r e s o u r c e s i n a t t e n t i o n , i n R. Parasuraman Wickens, C.D., and R. D a v i e s ( e d s . ) , V a r i e t i e s o f A t t e n t i o n , (New York, Academic P r e s s , 1984).
[55]
and W i e r w i l l e , W.W., B e h a v i o r a l measures o f a i r c r e w W i l l i g e s , R.C. m e n t a l w o r k l o a d , Human F a c t o r s 2 1 ( 1 9 7 9 ) 549-574.
[56]
Measurement o f O p e r a t o r W o r k l o a d W i l s o n , G.F. and O ' D o n n e l l , R.D., With t h e Neuropsychological Workload Test B a t t e r y , i n : P. Hancock Human M e n t a l W o r k l o a d (Amsterdam, The and N. M e s h k a t i (eds.), N e t h e r 1 ands, E l s e v i e r , 1987).
HUMAN MENTAL WORKLOAD P.A.Hancock and N. Meshkati (Editors)
Elsevier Science Publishers B.V. (North-Holland),1988
219
T H E COGNITIVE PSYCHOLOGY O F SUBJECTIVE MENTAL WORKLOAD Michael A. Vidulich Aerospace Human Fact.ors Research Division NASA - Ames Research Center Moffett Field, California U.S.A. The trend toward automated systems has created a need for evaluating mental workload in environments with little measurable performance. Subjective workload assessment is reviewed in terms of its suitability for such evaluations. The results reviewed suggest t h a t subjective assessment, as currently prarticed, can provide a valid assessment of the overall workload inflicted on an operator’s working memory, b u t is relatively insensitive t o demands outside t h a t component of the human information processing system. Also, performing multiple tasks concurrently seems to render subjective workload assessments somewhat insensitive t o changes in just one of t h e tasks. INTRODUCTION Today, there is a trend for the operators of human-machine systems t o become more supervisors or monitors as opposed t o an active controllers. This is the natural outconie of the application of sophisticated forms of automation to take over well-defined, repetitive tasks. A side-efferi of this trend is that the measurable performance of the operators is severely diminished. Consequently, there is an increasing need for assessing a n operator’s mental workload independent of performance. Currently, the most common method used to evaluate mental workload is subjective assessment. There are probably two main causes for the popularity of this technique: First, the face validity of the technique is high. If a task inflicts high workload on an operator, then it is expected t h a t the operator performing the task will feel loaded and be able to report it. Second, i n operational environments, workload assessments must often be collected and analyzed very quickly if they are t o be utilized in the design cycle of a new system. While performance or physiological reactions often require sophisticated equipment and analyses t h a t take considerable time to acquire and interpret, assessing subjective workload is quick and easy. However, suspicion seems to exist in regard to subjective workload assessments. T o some degree this suspicion might be a hold-over from behaviorism. For a substantial portion of this century, it was unfashionable for American psychologists t o deal with any d a t a t h a t could be considered introspective in origin. A more specific cause of t h e suspicion might be t h e disconcerting demonstrations of differences between subjective workload assessments and objective performance. There have been a number of cases where subjective workload assessments predicted different trends t h a n were demonstrated by concurrently collected performance measures (for example, Derrick, 1981; Wickens and Yeh, 1983). Since excessive workload is assumed to induce s u b o p t i m a l performance, such findings are particularly disconcerting in t h e operational environments where subjective workload is intended t o serve
220
M.A. Vidulich
as an indicator of the potential performance of compet,ing system designs. To properly use subjective workload assessments for system evaluation it is necessary t o understand what factors determine the relationship between subjective workload and objective performance.
The research reviewed in the present paper adopted an approach that has been called "processing-characteristic research" (Vidulich and Wickens, 1983). This approach was inspired by t h e work of Ericsson and Simon (1980). In response t o critical reviews of t h e value of verbal reports as d a t a for cognitive theory (e.g., Nisbett and Wilson, 1977), Ericsson and Simon proposed a theory of verbal reporb based on human information processing. Based on an extensive literature review, they suggested that subjects would be able t o accurately report or1 information heeded in working memory, b u t information otherwise processed would be unavailable for report. In the present paper, subjective workload assessments are viewed as essentially verbal reports of the level of information load experienced when performing a task. Applying the Ericsson and Simon approach in the domain of subjective workload assessment, the following research hypothesis will be examined below. When overloads in the information processing system occur in subsystems t h a t the performer is conscious of, then t h e performer's subjective assessments will be based on this experience and in good agreement with performance quality. On the other hand, when overloads occur in subsystems that are not well represented consciously, then subjective workload assessments will be based on an impromptu analysis of the task by the performer and may digress considerably from performance quality. When subjective assessments of workload and performance quality indicate different trends, the two measures are said t o dissoriate. The remainder of t h e paper is devoted t o an examination of some research on causes of dissociation. DISSOCIATION IN SINGLE-TASK TRACKING
A compensatory tracking task was used to study the interaction between the dominant stage of processing required for a task's performance and t h e validity of subjective workload assessment (Vidulich and Wickens, 1984). The difficulty of t h e tracking task was manipulated in one of two ways: the order of control or the bandwidth of t h e forcing function. Previous research had demonstrated t h a t these two difficulty manipulations have different effects on t h e information processing load of subjects. Increasing a tracking task's control order from a velocity (first-order) system t o an acceleration (second-order) system was found to cause a disproportionately high increase in demand for perceptual/central resources relative to the increase in the demand for response execution resources (Wickens, Gill, Kramer, Ross, and Donchin, 1981). This finding appeared t o result from a need t o maintain a more complex cognitive model of t h e tracking task active in working memory. In contrast, increasing the bandwidth of t h e forcing function of the tracking task primarily increased the deman.d for response execution resources (Wickens and Derrick, 1981). T h e higher bandwidth required t h a t responses be made more often, b u t there was n o need for a more complex cognitive model to generate these responses. Both increasing bandwidth and increasing control order were found t o raise the difficulty of a tracking task as indicated by t h e root mean square error (RMSE)of the tracking task. However, since increasing the order of control haa i t s major impact on t h e level of processing in working memory it was expected t h a t this would be well represented consciously and would, therefore, be well represented in subjective workload ratings. O n the other hand, increasing the bandwidth has its major effect outside of working memory and would,
Cognitive Psychology of Subjective Mental Workload
22 I
therefore, be expected t o be more poorly represented consciously and less likely t o generate workload ratings closely associated with performance. (The debate concerning t h e relatively poor quality of t h e conscious representation of movement has a fairly long history. See Judd (1905) for an early discussion of t h e topic.) It was thus hypothesized t h a t the conscious representations of t h r two manipulations would be different, and that the difference would influence the validity of the subjective workload assessments relative to the tracking performance. This hypothesis was tested by having nine subjects perform and rate the workload of six tracking tasks. A comparative rating technique was used. The tracking task with t h e lowest bandwidth (0.3 Hz) and order of control (first order) was selected as t h e standard task and arbitrarily assigned a workload rating of 10. Subjects were given a 15-sec exposure to this standard followed by a 2-mi11 test tracking trial. T h e test trarking trial was first order control paired with any one of four levels of bandwidth (0.3 Hz, 0.4 Hz, 0.5 Hz, or 0.6 Hz) or 0.3 Hz bandwridth paired with any of three levels of control order 11.0 (pure velocity), 1.5 (a linear cornbination of velocity and acceleration control), or 2.0 (pure acreleration)]. The workload of all six trackirig conditions were rated relative to the 15-ser exposure t o the standard, including a "catch" trial of the standard compared t o itself. T h e subjerts assigned a workload rating relative to the standard's arbitrary value of 10 (twice as hard would be 20. half as hard would be 5. e t r . ) . More detail concerning t h e design and procedure of this experiment is available in Vidulirh arid Wickens (1984). 'To evaluate thc relationship between performance and subjective workload assessment, twu simple linear regression analyses using the mean (collapsed over blocks of trials) workload rating as the predictor and mean RMSE as the criterion were performed. One regression was performed on t h e order manipulation d a t a and t h e other regression was performed on the bandwidth manipulation data. Scatter plots of these d a t a along with the least squares regression lines are displayed in Figure 1. The correlations between the ratings and RMSE were ,860for t h e order manipulation d a t a and .409 for the bandwidth manipulation data. Both of these correlations were statistically significant (t(25) = 8.42, p < 0.01; t(34) = 2.62, p < 0.05, respectively). In both cases, higher workload ratings were associated with worse performance. However, the correlation was significantly stronger for the order manipulation than for the bandwidth manipulation (Fisher's Z = 3.198, p < 0.01).
Consequently, the results of this first experiment were interpreted as supporting t h e hypothesis t h a t the validity of subjective workload assessments would be increased when t h e effect of the dificulty manipulation was more strongly represented in perceptual/central processing than in response execution processing. DISSOCIATION IN DUAL-TASK ENVIRONMENTS In many real-world enuironments (e.g., aviation) the operators must perform multiple tasks simultaneously. It is often necessary t o evaluate the impact of different configurations of the various tasks. A parallel to this in the laboratory would be the dual-task paradigm. In t h e following two experiments the major theme was the evaluation of subjective workload assessments as a tool for assessing t h e impact of various configurations of one or both tasks of a dual-task. DUAL-TASK EXPERIMENT 1. T h e first dual-task experiment focused on t h e effects of automaticity upon subjective assessment (Vidulich and Wickens, 1983; 1985). Ericeson and Simon (1980) suggested t h a t automatic processing has poor phenomenal representation
222
M.A. Vidulich .700
-
(a) ORDER MANIPULATION
0 w
v)
0 8
0
0
0
f
.loo0
0 0& 10
1
I
20 30 MEAN RATING (b) BANDWIDTH MANIPULATION
I
40
Figure 1 Sratter-plots of workload ratings and RMSE performance for both order (a) and bandwidth (b) manipulations with best least-squares regression lines plotted.
and is therefore not subjectively evaluated arrurately. This was tested with a Sternberg memory search task with two types of target/distractor mappings. lnconsistent target/distractor mappings involved a set of six letters (IIBI1,"C", "O","E", "V", and "I") t h a t could be either targets (memory set items) or distractors (non-members of the memory set) on successive trials. In contrast, the consistent target/distractor mapping used a unique set of six letters t h a t was rigidly divided into targets and distractors. T h e consistent trials always used t h e same two letters as targets (ItAlta n d ItNIq) and the othet four letters as distractors ("Kq', "S", I1Prt,and l'J"). Previous research has shown conclusively t h a t , with practice, a subject's performance improved much more in t h e consistent condition than in the inconsistent condition (e.g., Schneider and Shiffrin. 1977). T h e consistent training is said t o develop behavioral automaticity.
Cognitive Psychology of Subjective Mental Workload
723
T h e Sternberg memory searrh task in this experiment, uti1izc.d a display of t w o probe letters presented side 11y side. If either letter was a target. the subject pressed a buttori indicating its position. If neither was a target, the subject. pressed a third button. Subjects were never shown two targets simultaneously. Both the consistent and {.he inconsistent mapping memory searrh tasks had three levels: ( 1 ) The standard task i n which 40 stimuli pairs were presented with a n average inter-stimuli interval (IS]) of 1.5 sec. (2) The perceptually-loaded ta\k in which t h e stimuli appeared behind a mask of cross-hatched lines. (3) The rate-rhange task in which o n l y 20 stimuli pairs appeared in a trial with an average IS1 of 3.0 sec. Three Ic.vels of the two mapping ronsistencies generates a tot,al of six different lask conditions. l'hese s i x ronditiolls w e r e ~ a r hperformed as single-tasks and concurrently with a t.wo-dimensional first-order trarking task. In addition, a monetary bonus manipulation was iiirorporated. During t t i t s rxperimenhl sessions. each single-task rondition was perforrned twire; oncv w i t h a t)onus available for excellent performance, once without. Every sessions had a bonus available, but for half of the trials dual-task trial of the ~~xperirnental t,he memory search task w a s the critical determinate for the bonus and for the other half of t h r trials it was t h e tracking task. A set of bipolar rating scales were usrd to assess subjective workload of the tasks. For niore drtail on the experimental design o f this experiment, see Vidulirh and Wickms (1983, 1985).
To detect dissociations, the z-srore analysis procedure developed by Wickens and Yeh (1982) was employed. First, Performance scores and ratings were transformed t o z-scores on a subject-by-subject basis. These two sets of z-scores were then entered into an analysis of variance (ANOVA) as two levels of a n independent variable called "type of measure.'' The logic of the z-score analysis technique is t h a t the means and standard deviations of the different measures a r e made equal statistically (0 and 1 respectively) without changing the ordering of the different conditions within each type of measure. Therefore, any interaction involving the type of measure variable should indicate a different pattern of ordering; a dissociation. For each subject, z-scores were generated for the two most general rating scales used in the experiment (Overall Workload and Task Difficulty) and one performance score (correct reaction time ( R T ) t o target-present trials). T h e z-score d a t a were subjected t o four five-way ANOVAs (Number of Tasks x Consistency x Perceptually-Loaded or Rate-Change x Pay x Type of Measure). For both the perceptually-loaded and t h e rate-change manipulations, two ANOVAs were performed: one with t h e Overall Workload z-scores and the other with the Task Difficulty z-scores. A dissociation related t o the effect of multiple tasks on the validity of subjective workload assessments was detected. The Number of Tasks x Consistency x Type of Measure interactions were significant for the Task Difficulty scale for both the perceptually-loaded (F(1,39) = 20.8, p 0.0001) and the rate-change (F(1,39) = 11.2, p < 0.002). This interaction was also significant in the Overall Workload scale of t h e perceptually-loaded analysis (F(1,39) = 5.0, p < 0.05). One of these interactions is illustrated in Figure 2. On the left half of Figure 2, it is shown t h a t the effects of t h e consistency variable on both performance and subjective workload were essentially identical in t h e single-task trials. Both R T s and workload ratings were smaller in t h e consistent condition. However, a dissociation between the measures was apparent in the dual-task d a t a displayed on the right-hand portion of Figure 2. The R T d a t a still indicated an advantage for the consistent mapping, but the subjective workload ratings did not. The dual-task tracking RMSE d a t a did not show any effect of the consisbency manipulation. Apparently, the presence of the tracking task overwhelmed the subjective distinction between memory search consistency conditions.
224
M.A. Vidulich 1.o
DUAL-TASK
SING L E-TASK
/
.5 /
/
/
P
1
/
W
a
; N
z
/ O
5I -.5 TASK DIFFICULTY RATING -0- REACTION TIME ZONSISTENT -1.0
I
CONSISTENT I
I
INCONSISTENT
I
INCONSISTENT
CONSISTENCY
Figure 2 An example of the Number of Tasks x Consistency x T y p r of Measure interartion. This one is taken from the perceptual-load z-score analysis.
DUAL-TASK EXPERIMENT 2. A third experiment further investigated the question o f dissociation (Vidulich and Tsang. 1985a; 1985b). Unlike the previous two experiments i n which the subjective assessment techniques were rather spontaneous constructions, t h e third experiment used t,no assessment techniques that were previously validated by other researrhers. The terhniques used by the two workload groups were: ( 1 ) The Subjcctivc I\ orkload .Assessment Technique ( S W A T ) developed by the A i r Force Aerospace Medical Research Laboratory (Reid. Shingledecker. and Eggerneier, 1981) and ( 2 ) The NASA-Bipolar trcliriiqur dcveloprd at NASA-Anits Research Center (Hart, Battiste, and Lester, 1984). Subject background was also a variable. half of each workload group were college students arid half were pilots. The experiment used cwo basic tasks. T h e first task was a tracking task with two different types of difficulty dynamics used; either constant bandwidth throughout the trial or bandwidth t h a t changed dynamically within a trial. T h e second task was a spatial transformation task in which subjects were presented with one of the eight major compass directions (north, northeast, east, etc.) and were instructed t o respond with the next position in a clockwise direction. T h e stimuli for the transformation task were presented either visually or auditorily and subjects responded either manually or vocally. So, for the transformation task, there were four input/output (I/O) configurations performed by each subject: visual/manual (VM). visual/speech (VS), auditory/manual ( A M ) , and auditory/speech (AS). Subjects performed t h e tracking task and the transformation t a s k both alone as single-tasks and together a s dual-tasks. More detail concerning this experiment is available in Vidulich and Tsang (1985a, 1985b).
Cognitive Psychology of Subjective Mental Workload
22s
T h e d a t a analysis strategy used in analyzing the dual transformation-tracking trials was as follows: Multivariate analysis of variance (MANOVAs) were performed on t h e performance measures; a comparable ANOVA was performed on the subjective workload ratings data. T h e results of the separate analyses were then compared and contrasted. There were two sets of performance d a t a t h a t were analyzed by independent sixvariable MANOVAs (Session x Workload Group x Dynamics x Input x Output x Background). The raw RMSE of t h e tracking task, and t h e percent error and percent omission of the transformation task were analyzed in one MANOVA. The RMSE decrement, transformation RT decrement, and percent error decrement were analyzed in a second MANOVA. The raw performance MANOVA was expected t o be sensitive t o factors t h a t influenced task difficulty related to differences in the singletasks or t o the results of dual-task interference. However, since the decrement scores were generated by subtracting the corresponding single-task scores frorn the dual-task scores, the decrement MANOVA was expected t o be especially sensitive t o factors that influence dual-task interference. T h e most potent effect in the raw performance MANOVA was t h e input modality of the transformation task (F(1,20)= 27.84, p < 0.01). A visual input advantage was found in the percent error (2.1% vs. 7.2%), the percent omission (4.1% vs. 7.2%), and the RMSE (.411vs. ,418). There was also an interaction between the input modality of the transformation task and t h e difficulty of the tracking task ( F ( 1 , Z O ) = 5.46, p < 0.05). All three dependent measures showed a stronger visual input advantage when the tracking task’s difficulty varied during t h e trial than when i t w a s constant. There was also a main effect of the transformation task’s output modality in the raw performance MANOVA (F(1,20) = 6.11, p < 0.05). The percent omission was larger (6.9% vs. 4.4%), but the RMSE was smaller (.405 vs. .422)for speech than for manual output. The speech advantage was only minor in t h e percent error (4.4% vs. 5.0%). Thus, although speech output generally degraded transformation task performance (by causing more omissions), it was less intrusive to the manual control of t h e tracking task. T h e decrement scores MANOVA detected no effect of t h e transformation task’s input modality. However, there was a significant effect of the transformation task’s output modality ( F (1,20) = 11.26, p < 0.01). A speech advantage was detected in all three measures: RT decrement (0 nis v s . 31 ms), percent error decrement (-0.5% vs. 1.3%), a n d RMSE decrement (.015 vs. ,031). These results have two i m p o r h n t implications. First, the fact t h a t the effect of input modality was significant in the raw performance, b u t not in t h e decrement scores, implies a difference in task difficulty a t t h e single-task level. This was supported by a similar effect detected in t h e single-task data. Second, t h a t the output effect was significant in the raw performance analysis, and even stronger in the decrement scores, implies t h a t producing manual responses for t h e transformation task was more disruptive t o t h e manual control of the tracking task than was producing speech responses. T h e ratings d a t a for both subjective assessment groups were analyzed with a comparable ANOVA. The main effect of the transformation task’s input modality was significant (F(1,20)= 44.94, p < 0.01), with an average rating of 45 for t h e visual inputs, and 56 for the auditory inputs. T h e output modality of t h e transformation task was involved in 2 two-way interactions. First, a significant interaction between the transformation task’s input and output modalities (F(1,20) = 4.49, p < 0.05) showed t h a t speech output was
M.A. Vidulich
226
rated as easier than manual output when the input was visual ( 4 4 vs. 47), b u t harder when the input was audiLory (57 vs. 54). Second, a significant interaction between assessment group and the transformation task’s output (F(1,20) = 4.71, p < 0.05) showed t h a t speech output was rated as easier than the manual by the NASA-Bipolar group (46 vs. 48), but harder by the S W A T group (56 vs. 53). From Figure 3, a relatively high S W A T rating for the AS configuration appears t o be responsible for both of these interactions. Sinre the S W A T ratings and t h r NASA-Bipolar ratings mirror each other for the other I/O conditions, there is no obvious explanation for the difference in t h e AS condition. The results of this experiment might be inkrpreted a s further support for t h e findings of the first experimrnt; subjective workload assessments a r e sensitive t o manipulations that influence t h e perrept.ual/rentral processing demands and relatively insensitive to manipula-
tions that influence response execution demands. On t h e other hand, these results might reflect a lack of sensitivity t o the phenomena of dual-task interference. Further research will be required t o drtermine whether one or both of these mechanisms account for the lack of sensitivity of the ratings t o t h e output modality manipulation. However, regardless of which mechanism is responsible. these results definitely indicate that, subjective workload assessnterits are differentially sensitive t o manipulations that affect perforntancr.
NASA-BIPOLAR
0 SWAT 2c
1
VM
I
I
vs
AM TRANSFORMATION TASK, I/O
Figure 3 NASA-Bipolar and Swat workload ratings transformation-tracking configurations.
across
AS
the
four
1/0
Cognitive Psychology of Subjective Mental Workload
221
DISSOCIATION CAUSED BY MOTIVATIONAL DIFFERENCES One other aspect of the Vidulich and Wickens (1983, 1985) study is relevant to the present discussion: namely, the effert. of motivation on the dissociation between subjective workload assessments and performance. Wickens and Yeh (1983) predicted t h a t , although increased motivation would lead t o better performance, it would d o so by encouraging subjects t o expend grcater effort, thereby increasing t h e experienced workload. The prediction was confirmed in the singletask tracking d a t a of t h e Vidulich and Wickens (1983, 1985) experiment. T h e single-task tracking trials with the bonus available exhibited better RMSE performance (t(39) = 23.6, p < O.OOOl), but was also assessed as having a higher task difficulty (439) = -1.7, p < 0.05), than was the corresponding trial without a n available bonus. The implications of this result a r e important when ext.ended t o operational environments. If two task configurations were equally difficult to perforrri, but encouraged different levels of effort; then performance measures might indicate a spurious advantage for the configuration that erirouraged greater effort. In such a setting. subjective workload assessments could providr the system designer with a better appreciation of the trade-offs involved. GF:NEKAL DISCUSSION In general, the results of t h e experiments reviewed here argue for an extension of t h e Erirsson and Sirnori (1980) viewpoint into t h r domain of subjective workload assessment. Subjrrtive assessmer1t.s ran t a p useful information in many circumstances, but they are not a
rornplrtr answer to every workload question. Subjective workload assessments seem t o be particularly sensitive, t o the processing load that occurs in what is usually referred to, in most informatioil processing models, as working memory or primary memory. They appear t o be considerably less sensitive t o differences in load that occur in response execution processing. Fortunatelj, a major need for workload assessment is currently associated with evaluating t h e impact of automation on operators serving primarily a s system monitors. For such activities i t is likely t h a t t h e heaviest demands are placed on the operators’ decision making capabilities (the working memory system in particular) and t h a t relatively minor demands are placed on their response execution resources. In this setting subjective workload assessments should be a valuable source of information. However, t h e results reviewed in the present. paper also suggest t h a t careful consideration t o experimental design is crucial. In t h e second experiment the presence of a timeshared tracking task completely obscured the strong effect of the automaticity of t h e memory search task. In t h e third experiment, the ratings failed t o detect t h e different levels of interference caused by speech as opposed t o manual controls or the effect of a tracking task with dynamically varying difficulty. All of these differences may be somewhat related t o a general difficulty t h a t subjects experienced in sorting out phenomena from a number of simultaneous tasks. This could be a problematic deficiency i n operational environments where multiple simultaneous tasks a r e routinely encountered. However, when it is possible, isolating the task of interest as a single-task might increase the sensitivity of subjective workload assessments. In cases where multiple simultaneous tasks are unavoidable, subjective assessments of the type reviewed in the present paper are apparently effective in estimating the overall level of difficulty of the entire task complex, but d o not seem to be capable of assessing thc. more subtle interplay of the time-shared tasks. However. this lack of sensitivity may result more from t h e technique of the current assessment scales than frorn any inherent weakness of subjective assessments. None of the assessment Lerhniques used in the experiments that were reviewed provided the subjects
M,A. Vidulich
228
with a means for differentiating among individual tasks. It remains t o be seen whether subjects, properly instructed and provided with appropriate scales, can make distinctions about the workload imposed by specific task components embedded in a more complex task or concerning the best configuration for a single task ensconced among others. This topic should be addressed in future research. In the meantime, subjective workload assessments, as conducted currently, appear t o be a valid means for assessing the overall workload inflicted upon the working memory of system operators. As such, they are likely to become increasingly important in the evaluation of systems in which the human operator acts primarily as a monitor and decision-maker rather than an active controller. REFERENCES 11 1 Derrick, W . L., T h e relationship between processing resource and subjective dimensions of operator workload, in Proceedings of the Human Factors Society 25th Annual Meeting (Human Factors Society, Santa Monica, California, 1981). 121 Ericsson, I(. A , , and Simon, H. A., Verbal reports as d a t a , Psychological Review 87 (1980) 215-251 131 Hart, S. G., Battiste, V., and Lester, P. T., Popcorn: A supervisory control simulation for workload and performance research, in Proceedings of the Twentieth Annual Conference on Manual Control (NASA-CP-2341, Washington, D.C., 1984).
H., Movement and consciousness, Psychological Review Monograph Supplements 7 (1905) 199-226.
141 J u d d , C.
E., and Wilson, T. D., Telling more than we can know: Verbal reports on mental processes, Psychological Review 84 (1977) 231-259.
151 Nisbett, R.
B., Shingledecker, C. A., a n d Eggemeier, F. T., Application of conjoint analysis measurement to workload scale development, in Proceedings of t h e Human Factors Society 25th Annual Meeting (Human Factors Society, Santa Monica, 1981).
161 Reid, G.
171 Schneider, W., and Shiffrin, R. M., Controlled and automatir human information processing: I. Detection, search, and attention, Psychological Review 84 (1977) 1-66. j8! Vidulich, M. A., and Tsang. P. S., Assessing subjective workload assessment: Comparison of S W A T and the NASA-Bipolar methods, in proceedings of the Human Factors Society 29th Annual Meeting (Human Factors Society, S a n t a Monica, California, 1985a).
191 Vidulich, M. A., and Tsang, P. S., Techniques of subjective workload assessment: A comparison of two methodologies, in Proceedings of the Third Symposium on Aviation Psychology (OSU Aviation Psychology Laboratory, Columbus, Ohio, 1985b).
D., Processing phenomena and the dissociation between subjective and objective workload measures (Engineering-Psychology Laboratory Tech. Rep. EPL-BI-l/ONR-81-1, University of Illinois a t Urbane-Champaign, 1983).
1101 Vidulich, M. A., and Wickens, C.
Cognitive Psychology of Subjective Mental Workload
229
[ I l l Vidulich, M. A,. and Wirkens, C. D.. Subjective workload assessment and voluntary control of effort i n a tracking task, in Proceedings of the Twentieth Annual Conference on Manual Control (NASA-CP-2341, Washington, D.C., 1984).
1121 Vidulirh M. A., arid Wickens, C. D., Causes of dissociation between subjective workload measures and performance: Caveats for t h e use of subjective assessments, in Proceedings of t h e Third Symposium on Aviation Psychology (OSU Aviation Psychology Laboratory, Columbus, Ohio, 1985).
D., and Derrick, W., The processing demands of higher order manual control: Application of additive factor methodology (Engineering-Psychology Laboratory Tech. Rep. EPL-XI-l/ONR-81-1, University of Illinois a t Urbana-Champaign, 1981).
1131 Wickens, C.
1141 Wickens, C. D., Gill, R., Kramer, A., Ross, W., and Donchin, E., The cognitive demands of second order manual control: Applications of t h e event related potential, in Proceedings of t h r Seventeenth Annual Conference on Manual Control ( J P L Publication #81-95. Pasadena. California, 1981). 1151 Wickens, C . D., aild Yeh, Y. Y., The dissociation of subjective ratings and performance, in IEEE 1982 Proceedings of t h e International Conference on Cybernetics and Society (Institute of Electrical and Electronics Engineers, New York, New York, 1982). 1161 Wickens, C. D., arid Yeh, Y. Y., The dissociation between subjective workload and performance: A multiple resource approach, in Proceedings of the Human Factors Society 27th Annual Meeting - Volume 1 (Human Factors Society, Santa Monica, California, 1983).
This Page Intentionally Left Blank
HUMAN MENTAL WORKLOAD
P.A. Hancock and N. Meshkati (Editors) 0 Elsevier Science Publishers B.V. (North-Holland), 1988
INDIVIDUAL
23 1
DIFFERENCES I N SUBJECTIVE ESTIMATES O F WORKLOAD Diane L. Damos Department of Human Factors Institute of Safety and Systems Management Los Angeles, California
U.S.A This paper reviews six studies that examined the relation between established individual differences constructs and the subjective experience of workload. Because of the lack of data, few conclusions are drawn. Suggestions for future investigations are given. INTRODUCTION This chapter presents the results of six experiments that examined the relation between a well-established individual differences construct and the perception of mental workload. It is by necessity briet because so few experiments have been conducted on this topic. As the title states, only studies that used subjective estimates of mental workload are included. This selectivity reflects both my personal bias about the types of measures that are appropriate for reflecting individual differences in workload and the types of studies that been published. Additionally, I restricted the review to data that were published in some easily usable form. Before reviewing the evidence for consistent individual differences in the experience of workload, I want to discuss two problems associated with this topic. First, subjective ratings are verbal data. As such, they have all the limitations associated with verbal data described by Ericsson and Simon (1980). For the purposes of this review, the most important limitation discussed by Ericsson and Simon is that only information that is either in short-term memory at the time of recall or retrievable from long-term memory can be reported. Thus, unattended information or information based on cognitive processes that appear to bypass short-term memory (perceptual processes, motor processes, automatic processes, etc.) can not be rated accurately. These limitations should be borne in mind when evaluating the research presented below. Second, I could locate few test-retest reliability data for any subjective workload technique. Reid, Eggemeier, and Shingledecker (1982) reported limited test-retest reliability data for the Subjective Workload Assessment Technique (SWAT). Yeh and Wickens (1984a. 1984b) reported test-retest reliability data for nine-point attribute scales and for the ratio ratings they developed. Test-retest data are also given for a modified Cooper-Harper scale in Yeh and Wickens (1984b). Unfortunately, none of the studies discussed below used any of these techniques. Because test-retest data are absolutely critical for demonstrating consistent individual differences in the perception of workload, the reader should interpret all of the studies reviewed below with caution; their results may simply reflect chance differences that would vanish with practice.
232
D.L. Damos
With these warnings I present the six studies that have examined the relation between an individual differences construct and subjective estimates of workload. These studies are grouped under three headings: personality traits and behavioral patterns, response strategy, and resource capacity. PERSONALITY TRAITS AND BEHAVIORAL PATTERNS Personality Traits Contrary to my initial expectations, little research examining the relation between mental workload and personality traits has been conducted. Indeed, only two studies were located, both of which examined cognitive complexity (Harvey, Hunt, and Schroder, 1961), a trait that reflects the abstractness of thought. Individuals who score high on tests of cognitive complexity are classiiied as abstract thinkers. Such individuals categorize data along a large number of dimensions and are good at developing creative solutions to problems. Concrete thinkers, individuals who score low on tests of cognitive complexity, tend to categorize data in an "eitherlor" fashion and are relatively poor at developing creative solutions to problems. Robertson (1984) examined the relation between performance on a simulated fire control task, subjective ratings, and cognitive complexity. Twenty subjects completed the experiment. Ten of these subjects had high scores on two tests of cognitive complexity and were classified as abstract thinkers. The other ten subjects had low scores on the two tests and were classified as concrete thinkers. The task required the subject to destroy static targets displayed o n a computer screen by firing a projectile. The subject controlled the path of the projectile by entering coordinates determining the direction and elevation of the projectile at "launch." After each firing, the subject saw either the word "hit" on the screen indicating that the projectile had struck the target or a number representing the distance from the target to the impact point of the projectile. Four levels of difficulty were created by changing the time available to fire at the target and by introducing variability into the path of the projectile. The subject produced subjective estimates of task difficulty (workload) by assuming that the difficulty of a simplified version of the task was 100. The subject then assigned a number to each of the four levels of the experimental task representing its difficulty relative to that of the simplified task. Robertson made two predictions concerning the relative performance of the two groups as a function of task difficulty. She predicted that the abstract thinkers would perform better at the two intermediate difficulty levels than the concrete thinkers. Robertson also predicted that there would be no difference between the two groups at the either the highest or the lowest difficulty levels because the task exceeded individual skill levels at the highest level and was too simple to be motivating at the lowest level. Both of these predictions were confirmed. Robertson also made one prediction about the relation between sub'ective estimates of difficulty and objective task difficulty: objective and subjective di#ficulty would be correlated for the abstract thinkers but not for the concrete thinkers. This prediction also was supported. In a second study Robertson and Meshkati (1985) classified subjects according to their cognitive complexity and decision style (Schroder, Driver, and Streufert, 1967). Sixteen subjects performed the same fire control task described above and rated the workload imposed by the task in the same fashion. Robertson and Meshkati found that concrete thinkers tended to adopt less complex decision styles; abstract thinkers tended to adopt
Subjective Estimates of Workload
233
more complex decision styles. As in the preceding experiment, the subjective difficulty ratings of abstract thinkers increased with increases in objective difficulty. In contrast, the difficulty ratings of the concrete thinkers changed very little as a function of objective difficulty. Behavioral Patterns The only behavioral pattern investigated to date is the Type A coronary-prone behavior pattern (Friedman and Rosenman, 1974). Individuals who have high scores on tests measuring coronary-prone (Type A) behavior are characterized by an extreme sense of time urgency. As a result, they prefer a more rapid work pace and tend to perform better on tasks that have no deadlines than individuals who have low scores on tests of coronary-prone behavior (Type B individuals). Damos and Bloem ( 1985) investigated the relation between performance under singleand dual-task conditions, subjective estimates of workload, and the Type A behavior pattern. They predicted that Type A individuals would perform better under dual-task conditions when the tasks are unpaced than Type B individuals. Damos and Bloem selected nine Type A female and seven Type B female subjects from a larger group of females who had completed a questionnaire examining coronary-prone behavior. All of the subjects completed seven different tasks and four combinations. The seven tasks were time estimation using the production method, memory search (Sternberg, 1969), choice reaction time, delayed choice reaction time, mental arithmetic, letter matching, and a test of visual short-term memory. The four combinations consisted of the visual short-term memory and the mental arithmetic tasks, the time estimation and the memory search tasks, the time estimation and the choice reaction time tasks, and the delayed choice reaction time and the letter matching tasks. All of these tasks were unpaced. Eight bi-polar adjective scales developed at NASA by Hart and her colleagues (Hart, Battiste, and Lester, 1984; Vidulich and Tsang, 1985) were used to obtain multidimensional estimates of workload. These eight scales were Overall Workload (extremely higMextremely low), Task Difficulty (extremely hardextremely easy), Time Pressure (excessively rushedhone), Performance (perfecthotal failure), MentaUSensory Effort (impossiblehone), Frustration Level (totally exasperatedcompletely relaxed), Stress Level (extremely tensekompletely fulfilled), and Fatigue (wide awakelexhausted). The subjects rated each task and the combinations immediately after completion. Only one between-group difference was significant under single-task conditions: Type A subjects performed memory searches almost twice as quickly as Type B subjects. In contrast three of the four task combinations showed at least one significant betweengroup effect. All of these effects represented faster reaction times for Type A subjects than for Type B subjects. Despite the fact that Type A subjects generally performed better than Type B subjects under dual-task conditions, they reported significantly more frustration than the Type Bs. This trend was reversed under single-task conditions; Type Bs reported significantly more frustration than Type As. The complex interaction between performance, perceived workload, and Type NType B behavior pattern also occurred in a similar study by Damos (1985). Twenty Type A and twenty Type B females performed a mental arithmetic task and a voweUconsonant classification task. Each task was practiced alone before the subject performed the combination. Half of the Type A subjects and half of the Type Bs performed unpaced versions of the tasks and their combination. The other half of the subjects performed paced versions. Workload estimates were obtained by using the same eight bi-polar
234
D.L. Damos
adjective scales employed in the previous study. Analyses performed on the single-task data indicated that the Type A subjects responded more slowly to the paced voweVconsonant classification task than Type B subjects but faster than Type Bs to the unpaced version. This result reflects the sense of time urgency that is characteristic of Type A subjects. Under dual-task conditions the Type A subjects responded more quickly on the vowelkonsonant task than Type B subjects regardless of pacing. None of the analyses performed on the eight workload scales showed a significant between-group difference although the analysis of the MentaUSensory Effort scale indicated a significant group by task interaction. The source of this interaction could not be identified using post hoc analyses. Nevertheless, it appeared that Type A subjects experienced less effort than Type B subjects under dual-task conditions and the same or more effort under single-task conditions. RESPONSE STRATEGY Damos (Darnos and Wickens, 1980; Damos, Smist. and Bittner. 1983) developed a method for classifying individuals on the basis of the response strategy they use to perform a discrete task combination. Briefly, this system classifies people into four groups: simultaneous, alternating, massed, and mixed responders. Simultaneous responders emit responses to both tasks of the combination concurrently. Alternating responders strictly alternate responses between the two tasks. Massed responders emit several responses to one task before responding to the other. Mixed responders use a combination of the other three response patterns and represent an "other" category. Damos (1984) required 30 females to perform a delayed choice reaction time task and a letter match task individually and in combination. Subjects also performed a complex monitoring task that required them to count the number of tones of a given frequency during a 60-min period. Low-, medium-, and high-frequency tones were presented in a random-appearing sequence. As soon as a subject counted four tones of a given frequency, she pressed a key and began counting again. In the low difficulty condition subjects counted only one tone. In the high difficulty condition subjects were required to monitor all three tones simultaneously but independently. An older version of the eight bi-polar adjective scales used in Damos (1985) and Damos and Bloem (1985) was used in this study. This version differed from the version discussed earlier in that the Frustration Level scale was omitted and three additional scales were included: Comfort Level (very highlvery low), Achievement Performance (completely satisfiedcompletely dissatisfied), and Skill Required (very muchhone). T h e response strategy groups differed significantly only in their dual- task performance of the delayed choice reaction time task. This effect was caused by a significant difference between the alternating response subjects and the massed response subjects. Analyses performed on the workload ratings indicated that significant between-group differences were found o n two of the scales, Task Difficulty and General Workload. Further analyses revealed that alternating response subjects experienced significantly lower task difficulty and significantly less workload than mixed response subjects. INDIVIDUAL DIFFERENCES I N RESOURCE CAPACITY Estimates of the amount of information processing capacity used to perform a task can be obtained using the primary-secondary task technique. T o use this technique, a subject must first perform the task of interest, the primary task, until performance has
Subjective Estimates of Workload
235
approached asymptote. Then the subject must maintain single-task performance levels on the primary task while performing a secondary task. Individual differences in the resource capacity required by the primary task are inferred from performance differences observed in the secondary task. Bloem and Damos (1985) used the primary-secondary task technique to estimate individual differences in the resources used to perform two different primary tasks. One primary task was a delayed choice reaction time task. This task was performed with a vowelkonsonant classification task. The other primary task was a paired-associate recall task, which was performed with a letter matching task. Workload ratings were obtained using the eight bi-polar adjective scales developed at NASA and used in Damos (1985) and Damos and Bloem (1985). Because individual differences in resource capacity are inferred from secondary-task performance, the relation between performance on the secondary task and the workload ratings are of primary interest. Dual-task performance on the letter matching task did not correlate significantly with the ratings on any of the eight workload scales. Performance on the vowelkonsonant classification task correlated significantly only with the ratings from the Frustration scale and the Performance scale. These correlations indicated that subjects with better dual-task performance on the voweYconsonant task experienced less frustration and were more satisfied with their performance than those with poorer performance. Although these correlations imply that subjects with more residual capacity (as indicated by better secondary-task performance) experienced less workload, only 2 of 16 possible correlations were significant, making any interpretation of these results problematical. DISCUSSION With only six studies available, it is difficult to make any general statements about the relation between individual differences and subjective workload, especially since no general trends are apparent. The reader probably has noted that none of the six studies reviewed above showed strong main effects of group membership; Bloem and Damos (1985) showed the greatest number of significant main effects--2 out of 16. One of the experiments (Darnos, 1985) showed no significant main effects of the group factor. To make matters worse, the two experiments examining the Type A behavior pattern appear to give somewhat contradictory data; in Damos and Bloem (1985) the Type A subjects reported more frustration under dual-task conditions than the Type Bs. In contrast they appeared to experience less mentahensory effort under comparable conditions in Damos (1985) than the Type Bs although in both studies Type A subjects had better dual-task performance than Type B subjects. There are many possible explanations for the general lack of significant results. I will briefly discuss three of the more plausible. First, most of the research conducted to date has used small numbers of subjects. Between-group differences may not have been detected because of a lack of statistical power. Second, the scales that have been used in this research may not be sensitive enough to detect between-group differences. Several other well-developed scales are available and could be used in future research, such as SWAT(Reid, Eggemeier, and Shingledecker, 1982; Reid, 1985) and the modified CooperHarper scale (Casali and Wierwille, 1984). Third, few individual differences variables have been explored. It may be that the variables that have been investigated are just not that important in the perception of workload.
236
D.L. Damos
Despite a somewhat discouraging state of affairs, I do believe that further effort on this topic is warranted. Investigators should examine other well established individual differences constructs. Additionally, more attention should be given to methodological problems, such as statistical power and test-retest reliability. Finally, the research community should accept the possibility that the effects of interest may be complex functions that will require a great deal of effort to identify. REFERENCES Bloem. K., and Darnos, D. (1985). Individual differences in secondary task performance and subjective estimation of workload. Psvcholozical Reports. 56, 3 1 1-322. Casali, J , , and Wierwille, W. On the measurement of pilot perceptual workload: a comparison of assessment techniques addressing sensitivity and intrusion issues. Emonomics, 22, 1033-1050. Damos, D. (1984). Individual differences in multiple-task performance and subjective estimates of workload. Perceotual and Motor Skills, 59, 567-580. Damos, D. (1985). The relation between the Type A behavior pattern, pacing, and subjective workload under single- and dual-task conditions. Human Factors, 27, 675-680. Damos, D.. and Bloem, K. (1985). Type A behavior pattern, multiple-task performance, and subjective estimation of mental workload. Bulletin of the Psvchonomic Society, 23, 53-56. Damos, D., Smist, T., and Bittner, A., Jr. (1983). Individual differences in multiple-task performance as a function of response strategy. Human Factors. 25, 215-226. Damos, D., and Wickens, C.(1980). The identification and transfer of timesharing skill. Acta Psvcholocrica, 46, 15-39.
Ericsson, K., and Simon, H. (1980). Verbal reports as data. Psvchological Review, 87, 2 15-251. Friedman, M., and Rosenman, R. (1974). T m A behavior and vour heart. New York: Knopf. Hart, S., Battiste, V., and Lester, P. Popcorn: A supervisory control simulation for workload and performance research. In: S. Hart and E. Hartzell (Eds.), Proceedings of the Twentieth Annual Conference on Manual Control (pp. 431-453). Moffett Field, California: NASA (NASA-CP- 2341).
-
Harvey, O., Hunt, D., and Schroder, H. (1961). Conceotual svstems and personality oreanization. New York: Wiley. Reid, G. (1985). Current status of the development of the subjective workload assessment technique. In: R. Swezey (Ed.), Proceedines of the Human Factors Societv 29th Annual Meeting (pp.220-223). Santa Monica, California: Human Factors Society.
Subjective Estimates of Workload
237
Reid, G., Eggemeier, F., and Shingledecker, C. ( 1 982). Subjective workload assessment technique. In Frazier, M. and Crombie, R. (Eds.), Proceedings the Workshop on F&ht Testing to Identify Pilot Workload and Pilot Dynamics (pp. 281-288). Edwards Air Force Base, California: Air Force Flight Test Center. Robertson, M. (1984). Personality differences as a moderator of mental workload behavior: mental workload performance and strain reactions as a function of cognitive complexity. In: Alluisi, J., De Croot, S. and Alluisi, E. (Eds.), Proceedings of the Human Factors Society 28th Annual Meetin5 (pp. 690-694). Santa Monica, Cdifornia: Human Factors Society. Robertson, M., and Meshkati, N. (1985). Analysis of the effects of two individual differences classification models on experiencing mental workload of a computer generated task: A new perspective to job design and task analysis. In: Swezey, R. (Ed.), Proceedings of the Human Factors Society 29th Annual Meeting (pp. 178-182). Santa Monica, California: Human Factors Society. Schroder, H., Driver, M., and Streufert, S. (1967). Human information processing. New
York: Holt, Rinehart, and Winston. Sternberg, S. (1969). The discovery of processing stages: method. m a Psvcholoeica, 30, 276-315.
Extensions of Donders’
Vidulich, M. and Tsang, P. (1985). Assessing subjective workload assessment: A comparison of SWAT and the NASA-bipolar methods. In Swezey, R. (Ed.), Proceedings of the Human-Factors ~Society 29th Annual Meeting (pp. 178-182). Santa Monica, California: Human Factors Society.
the dissociation between subjective measures of mental workload and performance (Engineering Psychology Research Laboratory Technical Report EPL-84- 1/NASA-84-1). Urbana-Champaign: University of Illinois.
Yeh, Y.,and Wickens, C. (1984a). & I investigation of
Yeh, Y.,and Wickens, C. (1984b). The dissociation of subiective measures of mental and performance (Engineering Psychology Research Laboratory Technical Report EPL-84-2/NASA-84-2). Urbana-Champaign: University of Illinois.
-workload
This Page Intentionally Left Blank
HUMAN MENTAL WORKLOAD P.A. Hancock and N. Meshkati (Editors) @ Elsevier Science Publishers B.V. (North-Holland), 1988
239
T H E EFFECT OF GENDER AND T I M E OF DAY UPON T H E SUBJECTIVE ESTIMATE O F MENTAL WORKLOAD DURING T H E PERFORMANCE OF A SIMPLE TASK
P.A. Haticock Departlnent of Safety Science and t1 ulnati Factors Department
Institute o f Safety and Systems Managrment LJ!!iveJs*
ot Southern California
Los Angeles, CA 90089
ABSTRACT An open question in mental workload assessment concerns the impact of endogenous factors upon the perceived load of a performance task. To examine one element of this question, twenty-four subjects (12 male, 12 female) performed a sirriple time estimation task at four different times of day (0800. 1200, 1600, 2000h) Following rach session, subjects completed the NASA T L X workload assessmeill scales. Performance data and physiological response tollow patterns previously observed in t h e literature. Fernale subjects had a greater intolerance tor this repetitive task. Five of twelve female subjects failed to complete t h r series ot exposures. There were n o drop-outs among the male subjects. Analybis ot the remaining responses indicated that t h e female subjects rated effort and frustration significdntly higher and performance significantly lower than their male counterparts. For the subjective workload responses there were no higher order interactions and no significant ettects were tound for time of day. Caution concerning t h e ubiquitous application of these tindings is advised in light of a number of potentially confounding influences.
I . INTRODUCTION The iniportaiice of iiiental workload assessment is becoming progressively clearer (cf ., Gopher & Donchin, 1986). Accurate reflections of mental workload can be used to distinguish between competitive designs, and multi-attribute scales can partial operator response to provide engineers and designers with diagnostic information tor specific design evaluation (O'Donnell & Eggemeier, 1986). However, it is not concerning such static comparisons that workload assessment promises to make it's greatest contribution. Rather, it is in the dynamic, on-line assessment of an individual operator's response that workload can provide essential information. In other work, we have indicated the central role of mental workload evaluation in the construction and operation of adaptive humanmachine systems, as mediated through intelligent interfaces (Chignell & Hancock, 1985; Hancock & Chignell. 1987). With such a persprctive,
it
15
cledr that tactorn which zerve to influence workload
240
F A . Hancock
response need to be elaborated. The relationship, for example, between subjective response and task performance is somewhat complex and has been the subject of considerable empirical attack (see Hart k Staveland, 1987; Vidulich. 1987; Vidulich k Wickens, 1984; Wickens & Derrick. 198 I ). However, relatively little attention has been paid to endogenous and operator specific factors which might affect the perception of load, independent of task-related intluences. There is reason to believe that such effects are non-trivial. For example, we all recognize that prolonged work intluerices our approach toward a task arid that efliciency changes with tatigue as performance progresses o n a continuous task. One such influence that has been long-recognized in the human performance literature is the variation in response that accoiiipanies testing at different times of day (Dresslar, 1892; Kleitman, 1939/1963; Colquhoun. 197 I). These diurnal effects are often attributed to the actions of two companion oscillators (i.e., body temperature and time of day) whicli act in synchrony under most conditions (Moore-Ede, Sulzrnan, & Fuller, 1982). The purpose of the present study was to evaluate both the effects of gender and time of day on subjective and physiological workload responses against the background of performance of a simple repetitive task. 2. METHOD
2. I.Subjects Twenty-four subjects (12 male, 12 temale) were recruited for the experiment through voluntary response. They were students and staff members from the University of Southern California. No subject was under medication or medical treatment at the time of testing. 2.2. Procedure Each subject reported to the testing facility approximately twenty-minutes before the hour at which they were to undertake the experiment. The experimenter attached temperature recording equipment and ensured that both the physiological and performance recording data collection system was both functional and calibrated. A t approximately fifteen minutes to the hour the subject began the estimation task described below. The one hundred performance trials took approximately twenty-five minutes (dependent upon individual subject estimates). After completion of t.he time-estimation task the subject completed the NASA TLX workload assessment scales &. The experimenter then removed the physiological recording equipment and the subject was released at approximately fifteen minutes past the hour. Each subject engaged in one practice session which were all given at 1200h. The subject then performed in one of four order groups as described below. Assessment was made at 0800, 1200, 1600, and 2000h for all subjects.
2.3.Tasks The primary performance task in t h e present experiment was time estimation. Using the production technique (Bindra & Waksberg. 1956). each subject estimated a period of 1 I seconds by depressing a telegraph key. At the termination of a single trial, the experimenter recorded t h e time produced and the subject commenced a following trial
The Effect of Gender and Time of Day
24 I
Every five trials the experimenter recorded the temperature value. There were 100 trials per session. At the termination o t the individual session, the subject completed the NASA T L X workload assessment scales. I n this procedure, the subject compares the six defined sources of workload: mental demand, physical demand, temporal demand, performance, effort, and frustration Each of these are matched in a pairwise comparison and the subject indicates whicli o t the two alternatives represents to them, a greater source of workload, (tor a more complete derivation of the NASA TLX workload scales see Hart 8c Staveland, 1987). Thr experimenter then took these comparisons and derive a weight for each scale depending upon the nuniber of times the subject selected the attribute i n comparison with each of its colleagues. This weighting was then used to multiply t h e response on the workload rating sheet which is produced by the subject atter each individual testing condition. Through multiplication of the weight factor by t h e raw rating, given by the subject after each experimental condition, the experimenter derives an overall workload value which is divided by 15 (the total number of weights) to produce the mean weighted workload score tor the condition. 2.4. Design
The present experimental design used an incomplete Latin-square. Six subjects (three male, three female) were randomly assigned to one of four order groups. These represented tour different orders o f testing. As there are twenty-four possible orders of testing, the actual orders were selected by random drawing. T h e experiment was completed without subject replacement. Under many circumstances there is little attrition of subject population. However, in this experiment subjects chose to exercise their right of withdrawal. Of the subjects tested, five females exercised this right, none ot' the male counterparts withdrew.
2.5. Physiological Measurement In the present w o r k the physiological assessment technique employed was the measurement of core temperature variation. We (Hancock, Meshkati, & Robertson, 1985) have previously suggested a rationale for the validity and reliability of this measure as an index of mental workload. Despite some inherent limitations, particularly related to the inertia of the signal. we have found in other experimental procedures (see Hancock, 1983; Hancock & Brairiard. I98 I ) that auditory canal temperature retlects global changes in mental workload and is consistent in ditterentiating task onset and cessation In t h e present work, replication o t a prior method was undertaken, whereby the subject was affixed with a thermometer deep in the auditory meatus, which was worn tor the duration of the experiment (Hancock, 1983). T h e experimenter took periodic readings of' the temperature value as displayed on an Arbrook-LaBarge temperature monitor. It is these data that were subjected to analysis.
3. RESULTS The present work concentrates principally on the results from the subjective workload evaluation as derived through the NASA TLX scales. Concerning physiological and performance data, preliminary analysis indicated that each of these followed the trend that might be expected from the literature (e.g.. Kleitman, 1939/1963). This being a n increase in the body temperature with time of day, which peaked at the 2000h value.
242
P.A. Hancock
Tiriie estimates ro-varied w i t h this fluctuation such that the mean production intcrval decreased with ascending body tempcrature across the times of day investigated. Fulleielaboration of these results is to be presented elsewhere (Hancock, 1987).
3. I.Workload Evaluation The raw ratings, the wriglrted ratings train each scale, and the iiieaii weighted ratings of all scales from the N A S A T L X iristrutnent were subjected to analysis of variance. For this analysis there w r r e two independent variables (i.e., sex, a i i d titlie of day), and thirteen dependenr vdriahles derived troni the workload assessiiient scores. These were respectively the raw and weighted responses for each of the six scales, i.e.. metltal demand, physical demand, temporal demand, performance, frustration, and ettort. T h e final dependent variable being the sunimed weighted mean at all scales. 3.1.1. Weighted Responses
’There w e r e no significant interactions between sex and time of day for the niean weighted average, o r any o f the raw or weighted subsidiary scales of the NASA T L X . For three of the weighted attribute scales there were signiticant differences depending upon the gender of the subject, Feinales rated both effort and frustration signiticantly higher ( p < .05) than their rnales counterparts. Also, females rated perfoor-niance significantly lower ( p < .05) than the male subjects. These comparisons are illustrated in Figures I, 2, and 3. There were no signilicant eftects tor time of day upon the scores tor a n y of the scales. or the iiic’.iit weighted average. This absence of an expected ettect is somewhat surprisiiig, giveti tlie ubiquity o f rircadian variation in the efficiency of niariy human behavioral and subjective responses. tiowever, as retlected in Figure 4 , the tileati weighted average o f workload did not fluctuate significantly across the times of day examined.
3.1.2. Unweiglited Responses Analysis performed on the raw or unweightecl responses confirmed the above pattern ot results. For the scores derived after each individual session. terriale subjects rated trustration significantly higher ( p < .05)and perforniance significantly lower ( p < .05) than their male counterparts. Although the difference for the effort scale was not significant ( p = ,17), the weighted scale for effort achieved such a difference through combination with the initial weightings for males and females. This differences between weightings for different gender of subject i s exainined below.
3.1.3. Gender Difference in Scale Weightings ‘Tables I and 2 illustrate t h e rank order of t h e respective weightings tor each ot the six scales. The range on each scale i 5 from 0 to 5, where 5 represents the highest sour-ce ot workload as perceived by the individual arid zero the lowest. Table I gives these numerical values, while Table 2 presents a cotiiparative rank ordering. T h e first observation concerning Table 2 is the clear consistency between male and female subjects. An obvious exaniple is the equivalence between genders for the weightings ot the three lowest weighted scales. Note also that although the males tend to rate higher overall, see Table I, the means lot- these latter three scales are in close proximity. T h e only major difference is t h e view expressed concerning the effort attribute As
The Effect of Gender and Time of Day
T
V
cn
e
150
IL
w
100
so
MALES
FEMALES
Figure 1 . Suinmrd response o n t h e N A S A TLX Effort Scale for Male vs. Female Subjects (Males: N = 12, Females: N = 7). Columnar height equals mean response, vertical bar IS one half a standard deviation.
MALES
FEMALES
Figure 2. Summed response on the NASA TLX Frustration Scale for Male vS. Female Subjects (M,lles: N = 12, Females: N = 7). Columnar height equals mean response, vertical bar is one half a standard deviation.
243
P.A. Hancock
244
FEMALES
WALES
Figure 3. Summed response o n the NASA T L X Performance Scale [or Male vs. Female Subjects (Males: N = 12, Females: N = 7). Columnar height equals mean response, vertical bar is one half a standard deviation.
50
:I 20
10
0800
1200
1600
2000
TIME OF M Y (24 HOUR CLOCK 1 Figure 4. Mean Weighted Average, Overall NASA T L X Scale vs. Time of Day (Males: N= 12, Females: N = 7 ) . Vertical bar equals o n e standard deviation.
The Effect of Gender and Time of Day
245
represented by the longest diagonal on Table 2. t h e female subjects gave ascendancy to this scale. The male subjects alternatively found effort of only medium concern. This comparative ranking is in part responsible for the significant difference between the genders on the eftort scale in the mean weighted average.
Male Subjects (N=12) NASA TLX sc0l.r
R R a w Ratlng
Female Subjects (N.7)
Row
Ronk
Ratlng
Ronk
Mental Demond
3.98
I
3.31
2
Physicol Demand
I .20
6
0.96
6
Temporal Demand
2.39
4
2.27
4
Performance
3.13
2
2.88
3
Effort
3.07
3
3.65
I
Frustration
I .24
5
I.92
5
Table 1. Raw rating and rank by Subject Gender of the six NASA TLX Subscales.
Mole Subjects (N.12)
Female Subjects (N.7)
Rank Odor Mental Demand
Effort
Performance
Mental Demand Performance
Effort Temporal Demand Frustration Physical Demond
-
Tempoml Demond Frustration
- Physical Demond
Table 2. Coriiparative ranking by Subject Gender of the SIX NASA TLX S u bsca les. 3.1.4. Time of Day Differences in Scale Weightings
Figure 5 gives the weightings of the six scales by the four times of day examined. The figure presents few significant patterns. The clear differences for gender are retained for both the effort and trustration scales, although the difference on the performance scale appears due to a large divergence between gender scores at 1600h. For the mean values (dotted lines), these are no consistent effects for time of day, while some scales exhibit small increases or decreases through the day, only the performance scale shows a noticeable pattern, where pertormance is rated a slightly higher source of workload as the day progresses. 4 . DISCUSSION
Given the number ot workload scales, and the level and number of the independent variables noted, from a parsimonious viewpoint there are thirty-nine opportunities to derive a signiticant ettect in the present experiment. By chance alone, therefore, two of
PA. Hancock
246
Yolr OFbllOlN 0-
;3, 1'1 I
mm ORrrmIN 0-
1
4
;5I 1 'I Figure 5. Mean weightings (dotted lines) on Each NASA TLX Scale vs. Time of Day (Males: N = 12, Females: N=7)
The Ejyect of Gender and Time of Day
241
these should provide significant rffects. I n the currelit rrsults, thrre are five significant differences. However. it is unlikely that any of these are due to mere chance for a number of reas0n.i. First, they all occur tor one independent variable. Second, the pattern of results reinforce the potentially more important observation which is the difference in the drop-out rate for the dit'ferent genders. The observed difference i n the rating of performance is also not unexpected the given attitude toward the task of the female subjects that may be inferred from the value o n the frustration and effort scales. An alternative explanation may come from the consideration of the sex of t h e experimenter. It h'is been observed (Runienik, Capasso 8c Hendrick, 1977) that males tend to increase in performance efficiency given the same sex experimenter. In contrast, females subjects d o not perform as well in the presence of a male experimenter, In the present work both experimenters were male. While this effect might account for the difference in the performance scale, it is unlikely that these sex of subject by sex of experimenter interactions account for the observed differences o n the effort and frustration scales, which may be more appropriately attributed to the attitude toward the performance task. T h e absence of any effects for time of day is surprising. In numerous performance experiments, it has been noted that stage of t h e circadian cycle, which under normal conditions co-varies with time of day. influences performance efficiency (see Colquhoun. 1971; Webb, 1982). I f subjective workload assessment were responsive to time of day. it would be hypothesiied that workload level would be systematically reduced across the times investigated. 'l'his argument ;issuriies that performance increases in efficiency with progressive time o f day between (0800 and 2000h). However, the complexity of the present task is low. Complexity is detined as the number of free variables to be controlled in order to achieve successful performance. Also difficulty, which may be defined as the point along the_&"mance continuum of each free variable to be controlled, is particularly low for this task. Therefore, if subjective workload followed either performance output or circadian phase, a monotonic decrease in workload value should be expected across tiine of day. As subjective workload remained constant in the present case, it is more likely to reflect some constant in the performance environment. As noted, this is the task itself, and it's complexity and difficulty. Although it may be tempting to conclude that time of day has little effect upon the subjective experience ofworkload, the present findings d o not warrant such a general conclusion. Indeed, there are a number of caveats that should be recognized prior to the application of these findings. A number of these reservation are outlined below. First, the times of day evaluated in the present work represent only a restricted range of the full circadian cycle. T h e peak of the circadian rhythm occurs roughly at 2000h. but the acrophase, or lowest point, occurs at approximately 0400h. an interval of some 1 hours early than the earliest testing time in this work. This limits the power of the circadian effect which might emerge given comparisons of capability at the peak against performance at the acrophase. However, a recent study at NASA Ames Research Center has indicated that even at the lowest point of the circadian cycle subjective workload does not vary from a constant value. During the testing period of the present study (0800-2000h). the circadian rhythm is in a state of constant increase. If subjective workload were responsive to rate of chanee in circadian function, then the present range of times would not be sensitive to such an effect. It should be noted, however, that performance itself follows the absolute level of the rhythm, making this latter sensitivity somewhat unlikely. Perhaps of greater concern is the lack of change in complexity and difticulty of the performance task. Should time of day effects interact with the character, complexity. and
248
P.A. Hancock
difficulty of the performance task, and some evidence suggests that they d o (e.g., Folkard & Monk, 1980), then the present study would not exhibit such effects. An additional possibility is that the present performance task represents essentially no load and therefore, the tool used to assess a no load rondition is rightly insensitive to such a circumstance. However, it should he noted that the subjective response was not universally low on each raw scale. Even where time of day effects occur, they are typically consistent, hut relatively small in terms of absolute magnitude. Thus such effects may be masked by unwanted influences intrinsic to the experimental procedure. From a practical perspective one might argue that such fragile effects become of limited importance. However, it is possible that under a concatenation of circumstances (e.g., long-distance, transmeridian flight) where the factors of circadian desynchronosis and fatigue enter into t h e operational arena, the interactive effects of circadian phase and task demand become the most important influences o n the workload experienced by the operator Elaboration ot these eftects await fuller experimental inquiry. Finally, there are a number ot procedural questions which should he considered when evaluating the veracity of the present findings. First, as with all within-subject designs, there is an asymmetric transfer effect embedded in the repetitive exposure of each subject (see Poulton. 1982; Damos & Lyall, 1986). While this effect has a difterent impact, depending upon the performance circumstances, such a contaminant is embedded in the current design. Due to the simple nature of the task involved, and the subjects’ intimate familiarity with the source of stress (i.e., time of day), such an eftect is considered negligible in the present work. There is a restriction o n time of testing, essentially only one subject could be tested per testing week. As a result i t was inteasible to examine all potential order effects and thus this factor is not completely counterbalanced in the present work. Analysis with this factor embedded revealed n o significant effect for this influence, therefore it is not considered a problem i n this circumstance. Finally, according to some authors (e.g., Simon, 1987), the present investigation contains insufficient independent variables. According to such tenets all variables thought to influence mental workload should be included in the design. As only a restricted number of observations are needed to sample the response matrix for important effects, such a design should not, according to Simon (1987), present insurmountable obstacles. This has yet to be accomplished for mental workload evaluation. In conclusion, t h e results from t h e present work indicate the intolerance ot temale subjects For the repetitive and boring task examined. Should this represent a gender difference in tolerance to work underload the present study may represent an important finding. T h e lack of change in subjective workload to change in time of day is an unexpected finding, but may represent the appropriate lack of sensitivity of the workload scale to such a low load situation, and also suggests that task complexity and ditticulty are important influences on what operators perceive as mental workload.
5. REFERENCES Bindra. D., & Waksberg. H. (1956). Methods and terminology in studies of time estimation. Psychological Bulletin, 53, 155- 159. Chignell. M.H.. & Hancock, P.A. (1985). Knowledge-based load leveling and task allocation in human-inachine systems Proceedings of the Annual Conference gn Manual Control, 2 I , 9.1-9.1 I
The Eff'ectof Gender and Time of Day
249
Colquhoun, W.P. (I97 I). Circadian variations in mental efficiency. In: W.P. Colquhoun (Ed.). Biological rhythms and liurnan performance. (pp. 39- 107). Academic Press: London. Damos, D.L., & Lyall, E.A. (1986). T h e effect of varying stimulus and response modes and asymmetric transfer on the dual-task performance of discrete tasks. Ergonomics, ,!L 5 19-533. Dresslar, F.B. (1892). Some influences which affect the rapidity of voluntary movements. American journal of Psychology, 4,514-527. Folkard, S., & Moiik, T . H . (1980). Circadian rhythms in hunian memory. Journal of Psychology. 2_1, 295-307
British
Gopher, D., & Donchin, E. (1986). Workload: An examination of the concept. In: K. Boff., L. Kaufnian., and J.P. Thomas, (Eds.). Handbook of perception and human performance. (pp. 4 I : 1-49). N e w York: Wiley. Hancock, P.A. (1983). The effect of an induced selective increase in head temperature upon performance of a simple mental task. Hhman Factors, 25, 441-448. Hancock, P.A. (1987) T h e internal clock. Manuscript in preparation Hancock, P.A., & Braiiiard, D M . (1981). Tympanic temperature: A non-invasive physiological measure of' workload. Technical Report, for Environmental Devices Corp., MA. Hancock, P.A., & Chignell, M.H. (1987). Adaptive control in human-machine systems. In: P.A. Hancock (Ed.). Human factors psvcholog.v. (pp. 305-345). North-Holland: Amsterdam. Hancock, P.A., Meshkati, N., & Robertson, M . M . (1985). Physiological reflections ot mental workload. Aviation, Space and Environmental Medicine, 56, 1 110-1114. Hart, S.G.. & Stavelarid. L.E. (1987). Development of NASA TLX (Task Load Index): Results of empirical and theoretical research. In: P.A. Hancock and N . Meshkati (Eds.). _ Human ___ mental ___ workload. North-Holland: Amsterdam. Kleitman, N . (1939/ 1963). Sleep and wakefulness. University of Chicago Press: Chicago. Moore-Ede, M.C., Sulznian, F.M., 8c Fuller, C.A. (1982;. Harvard University Press: Cambridge. M A .
T h e clocks that time g .
O'Donnell, R.D., & Eggemeier. F.T. (1986). Workload assessment methodology. In: K. Boff., L. Kaufman., and J.P. Thomas, (Eds.). Handbook of perception and human performance. (pp. 42: 1-49). New Y o r k : Wiley. Poulton, E C (1982) Influential companions Effects of one strategy on another in the within-subjects designs ot cognitive psycholoqy &chological Bulletin, 92, 673-690
250
P.A . Haticock
Rumenik. D.K., Capasso, D.R., 8c tielidrick, C . (1977) Experiinenter sex etfectz in behavioral research. Psychological Bulletin, X.r, 852-877. Simon, C W. (1987) Will egg-sucking ever become a mence3 Hunian Factors Soc iety -~ Bulletin. 30, 1-4 Vidulich. M.A. (1987). T h e cognitive psychology ot siihjertive mental workload. I n : P . A . Hancock and N . Meshkati (Eds.). Hen411 rIieiltiJ worklgad. North Holland: Amsterdam. Vidulich, M.A., & Wickens, C.D. ( 1984). Subjective workload assessment arid voluntary control of effort in a tracking task. croceeding o t be h4igal Conterence on M ~ I I L I A ! Control, 20, 57-7 I . Webb, W.B. (1982). Biological i&hnns,
sleep, and pertorinance. Wiley: N e w York
Wickens, C.D., & Derrick, W. (1981). T h e processing demands of higher order manual control: Applications of additive tactor methodology. (Engineering-Psychology Laboratory Technical Report EPL-8 I-I/ONR-8I-I). University of Illinois at UrbanaChampaign.
6. ACKNO W LEDG EM ENTS T h e research reported here was supported by Grant NCC 2-379 troni NASA, the National Aeronautics and Space Administration, through Ames Research Center. Motfert Field, CA 94035. Dr. Michael Vidulich was the technical monitor. The views expressed are those of the author and should not be necessarily construed as those of the sponsoring agency. I would like to thank Drs. M.H. Chignell and A. Marchzak for help with the design and analysis of the present experiment. Raphael O’Donghue, P.J. Smith, and J. Negrete helped with data collection and coding in this study.
HUMAN MENTAL WORKLOAD P.A. Hanmck and N. Meshkati(Edit0rs) 0 Elsevier Science Publishers B.V.(North-Holland), 1988
25 1
A N ECLECTIC A N D CRITICAL REVIEW OF FOUR PRIMARY
M E N T A L . WORKLOAD ASSESSMENT METHODS: A GUIDE FOR DEVELOPING A COMPREHENSIVE MODEL
N . Meshkati Human Factors Department Institute of Safety and Systems Management University of Southern California Los Angeles, CA 90089 and
A . Loewenthal Lockheed Aeronautical System Co Burbank. CA 91520
ABSTRACT Four primary methods of mental workload assessment, i.e., Secondary Task, Subjective Rating, Performance Measure, and Physiological are reviewed and the latest development in each one is also evaluated. Furthermore, based upon a thorough, critical analysis, it is found that all of the methods are very sensitive to the effects of the individual differences factor. Therefore, it is recommended that, in order to develop a comprehensive conceptual paradigm for mental workload measurement, the factor of individual differences in information processing should not only be incorporated iii the model, but also be regarded as one of the promising areas for further research. 1. INTRODUCTION
The objective of this 3tudy is twofold. First, it reviews the critical issues of four primary mental workload (MWL) assessment methods and maps their potential areas of improvement for future researchers. Second, the factor of "individual differences" and its impact o n the results of the above methods is thoroughly investigated and documented. The latter is also used as a basis for and prologue to development of a more comprehensive 'ind enhanced model of MWL measurement. As Wierwille and Williges (1978) have pointed out, "literature on workload is so diverse that categorization oil t h e part of the reader of this literature is almost intuitive." Furthermore, in order to avoid repliration and maintain continuity for the reader, it has been attempted to l i i i k this study to the exhaustive reviews of MWL measurement methods conducted by Ogden, Leviiie and Ellen (1978), Williges and Wierwille (1979). Wierwille (1979). Moiay (1979; 1982), and Meshkati, Hancock, and Robertson (1984).
252
N. Meshkati and A. Loewenthal
Thus, these works should be considered as the foundations of this analysis. However. the scope of' this work goes beyond the sheer presentation of a complementary part to these studies, and it includes review of new investigation performed since their dates of publication.
2. REVIEW T h e four MWL measurement methods which have been extensively investigated and evaluated in the above studies are:
I ) Secondary Task methods
2) Subjective Rating methods 3) Performance Measure methods 4) Physiological methods
2. I . Remarks on Secondary Task Methods A secondary task is a task which the operator is asked to do in addition to hidher primary task. If he/she is able to perform well on the secondary task, this is taken to indicate that the primary task is relatively easy; if helshe is unable to perform the secondary task and at the same time maintain the primary task performance, this is taken to indicate that the primary task is more demanding (Knowles, 1963). The difference between the performances obtained under the two conditions is then taken as a measure, or index, of the workload imposed by the primary task.
In a study Bloem and Damos (1985) utilized a secondary task technique to examine a human's performance in complex task combinations. They concluded that for the tasks used in their experiment, single and dual task performance on the easy primary taskc provided a better prediction of hard primary task performance than the secondary task performance. In another MWL study, Damos and Bloem (1985) examined the effects of type A and B behavioral patterns on multiple task performance. They reported that under the dual task conditions type A had significantly different performance than type B. Secondary task as a mental workload assessment technique has several limitations, the most serious being the intrusion aspect of the secondary task. When the secondary task is introduced, performance o n the primary task is known to be modified, and usually degraded (Williges and Wierwille. 1979). This problem has been addressed by other investigators such as Wellt'ord (1978), who regarded the extra load imposed by the secondary task as a factor that might produce a change of strategy in dealing with the primary task, and thus distort any assessment of the load imposed by the primary task alone. Brown (1978) argued that since the dual task method is essentially a resourcelimiting device (i.e.. the human processing resources are limited), interference would occur within the processing mechanisms, rather than at the sensory input or motor output. He claimed that there is empirical support for the idea that interference is maximal at the level of response selection. He also believed that the dual task
Four Primary Mental Workload Assessment Methods
253
interference is greater when the tasks share the same response modality rather than different modalities. T h e nature of thr primary task, its informational load and structural characteristics can seriously affect the efficiency and reduce the utility of the secondary task. Workload may be largely a function of the structural characteristics of a task rather than ot the informational load imposed by its component parts (Brown, 1978). Therefore, the more interesting tasks niay be relatively inaccessible to study by the dual task methods because their complex structure does not effectively permit a reliable sharing of attention between inputs from both tasks in the dual situation (Ibid). Based on this observation and unexpected interactions between certain tasks, Ogden et al. (1978) point o u t that the choice of a secondary task can prove to be a real problem. Knowles (1963) attempted to provide a set of criteria against which to judge the desirability of a secondary task. These criteria included "noninterference to primary task," "ease of learning," "self-pacing'' (in order for the secondary task to be neglected in the service of maintaining primary task performance), and "compatibility with the primary task." Ogtlen et al. (1978) added "sensitivity" and "representation" to the above list. T h e former criterion implies that the secondary task should be sensitive to manipulations of the primary task conditions (the level of effort required by the primary task), while the latter ensures that the findings based on the particular secondary task would hold up when other secondary tasks were substituted (Ibid). According to Kalsbeek (1971). the dual task method can be used in only t w o ways: first, in the traditional way, which consists of measuring the so-called spare mental capacity; second, in experiments where the main task, to which preference has to be given, is a simple, repetitive one. For instance, a binary choice task can be regarded as a stress condition in the performance of a secondary task. Brown's (1978) position on this issue was relatively different from Kalsbeek. He proposed that the dual task method should be used only for the study of individual differences in processing resources available to handle workload. A viable secondary task in this setting, according to Brown, should present discrete stimuli of constant load on a forced. paced schedule and also should compete for processing resources only. T h e question of the effects of individual differences and personality factors on t h e secondary task performance has been raised by some investigators such as Gibson and Curran (1974) and Huddleston (1974). They argued that since an additional task will be arousing, it may benetit certain personality traits and be detrimental to the performance of others. Motivational factors and their roles in secondary task performance pose another problem in this area and have been only partially addressed by Kalsbeek and Sykes (1967). In order to avoid some of the aforementioned problems, Kalsbeek and Ettema (1964) favored the utilization of sinus arrhythmia (i.e., a physiological method) over the dual task method. They argued that sinus arrhythmia is a better predictor of a n individual's reserve capacity than the dual task method. They believed that the latter parameter depends on an individual's motivations and training, whereas sinus arrhythmia appeared to give a measure of total mental load (rather than just the load resulting from the task). the very factor which, i f unchecked, will lead to the utilization of reserve capacity. Hyndman and Gregory (1975) expressed the same concern and noted that the use of a secondary task in real life situations is obviously precluded by the risk and uncertainty ot overloading the individual.
254
N. Meshkati and A. Loewenthal
T h e development of Multiple Resource Theory (Wickens, 1980) challenges some of the theoretical premises of the conventional secondary task method, because it relies on the limited channel capacity theory and the undifterentiated pool of attentional resources concept (ct'., Kahneman. 1973). This fact might lead to major revisions in structure and applications of secondary task method in the future M W L studies.
2.2. Remarks on Subjective Rating Methods Subjective measures include direct or indirect queries of the individual for his opinion ot the workload involved in a task. The easiest way to estimate the mental workload of a person who performs a cert'iin task is to ask h i d h e r what he/she subjectively feels about the mental load level of the task. Since Moray's (1982) study of sub~ectivemental workload there have been at least three major (reported) developments and several additional new studies in this field. T h e first one pertains to the Subjective Workload Assessment Technique (SWAT) (Reid, Shingledecker, and Eggerneier. 198 I ) , which has three dimensions: time load, mental effort load and stress load. Since its conception, SWAT has been undergoing systematic development and validation in order to ensure its general applicability and sensitivity as a workload index (ct., Reid, Eggemeier. and Nygren, 1982; Eggemeier, Crabtree, and La Pointe, 1983). T h e Modified Cooper-Harper Scale (Wierwille and Casali, 1983; Rahimi. 1982; and Casali and Wierwille, 1983), which is a modified version of Cooper-Harper's (1969) aircraft handling qualities rating scale, is considered as the second major development. This scale is applicable to a wider variety of task workloads, especially for systems which load perceptual, mediational and communication activities, and, like the preceeding scale, is still in the developmental stage. T h e multidimensional "bi-polar" rating scale (Hauser, Childress. and Hart, I982a) is considered as the third inajor development. I t consists of several hi-polar adjective rating scales such as: Overall Workload, Task Difficulty, Time Pressure, Pertormance, Mental/Sensory Effort, Frustration Level, Stress Level, and Fatigue. I t suggests that ther-e is a wide range of' interpretations put on the meaning of the term "workload" which stenis from the individual differences (Moray, 1984). In a study on twelve pilots the authors (i.e., Hauser, Childress, and Hart, 198%) utilized this scale and concluded that the subjective rating of factors contributing to workload was generally consistent with the task demands. In a related study, Miller and Hart (1984) investigated the intluence of specific navigation related tasks, utilizing the bi-polar scales. They reported that the subjective responses and objective measures of performance retlected a strong association between subjective experience and objective behavior.
Damos and Bloem (l985), in a work reported above, also employed the bi-polar scales to elicit the subjective estimates of the MWL ot the Type A and Type B behavior patterns. They tound that Type A subjects reported less frustration and more f'atigue under singletask than under dual-task conditions, whereas Type B subjects reported the opposite pattern. The works OF Meshkati (1983). Robertson (1984) and Robertson and Meshkati (1985) are other MWL experiments which utilized the subjective ratings method to study the effects
Four Primary Mental Workload Assessment Methods
'55
of individual differ-ewes. Meshkati (1983) classified ti4 participants of his experiment according to their dominant dccision styles (i.e , dominant Information Processing Behavior). I t was found that there were significant ditterences in evaluating task dit'ticulty which were retlecttd in the behavior ot the different decision styles' subjective ratings. 111 order to incorporate individual tlitferences in the M W L assessment model, Robertsoli (1984) utilized a cognitive complexity or abstriiCtiiess-concreteness scale. 'The finditigt ot this study attested 1 0 the existance of t w o 5igniticaiitly different subjective rating patterns associated with subjects' cognitive coniplexity (1.e.. abstract or concrete). T h e work ot Robertson and Mrshkati ( 1 985), which employed both of the individual differences classification models (i.e.,decision style and cognitive complexity), produced consistent results with the two independent atorementioned studies. The similar dichotoiiius subjective rating patterns tor the two different groups of subjects was observed The tirst group consisted of abstract and high conceptually complex decision styles and the secoiid group comprised concrete and low conceptually complex decision styles. Since subjective rating of the difficulty of a task is primarily a function ot the rater's perception, the concept of "perceived difficulty" has to be given importance aiid analyzed directly. McCormi.ck and Sanders (1982) even argued that the perceived difficulty of work may he more important than the workload level and its accornpanyiiig strain. Audley, Rouse, Sanders, and Sheridan (1979) proposed the perceived difficulty ot meeting the task demands should be the primary consideration and attempts should be made to dissociate this froin other tacets of the subjective aspects of workload. The perceived difficdty ot a task may alter the human operator's attitude towards i t . This, in turn, can affect the tiriie operators would he prepared to jpend in t h e performance of the task and the level of confidence in their decisions (Moray, 1982). The perceived difticulty of the individual is intluenced by at least three groups of tactors. The first group deals with the context ot long-term memory storage including both general experience and memories of similar tasks. T h e secoiid group is the background factors such as personality traits, habits and general attitudes including likes and dislike.;. and aspiration and expectation levels. T h e third group of factors represetits rnomeiitary conditions, e.g., one's emotional state, general fatigue, motivation. and the importance 01 the task, as well as rhe actual and anticipated success or failure (Borg, Bratfisch, and Dorinc, 1971). T h e perception of difficulty in a practical work setting. especially in '1 decision making process, is influenced by such factors as the number of alternative actions, insufficient or contradictory data, uncertainty about the consequences at actions. conflicting demands o n the outcome of the work, need for feedback, and scarcity of t i i i i e (Borg, 1978). The subjective rating of a task difficulty can also be at'fected by the situation and job as a whole rather than by only the task induced or individual rater's factor. Borg (1978) proposed that it is necessary to point out the set of factors which seem to cause t h e experience o t difticully in one job which rnay be ditterent from those in another job. Another problem aswciared with subjective rating is the question of existence at' an inverted U-shape relationship between performance and subjective experience of workload as observed by Tulga (1978) and reported by Moray (1982). The structure of the subjective rating scale and its anchoring is a critical tactor which affects the sensitivity and validity of this method. Unless the subjective measures are properly structured, they may serve only as gross indicators ot stress level arid have little "diagnostic value," i.e., they may not indicate the source or type of workload involved
256
N. Meshkati and A. Loewenthal
(Gaume and White; 1975, and Robertson and Meshkati; 1985). This observation i s echoed by Audley et a1 (1979) as well as by Hopkin, Parks, Rohmert, Rault, Soede, and Schnidike (1979). Individual differences are considered as one of the most influential factors affecting subjective ratings, Moray ( 1982) realized that "individual differences niight be quite considerable." However, he recognized that this issue has never been followed LIP. despite many hints. Borg et al. (1971) reported that participants who score highest on intelligence tests rate any given problem higher i n difficulty than those with lower scorer, although there is no difference in time taken to solve. Later, Borg (1978) referred to the perceived difficulty of a task as a factor of great importance in the evaluation of work difficulty and the difference between individuals. Phillip, Reiche, and Kirchner ( 197 I) proposed that in considering subjective ratings it should be noted that the feeling o f t h e rater is not only intluenced by the stress situation but also by his "individual capacity lor the actual control task." T h e foregoing points regarding subjective rating methods led Moray (l982), with reference to many investigators (i.e,,Johantisen, Moray, Pew, Rasmussen. Sanders, and Wickens. 1979), to assert: "all subjective load is secondary to such physiological events as heart rate changes, muscle tension, etc."
2.3. Remarks on Performance Measure Methods Primary task performance may be the most obvious method of workload assessment. If one wants to know h o w driving is affected by different loading characteristics, e.g.. traffic, fatigue, or lane width, one should be able to utilize the driving performance as 'I criterion (Hicks and Wierwille, 1979). T h e lack of sensitivity of performance measures to the changes in mental workload levels is one of the major deficiencies of these methods. This issue is raised by many investigators, such as Gaume and White (1975) and Gartner and Murphy (1976). Gaurrie and White argued that the level of mental workload may increase while the pertornlarice is unchanged, so that the performance may not be a valid measure of workload. Gartner and Murphy proposed that an operator may show equal performance for two different configurations, but in reality, the effort level in one system may greatly exceed that in the other. Moreover, Danios ( 1984) based on her experimental investigation, concluded that there was not even a consistent relation between pertorrnance ratings and subjective estimates of overall workload, task difficulty. and mental sensory etfort. 'The generalized application of this method to different task situations poses another set of problems, since for each experimental situation, a unique measure must be developed (Hicks and Wierwille, 1979). Williges and Wierwille (1979) also referred to this point atid argued that the measures of performance of t h e primary task are "task-specit'ic." Each time a new situation is examined, new measures must be developed and tested, a problem not shared by several other methods of mental workload assessment. Rouse ( 1979) clasified the performance measures into thort-term and long-term measures, where short term performance does not necessarily reflect workload. This conclusion is supported by the work of Enstrom and Rouse (I977), who demonstrated that in a particular control and monitoring task, short term performance on the central task (i.e., RMS error) did not correlate with attention allocation. This very tact was
Four Primary Mental Workload Assessment Methods
257
reiterated by Williges and Wierwille (1979), who also cited a number of studies which supported the sarne c-oncept. According to the authors, those studies appeared to have been performed at workload levels where the operator had sufficient reserve capacity to adapt to the increased load Rouse ( 1979) evaluated long-term performance measures as the indicator of relative workload which woiild seem to provide, at best, an ordinal scale of workload. H e also argued that, unless one is willing to assume that humans always operate at capacity and that all humans have the same capacity, inter-human cotnparisons are not valid. Rolfe (1976) proposed that direct prrt'ormaiice measures are not always as informative as may be required, due to the changing nature of the operator's tasks in current systems, and the paucity ot the assessment of task demand factors. This latter fact was also referred to by Knowles (1963) as a pitfall of performance measurement methods which, in his opinion. seldom retlect the operator's load and usually exhibit only h o w well some functional system criterion is met. Gaurne and White ( 1975) expressed similar concerns in their study Williges and Wierwille (1979) concluded that only high workload situations (near operator overload) are discernable by primary task performance measures, while low workload conditions may not be, since at these low levels the operator ordinarily adapts in an effort to maintain output variables at an acceptable level. T h e failure of primary task measures and task analysis methods to detect unobservable actions resulting from performing a cognitive task induced Welford (1978) to propose the use of psychophysiological methods. He suggested the adoption of the measures of autonomic activity and EEG activation during task performance as possible alternatives.
2.4. Remarks on Physiological Methods Individuals w h o arr subjected to some degree of mental workload comrnonly exhibit changes in a variety of physiological functions. As a result, several researchers have advocated the measurement of these changes to provide an estimate of the level of workload experienced. Since Wierwille's (1979) and Hancock, Meshkati, and Robertson (1985) review of physiological measures of mental workload, there have been several reported studies which utilized one or a combination of physiological methods to assess mental workload. A representative sample of these studies are briefly reviewed here. Wickens, Heftley, Kramer. and Donchin (1980) used the Event-Related Brain Potential (ERP) as the dependent variable. They showed that the P300 component of the ERP is able to reflect differences between two levels of workload, as well as the task relevance of the stimuli. Isreal, Wickens. Chesney, and Donchin (1980) contirmed the validation ot the ERP measures as an indication of systematic difference in task workload. Kramer, Wickens, and Donchin (1983) also tound a signiticant decrease in the amplitude of P300 as a result of increase in task difficulty. Sharit and Salvendy (1982) used a parameter of sinus arrhythmia (SA) to assess differences in mental workload between machine-paced and self-paced work. They reported that the SA failed to detect differences in informational load implicit in the tasks due to the attentional characteristics associated w i t h the tasks.
258
N. Meshkati and A . Loewenthal
Wierwille and Coiiner ( I 983) examined five different physiological measures (mean pulse rate, pulse rate variability, respiration rate, pupil diameter. and voice pattern) elicited by digit shadowing and mental arithmetic tasks. According to thelr results. only t h e mean pulse rate demonstrated some limited "sensitivity" to some of tlie cliftererices i n the psychornotor load conditions. In a related study, Casali arid Wierwille ( 1983) monitored respiration rate, heart rate mean, heart rate standard deviation, pupil diameter. and eye blinks and concluded that the sole physiological measure to display sensitivity to changes in communications load is tlie ptifiil diaiiieter measure. There are studies w h i c h utilized relatively unconventional and novel physiological approaches to assess human mental workload. Hyypa, Aunola, Lah Tela, Lah Ti, and Marniemi ( 1983) investigated psychoneuroendocrine responses to mental workload. They were able to find a significant decline of the cortisol and prolactin levels of subjects undergoing psychologically demanding achievement-oriented tasks. Loewenthal (1983) proposed alveolar gas concentration level could be a "cleaner" physiological measure than the others (e.g., respiratory arrhythmia). In his extensive study, Loewenthal cited several studies that tried to demonstrate a relationship between alveolar gas pressures and mental workload. Hancock (1983) and Haricock, Meshkati, and Robertson (1'38.5) considered tympanic temperature or. more correctly, deep Auditory Canal Temperature (ACT) as an alternative measure which circumvents certain problems associated with other physiological measures. I t has been observed that subjects beginning work on a simple mental task, after a period ot' quiescence, exhibit small but constant increases i n ACT (Hancock, 1983; Hancock et al., 1984). Also, subjects encountering a number of different computational problems embedded in a series of simple mathematical additions show an increase in ACT (Hancock and Brainard, 1981). According to the tindings ot t h e tollowing works and despite criticism, once more, it was demonstrated that the heart rate variability technique is a promising tool in the M W L studies. Meshkati (1983) utilized sinus arrhythmia and observed a significantly higher score under the rest as compared with the loading conditions. Robertson, Hendrick, and Hancock (1984), in their investigation of the role of cognitive complexity of the individuals in responses to a computer-generated mental workload task, found that sinus arrhythmia scores for one group of subjects (i.e., abstract) were significantly higher than their (concrete) counterparts. A similar pattern was reporred in another study by Robertson and Mehskati (1985). These observations are i n compliance with t h e O'Donnell's ( 1979) evaluation of and recommendation on the application ot psychophysiological techniques for certain purposes. In dealing with physiological methods of' assessing mental workload, it should be recognized that many aspects of operator behavior other than mental workload may have an effect on the physiological measures. As Kalsbeek (I97 I) noted, physiological changes reflect not only the mental workload but the combined influence of stress trom the environment, From physical effort. and trom the emotional state as well as the mental workload. The author stated that it would be of great interest if one could find physiological indicators related to one moment of conscious control or of several successive ones with variable intervals, and it would be even better i f w e could understand the neurophysiological implications of a so-called moment of conscious brain control.
Four Primary Mental Workload Assessment Methods
259
Hopkin, Parks, Rohmert, Rault, Soede. and Schmidike (1979) considered the inability of physiological measures to discriminate between mental workload in information processing and mental workload from other sources, such as emotional factors, as a major limitation on the usefulness of these techniques. This fact is also acknowledged by Hamilton, Mulder, Strasser. and Ursin (l979), w h o suggested the utilization of special analysis techniques in order to determine the contaminating effects OF some "task specific activity." Rolfe (1973) concluded that the true meaning of physiological changes in exposure to mental workload can only be assessed in conjunction with a comprehensive knowledge of task situations. In the same vein. Mulder ( 1 979) hypothesized that different patterns of physiological activities are associated with different types of cognitive functions. 3. EPILOGUE T O THE DISCUSSION O F MENTAL WORKLOAD ASSESSMENT METHODS As briefly explained earlier in this work, a common and important aspect of all mental workload assessment methods is their relative sensitivity to individual differences. In his study, Moray (1984, p. 4 I)has reiterated this fact and asserted: "Individual differences in workload research is far more important than has hitherto been acknowledged Without taking this into account we are 3eriously delaying the development of a usetirl measure." A large number ot researchers who did not obtain significant results in applying mental workload measurement techniques suggested that either the sample population must be homogenized or, alternatively. personality traits, individual differences. and other related factors should be incorporated in the model. Following is a summary of such expert recommendations Kitchin and Graham (1961) referred to the character of the human operator as "a very important area for concentration" and without which the operator's physical and mental abilities are of little value to industry. Mulder and Mulder (1973) acknowledged the large differences among subjects and theret'ore recomniended single subject analysis. Leplat ( 1978) stated that the characteristics ot personality could intervene in a tar-from-negligible manner in regard to workload. Hamilton, Mulder, Strasser, and Ursin (1979) tried to analyze the activation responses as a function of the task characteristics. Furthermore. they acknowledged that the subject's active information processing involves personality traits. Hopkin ( 1979) considered personality variables as potentially relevant to mental workload. According to Firth (1973), in the real life working environment, individual differences in operator's characteristics very much influence the information processing of the individuals. These differences arise from a combination of past experience, skill, emotional state, motivation, and the estimation of risk and cost inherent in a task. T h e influence of these individual differences is important, since many of these factors have been shown to directly influence cardiac responses.
260
N. Meshkoti and A. Loewenthal
There are some other indications. of a relationship between personality traits and physiological reaction parameters. e.g.. Rotter's ( 1966) internal-external locus of control and heart rate control. Ray and Lamb (1974) and Gatchel (1975) tound that internal locus of control subjects were better able to increase their heart rate as compared with their external counterparts. Duffy (1962) reported that individual differences in responsiveness have been observed in many forms, in the Frequency and amplitude of rhythms in the EEG. in the occurrence of "spontaneous" changes in skin resistance, peripheral blood How, heart rate, muscle tension, and other functions. T h e author referred to the work of Armstrong (1938), who detected correlation bcrween cardiovascular reactivity and emotional stahility in 700 Air Corps candidates. Offerhaus (1980), bared ~ i p o nhis study of hospital staff (normal subjects), and psychiatric patients concluded that by eniploying the concept of' heart rate variability, it is possible to differentiate between two pairs of groups of subjects: first, the high anxiety group from the anxiety one (i.e.. psychotic patients from non-patients), and second, the stress reactor group from the non-Stress reactor group (Le., acute patients and neurotic staff from chronic patients and stable staft). T h e issue of individual differences and psychological variables and their substantial ettects on autonomic responses has been addressed by Cleary (1974); Van Egeren. Headrick, and Hein (1972); and Sutton and Tueting (1975). This concept was experimentally evaluated and contirmed by Bryson and Driver (1972). They found that "cognitively complex" subjects manifest higher GSRs in attending to stimuli. Lykken (1968), in the same regard, referred to two additional areas of consideration of individual differences, as the tonic psychophysiological level and phasic response to specific stimuli. The effects of personality traits and individual differences on the performance of a mental task bears a great amount of signit'icance. Hopkin (1979) stated that on many occasions, individual differences have precluded general judgments on whether the taskinduced workload is excessive as distinct I'rom high. He considered this typical inability to generalize the Findings as mostly due to the tact that any given individual characteristic becomes a pertinent tactor of workload only insofar as the task being performed brings that characteristic into play. Schroder, Driver, and Streiifert ( 1967) also reiterated this fact by arguing that if the task requires the processing of large amounts of discrepant information, and if this information must be integrated into a flexible, comprehensive system, then it can he expected that the "integratively complex" persons would perform better than integratively simple persons. They also postulated and later demonstrated that superior performance may he expected of a simple person, in an open situation, if the environment is complex and t h e criterion is simple. Thackray, Jones, and Touchstone (1973) studied the role o f personality in the performance decrement and attention. Their results indicated that individuals scoring high on a distractability scale (i.e.. extrovert) tound it difficult to maintain a uniform rate of performance. This group of subjects exhibited increasing lapses ot attention, while introverted ones tailed to show any evidence of a decline in attention. Wickens (1979) also regarded the relatively large differences among subjects in time sharing abilities as the cause of substantial variance in dual-task performance. His proposal to tackle this problem was to "calibrate" particular workload measurement techniques for different operators. Furthermore, with reference to Pew (1970). he recommended that at the same time these individual differences might actually be
Four Primary Mental Workload Assessment Methods
26 1
exploited to enhance system performance by employing them to provide guidelines tor assigning operators to specitic systems, or by modifying systems to the limitations and strengths of individual operators. Kahnenian (1973) rates a system possessing these qualities as "perfect." 4. REFERENCES Armstrong, H . G . (1938). The blood pressure and pulse rate as an index of emotioiiirl stability American journal of Medical Science. 195, 2 I I - 220. Audley, R. J . , Rouse, W., Sanders, J . , and Sheridair, T. (1979). Final report of mathematical modeling group. I n N . Moray (Ed.), Mental workload: Its theory ~ K J measurement. New York: Plenum Press, 269-285. Bloem, K. A. and Damos, D. L. (1985). Individual differences i n secondary task performance and subjective estimation of workload. Psychological Reports, 5_s, 3 I 1-322. Borg, G. (1978). Subjective aspects of physical work. Ergonomics. 21. 2 15-220 Borg. G . . Bratfisch. O . , and Dorinc, S. (1971). O n the problem of perceived difficulty. Scandinavian Journal of Psychology, 12.249-260. Brown, I . D. (1978) Dual task methods of assessing workload. Ereonomics, Bryson, J B and Driver, M J (1969) Psychonomic Science, l7. 7 1-72
21. 221-224
Conceptual complexity and internal arousal
Casali, J . G. and Wierwille, W. W. (1983). A comparison of rating scale, secondary task, physiological. and primary task workload estimation techniques in a simulated flight task emphasizing communications load. Human Factors, 25, 623-64 1. Cleary, P. J. ( 1974). Description of individual differences in autonomic reactions. Psychological Bulletin, 8 l , 934-944. Cooper, G . E. and Harper, R. P., Jr. (1969. April). The use of pilot rating ih the evaluation of aircraft handling qualities. Moffett Field, CA: National Aeronautics and Space Administration, Ames Research Center (NASA TN-D-5 153). Damos, D. L. (1984). Individual differences in multiple-task performance and subjective estimates of workload. Perceptual and Motor Skills, 59, 567-580. Damos, D. L. and Bloem, K. A. (1985). Type A behavior pattern, multiple-task performance and subjective estimation of mental workload. Bulletin of the Psychomonaic 23, 53-56.
w,
Duffy. E. (1962). Activation and behavior. New York: Wiley, Eggemeier, F. T., Crabtree. M. S.,and La Pointe, P. A. (1983). T h e effect OF delayed
262
N. Meshkati and A . Loewenthal
effort on subjective ratings of mental workload. Proceedings of the Human Factors 27, 139-143.
m,
Enstrom, K. D. and Rouse, W. B. (1977). Real time determination of how a human has allocated his attention between control and monitoring tasks. IEEE Transactions o-n Systems, Man and Cybernetics, 7, 153- I 6 1. Firth, P. A. (1973). Psychological factors influencing the relationship between cardiac arrhythmia and mental load. Ergonomics. 5- 16.
s,
Gartner, W. B. and Murphy, M. R. (1976. October). Pilot workload and fatigue. A critical survey of concepts and assessment techniques. Washington, DC: National Aeronautics and Space Administration (Report No. ASD-TR-76- 19). Gatchel. R. J. (1975). Change over training sessions of relationships between locus of' control and voluntary heart rate control. Perceptual and Motor Skills, 40, 424-426. Gauine, J . G. and White, R. T. (1975. December). Mental workload assessment 111. Laboratory evaluation of one subjective and two physiological nieasures of mental workload. Long Beach, CA: McDonnell-Douglas Corporation, (Report MDCJ7024/0 I ) . Gibson, H.B. and Curran, J . D. (1974). The effect of distraction on a psychornotor task studied with reference to personality. Irish Journal of Psychology, 2. 148-158. Hamilton, P., Mulder, G . , Strasser, H., and Ursin, H. (1979). Final report ot the physiological psychology group. In N. Moray (Ed.), Mental workload: Its theory & measurement. N e w York: Plenum Press, 367-385. Hancock, P. A. (1983). The effect of an induced selective increase in head temperature upon performance of a simple mental task. Human Factors, 25, 44 1-448. Hancock, P. A (19x4) An endogenous metric for the control of perception of brief temporal intervals Ann& 4 the New York Academy of Sciences, 423, 594-596 Hancock, P. A. and Brainard, D. M. (1981). Tympanic temperature: A non-invasive physiological measure of workload. Technical Report, Endeco, Inc. MA, Hancock, P. A,, Meshkati, N . , and Robertson, M . M . (1985). Physiological reflections of mental workload. Aviation, Space, and Environmental Medicine, 56, 1 110-1 114. Hauser. J.R., Childress, M.E., and Hart. S.G. (l982a). Rating consistency and component salience in subjective workload estimation. Paper presented at the 18th Annual Conference on Manual Control, Dayton, Ohio. Hauser, J.R., Childress, M.E., and Hart, S.G. (1982b). Individual definitions of the term "workload." Paper presented at the 1982 Psychology i n the DOD Symposium. Hicks, T. G. and Wierwille. W. W. (1979). Comparison o f five mental workload assessment procedures in a moving-base driving simulator. Human Factors, 21, 129-143.
Four Primary Mental Workload Assessment Methods
263
Hopkin, V . D. (1979). General discussion based upon interactive group sessions. In N . Moray (Ed.), Mental workload Its thhp_rya n d measurement. New York: Plenum Press, 484-487. Hopkin, V . D , Parks. D. L., Rohmert, W.. Rault, A,. Soede. T . , and Schmidtke. H. (1979). Final report of application group. 111 N . Moray (Ed.). _M_et,?_lworkload: theory and measuremen!. N e w York: Plenum Press, 469- 495. Huddleston. H. F. (1974). Personality and apparent operator capacity. Pe_rceptual and Motor Skills, 38, 1189-1 190. Hyndman, B. W. and Gregory, J R. (1975). Spectral analysis of sinus arrhythinia during mental loading. Ereononiics, l8, 255-270. Hyyppa, M T . , Aungola. S.. Lahtela. K . , Lahti. R.. and Marnieini, J . (1983). Psychoneuroentloci me responses to mental load i n an achievernent-oric.nted task. Ergononics, 23,I 155-1 162. Isreal, J . B.. Wickelis. C. D.. Chesney, G. L.. and Donchin. E. (1980). The event-related brain potential as ail index of displ;ry-rnoiiitoriii~workload. Human Factors. 22. 2 11-224. Johannsen, G , Moray. N . . Pew, R.. Rasmussen. J.. Sanders. A , . and Wickens, C. D. (1979) Firial report of experimental psychology group. In N . Moray (Ed.), Mental workloacI; Its theory and measurenient. New York: Plenum Press, 101-114. Kahneman, D.(1973). Attention and effort. Englewood Cliffs. NJ: Prentice-Hall, Kalsbeek, J. W. H. (1971). Standards of acceptable load in ATC tasks. Erqonomks-. Ip, 64 1-650. Kalsbeek, J. W. H. and Ettema, J . H. (1964). Physiological and psychological evaluation of distraction stress. Proceeding 2nd International Congress on Ergonomics. Dortmund, 443-447. Kalsbeek, J . W. H. and Sykes. R. N. (1967). Objective measurement ot mental load. A z a Psvcholoe;ica, 27, 253-26 I . Kitchin, J. B. and Graham, A. (1961). Mental loading of process operators. An attempt to devise a method ot analysis and assessment. Ergonomics, 5 , 1-15. Knowles, W. B. (1963). Operator loading tasks. Human Factors, 5 , 153-161 Krarner, W. F., Wickens. C. D., and Donchin, E. (1983). An analysis o t the processing requirements of a complex perceptual-motor task. Human Factors, 25, 597-62 I . Leplat, J. (1978). Factors determining workload. Ergonomics,
a,143- 149
Loewenthal, A . (1983, November). Alveolar gas concentration and mental workload. Department of Industrial and Systems Engineering. University of Southern California
264
N. Meshkati
und A. Loewenthal
(Technical Report 89-2) Lykken, D T (l96H) Neuropsychology .ind psychophysiology i n personality research I n E F Borgatta and W W Iambert (Eds ), Handbook ot personality theory ,Ind research Chicago Kand McNallr. 4 13-509 Meshkati, N. (1983). A conceptual model tor the assessment of mental workload based upon individual decision styles. tjnpublished Ph.D. dissertation. University of Southern California. Meshkati, N . , Hancock, P.A., and Robertson, M . M . (1984). The measurement of hunian mental workload in dynamic organizational systems: An effective guide tor job design. In: H.W. Heridrick and 0. Brown (Eds.), Human factors in or~anizatiorn~l clel & a management. North-Holland: Anisterdam.
s ~x~e~ a rd Idezirj!. s McCormick, E. J . and Sanders, M . S. (1982). Human factors i 5th edition. New York: McGraw-Hill. Miller, R. C. and Hart, S. G. (1984, June). Assessing the subjective workload of directional orientation tasks. Proceedings of the 20th Annual Conference on Manual Control. Moray, N . (1984, May). Merital workload. Proceedings of the 1981 International Cont'erence on Occupational Ergonomics, Toronto, Canada. 4 1-46, Moray, N (1982). Subjective mental workload. Human Factors, 2 4 , 25- 40 Moray, N. (Ed ). (1979). Mental workload: Its theory and rlieasuremerit. Plenum Press.
New York:
Mulder, G . (1979). Sirius arrhythmia arid mental workload. In N . Moray (Ed.), Mental workload, Its &e()ry and measurement. N e w York: Plenum Press, 327- 343. Mulder, G and Mulder-Hajonidcs Van Der Meulen, W R E H (1973) and the measurement ot heart rate variability Ergonomics, 69-83
s,
Mental lodd
ODonnell, K. D. ( 1979, July). Contributioiis ot psyctiophysiological techniques design and other operational problems. (AGARD-AG-244).
to
aircrat't
Offerhaus. R. E. (1980). Heart rate variabilitv in osvchiatrv. In R . I . Kitnev and 0 Rompelman (Ed? ), I& study heart rate'variabiiity Oxtord Clarendon 'Press. 225-238 Ogden, G . D., Levine, J . M . . and Eisner. E. J . (1978, January). Measurement OF workload by secondary tasks, A Review and Annotated Bibliography. Washington, DC: Advanced Research Resources Organization. National Aeronautical and Space Administration, Ames Research Center (Contract NAS2-9637). Pew, R. W . f 1970). Comments on 'Promotion o t Man':
Challenges in sociotechnical
Four Primary Mental Workload Assessment Methods
265
systems: Design for the individual operator. Proceedings of the Global Systems Dynamics International Symposium. Charlottesville. NJ: 59-65. Phillip, V., Reiche. D.. and Kirc-hner, J . (1971). Erzonomics, 14,61 1-616.
The use of subjective rating.
Rahimi, M . (1982). Evaluation of workload estimation techniques in simulated piloting tasks emphasizing rneditational activity. Unpublished doctoral dissertation. Virginia Polytechnic Institute and State University. Blacksburg, VA. Ray, W. J . and Lamb, S. B. (1974). Locus of control and the voluntary control of heart rate. Psychosomatic Medicine, 36, 180- 182. Reid. G . B., Eggemeier, F. T.. and Nygren, T. E. (1982). An individual differences approach to SWAT scale development. Proceedings of the Human Factors Society, 26, 639-642. Reid, G . B., Shingleclecker, C . A,, and Eggemeier, F. T. (1981, October). Application of conjoint measurement to workload scale development. Proceedings of the 198 1 Human Factors Society Annual Meeting, 522-526. Robertson, M . M. (1984). Personality differences as a moderator of mental workload behavior: Mental workload performance and strain reaction as a function of cognitive complexity. Proceedings of 28th Annual Meeting of the Human Factors Society, Santa Monica, CA. Robertson, M. M., Hendrick, H. W., and Hancock, P. A. (1984). Individual response to a computer generated mental workload task as a function of cognitive complexity. Proceedings of the I984 International Conference o n Occupational Ergonomics. Toronto, Canada. Robertson, M. M. and Meshkati. N . (1985). Analysis of the effects of two individual differences classification models on experiencing mental workload of a computer generated task: A new perspective to job design and task analysis. Proceedings OF the 29th Annual Meeting ot' the Human Factors Society, Santa Monica, CA. Rolfe, J . M. (1973). T h e secondary task as a measure of mental load. In W. T Singleton. J. G . Fox, and D. Whitfield (Eds.), Measurement of man at work. London: Taylor and Francis, 135- 148.
Rolfe, J . M . (1976). T h e measurement of human response in man vehicle control situations. In T. B. Sheridan and G. Johannsen (Eds.), Monitoring behavior a d supervisory control. N e w York: Plenum Press, 125- 137. Rotter. J. B. (1966). Generalized expectancies tor internal versus external control of reinforcement. Psychological Monographs. 80. No. 7. Rouse, W. B. (1979). Approaches to mental workload. In N. Moray (Ed.), Mental workload: Its theory and measurement. N e w York Pleirum Press, 255- 262
266
N. Meshkati and A . Loewenrhal
Schroder, H., Driver, M . , dnd Streiifert, S. ( I 967). Human information processirig. New York: Holt, Rinehart, & Winston. External and internal environmeiits II. Sharit, J . and Salvendy. G. (1982). Reconsideration of the relationship between sinus arrhythmia and int'ormation lo,itl Ergonomics, 22, 12 I- 132. Sutton, S . , and Tueting, P. (1975). The sensitivity of the evoked potential to psychological variahles. In P. H. Veriables and M . J . Christie (Eds.), %szarc.li in psychophysioloa. New York: Wiley and Sons, 35 1-363. Thackray, R. I . , Jones, K . N . . and Touchstone, R. M. (1973). Personality a n d physiological correlates ot pertorinance decrement on a monotonous task requiring sustained attention Washington. DC: FAA Office o f Aviation Medicine (Report N o . AM-73- 14). Tulga. M. K. ( 197X). Dynamic decision making iri midtitask supervisory control: Comparison o t an optimal algorithm to human behavior. Cambridge, M A : M I T Man Machine Systems Ihhoratory.
Van Egeren, L. F., Headrick, M . W., and Hein, P. L. (1972). lndividiial dil.terencrs in autonomic responses: Illtistratioil of a possible solution. I1sychophys~ol~g~. 2. 626-603. Weltord, A. T. (1973). Mental workload as a function of derriand, cnpacity, strategy and skill. Ere;onomics, 21, 157-167 Wickens, C. D. (1980). T h e structure of attentional resources. In R. Nickerson and R. Pew (Eds.) &titntIofi arid pertocrnaiice V I I I . Englewood Cliffs, NJ: Erlbaum. Wickens, C. D. (1979). Measures of workload, stress and secondary tasks. In N . Moray (Ed.), Mental workbz!; Its theory and nic:.as-ure_im. New York: Plenuin Press, 79-99. Wickens, C., Heffley. E.. Karnier, A , , and Donchin, E. (1980). The event-related brain potential as an index ot attention allocation in complex displays. _9r_qceed-?gs ot the ____ Human _ Factors _ Society 24, 297-30 I. Wierwille. W. W. ( 1979). Physiological measures of aircrew mental workload. F s,21, 575-593.
H-bi!iian
Wierwille, W. W. and Casali, J . G . (1983). A validated rating scale lor global mental workload measurement applications. Proceedings OF the 27th Annual Meeting o f the Human Factors Society. Santa Monica, CA: Human Factors Society. Wierwille. W. W. and Conner, S. A., (1983). Evaluation of 20 workload measures using a psychomotor task in a moving-base aircraft simulator. Human Factors, 25, 1-16. Wierwille, W. W. and Williges, R. C. (1978, September). Survey arid analysis of operator workload assessment techniques. Blacksburg, VA: Systernetics. Iric. (Final Technical Report No. 5-78-101).
Four Primary Mental Workload Assessment Methods
261
Williges, R. C. a n d Wierwille. W. W. (1979). Behavioral measures of aircrew mental workload. H u m a n Factors, 549-574
a,
This Page Intentionally Left Blank
HUMAN MENTAL WORKLOAD
P.A. Hancock and N. Meshkati(Editors) 0 Elsevier Science Publishers B.V. (North-Holland), 1988
269
T H E EFFECTS O F INDIVIDUAL DIFFERENCES I N I N FORM AT ION PROCESS1 NG B EH A V IOR 0 N EX PER1 E NCI N C MENTAL WORKLOAD AND PERCEIVED TASK DIFFICULTY: A PRELIMINARY EXPERIMENTAL INVESTIGATION
N . Meshkati
Human Factors Department Institute of Safety and Systems Management University of Southern California Los Angeles, CA 90089 and A. Loewenthal
Lockheed Aeronautical System Co. Burbank, CA 91520
ABSTRACT A comprehensive conceptual model for mental workload assessment, which includes the txtor of individual differences in information processing (i.e,, decision style), is introduced and described. T h e model is experimentally evaluated and the results are analyzed and reported. It is found that the dependent variables (sinus arrhythmia and subjective rating) are affected by the subject's individual decision styles. It also appears that the perceived task difficulty is a function of the dominantlbackup decision style pattern. T h e sinus arrhythmia scoring method, the structure and content of the tasks, motivational factors, and reserve capacity affect the outcomes of the experiment. These effects are also analyzed and documented.
I . INTRODUCTION Numerous studies assert that excessive Mental WorkLoad (MWL) results in negative effects on human operators both in terms of performance and physical and psychological well being. Welford (1978) considered two different kinds of symptoms as the result of an unbalanced mental load. First is the "chronic overload," which appears to be implicated in many psychosomatic disorders such as ulcers of the gastrointestinal tract, various neurotic symptoms, hypertension, and other heart ailments. T h e second one is the complaint of the incumbents that after a few years the j o b "gets o n top of you" and results in a change of job. This is mainly due to the loads which become unbearable tor reasons that are esseiitially emotional. Unbalanced workload is a potential source ot stress, and excessive stress tends to disrupt performance. Tikhomirov (I97 I ) has cited some of the negative effects of stress as narrowing the span of attention. forgetting the proper sequence of actions, incorrectly evaluating situations, slow decision making. and
210
N. Meshkati and A. Loewenthal
failing to carry out decisions made. Furthermore, according to Turner and Karasek (1984), performance disruption may extend far beyond the original task boundary arid, through generalized "tension" or "anxiety," affect the individual's entire psychological orientation. The result may be loss of esteem, social aggression, or sleep disruption. Hyndman ( 1 980) also has echoed these findings and considered stress-induced hypertension due to the melitally demanding occupational task as its potential hazard to health. Sheridan (1980) emphasized the health and safety of the human operator as t h e principal underlying concern and related that to the basic question of how much can t h e operator d o before performance breaks down. According to Strasser (1979), humans are not, like machines which call be utilized and coupled with men arid technical devices without taking into account wishes, necessities and social needs. Thus, in order to optimize safety and performance, job demands, task characteristics and operator's individual capabilities should be analyzed and determined. This knowledge may contribute to better personnel selection and to more efficient system design (Chiles and Alluisi, 1979).
The need for the incorporation of the individual differences into mental workload assessment models has been advocated by various investigators and practitioners in t h e field (cf. Moray, 1984; Wickens. 1979; Firth, 1973; Meshkati and Loewenthal. 1987). I t such a model can he constructed and tested, then it would be a viable remedy which resolves some of the problems associated with individual differences. This study attempts to shed some light on the role of individual differences in decision making and mental workload. There are three objectives. First, the role of individual differences in information processing is studied and the Decision Style model which includes this concept is briefly reviewed. Second, a proposed version of a conceptual workload assessment model based upon that parameter is introduced. Third, the results of experimental evaluation of the workload and its etfects are presented and discussed.
2. DECISION STYLE MODEL In terms of MWL assessment, the human operator can be considered as an input-output system of information processing (Kalsbeek, 1968). This concept is also supported by Rohmert (1979). who described non-physical work as sensing, inlormation processing, decision making, and action generation. These and many others similar assertions lead one to closely relate MWL, information processing, and decision niaking to each other. All the operations between the point of sensing the stimuli (data) and the point of taking a specific action can be regarded horn the perspective of information processing and decision making. However. since individuals have different levels of conceptual structures (i.e., the way an individual receives, processes, and transmits information) then this process is fairly individualistic and is referred to as Information Processing Behavior (IPB) (Schroder, Driver, and Streufert, 1967). T h e complexity of the IPB portrays the manner by which the individual decision maker seeks, acquires, evaluates, integrates, and uses information in order to make a decision (Alawi. 1973). T h e complexity of the IPB of an individual is determined by three major factors: the complexity of the j o b environment, the complexity of the organization in which his job is situated, and, most importantly, his cognitive style complexity (Alawi, 1973). T h e job environment incorporates four basic and task-related
parameters:
(a)
Mental Workload and Perceived Task Difficulty
27 I
information load, (b) time pressure, (c) routineness, and (d) autonomy. These are among the critical sources of pressure that any job incumbent experiences in hidher j o b Organizational complexity i s a function of the complexity of every level of the organization. An organization that is bureaucratic in the Weberian detinition is classitied as "simple," whercas one that is iriatricized and highly integrated in nature and character i s viewed as "complex ." Cognitive styles are defined as Ie,ir.ned thinking habits. Intelligence tries to capture the upper limit of a person's thinkina capacity; styles try to measure a person's typical ~i. styles are, therefore. not absolute in method of thinking i n a given s i t i ~ ~ ~ r i oCognitive any sense. They CJII be moditied b y further learning. and there is no aksolute best style (Driver, 1983). Schroder et al. (l!)G7) and Driver and Streufert (1969) developed a human information processing model. This model suggests that environmental pressures (or load) systematically affect the coniplexity of information processing in persons and groups following an inverted U-shaped function. Each individual or group can be considered to have a unique and consistent ccirvilinear information processing pattern. Environmental I-oatl is defined as the s u m of the effects of four basic environniental factors: (a) information complexity in the environment, (b) Noxity or negative input, (c) Eucity or positive input, and (d) uncertainty (Driver, 1979b). Informational complexity is defined as the iiipiit to the individual which changes either probabilities o r utilities perceived by h i m . Operationally, it is linked to such input aspects as number of messages per unit of time, complexity of message content, time pressure, number of people supervised. amoiint o f reading, and similar variables. the amount of negative input reaching the individual; operationally, Eucity is defined as the aino~iiitof positive input and has been operationally defined as praije. success. support and the like. Finally, uncertainty refers to the unpredictability of t h e situation. Noxity is defined
.IS
it has beeii linked t o threat, failure, criticism. assault. and siiriilar variables.
T h e decision style niodel which IS based upon the toregoing concepts has two basic dimensions. information use and focus. Inforiiiation use refers to the amount and complexity ot inforin,ition actually used in thinking. Number of foci is defined as the number of alternative which are contained in the final solution Focus is a continuous dimension ranging troin unifocus to niultifocus (Driver, 197% and IW%\ Both of the dimensions--iritor.n~atioriiise and focus--can be partitioned into categories for descriptive purposes. the intorrr~ationuse dimension can be split at some point between t w o extremes. At one extreme a r e those individuals who habitually use as much information as is compatible with non-redundancy (termed maximizers). A t the other end are those people w h o use just enough information to generate one or t w o useful alternatives (termed satisficers). T h e maximizerisatisficer dimension suggests low vs. high amount of integration. or the type and amount of connections between information units during analysis. T h e Focus dimension has t w o extremes, the unifocus, in which a single alternative forms the outcome, and multitocus, i n which many ditferent options are final answers. By combining the dimension ot intorination use and focus dimension, the model seen in Table 1 can be generated.
N . Meshkati and A. Loewenthal
212
UNI FOCUS MULTI
DECISIVE
HIERARCHIC
---------------.--------------SYSTEM1C FLEXIBLE
INTEGRATIVE
TABLE I BASIC DECISION STYLES
T h e Decisive Style ( D ) is det'ined as using just enough data to develop a satistactory answer to which one irrevocably adheres. This style is expected to be very concerned with speed, efficiency, consistency. and achievement OF results (Driver and Rowe, 1979; Driver, 1979a). T h e Flexible Style (F) also employs a minimal o r sufficient amount of data and is w e n as one engendering the use of enough data to reach one or two alternative conclusions which are constantly open to absorb new data and its reevaluation, and generates new solutions as needed. This style is associated with speed, adaptability, and a certain intuitiveness. T h e Hierarchic Style ( H ) presents a sharp contrast to t h e other two and shows a very high use of a l l available information to meticulously generate a single optimal plan of action. Then the solution is implemented using an elaborate contingency plan, but it is basically resistant to change. This style is seen as rigorous. analytic, precise, arid even perfectionistic. T h e Integrative Style ( I ) also uses a large amount of information, but simultaneouslr generates a number of possible solutions tor implementation. There is also a greater tendency to rely o n creative synthesis rather than pure logic. This style is highly inventive, emphatic and cooperative. T h e Systemic Decision Style (S) combines features of Integrative and Hierarchic orientation. This style seeins to operate at tirst as an Integrative in exploring all options; then they shift to a higher order schema to prioritize options more like a Hierarchic. They appear to be more methodical and careful than t h e Integrative. yet more open than the Hierarchic style.
In conclusion, the model proposes that there are five basic styles out of which four, i.e., D, F, H . and I , are the frequently observed ones. Each person has acquired at least one basic or "dominant" stvle that normally shows LIP under moderate environmental load.
Mental Workload and Perceived Task Difficulty
213
For most people, rl second or "back up" style emerges in extreme load conditions. The overload or underload can be due to information, which refers to the intorniatiorial pressure emanating from the j o b environment that the job incumbent experiences duriiig a period o t time (Alawi, 1973). Information load, according to Streufert and Strekltert (1978), I S the number and kind OF external stimuli impinging on t h e organization w i t h i n a specitic limited time period, and is the variable nlanipulated in this study. There are t w o measures iri use of decision style: the Driver Decision Style Exercise (DDSE). designed to assess unconscious (operating) style, and the Driver-Streutert Complexity Index ( DSCI), aimed a t measuring conscious (role) style.
3. T H E CONCEF'I'UAL MODEL AND METHOD 3.1. Experimental Design In order to incorporate the individual ditterences parameter in the workload assessment model and study its effects, a two-factor factorial experimental design w i t h repeated measures on one factor was utilized. The levels of the first factor, Decision Style, are the four primary dominant decision styles, and the levels of the second tactor were tour mental workload levels (i.e., Rest (R), Low (L). iModerate (M), and High ( H ) ) . T h e subjects were nested under the Decision Style factors. The analysis of variance followed the approach used by Winer (1971) tor these models. There are two basic sources of variation; between subjects and within subjects. Under the between subjects category there are two sums of squares; Decision Style and subjects within decision styles. Under the within subject heading there are three SLIIIIS ot squares; Mental Workload, interaction of Mental Workload and Decision Styles, and interactiorii of Mental Workload and subjects within decision styles Entries for each subject are the sinus arrhythmia score (SA) and subjective rating (SR) 011 each of the mental workload level tasks. Although concern has been expressed regarding the non-normality and nonhomogeneity of the heart rate distribution, according to Graham (19x0). for t h e adtilt group data, the degree to which these characteristics are present in either heart rate o r period (from .which sinus arrhythmia scores are derived by linear traiistorrnatioii) distribution will not violate the assumptions of statistical tests. Each of the subjects in each level of the Decision Style factor was observed under tour different treatment combinations of Mental Workload factor. The order ot presentation of trials was randomized in order to avoid the systematic bias.
In order to detect a difference of one standard deviation between t w o means at an alpha of 0.05, at least (our replicates are necessary (Neter and Wasserman, 1974). This implies the use of at least tour subjects in each decision style group Furthermore, in a pilot study based on Winer's (1971) approach and prior to the actual experiment the number OF subjects in each group was settled at 16 and, consequently, the total at 64.
214
N. Meshkati and A. Loewenthal
3.2. Independent and Dependent Variables The independent variables were the mental workload and decision styles. T h e dependent variables were Sinus Arrhythmia (SA), where SA is defined as the variability of instantaneous heart rate, and Subjective Rating (SR). Despite t h e lack of consensus among researchers o n the existence of strong theoretical and einpirical bases tor lieart rate variability, J significant nuinber of investigators considers it as one of the most viable psychophysiological indicators ot mental workload (see Kalsbeek, 1968; 197 I ; 1973, Ettenia and Zielhuis. 197 I . Rohniert and Laurig; I97 I . Kohmert; Laurig, and Luzak; 1973. Meers arid Verhagen; 197'2, Zwaga; 1973. Boyce; 1976, Strasser; 1977, Strasser; 1979, Opnierr; 1973, Hyndrnan and Grcgory; 1975. Hyndman; 1980; Hancock, Meshkati, and Robertson. 1985). Furthermore, as O'Doiinell (1979) has pointed out. if the application of psychophysiological techniques is viewed tram a specific "level" and for a certain purpose, "the lack of broad theoretical base is not an insurmountable obstacle." The scoring of SA i n this study was based upon the methods set forth by Kalsbeek (1968, 1973). T h e subjective rating of workload was another dependent variable that was used because of its relevance and wide application in this tield. The subjective rating of difficulty was based on the Constant Sum Method. This method was originally proposed by Metfessel (1947) for the purpose of obtaining psychological values on ratio scales. T h e typical judgment of t h e subject is to divide a total of 100 poiiits as h e thinks they should be divided to represent the dit'ticulties ot the mental workload levels in the experiment (Guilford, 1954). 3.3. Experiinental Method and Procedures T h e experiment consisted of two parts. The tirst part involved administering the DDSE. T h e second part of the study consisted at the administration of mental workload tasks to the individual participants. Upon arrival at the Human Factors Laboratory and the introductory procedures, the subject was requested to take t h e DDSE. There was no time limit for this instrument which usually lasted 15-20 minutes. All the subjects, under 30 years of age, were volunteer male students in the Human Factors area in the department of industrial and systems engineering at a major university. This instrument produces a numerical score for each one of the information processing behaviors utilized by the subject as well as the dominantlback-up decision styles. Based on the results of the DDSE, the participants were assigned to one of the four primary decision style groups. Due to the rarity of dominant Systemic Decision Style in the population, this group was not included in this study. Following the administration OF the DDSE, each subject was instrumented with EKG electrodes, and t h e base line of the waveform was established. T h e heart interbeat intervals were monitored with LaFayette EEG/EKC Amplifier Model 76406. Its output signal was modulated and the R-peak of EKG waveform detected by the Peak Reader (Meshkati. 1983). At this time. five minutes of Resting ( K ) heart rate values were recorded. At the end of the rest period, the subject was asked to start the first of three experimental tasks. Each task was followed by a 5 minute rest period. There was a time limit of 15 minutes for each task. The subject was informed at the beginning of each trial that it was his responsibility to keep track of and budget his time
Mental Workload and Perceived Task Difficulty
215
in order to be able to complete each task. At the end ot the experiment. t h e subjective rating of the tasks difficulty was sought and recorded. Each experimental rask was a paper and pencil instrument consisting ot t h e description ot a hypothetical situation followed by a certain number of items ot iritormation .itid questions requiring decision making regarding the case. These tasks were rnoditied versions of the stantlardized DDSE. and had t h e same structure. T h e participant’s responses in the experimental trials were based on his cognitive style when faced with the hypothetical case. As such, these responses should riot be judged against a so-called optimal set of decisions or used as a performance criterion. According to Driver ( I979a). the environmental load and the intormation complexity component of the task can be broken down into these parts: the number of inputs or messages per time unit. the complexity of message content, time pressure and the amount of reading required. The levels of difficulty in the experimental tasks were achieved by altering the number of information items and questions. Although apparently self paced, the 15 minute time limit on each task in conjunction with the increasing number of items of intormation and questions resulted in a real increase in the environmental load. T h e complexity of the message coiitent stems trom to t h e case description the number ot combinations of items of intorination. the number of comparisons among them, arid their integration in making arid reporting a decision Various complexity levels were achieved by increasing all these factors. T h e time pressure or time demand is an important component ot mental workload (White, 1971). Since the available time for each task was set at 15 minutes. the t i i r i r pressure was primarily determined by the length of description, the number of items ot information, and number of questions of each task. The low, moderate, and high M W L were designed in increasing order of case description length and complexity. They consisted of 4. 6, and 8 items of information and 20, 45, and 54 decision-making questions respectively. T h e last component of the load, the amount of reading, exhibited an ascending trend from low to high MWL tasks. Since it is determined not only by the number of items of information and number of questions, but also by the length of description. 4 . RESULTS
T h e analysis of the data was performed in two separate parts. T h e first part was concerned with the grouping of the subjects based upon their dominant decision styles and involved the determination, by analysis of variance, of statistically significant effects of the independent variables on the dependent ones for the overall decision styles. T h e second part consisted of the statistical analysis of the variables tor each decision style group.
N. Meshkati and A. Loewenthal
216
4 . I . Dominant Decision Style Groupll1g
Those effects that were found to be sigiiiticant at the 0.05 alpha level are presentrcl in Table 11.
DXQRSll
DXPXMDXNT
TAIL PROBABILITY
VARIABL.
SINUS ARRHYTHMIA
0.0066
(SA)
0.0
18.25
2.17
DECISION STYLE
1
O.O5O2
TABLE I1
Composite of the significant effects of the ANOVA tests (Grouping is dominant decision style)
Having determined the significant main effects and interactions through the ev.aluation of appropriate F statistics, the Newman-Keuls Test (Wilier. I97 I ) was applied to establish the relationship between the various levels of each factor. This procedure was repeated for SA and SR,which had t h e significant main effects. T h e results of the Newrnan-Keuls test are given in Table Ill.
T h e significant effect in the analysis of the sinus arrhythmia variable was the MWL. Newman-Keuls Test showed that a significant ditference between the levels means can be found only between the rest and the load conditions the latter being indistinguishable. T h e significant effects in the analysis of the subjective rating variable were MWL and the interactions of subjects nested under decision styles with M WL. Newman-Keuls Test results (Table 111). indicated that in the case of mental workload, there is a significant difference between the subjective rating of low, moderate, and high MWL.
Mental Workload and Perceived Task Difficulty
211
KPENDENT lARlABLE
EFFECT
SA
W L
NEWAN-KEULS TEST RESULTS
HSA
LSA
MSA
RSA
W L
SR
WL
x
D,LSR) (H.LSR) ( I . L S R ) (F.LSR)
(F.WSR1 (D.HSR) (1.MSR)
(I.HSR)
(H.MSR)
(H.HSR)
(F,HSR)
DEC. ST.
TABLE 111
(levels underlined by common line do not differ from each other)
mi: RSA : Rest Sinus Arrhythmia score LSA : &ow mental workload's Sinus Arrhythmia score M A : Moderate mental workload's
HSA : High mental workload's LSR : &ow mental workload's
Sinus Arrhythmia score
Sinus Arrhythmia score Subjective Rating
XSR : Moderato mental workload's
Subjective Bating
HSR : High mental workload's Subjective Bating H : Hierarchic decision style I : Integrative decision style
P : Flexible decision style D : Decisive decision style
(D.HSR)
N. Meshkati and A. Loewenthal
218
4.2. Results of Variables tor Each Dominant Decision Style Group In this part each dominant decision style group was treated separately. The results of the significant effects are shown in Table IV. The corresponding Newman-Keuls Test results are given in Table V
DECISION
DEPENDENT VARIABLF,
SOURCE of VARIATION
DEGREE of FUEEDOM
STATISTIC I"
I
PROBABILITY
SA
MWL
3
0.56
0.6414
SR
MWL
2
4.51
0.0194
*
4.40
0.0077
*
HIERARCHIC
SA
MWL
3
SR
MWL
2
SA
MWL
3
SR
m
0.2199 2.67.
0.0586
2
0.2967 0.7978
SA
MWL
3
SR
MWL
2
DECISIVE
,
43.23
*
0.0000
T A B U IV
Composite of the dgnificant effects of the ANOVA tests for individual decision stylea ( + Significant effect at alpha-0.05)
The Kendall Coefficient 0 1 Concordance was also utilized to provide another perspective on the degree of agreement within each dominant decision style group regarding subjective ratings of task difficulties. The results of this test and the associated significance level for each decision style are also tabulated in Table VI.
Mental Workload and Perceived Task Difficulty
279
DECISION DEPENDENT VARIABLES S I N U S ARRHYTHMIA
SUBJECTIVE RATING ~~
HIERARCHIC
HSA
LSA
MSA
RSA
-LSR
MSR
HSR
INTEGRATIVE
LSA
HSA
MSA
RSA
LSR
MSR
HSR
FLEXIBLE
HSA
MSA
L5A
RSA
LSR
MSR
HSR
DECISIVE
RSA
LSA
HSA
MSA
--LSR
MSR
HSR
N. Meshkati and A. Loewenthal
280
1
1
DECISION STYLE
KENDALL ::EFFICIENT CONCORDANCE
I
HIERARCHIC
I
INTEGRATIVE
I
I
0.23828 0.14355
I I
I
of SIGNIFICANCE 0.0221 0.1006
FLEXIBLE
0.15234
0.0874
DECISIVE
0.79395
0.0000
I I
I *
i
TABLE VI
Kendall coefficient of concordance
4.3. Behavior of Each Dominant Decision Style Hierarchic Decision Style: T h e subjects did not show any significant difference i n the sinus arrhythmia indicator either when they proceed From rest to the mental load o r while performing the three experimental tasks. T h e subjective rating indicator demonstrated a change between low mental workload on one hand and moderate and high on the other hand. However, it failed to discriniinate between moderate and high workload. The Kendall Coefficient of Concordance of t h e subjective ratings of this decision style was moderately low (W = 0.23) but significant. Integrative Decision S d e : T h e subjects showed a significant d r o p in the sinus arrhythmia level at the onset of the first test. This indicator remained relatively steady in the rest of the experiment. T h e subjective ratings failed to show any change throughout the test. The Kendall Coefficient 01 Concordance of the subjective ratings for this decision style was the lowest (W = 0.14) among all decision styles. _ Flexible ___ Decision _ _ S&: T h e average sinus arrhythmia scores corresponding to the Rest and Low mental workload conditions were significantly dil'ferent (higher) trom the other two conditions, but there was not any significant difference among t h e m T h e W of the subjective ratings was 0.15 and was similar to the Integrative ones.
Decisive Decision 5 u : T h e sinus arrhythmia did not change throughout the experiment and therefore exhibited a pattern similar to the one seen in the Hierarchic decision style. However, in terms of the subjective ratings the subjects were able to demonstrate that the three levels of workload significantly differed from each other. The W o t the subjective ratings of this decision style was also remarkably high and significant (w =0.79).
~~
Mental Workload and Perceived Task Difficulty
28 1
5 . DISCUSSION AND CONCLUSIONS Based on the results o f t h e analysis of variance, the Newman-Keuls Test, and the Kendall Coefficient of Concordance, t h e following conclusions can be drawn. As can be seen i i i Table 11, the mental workload (MWL) is a significant factor in each ANOVA since it is the term which accounts for inducing the psychophysiological and subjective responsps in the subjects.
5.1. Sinus Arrhythmia Measure T h e sinus arrhythmia measure was able to detect a significant difference between the rest condition on one hand and all mental load levels o n the other hand (Table 111). However, it was unable to distinguish among the three mental load levels. This phenomenon can be attributed to one or more of the following factors: ( I ) the structure and content of the experimental tasks, (2) the decision styles of the subjects, and (3) the scoring method. Structure and - ___ content of the experimental tasks: T h e experimental tasks were modified versions of the standardized "Drivcr Decision Style Exercise" and their difficulty was due to their imposing (or environmental) load. It could be hypothesized that the three difficulty levels were not distinctly different from each other and that overlaps existed. Therefore, and in spite of the subjective feeling of the difference in the difficulty levels, the experimental tasks were not able to stimulate a noticeably different psychophysiological response in the subjects. ~
Another structural aspect of the experimental task was the format of its questions and answers. Each answer consists ot' making a single choice and recording it o n the answer sheet. This process complies more with the unifocus behavior rather than the inultifocus one and could be one of the reasons why the subjects with the unifocus decision style did not exhibit any change in the difficulty induced sinus arrhythmia with respect to the Rest condition (Table IV) Decision styles ot- the subjects: As mentioned above, the Hierarchic and Decisive decision styles did not show any significant difference in their sinus arrhythmia, whereas Integratives and Flexibles did (Table IV). Alsq the unifocus styles started the experiment with a significantly lower sinus arrhythmia in the Rest conditions as opposed to their multifocus counterparts. This behavior is consistent with their structured styles. T h e whole experiment, even before its beginning, was ambiguous to them. This "ambiguity" affects the subjective "uncertainty" and the degree of conformity (Streufert and Streufert. 1978). This is also in accordance with Zwaga's (1973) findings which suggested that: (a)T h e novelty of the experimental situation has an increasing effect on the general level of arousal indices; (b)-An increase in activation level can be expected by the novelty of the experimental situation; and (c)-The 'complete' experiment should be considered as a stressor, rather than the task period only. This deduction is also supported by Mackie (1977) who argues that some subjects may experience hyper-arousal (3) at the onset of the experiment and thus progress from hyper to a normal state rather than from a normal to a hyper-aroused state. Also, there are three inter-related theories which attempt to explain the behavior.of the decision styles: (a) General Incongruity Adaptation Level (4) (GIAL), (b) Motivational
282
N. Meshkati and A . Loewenthal
et'tects, and (c) Reserve capacity General 1ncone;ruity Adaptation L-e-vcl (GIA.): The CIAL defines the optimal incongruity level for an individual (Driver, 1978). T h e CIAL would motivate cognitive activity whenever the general incongruity currently being experienced by the organi5m departs from the expected value, i.e., whenever inconsistency IS experienced. Thus, any departure from the GIAL value would excite some cognitive activity and only attainment of the GIAL would produce consistency and a consequent cessation of activity. The ambiguity ol the experiment and the lack of a prior knowledge of it, and the resulting uncertainty, caused the departure from GlAL for Decisive and Hierarchic subjects. This, in turn, changed their activation (5) levels and thus caused slight anxiety where "anxiety"(6) is viewed as a state of heightened activation (Claridge, 1967). The Decisives and Hierarchics expect order and predictability and avoid uncertainty. Therefore. their learned expectations and adaptation levels determine their General Incongruity Adaptation Level. According to Kalsbeek (1973), Offerhaus (1973) found that the rest value of the sinus arrhythmia of the high anxiety subjects was signiticantly lower than the low anxiety group. Later, Offerhaus (1980) argued that Heart Rate Variability ( t l R V ) is relevant to the question of "How anxious is a person?" and proposed that a change in 'anxiety state' is indicated by a change in the HRV. Thus, the behavior of the rest sinus arrhythmia ot the Decisive and Hierarchic subjects was in accordance with their styles and predictability. Motivational e m s : Another approach to the analysis of decision styles behavior is by the study of motivation in general and "incongruity motivation"(7) in particular. This can be done through t h e analysis of the Decisive and Hierarchic styles structure. In information processing, a concrete structure (8) has comparative certainty and determinant character (Schroder et al., 1967). Since the structure affects motives, concrete systems have simpler goals. They seek only a few things and require simpler environments, whereas abstract systems seek to attain more things and move into more complex environments ([bid).
In spite of the above points and the general tendency of the CIAL. tor Decisive and Hierarchic decision styles, it can be concluded that they were more iricongruitively motivated and, therefore, had a lower sinus arrhythmia. This conclusion is consistent with the findings of' Kalsbeek and Sykes (1967) that the motivated group of subjects had a constant and higher level of sinus arrhythmia suppression. According to Kalsbeek ( 1973), task induced information is not the only Reserve &. one handled by the information processing system. There is a constant flow of more general 'situational' information and a constant flow of 'self-generated' intormation by the brain system. When the intensity of this latter flow changes, the intensity o r the fluctuations in intensity of the tirst flow can be counterbalanced. T h e author argued that, "changing physiological parameters are not to be expected, although the task induced inf'ormation flow can show fluctuations ... According to this model one could say that sinus arrhythmia indicates the amount of reserve capacity which a single channel function
Mental Workload and Perceived Task Difficulty
283
has available but which is not occupied. A complete suppression of sinus arrhythmia would mean that there is no reserve capacity left unoccupied". Based upon the above reasoning it can be argued that the 'self-generated information' either directly or indirectly - through the utilization of the 'reserve capacity' causes the physiological change (i.e., suppression of sinus arrhythmia). Thus by relating the integrative and conceptual complexity characteristic of the well structured Decisive and Hierarchic decision styles to the self-generated information, one could visualize the underlying rationale of their suppressed sinus arrhythmia throughout the experimental trials. Scoring method: T h e failure to detect certain load changes by the SA can be attributed to the scoring method, as sometimes one scoring method retlects a supposed increase or decrease in mental load whereas another one shows no change. (Kalsbeek, 1973)
5.2. Subjective Rating Measure T h e subjective ratings were influenced by the perceived difficulty of the experimental task and the decision style behavior of the subject. According to Borg (1978), the subjective aspects of the mental load are partly determined by the 'subjective complexity' of the individual. T h e author also referred to the work of Herbert (1974a and b), who points to the factors affecting the perception of difficulty in practical work settings: "the number of alternative actions, insufficient or contradictory data, uncertainty about the consequences of actions, conflicting demands on the outcome of work, and the need for feedback. Scarcity of time, and t h e perceived probability of failure, also seem to be important factors in the perception of difficulty." These intluential factors parallel the components and functions of 'Environmental Load'. Consequently, the perceived difficulty which is indicated through the subjective rating of each subject is determined partly by the level of the environmental load of the task. In a decision making process, a short and low informational content task seems subjectively more difficult to a maximizer; rather a more detailed one, and the reverse is true for a satisficer. T h e number of foci can affect the convergence or concordance of the different subjective rating reports of the subjects. Since reporting the subjective judgment of the tasks difficulties through the assignment of different weights to them is in itself a choicemaking activity, oiie could say that the multifocus subjects would have multiple sets of weights to report. T h e process of narrowing down the choices and reporting only one of the equally probable options is a difficult task to them. T h e conceptual and integrative complexity of the dominant multitocus decision styles intluences the process of narrowing down the choices, although reporting the tinal choice could be a random selection among t h e remaining alternatives. 'The low Kendall Coefficient of Cohcordance of the Integrative and Flexible decision styles could be attributed to the above effect and to the general behavior of these styles (Table VI). ACKNOWLEDGEMENT T h e authors would like to give special thanks to Dr. Michael J. Driver for his invaluable assistance in this study.
284
N. Meshkati and A. Loewenthal
6. FOOTNOTES (1) Tail Probability is the probability of exceeding the F ratio when group means are equal and the data are sampled from normal distributions with equal population variances (Dixon, 1981).
(2) The null hypothesis that 16 rankings of the subjective rating of the difficulties of each decision style group are unrelated may be rejected at this level of significance (Siegel, 1956). (3) "Arousal" and "activation" have been frequently used interchangeably (e.g., Wisner, 1973). However, Dut'fy (1962) defined the levels of activation as; "The extent of release of potential energy, stored in the tissues of the organism, as this is shown in activity or response ...T h e activation is the arousal which occurs in the absence ot overt activity or physical exertion ... Activation is the arousal found when w e subtract from measures of activation the effects of physical activity". Pribain and McCuinnes ( 1975) defined arousal as the phasic physiological responses to input and activation as the tonic physiological readiness to respond. They also argue that each one of them is controlled by separate, but interacting neural systems. (4) Incongruity refers to any unbalancing input and includes novelty, ambiguity, frustration, uncertainty. risk and conflict (Driver. 1979b).
(5) According to Schroder et. al., (1967) activation could mean a goal of obtaining those inputs and outputs that are most efficient or most pleasant to the individual. Pleasure and efficiency are, however, only two among many goals.
(6) This refers to the "normal anxiety" which is very different trom neurotic anxiety (Schneider. 1965). Normal anxiety is, like any anxiety, a reaction to threats to values the individual holds essential to his existence as a personality; but normal anxiety is that reaction which (a) is iiot disproportionate to the objective threat, (b) does not involve repression or other mechanisms of intrapsychic conllict. and as a corollary to the second point, (c) does not require neurotic defense mechanisms for its management ([bid).
(7) As one departs in either direction from the CIAL, this may h e either amplified o r diminished by concomitant emotions (Driver, 1978).
(8) Concrete structures are characterized by compartmentalization and by hierarchical integration of parts or rules (Schroder et. al.. 1967).
Mental Workload and Perceived Task Difficulty
285
REF E R E NC ES Alawi. H.. (1973) Cognitive. task and organizational complexities in relation to information procrssing behavior in business managers. Unpublished Doctoral Dissertation. University of Souther 11 California Borg, C . , (1978) Subjective aspects ot physical work
Ergonomics. 21. 2 15-220
Boyce, P. R., (1976). Sinus arrhythmia as a measure of mental load. Ergorioinics. 17, 177- 183. Chiles, W. D. and Alluisi, E. A., (1979). On the specification of operator or occupatioiial workload with pert[)rniarice-ineasurement methods. Human Factors, 21, 5 15-.52X Claridge, C. S., (1967). Personality and arousal. London, Pergamon Press. Dixon, W.J.. (198 I ) of California Press.
BMDP Statistical Sottware. (Chief Editor). Los Aiigeles. Ilniversity
Driver, M., (1978). T h e general incongruity adaptation level (GIAL). I r i S. Streutert and S.C. Streufert (Eds.), Behavior In a complex environment. Washington, D.C., V . H . Winston and Sons, 162-206. Driver, M., ( 1983). Decision style and organizational behavior: academia. T h e Review of H s h E Education, 6 387- 406.
Implications for
Driver, M . , (l979a) Person-Environment metastability. I : Decision style reliability. Paper presented at the Joint National Meeting of ORSAITIMS. Milwaukee. Driver, M . . (1979b). Individual decision making and creativity. In S. Kerr (Ed Oraanizational Behavior. Columbits, Ohio, Grid Publishing, Inc,, 59-9 1 .
)
Driver. M . and Rowe, A , . (1979). Decision making styles: A new approach to management decision making. In C . Cooper (Ed.) Behavioral problems & oraanizations. Englewood Clitt's, N r w Jersey: Prentice-Hall, 14 1 - 182. Driver, M . and Streiifert, S.. (1969) Integrative complexity: An approach to individuals and groups as information processing systems. Administrative Science Ouarterly, l4. 272-285. DufFy. E., (1962). Activation and behavior. N e w York, Wiley Ettema, J. H . , and Zielhuis. R. L.. (1971). Physiological parameters of mental load. Eraonomics, 14, 137- 144. Firth, P. A,, (1973). Psychological Factors Influencing the Relationship between Cardiac Arrhythmia and Mental Load. Eryonomics, 1.6, 5- 16.
286
N. Meshkati and A. Loewenthal
Graham, F. K., (1980). Representing cardiac activity in relation to time. In I Martin arid P.H. Venables (Eds.) Techniques i!l psychophysiology. N e w York, Wiley, 192- 197. Guillord, J P I (1954) Psychometyc methods N e w York, McGraw-HIII Hancock, P.A., Meshkati, N.. and Robertson, M . M . , (1985). Physiological retlections of mental workload. Aviation, Space. and Environmental Medicine, 56,1 1 10- 1 I 14. Herbert, A., (1974a). Measurement of perceived work ditficulty. Report from the Institute of Applied Psychology, The University ot Stockholm, No. 52. Herbert, A,, (1974b) Factors in the perception o t work difliculty. Institute of Applied Psychology, the University ot' Stockholm, No. 53.
Report From the
Hyndman, B. W.. (1980). Cardiovascular recovery to psychological stress: a means to diagnose man and task? In R. I. Kitney and 0. Rompelman (Eds.) T h e study c f b.eaF_t -~ rate variability. Oxford, Clarendon Press, 191-224. Hyndiiian. B. W., and Gregory, J. R., (1975). Spectral analysis of sinus arrhythmia during mental loading. Ergonomics, 255-270.
c8_,
Kalsbeek. J . W. H , (196X). Measurement of mental workload and of acceptable luacl: Possible applications in industry. The Iiiternational Journa_! of Production Research, 7 , 33-45. Kalsbeek, J. W. F i , , (I97 I ) . Standards of acceptable load in ATC tasks. Ereonomics, 64 1-650. Kalsbeek, J . W. H.. (1973). Do you believe in sinus arrhythmia? 99- 104.
B.
Ergonomics, @,
Kalsbeek, J . W. H., (1973). Sinus Arrhythmia and the Dual Task Method in Measuring Mental Load. In W. T. Singleton and J G . Fox, and D. Whitfield (Eds.) MeasuJement of men at work. London, Taylor and Francis, 101-113. Kalsbeek, j.W. H., (1982). Personal Communication. Kalsbeek. j. W. H., and Ettema, J. H., (1963) Scored regularity ot the heart rate pattern and the measurement of perceptual or mental load Ervonom..J, S, 306. Mackie, R. R., (1977). Introduction in R. R. Mackie (Ed.) Vieilance, theory, o j ~ r a t i g i ~ ~ a l performance md $iysioloaical correlates. N e w York, Plenum Press, 1-26, Meers, A,, and Verhaegen, P., (1972). Sinus Arrhythmia, information transniissioii and emotional tension. Psychol. Belg, Xll-7, 45-53. Meshkati, N.. (1983). A conceptual model tor the assessment o f mental workload based upon individual decision styles. Unpublished Ph.D. Dissertation. University of Southern California.
Mental Workload and Perceived Task Difficulty
287
Meshkati, N., ant1 I~oewenthal.A (1987) An eclectic and critical review 01 four primary mental workload ase\\riirnt methods A guide for developing a compreheii5ive conceptual model I n P A Hancock k N Meshkati (Eds ), H u m a n Mental Worhload Amsterdam. Nor! Ii-Holland Metfessel. M., (1'147). A proposal tor quantitative reporting of comparative judgmeirts. Journal of ~ c - h o l o ~ . 229-235.
a,
Proceedings of the 1984 International Moray, N , May (1984) Mental Workload Conference on O( cupationd Ergononiics, Toronto, Canada. 4 1-46 Neter, J . , and U'asserman. W.. (1974). Applied linear statistical models- regression, analysis ot variance. arid experimental designs. Homewood. Ill. Irwin. Inc. ODonnell, R.D., July (1979). Contribution of psychophysiological techniques to aircratt design and other operational problems. AGARD-AG- 244. Offerhaus. R., (1973). Variables psycho-physiologiques et psychiatric. Cominunication et SELF symposium "Les variables physiologiques d a m la rechrrche stir la charge mentale". Offerhaus. R . E.. (1980). Heart-rate variabilitv in mvchiatrv. 111 R . I . Kitnev and 0. Rompilman (F.ds.) T h e study of heart-rate variability. (jxFortl. Clarendon 'Press. 225-238. ,
,
I
Opmeer, C. H. J . M.. (1973). The information content of successive R - R Interval times in the ECG. Preliminary results using Factor Analysis and Frequency Analysis. Ereonomics. LG, 45-97. Pribram, K . H . , and McCuinness. D., (1975). Aroiisal, activation, and effort i n the control of attention. Psychological Review, @, 116- 149. Rohmert, W., (1979). Human work taxonomy tor system applications. I n N . Moray (Ed.) and measurenient. New York. Plenum Press, 480-483. ~ Mental _ workload: _ _ Its _theory
Work measurement, Psychological and Rohmert, W., and Laurig. W.. (1971). physiological techniques for assessing operator and workload. International Journal of Production Research, 9, 157-168. Rohmert, W., Laurig. W., Phillip, V., and Luzak, H., (1973). Heart rate variability and workload measurement. ErgonomjcJ. 16,33-44. Schneiders. A.A., (1965). Personality Dynamics and Mental Health. N e w York. Holr, Rinehart and Winston, Inc. Schroder, H . , Driver, M . , and Stretitert, S., (1967). Human Information Processing N e w York, Holt, Rinehart, Winston. Sheridan, T.B. (1980). Mental workload - What is it? Why bother with it? Human Factors Societv B u l l e t b 23.
188
N. Meshkuti and A. Loewenthal
Siegel, S.S., ( 1956). Nonpararnetric st,ltistks McCraw-Hill.
fir
t_bc behavioral
sciences. New York,
Strasser. H . , April. ( 1 977). Physiological ineasures of workload- correlations between physiological parameters and operatioiial performance. ACARD Proceedings o i r Methods to Assess Workload, Cologne, FRG. AGARD-CP-2 16, (AS- I-AH-8). Strasser, H , (1979) Me,isuternent ot metrtal workload In N Moray ( E d ) Mental workload Its theory agd measurement Ncw York, Plenum Press, 345- 348 Streutert, S., and Streufert, S. S., (1978). Behavior Washington, D.C., V . H . Winston and Sons.
Lhs co03:lq environrnell_t.
Tikhomirov, 0. K , January (1971). The Structure of Human Thinking Activity. Translated froin Russian-language book, Moscow University Printing House, 1969. J . P. R. S. 52199. Software ertronomics: effects of cornt>uter Turner. .,I. A , . and kirdsek. R. A , . (1984). . u application design parameters o n operator task performance and health. Ereonomics. 22, 663-690. Welford, A 1 , (1978) Mental Workload As and Skill Ergononrio. 21, 157-167
rl
Function of Demand, Capacity, Strategy
White, R.T., September, (I97 I ) . Task analysis methods: Review and development of techniques tbr analyzing mental workload in multiple- task situation. Long Beach, California. McDoiiriell Douglas Corporation. Report No. MCD-J5291. Wickens, C., (1979). Measures OF Workload. Stress and Secondary Tasks. In N . Moray (Ed.), Mental workload: Its theory and measurement. New York: Plenum Press, 79-99. Winer, 6 . J . , (I97 I ) . Statistical principles in experimental design. New York, McCrawHill. Wisner, A , , (1973). Electrophysiological measures for tasks of low energy expenditure. In W. T. Singleton, J . C . Fox. and D. Whitfield (Eds.) Measurgri!ent o f man at E& London, Taylor and Francis. ti 1 - 7 3 Zwaga, H,. J . C . , (1973). Psychophysiological reactions to mental tasks: effort or stress? Ergonomics, 16, 61-67.
HUMAN MENTAL WORKLOAD P.A. Hancock and N. Meshkati (Editors) 0 Elsevier Science Publishers B.V.(North-Holland), 1988
Fuzzy ANALYSIS OF SKILL A N D RULE-BASED
289
MENTAL
WORKLOAD NEVILLE MOEAT. PAUL BISEN. LAURA YONET AND 1.B. TURK!5EN Department of Industrial Engineering University of Toronto
With the introduction d Rasmussen’s taxonomy of skill, rule and knowledge based behaviour the question arises of their relative importance as sources of workload. If workload is rated using fuzzy measurement, it can be shown that the ratings approximate an interval scale. Regression models show that the difficulty a task with both skill and rule based components can be predicted from the ratings of the difficulty of the skill and rule based components measured separately. The major source of difficulty is the skill based component with the rule based component modulating the overall task difficulty.
I “I‘EODUCT ION A common and successful method for estimating workload is to use subjective judgements, often in the form of a rating scale which requires operators to put a numerical value on the magnitude of workload (Moray, 1982; O’Donnell and Eggemeier, 1986). One of the reams for introducing Fuzzy Set measurement was to provide a means for humans to express judgements qualitatively but precisely, formalisiag the use of verbal judgements (Zadeh, Fu, Tanaka & Shimura, 1975). It is therefore appropriate to ask whether fuzzy concepts could be used effectively in measuring subjective estimates d Workload. We have already shown that fuzzy analysis can be applied successfully to the analysis of a single task (Moray, Turksen, King and Waterton, 1987). In this paper we turn to its use in a more complex setting.
Rasmussen (1986) introduced a taxonomy of human performance which divided tasks into those using Skill-based, Rule-based and Knowledge-based behaviour. In this paper we investimte the subjective ratiws d skill-based workload, rule-based workload, and tee workload caused by performing a skill-based and a rule-based task simultaneously.
N. Moray et al
290
The paradigm of skill-based behaviour is a highly practised perceptual motor skill, practised to such an extent that its performance has become more or less "automatic". A rule-based task is one where the operator has been trained to recognize different situations, and respond to each with a set of specific actions governed by a set of rules: "if situation Alpha then perform actions B,C and D in that order". Recoenition and cognitive decisions are required, but the choice of actions is deterministic. The overall aim of the present study was to see whether the subjective rating assigned to the act of performing skill and rule based tasks together could be predicted from the judgements made about each singly.
Fuzzy set theory relies on subjective estimates to assign values to task situations,where a certah task situation is evaluated according to a specific dimension of interest, say "difficult tasks." After performing a task, a subject estimates how strongly that task belongs to the "set of difficult tasks." The response is termed the "membership"of that task in the set of difficult tasks. The advantage of fuzzy measurements is that desaiptions need not be exact, paralleling the way that people think. If a subject believes that a task is difficult, he generally does not know exactly how difficult, but he can certainly perceive the difficulty. His membership estimate is a confidence rating of the extent to which he feels the task is diflicult.
A series of tasks, with their corresponding memberships of "difficult,"can be mapped onto a set of axes to form a "membership curve," which thus describes the set of difficult tasks in light of the specific tasks experienced. As an example, aisume that there are three tasks being estimated on the "difficult" membership scale, which ranges between 0 and 1. Task * l has been estimated as belonging to difficult tasks with a strength of 0.1, Task #2 with a strength of 0.4, and Task "3 with a strength of 0.9. These values can be placed on a graph to produce a curve as shown in Figure 1. Examplo o f Difficult Mombership Curve Flguro 1
Tuk.
I
I
2
3
Fuzzy Workload Analysis
29 1
One of the reasons for the current interest in workload is that many tasks combine both skill based and rule based behaviour. Examples are flying and navigating a light plane or helicopter which has a single person crew, or driving an automobile wlllle obeying traffic signals. Usually subjective workload ratings are made of the task as a whole. But it is of interest to partition workload between the different types of subtask, since some are more easily automated than others. Conversely, it is of interest to see whether the total workload can be predicted from workload ratings d the constituent tasks.
METHOD
In the tasks described as Skill-based, the subjects had to guide a cursor (a "hovercraft")through a maze (or track) using a position-control (zero-order) joystick. The objective was to traverse the entire length of the track while minimizing time and the number of times the side of the hovercraft touched the side of the track. An audible click sounded each time the wall was hit, and the time was continuously displayed on the video monitor, acting as feedback to the controller. One of three track widths -- Widthl, WidthZ, or Width3 -- were used in conjunction with one of three levels of turbulence -- Turbl, Turb2, Turb3. Turbulence is random noise generated by the computer affecting the movement of the hovercraft. hch combination produces a different level of difficulty to the controller. Because of the turbulence, and because a "hovercraft" is frictionless, the operator must control the craft at all times. Even when the joystick is centered and no motion is commanded, the vehicle will be blown about at random. So the operator must continualIy apply control, even when doing some other task. This guaranteed that the tasks had to be performed simultaneously.
For the rule based tasks, the subject was required to control up to three variables according to constraints built into the program. The fuel level, the oil level, the oil pressure, or a combination of these had to be incremented or decremented according to the appropriate rule. The subject controlled these variables using the keyboard. A plcture of the track
Is given Flgure 2, and an example or a Rule In Table 1.
N. Moray et al
292
TYPICAL PORTION OF HOVBRCRAFC TRACK.
TYPICAL RULB: CHECKING OIL PREjSUR6 During the run, the oil pressure will vary automatically by the program. You must ensure that the oil pressure is above a certain level at all times. If the pressure is too low, the program will crash! There is no limit to the number of increments of oil pressure. Keep oil pressure equal to approximately 1.00. As in the oil level section the oil valve must be opened to increment the oil pressure and must be shut afterwards. The following commands allow you to control the oil pressure. TYPE IN COP COV 00
so IP
E F F m DISPLAY Check Oil Pressure (Valve) Check Oil Valve Open/Shut Open Oil Valve Shut Oil Valve Increment Oil Pressure
The task was presented on an Apple IIe computer. Two screens were used. On one M image was displayed showing the hovercrPft travellinfi along the "river" a8 in Fhure 2. The operator could replace that meen with one showing an instrument panel. The latter showed the status of variables such as fuel level, oil pressure, distance run, etc. The operators could toggle between the screens. The difliculty of the skill-based task was controlled by changh the width of the river and the amount of turbulence to which the c r l t was subjected. D u r u the run rue1 waa consumed. The dynamlca d the craft were such that fuel consumption was fireater U the crllt was heavily lorded with crew and cargo, if it travelled at a higher speed, and if
Fuzzy Workload Analysis
293
the oil pressure or engine temperature were abnormal. Fuel could only be taken on board at certain way-points, and if fuel was low, speed would have to be reduced or cargo (or crew) dumped. In addition, on some runs, faults occurred (which were signalled by an alarm at the bottom d the screen or on the control panel) and these had to be managed. Fault management involved a series of key strokes by which fuel or oil pressure or level would be measured, and fuel or oil supplies taken on board, or crew or cargo dumped. These keystroke sequences differed in complexity in the number and variety of keystrokes required. These corresponded to different levels of difficulty in the rule-based mode of behaviour.
Twelve subjects were Used. All were male university undergraduates between the ages of nineteen and twenty-three. Nine d the twelve were engineering students. None had previous exposure to either the Apple joystick or the NASA Hovercraft Simulator. All subjects were paid for their time at a rate of five dollars per hour. Five difficulty levels d perceptual-motor skill were used. These varied from a broad river with very little turbulence (Skill 1) to a river hardly wider than the vehicle with very bad turbulence, (Skill 5). Skill 5 was almost impossible to perform. The operams received 5 hours of training distributed over the five skill levels, with slightly less time on Skill 1 and more on Skill 4 and Skill S than on the two intermediate levels. Following this they received a further five hours d training on rule based behaviour, using a wide river and moderate turbulence. Upon completion of the initial ten-hour train@ period, the operators were given a set of instructions describing the method af estimating membership functions. They then completed a brief written test that was reviewed by the experimenters to ensure that the subjects had a sufficient understandin8 of fuzzy estimates. The subjects were then prepared for the first set of tests - the development of the skill and rule based membership curves.
Each subject was tested on all five levels d skill and on five levels d rules, spanning easy to most difficult. This selection was determined by informal conversation with the subjects after the practice runs. Examples of an easy and a hard rule are given in the appendix to this paper. The condition was a perfectly straight track and employed for the rule based tesnetjli8ible turbulence (Turb 1) which required virtually no skill based behaviour. The only joystick control movements necessary during these
294
N. Moray el al:
runs were simple adjustments to the joystick position when entering and exiting the checkpoints. In the Combined condition the operators were requited simultaneously to control the movement of the vehicle with the joystick while, when necess~ry,carrying out keyboard rule-based behaviour. On the basis of the single mode tasks, a selection of 3 levels of skills and 2 levels of rules were combined. Unfortunately no Rules were judued to be really difficult, so only Sllpht and Moderate Rules were tested.
Half of the subjects were tested on the five levels of skills first;,the other half performed the rule based tasks fitst. The skills and rules were not mixed so that the subjects could make consistent judgements within each dimension. The order of administration of the conditions within each dimension was determined by two identical Youden squares ( 5 conditions I 6 subjects each). The subjects were asked to judge the perceived task difficulty on three scales representing slight, moderate, and difficult tasks. Thus three separate measures were taken on each test run, for a total af 360 data points used to derive the skill and rule based curves. After each run, the operator was aeked how strongly the task just performed belonged to the class of tasks having slight difficulty, how strongly to tasks having moderate difficulty, and how strongly to definitely difficult tasks.
For an estimate of combined skill and rule based tasks, each of three skill levels - Skill 1, Skill 4, and Skill 5 (as described above) - were combined with each of two rule levels - renamed Rule 2 and Rule 5 - for a total of six conditions. All Subjectswere tested on all conditions, administered accord@ to two latin squares (6 subjects I 6 conditions each). This order was then repeated for all subjects in a second replication. As above, three estimates were provided on every run,so a total al432 data points were gathered. RESULTS Table 2 is a summary of our data. The statistics of the strengths of memberrhipe of rlight, moderaw, and difficult effort for skill-bared, rulebased and combined skill-plus-rule-based tasks are avera8ed Over 12 operators with 2 replications per operator per condition. The meanin8 of the "Normalised" data will be explained below.
Fuzzy Workload Analysis
295
Table 2 - Summary or SUbjectlVe Bstlmites
Mean
Rule 1 Rule 2 Rule 3 Rule 4 Rule J
0.80 0.77 0.75 0.53 0.51
0.33 0.27 0.30 0.53
Comb 1 0.76
0.43 0.51
Comb 2 Comb 3 Comb 4 Comb S Comb 6
0.65 0.36 0.30 0.10 0.06
030
0.S
0.58 0.34 0.22
Normalized Yean
SD
Slight Moderate Difficult Skill 1 0% 0.42 0.21 Skill2 0.44 0.51 0.33 Skill3 0.20 0.49 0.62 Skill 4 0.13 0.47 0.66 Skill3 0.03 0.22 0.90
Sliaht Modemb Difficult 0.18 023 0.14 0.28 0.19 0.22 022 0.15 023 0.08 0.19 0.15 0.01 0.16 0.08
0.10 0.10 0.10 0.24 0.34
0.13 0.24 0.23 0.22 0.23
0.11 0.19 0.42 0.47 0.79 0.90
0.15
Skill 1 - Turb 1 with Width 1 Skill 2 - Turb 2 with Width 2 Skill 3 - Turb 3 with Width 1
Skill 4 - Turb 1 with Width 3 Skill 5 - Turb 3 with Width 3
0.17 0.18
0.19 0.10 0.07
0.19 0.21
0.16 0.18 0.20 0.18 0.23 0.18 0.18 0.20 0.17
Rule 1 - Oil Level Rule 2 - Fuel Level Rule 3 - Fuel Level + Oil Pressure Rule 4 - Oil Level+ Fuel Level Rule 5 - Oil Level + Fuel Level + oil Pressure
Slight Moderate Difficult 0.00 1.00 0.69 0.61 1.00 0.20 039 0.23 0.93 0.15 0.86 0.65 0.00 0.00 1.00
0.a
1.00 0.90 0.83 0.07 0.00
0.21 0.00 0.11 1.00 0.82
0.10 0.16 024 0.19 0.14 0.10
1.00 0.84 0.43 0.34 0.06 0.00
0.57 0.81 0.99 1.00 0.32 0.00
0.13 0.12 0.10 0.16
0.00 0.00 0.00 0.58
1.00 0.00 0.10 0.39 0.46
0.87 1.00
Comb 1 - Skill 1 with Rule 2 Comb 2 - Skill 1 with Rule 5 Comb 3 - Skill 4 with Rule 2 Comb 4 - Skill 4 with Rule 5 Comb 5 - Skill 5 with Rule 2 Comb 6 - Skill 5 with Rule 5
The statistical analyses were designed (1) to test the relation between the strengths of membership and the levels of task difficulty; (2) to investigate the qualitative and quantitative differences between skill and rule based behaviour; and (3) to fit a predictive model us- regression analysis. In order to choose appropriate statistical and mathematical techniques, we tested the data for weak stochastic transitivity and monotonicity. A weak stochastic transitivity test was performed to see whether dl the operators ordered the conditions in the same relative order of difficulty. All possible combinations d 3 points on the membership curves were examined to gee if they were correctly ordered, and only one condition departed from weak stochastic transitivity by as much as 20%. That is, in almost every case, the membership of "difficult" at level 2 is greater than level 1. and the membership of level 3 is greater than the membership of level 2, and so on for the mher levels of difficulty. (In the case of "slight" the relationship is
N. Moray e l al
296
of course "less than" rather than greater.) We can therefore assume that the data lie at least on an ordinal scale, and rank& is consistent over operators. A test of weak stochastic monotonicity was also carried out to show that the operams could make reliable distinctions between conditions. For example, if there are six successive points on a curve, called a, b, ... f, and O(x, y) represents the numerical difference between any two points x and y, then monotonicity tests the relations if O(a,b) t b(d,e) and b(b,c) b Me$) then $(a,c) b $(df).
In all conditions half or more of the subjects passed this test, and in some conditions as many as 10 or 11 out of 12 satiafied it. We take this to be satisfactory evidence that the data approximate an interval scale, and therefore statistical manipulations may be validly performed which depend on interval measurement. A n ANOVA was performed to investigate the effects of modes, levels of difficulty, and individuals. On the base conditions (Skill alone and Rule alone) one way ANOVAs were used. Table 3 shows the results of a test for individual differences among operators. It is clew that different operators found different rules to vary differently in difficulty. By contrast there were no individual differences with respect to skill based behaviour.
DIFFERENCE BBTWEEN OPERATORS Subjective Estimate
F-test
Difficult Skill Moderate Skill Slbht Skill
.944 .895 .802
11.9.
Difficult Rule Moderate Rule Slight Rule
3.035
p' .005 p< .005 p' -05
3.112 2.006
11.8.
11.9.
The results show that the null hypotheaie that individual differences we insbnificant cannot be rejected for skill based behaviour, but can be rejected for rules. For all subjective estimates of the rules, the individual estimates are sisnificant. Different subjects find different rules to vary in difficulty. There seems to be no consistent general perception of difficulty when dealing with rule based behaviour. This could be explained by the use of
Fuzzy Workload Analysis
291
different strategies. If the subjects use different strategies, then it would be logical to conclude that the corresponding abilities to ded with the rules alter the perceived rule based difficulty of the task. Table 4 shows the ANOVA performed on the levels of diFficulty. It is clear that the different conditions are indeed of differing difficulty.
TABLE 4. DIFFERENCE BEIWEEN LEVELS OF DIFFICULTY. Subjective Estimates
F-test
Difficult Skill Moderate Skill Shght Skill
17.722 5.185 26.528
p .0001 p < .005 p < .0001
Difficult Rule Moderate Rule Slight Rule
4.409 5.389 7.606
p < .oos p < -005 p < -0001
ANOVAs on the Combined (Skill + Rule) conditions showed that there were both (operator x condition) and (Skill x Rule) interactions. The first was significant at better than p < .05 for slight, moderate and difficult conditions; the second was significant only for the Moderate membership function: for judgements of Difficult and Slight there was no interaction. The importance of establishing the existence of interaction is not merely in identifying that it is present, but that it implies the need for an interaction term in Lhe model. That is, in predicting judgements of the perceived difficulty of Combined modes of operation from the two Base modes, we must choose a way to represent interaction. Several interaction terms have been proposed for Fuzzy logic. They include Min/Mar, Probabilistic, Bold Union and Intersection, etc. Since we have established that we can treat our fuzzy judgements as on a linear scale, we can seek a regression model of the general form: c(S, R)
where
-
=
a t b I Union (S, R)] t c Intersection (S, R)].
S = Skill component membership estimate
R Rule component membership estimate C = Combined membership estimate a, b, c are red constants.
N. Moray et al.
298
Preliminary examination of the membership curves showed some anomalies. For example, at some points the shght membership combined with a slight rule combined to give a judgement of slight membership which was more strongly slight than the slight sku by itself. There was consistently more conservatism in the Base estimates than in the Combined conditions. In an attempt M make the relations between the Base and Combined conditions more orderly, the curves were normalized. The largest and smallest membership value for each curve were set to 1 and 0 respectively, and the rest of the points scaled proportionately between them. For example, the averaged Combination Slight membership ranged from 0.06 to 0.76. The membership strengths for this condition were normalized using the following equation:
-
Normalized Estimate (Original Estimate - 0.06)/(0.76 - 0.06) The normalized membership graphs are shown in Figures 3,4, and 5.
Flgwc 3
I
2
- Normallzed Sklll
3 tuk.
4
5
Fuzzy Workload Analysis Flgure 4
0.7
- Normalized Rules
1
\
::t
299
/
0 10
I
2
J M=
Figure 5
- Normaltzed Combination
A preliminary review d the data showed a clear pattern, which is apparent in Table 1. The overall difficulty is clearly determined by the Skill-based component of the task, while the Rule-based component modulates that overdl level. This impression was strongly born out by the regression model. The best fitting model was found to be the one us& Mlu/Mln logic. and the equation is:
c = 0.125t 0.743s- 0.042R t 0.193 M h 6,R).
N. Moray et al
300
The above equation can be used to predict how strongly the Combined task will be rated as a member of 'slight" tasks if the Base conditions are rated for "slight"; how strongly it is rated for "moderate" if the Base conditions are rated for "moderate"; or how strongly for "difficult" if the Base conditions are rated for "difficult". The last of these mean3 that we can answer the question, "If you have a task which includes both Skill-based and Rule-based behaviour, and you know how difficult each seems by itself, how difficult will the combination seem, and how do they interact to cause difficulty to change?" The goodness of fit between the predicted and observed data is shown h Fbure 6. Flgv, 4
Normalized Membership Value
-MODEL PREDlCTlOn COnPNIED WITH MPERltENlAL )(ENIS
0.4 01
I 1 I 4 S 6 7 0 9 MIIIPU14ISW17111
Test Conditions The numbers 1 through 18 correspond t o the d i f f e r e n t estimates ( s l i g h t , moderate, and d i f f i c u l t ) o f the combination t e s t s , arranged i n ascending rank order.
DISCUSSION Some further comments are required. The firat io to do with generality. We do not believe that our results show that rule-based behaviour is inherently less difficult than skill-based behaviour. The terms as originally introduced by Rasmussen (see, e.g., Rasmussen, 1986) imply that skill-based behaviour, which is skilled because it is "automatic", should not impose much. if any, load on an operator. In our experiment. on tne contrary. the SlU-based behaviour was the domlnrnt source of difficulty. Thls might be interpreted to mean that the 5-10 hours experience of the task was not enough to automatise the skill. But it is more likely that any skill, however practised, can, under the right task demands, be difficult. We do not say that a pilot, landing a light plane in heavy turbulence, is unskilled just because he gives a very high ram of difficulty to the task. I t is rather
Fuzzy Workload Analysis
301
that he makes the appropriate control movements without thinking about it. So it is not surprising that the exercise of even a highly practised skill can be judged difficult given certain task demands. It is interesting that we could not find really difficult rules. This, again, does not mean that none such exist. However in another experiment to be reported at a later date, we have also found evidence to sugsest that Rulebased behaviour is less likely to be judged difficult than Skill- or Knowledge-based behaviour. While one could certainly think up an extremely difficult rule, ("In order to choose the appropriate level of oil pressure, mentally take the cube root of the speed", for example), it begins to look as though what we may call "Keyboard Rules", requiring a correct sequence of particular but simple perceptual-motor actions, may, when practised, be a surprisingly light source of workload. If this were to be confirmed in a wider context, it would have some important implications for human-machine system design. Thirdly, the regression model works on averages. If we look in detail at individual performance, a very much less satisfactory picture emerges. For example, take a single operator's data. For a particular level of Skill, say level 1, take the highest membership curve. Do the same for Rule level 2. One of these might be Moderate, the other Slight. Now ask the question in each case, "If the strongest membership in Skill is Slight, and the strongest membership in Rule is Moderate, what will be the strongest membership when Skill and Rule must both be performed? In other words, at the level of individual data points, does (Slight t Moderate) equal Slight. Moderate, Difficult, or something in between? The answer is enormously variable. We m o t say with any certainty that C = Max (Shght, Moderate), or (Slight t Moderate), etc. This may be of little importance. While it would be highly satisfactory to be able to make statements of the form "A Slightly difficult Skill plus a Moderately difficult Rule gives a combined task which is definitely Difficult", it is quite sufficient in practice to say, as does the regression equation, "If you ask people to say how strongly the skilled component alone belongs to tasks which are difficult, and ask the same question with regard to the rule component, you will be able to predict how strongly their combination is regarded as difficult."
A final point concerns practicality. In order to develop the regression model, much work is required. The establishment of membership curves, the collection of enough data to test transitivity and monotonicity, and the ANOVAS, are all required to justify the application of regression to initially fuzzy estimates. The labour involved is not trivial, and is not less than that associated with traditional psychological scaling procedure.
302
N. Moray et al
APPENDIX The three rules were the following: 1.
OIL LEVEL CONTROL
During the run. the oil level will automatically decrement by the program. It is your responsibility to ensure that a sufficient oil level is maintained. If the oil level is too low the program will crash! Do not increment oil level above 10. The oil valve must be oped to increment the oil level. If it is not closed afteerwards oil will rpill and I need not reiterate the dire consequences of such activity. The following commands allow you to control the oll level:
TYPE IN
RFFECT
DISPLAY
CQL COV
Check Oil Level Check Oil Valve Open Oil Valve Shut Oil Valve Increment Oil Level
(Value) Openhhut
00
so I0 2.
FUELLEV6L
During the run, the fuel level will automatically decrement by the program. It is up to you to ensure that enough fuel is In the car to get you to the next checkpoint. This generally means incrementing the fuel level several times to have sufficient fuel. The maximum fuel which can be added at each checkpoint Is just sufflclent to ~uaranteJethat you can, If you conuol things well, comlete the entire run. The fuel valve must be open to increment the fuel level. If it is not closed afterwards fuel will spill md you will not have enough to get you to the next checkpoint: the pro@ram will crash! The following commands allow you to amtrol the fuel level:
Fuzzy Workload Analysis
TYPE IN
BPPBCT
DISPLAY
CFL
Check Fuel Level Check Fuel Valve Open Fuel Valve Shut Fuel Valve Increment Fuel Level
(Value) Open/shut
CFV OF
SF IF
3.
303
OIL PRESSURE
Durlng the run, the oil pressure will vary automatically by the program. You must ensure that the oil pressure is above a certain level at all times. If the pressure is too low, the program will crash1 There is no limit to the number of increments of oil pressure. Keep oil pressure equal to approximately 1.00. As in the oil level section the oil valve must be opened to increment the oil pressure, and must be shut afterwards.
The following commands allow you to control the oil pressure: TTPE IN
BFPBCT
COP
Check Oil Pressure (Value) Check Oil Valve Open/Shut Open Oil Valve Shut Oil Valve Increment Oil Pressure
COV 00
so 1P
DISPLAY
MORAY, N. 1982. Subjective Workload. Human Factors, 2% 25-40. MORAY, N., TURKSEN, I.B.. KING, B.,& WATERTON, K. 1987. A closed loop causal model of work based on a comparison of fuzzy and crisp
measurement techniques. Human Factors (in press). O'WNNELL, R. & EGGEMBIER, T. 1986. Workload assessment methodolay. Boff, K.,Kaufmann, L. & Thomas, J. (eds). Handbook of Perception and Human Performance,ch.42, Wiley. N.Y.
304
N. Moray et al
RASMUSSEN, J. 1986. Information Processing and Human-machine Interaction. North-Holland. Amsterdam.
ZADEH, L., FU, ., TANAKA, ., & SHIMURA, M . 1975. Fuzzy sets and their applications to cognitive and decision processes. Academic Press. N.Y. AUNOILBDGEYENTS
This work was carried out by NASA grant NAGW-429 to study the Fuzzy Set Analysis of Workload, contract monitor S. Hart, NASA-AMES.
HUMAN MENTAL WORKLOAD P.A. Hancock and N. Meshkati (Editors) 0 Elsevier Science Publishers B.V. (North-Holland), 1988
305
TOWARD DEVELOPMENT OF A COHESIVE MODEL OF WORKLOAD
N . Meshkati Human Factors Department Institute ot Safety and Systems Management University of Southern California Los Angeles, CA 90089
ABSTRACT Mental workload is, generally, considered to be a multidimensional construct. A, preliminary, cohesive and multidimensional model incorporating factors atfecting operators' mental workload is proposed. T h e model consists of two major sections -- Causal Factors and Effect Factors -- each with two primary component groups: Task and Environmental Variables and Operators's Characteristics and Moderating Variables, and Difficulty, Response and Performance Measures and Mental Workload Meaures, respectively. This model attempts to demonstrate the relationship and interaction of intrinsic task variables, operator's cognitive processes and information processing behaivor on the mental workload construct. I.PRESENT STATUS OF MENTAL WORKLOAD THEORIES
There are numerous theories attempting to detine mental workload and its major components, and to demonstrate the existing interrelationships (e.g., Jahn, 1973: Welford, 1978; Sanders, 1979; Sheridan and Simpson, 1979; Johannsen, Phendler arid Stein, 1979). These models generally consider mental workload as a multidimensional construct that reflects the interaction of such elements as task and system demands, operator processing capabilities and effort, subjective performance criteria, operator information processing behavior and strategies, and tinally, operator training and prior experience (cf., Eggemeier and O'Donnell. 1982). The concept of multidimensionality is also supported by Wierwille and Williges (1978). White (1971), Gopher (1978). and Meshkati ( 1983). Furthermore, some investigators also maintain that several dimensions of workload such as "input load" or "system load" are themselves multiattributal; e.g., according to Jahns (1973), there are three categories of input load: environmental, situational, and procedural. Thus, one of the fundamental dimensions of a cohesive mental workload theory, which is also a necessary (but not sufficient) condition of viability, should be the unignorable fact of multidimensionality. An observed common aspect of all mental workload assessment methods is their relative and differential sensitivity to the factor of individual differences (and personality traits) There are numerous empirical studies which acknowledge and consequently report the significant effects of this highly influential variable, [c.f., Meshkati, Hancock, and
306
N. Meskkati
Robertson (1984); Meshkati and Loewenthal, (1987a)l. A representative sample of above studies includes the works of: Mulder and Mulder (1973); Borg, Bratfisch, and Dornic (1971); Firth (1973); 'l'ackray, Jones and Totlchstone (1973); Huddleston (1974); Gibsoti and Curran (1974); Lepat (1978); Hopkin (1979); Hamilton, Mulder, Strasser, and Ursin (1979); Haures, Childress, and Hart (1982); Robertson (1984); Danios and Bloem (1985); Robertson and Meshkati (1985); Meshkati and Robertson (1985); and Meshkati and Loewenthal ( I987h). Moreover, in addition to the individual information processing behavior, motivational level of the human operator also affects t h e experienced mental workload level and, coiiscqirently, is reflected in the subjective assessment (Borg, 1978), and physiological responses of the human operator e.g. differential ERPs for motivated versus noii-nrotivated individuals (Sutton and Teuting, 1975) and differential heart rate variabilities (Kalsbeek arid Sykes, 1967). T h u s , in order to avoid the 'reactive' act ot "calibration" of the particular workload measurement techniques based an individual differences [as proposrd by Wickens ( 1979)],a comprehensive proactive mental workload model should systematically take this variable into account. Considering individual differences variable in the development of a cohesive inerital workload model is also strongly rrconimended by Moray's work (1984, p.44): "Withotit taking this (individual difference) into account, w e are seriously delaying the developiiient of a useful measure." A multifaceted inclividual differences classitication model is capable to track c l o w n changes of operator strategies in task performance due to overload and/or underload conditions (Meshkati and Driver, 1984). This would satisfy Wickens'( 1984b) recommendation concerning "strategically adapting systems" and their impact on the mental workload models.
A comprehensive mental workload model should also include provision to prognosticate task loading factors and prescribe the most appropriate measurement technique(s) accordingly. This is an adaptive, on-line and dynamic process. Coupling this with the required diagnosticity characteristic of the measurement rriethod (see Criteria for a cohesive Workload Model) determines the choice of measurement technique for the particular intended objective. For instance, if the goal ot an evaluatioii is to determine whether workload problem exists at all, the model, based on task loading factors (i.e., cognitive, psychomotor, visual, auditory). identifies the most appropriate measurement technique which could be a "global" measure. However, i f information about the differential loading levels of different system designs is required, the model utilizes more diagnostic techniques (tor discussion of global and diagnostic measures, see O'Donnell and Eggemier, 19x6).
Figure 1 depicts the major components of a cohesive workload model and its related measurement variables. This model consists of two sections -- Causal Factors arid Effect Factors -- each with two primary component groups. The two primary groups of Causal Factors are: Task and Environrriental Variables and Operators's Characteristics and Moderating Variables. The two primary groups of Effect Factors are: Variables and Mental Workload Meaures.
Difticulty, Response and Performance
Task and Environmental Variables: This group of influential variables includes: Task Criticality and its eftects on the ascribed utility of the operator on the outcomes; Physical
Figure 1 W D R CO?5”ENTS OF CDMPREHENSIVE MENTAL MRKLDAD MIDEL AND RELATED ASSESSMENT VARIABLES Task 6 Environmental Variables
Operator’s C h a r a c t e r i s t i c s 6 Noderating Variables
Difficulty.
Responses, fi Performance
Vental Uorkload Measures
Reward Cognitive Capabilities Perceived
’
Task
> H o tfii v Personal a t i o n a l State+U t i l i t y system
e l a t e d Mea
I IReo. P e r f o m n c e Time and
I
I
I I
Past Experience (Training) I I
Cognitive Canplexity
ii
’
S
T
Change Ic Selection of Alternative Strategies 6 Orientation
L------
Speed Stress
I n f o m t i o n and Canplexi ty
I
6 Physical Abilities
I
I
Sensory
I I I I
I
--t
Task Novelty
L-
------
State
--
J
1
erformance
I
I I Predetermin ed Time Sys Pleasures
L __-_
I 1
_ _ _ _J
N. Meshkati
308
and Psychological Environmental factors (e.g., noise, vibration, heat, G-force, illuminating, threat); intrinsic task- related variables including Amount of Information (e.g., number of messages); time pressure (e.g., required vs. available time), Task Structure and its rigidity (e.g., decoding requirements, decision making vs. problem solving), Task Novelty (to the operator); Task RateiFrequency; Equiprnent Used; and type of Reward System. Operator's Characteristics 4 Moderating Variables: This group includes: Cognitive Capabilities of the individual (e.g.. intellect); Motivational States arid Personal Utility System (e.g., goals, leedback orientation, attitude toward task and the utility of' the task and its outcomes); Past Experience and Training; Cognitive Complexity, Decision Styles and Personality Traits (e.g., nianual dexterity, physical fittness etc.); Sensory Capabilities (e.g.. visual and hearing capabilities); and Arousal State (e.g., level of activation, tolerance of ambiguity and uncertainty. degree of expectancy, experience). Diflicultv, Responses. and Performance: In this grouping we tind variables such as: Perceived Task Difficulty and Objective Task Difficulty [cf, Borg, 1987 and Boray, et al., ( 197 I)]; and Performance determinants (e.g., fatigue, hearing, speed, and accuracy). Mental Workload Measures: Here we tind: Physiological Measures including Autonomous Nervous System-related measures (ANS) and Control Nervous Systemrelated measures (CNS); Subjective Measures (e.g., subjective response); and Pertorinance Measures (e.g., performance time vs. standard time). Figure I attempts only to present a preliminary concept of' an ideal cohesive mental workload (assessment) model. Moreover, it intends to map and display the major interrelationships of the interacting variables.
2. Cohesive Mental Workload Model and Concurrent Tasks There is no doubt that the operator in the modern automated system has to attend to several concurrent tasks. Therefore, the cohesive mental workload model should also consider this Fact. There are conceptually different frameworks facilitating analysis of such paradigms. These include: ( I ) Theories which attribute processing restrictions in the human system to limits of a single processing channel (e.g., Broadbent, 1958), or of a single pool of processing resources (e.g., Moray, 1967). (2) Theories which favor a multiple- resource approach to human capacity limitations (Wickens, 1980, I984a). Available data provide some support for the dimensions of information processing, modalities of perception and codes of information processing and response time. However, according to O'Donnell and Eggemier (l986), more extensive data is required before definitive conclusions can be drawn regarding the number and types of dimensions required by the multiple resource theory. Multiple response theory implementational problems:
in
the
operational
environment
encounters
several
I. Overlaps in processing resources between t h e primary and secondary tasks affects the sensitivity of' the utilized secondary task measurement technique. In order to tackle this problem, several investigators (e.g., Gopher, 1978; Wickens, 1979) have suggested that the alternative of establishing a battery ot secondary tasks, each tapping a different resource, which would be applied to ditterent primary tasks.
A Cohesive Model of Workload
309
11. Several investigators (e.g., Damos, 1977; Gopher and North, 1977) have reported the development of specific timesharing capabilities during concurrent task performances subsequent to practice o n the individual task themselves. An important implication of this type of finding is that when such timesharing strategies significantly influence either primary or secondary task pertormances, workload estimates and conclusions will be specific to that type of primary-secondary task combination (c.f., O'Donnell and Eggemeier, 1986).
I l l . There are a number of sources of single-to-dual task performance decrements that are not directly related to the capacity or resource expenditure associated with either primary or secondary tasks (e.g. Kantowitz and Knight, 1976; Navon and Gopher, 1979; Roediger, Knight and Kantowitz. 1977). These sources have been referred to as "qualitative changes" in single-to-dual performance (Roediger et al., 1977) or "concurrence costs'' (Navon and Gopher, 1979). Nonresource interference can be related to such factors as: ( I ) interference between primary and secondary task occasioned by competition for structures or mechanisms within the processing system (e.g., a single memory or motor system); and (2) capacity or resource expenditure unrelated to either task individuallly, but necessary to coordinate, schedule, or facilitate the concurrent performance. Another observed fact, unrelated to resource expenditure (in secondary task performance), is the possibility that operators will vary their allocation of processing resources to a task as function (or response to) changes in the non-taskrelated conditions. These facts led some investigators (e.g. Gopher, Bricker. and Navon. 1982; Navon and (iopher, 1979, 1980) to propose that the nature o t interactions between concurrently performed tasks can best be investigated if both task difficulty and task emphasis are jointly manipulated in a dual task situation. As mentioned before. a cohesive mental workload model should take the concurrent task performance into account, but it should not rely solely o n the measures which are subject to high variability and narrow in their focus and application. This fact may have led Wierwille (1987) to conclude that "the study of workload in dual tasks is an endless endeavor."
3. CRITERIA FOR A COHESIVE WORKLOAD MODEL
The following criteria for development of a mental workload model could also be used as a guideline for selecting the appropriate measure associated with the application of the model to the operational environment. (1) Validity A "good" mental workload model should satisfy three validity constraints: predictability, and construct.
content,
(2) Reliability Reliability in this context refers to repeatability, consistency, and stability of the model and its measurement variable over time and across representative trials.
(3) Sensitivity
310
N. Meshkati
Sensitivity refers to the capability of a technique to discriminate significant variations i n the workload levels imposed by a task or group of tasks (Eggemeier, 1985). (4) Diagnosticity Diagnosticity refers to the capability OF a technique to discriminate t h e amount of workload imposed on different resources or capabilities of t h e human operator (e.g., cognitive versus motor resources) [Wickens, 1984; Eggemeier and O'Donnell, 1982). (5) Obtrusiveness The mental workload measurement technique should not interfere with and cause degradations in ongoing primary task performance.
(6) Focused The measurement technique should be focused only on the changes in the mental workload levels and should not reflect changes in physical load or artit'acts by variations in environmental conditions.
(7) Ease of the Field [Jtilization A "good" mental workload measurement technique should be easily transferable from the laboratory environment to the field situation. Factors which contribute to making a technique cumbersome include: instrumentation. analyst and operator training and data recording and analysis.
(8) Operator Acceptance
The success of a mental workload model is wholly dependent on the operator acceptance and their sincere cooperation. This would suggest understanding of the psychological profile of the typical end-uses population prior to the development of any measurement technique. 4. REFERENCES Borg, C. (1978). Subjective aspects of physical work. Ervonomics. 2 I , 2 15-220 Borg, G . , Bratfisch, 0. and Dorine, S. (1971). On the problem of perceived difficulty. Scandinavian lournal of Psychologv. 12, 249-260. Broadbent. D.( 1958). Perception and Communication. Oxford: Pergamon Press Damos, D.L.(1977). The development and transfer of timesharing skills. Proceedings - of the 21st Annual Meeting of the Human Factors Society, San Francisco, CA. Damos, D.L. and Bloem. K . A . (1985). Type A behavior pattern, multiple-task performance and subjective estimation of mental workload. Bulletin of the Psvchomonaic
A Cohesive Model of Workload
31 1
Society, 23( I ) , 53- 56 Eggemeier, F.T. ( 1985). Workload measurement in system design and evaluation. Proceedings of the 29th Annual Meeting of the Human Factors Society, Santa Monica, CA. Eggemeier, F.T. ‘ind O’Doiinell, R.O. ( 1982). A conceptual framework for development of a workload assessment methodology. Text of the Remarks made at the 1982 American Psychological Association Annual Meeting. ~~
Firth, P.A. (1973). Psychological factors influencing the relationship between cardiac arrythmia and mental load. Ergonomics, 16( I ) , 5- 16. Gibson. H.B. and Curran, J.D. (1974). T h e effect of distraction on a psychornotor task studied with reterence to personality. Irish [ournal ot Psychology. 2(3), 148- 158. Gopher. D. ( 1978). Human pet-tormance and residual capacity. Proceedings ot & Airline Pilot’s Association Symposium on Man-Systems Interface: Advances in Workload Washington. D.C.. 6-20.
w,
Gopher, D., Brickner, M . and Navon. D. ( 1982). Different difficulty inanipulations interact differently w i t h task emphasis: Evidence for multiple re5ources. Journal ( ~ t ExDerimental P ~ h o l o_. ~ -Human y: Perception and Performance, 8, 146- 157. Gopher, D. and Yorth. R.A. (1977). Manipulating the conditions of training i n timesharing perforiirance. Humair-Factors. 19, 583-593. Hamilton, P., Multler. G.. Strasser, H. and Ursin, H. (1979). Final report of the physiological psychology group. In N . Moray (Ed.), Mental Workload: Its Theory and Measurement. N e w York: Plenum Press, 367-385. Hauser, J.R., Childress, M.E. and Hart, S.G. (1982). Rating consistency and component reliance in subjective workload estimation. Paper presented at the 18th annual Conference o n Manual Control, Dayton, Ohio. Hopkin, V.D (1979). General discussion based upon interactive group sessions. In N. Moray (Ed.), Mental Workload: Its Theory and Measurement. N e w York: Plenum Press, 484-487. Huddleston. H.F. ( 1974). Personality and apparent operator capacity. Perceptual and Motor Skills. 38, 1189-1 190. Jahns, D.W. (1973) A concept of operator workload in manual vehicle operations. Forschunesinstitute Anthropotechnik. 14. Meckenheirn Bericht. Johannsen, G , Ptendler, C a i d Stein. W (1976) Human performance and workload 111 simulated landing-approaches with autopilot tallures In T B Sheridan and G Johannsen (Eds.), Monitoring- B e h a v i o r and Supervisory Control New York Plenum Press
312
N. Meshkati
Kalsbeek, J.W.H. and Sykes, R.N. (1967).Objective measurement of mental load. Acts Psycholoeica. 27,253-261. Kantowitz, B.H. and Knight, J.L. ( 1976). Testing tapping timesharing, 11: Auditory secondary task. Acta Psycholoeica, 40,343-362. Leplat, J. (1978). Factors determining workload. Ereonomics. 2 l(3). 143-149 Meshkati, N. (1983). A conceptual model for the assessment of mental workload based upon individual decision styles. Unpublished Ph.D. dissertation. University of Southern California. Meshkati, N . and Driver, M.J. (1984). Individual information processing behavior in perceived job difticulties: A decision style and job design approach to coping with human mental workload. In H.W. Hendrick and 0. Borwn (Eds.), Human Factors in Management and Organizational Desim. I . Amsterdam: North-Holland. Meshkati, N.. Hankcock, P.A. and Robertson, M . M . (1984).The measurement of human mental workload in dynamic organizational systems: An effective guide for j o b design. In Hal W. Hendrick and 0. Brown (Eds.), Human Factors in Management and Organizational Design, I . Amsterdam: North-Holland. Meshkati, N. and Lowenthal, A. (l987a). An eclectic and critical review of four primary mental workload assessment methods: A guide for developing a comprehensive conceptual model. In P.A. Hancock and N . Meshkati (Eds.), Human Mental Workload. Amsterdam: North-Holland. Meshkati, N. and Lowenthal. A. (1987b). The effects of individual differences in information processing behavior on experiencing mental workload and perceived task difficulty: A preliminary experimental investigation. In P.A. Hancock and N. Meshkati (Eds.). Human Mental Workload. Amsterdam: North- Holland. Meshkati, N. and Robertson, M . M . (1985).Individual differences in experiencing mental workload: A guide tor cockpit design evaluators. Proceedines of the 8th Annual Meeting of the Los Aneeles Chapter of the Human Factors Society, Los Angeles. CA. Moray, N. ( 1984). Mental workload. Proceedinm - of t h e 1984 International Conference on Occupational Ergonomics. Toronto, Canada, 4 1-46. Moray, N. (1967). Where is attention limited? Psycholopica. 27,84-92.
A survey and a model.
&a
Mulder, G. and Mulder Hajonides van der Meulen. W.R.E.H. (1973). Mental load and the measurement of heart rate variability. Ereonomics. 16(I), 69-83. Navon, D. and Gopher, D. (1980). Task difficulty, resources, and dual task performance. In R. Nickerson (Ed.) Attention and Performance. VIII. Hillsdale, N e w Jersey: Erlbaum. Navon, D. and Gopher D. (1979). On the economy of the human processing system.
A Cohesive Model of Workload
313
Psychological Review, 86, 2 14-255. O’Donnell, R.D. and Eggemeier, F.T. ( 1986). Workload assessment methodology. I n L. Kaufman, J . Thomas and K. Botf (Eds.) Handbook of Perception and P e r f o r m s . New York: Wiley Press. Robertson, M.M. (1984). Personality differences as a moderator of mental workload behavior: Mental workload performance and strain reaction as a function of cognitive complexity. Proceedings o& 28th Annual Meeting of the Human Factors Society, Santa Monica, CA. Robertson, M.M. and Meshkati. N. (1985). Analysis of t h e effects of two individual differences classitication model on experiencing mental workload of a computergenerated task: A new perspective to job design and task analysis. Proceedings of the 29th Annual Meeting- of the Human Factors Society, Santa Monica, CA. Roediger, H.L.. Knight, J.L. and Kantowitz, B.H. (1977). Inferring decay in short-term memory: The issue of capacity. Memory and Cognition, 5, 167- 176. Sanders, A.F. (1979). Some remarks on mental workload. I n N . Moray (Ed.), Mental Workload: Its Theory and Measuremnent. New York: Plenum Press, 41-71. Sheridan, T.B. and Simpson, R . W . (1979). Toward t h e Definition and Measurement of the Mental Workload of Transport Pilots: F T L Report No. R79-4. Cambridge. Massachusetts Institute of Technology, Flight Transportation Laboratory. Sutton, S. and Tueting, P. (1975). The sensitivity of the evoked potential to psychological variables. In P.H. Venables and M.J. Christie (Eds.), New York: Wiley and Sons, 351-363.
v.
Thackray. R . I . , Jones, K.N. and Touchstone, R.M. (1973). Personaltiy and physiological correlates of performance decrement on a monotonous task requiring sustained attention. FAA Office of Aviation Medicine: Report No. AM-73-14. Washington, D.C.. Welford. A.T. (1978). Mental workload as a function of demand, capacity, strategy and skill. Ervonornics, 21(3). 157-167. White, R.T. ( I 9 7 I). Task analysis methods: Review and development of techniques for analyzing mental workload in multiple-task situatioans. McDonnell Douglas Corporation: Report No. MCD-15=, Long Beach, CA. Wickens, C.D. (1984a). Processing resources in attention. In R. Parasuraman and R. Davies (Eds.). Varieties of Attention. N e w York: Academic Press. Wickens, C.D. (1984b). Engineering - Psychology and Human Performance. Columbus, Ohio: Charles E. Merril Publishing Co.. Wickens, C.D. (1980). The structure of attentional resources. In R. Nickerson (Ed.), Attention and Performance. VIII. Hillsdale, New Jersey: Earlbaum.
314
N. Meshkati
Wickeris, C. (1979). Measures of workload. stress arid secondary tasks. I n N . Moray (Ed.). Mental W o r k b a d ; I t s ~ ~ l a!ld ~ o Measureinent. ~ y N e w York: Plenum Press, 79-99. Wierwille, W. W. (1987). lrriportaiit remaining issues in mental workload estimation. In P.A. Hancock and N . Meshkati (Eds.), Human Mental Workload. Amsterdam: NorttiHolland. Williges, R.C. and Wierwille, W . W . (1978). Survey and analysis 0 1 operator workload assessment techniques. Svstemetics. Inc. .Final Technical Report No. 5-78- I 0 I , Blacks burg, Virginia.
HUMAN MENTAL WORKLOAD P.A. Hancock and N. Meshkati (Editors) 0 Elsevier Science Publishers B.V. (North-Holland), 1988
IMPORTANT REMAINING ISSUES I N MENTAL WORKLOAD ESTIMATION W a l t e r W. W i e r w i l l e Human F a c t o r s E n g i n e e r i n g C e n t e r Department o f I n d u s t r i a l E n g i n e e r i n g and O p e r a t i o n s R e s e a r c h V i r g i n i a P o l y t e c h n i c I n s t i t u t e and S t a t e U n i v e r s i t y B l a c k s b u r g , V i r g i n i a 24061
USA Mental workload r e s e a r c h , having made i n i t i a l s t r i d e s , now a p p e a r s on t h e v e r g e of bogging down in d e t a i l s . I t is i m p o r t a n t t h a t momentum in t h i s r e s e a r c h area n o t b e l o s t . T h e r e f o r e , s e v e r a l new, p r a c t i c a l r e s e a r c h problems are proposed and j u s t i f i e d . Each r e s e a r c h problem is p r e s e n t e d as a n e n t i t y . I f i n v e s t i g a t e d , t h e r e s e a r c h problems have a h i g h l i k e l i h o o d of payoff and p o t e n t i a l f o r d i r e c t application.
INTRODUCTION I n a n y f i e l d of r e s e a r c h t h e r e is a t e n d a n c y f o r r e s e a r c h e r s t o g e t c a u g h t up i n t h e day-to-day problems of p r o p o s i n g , p l a n n i n g , and c o n d u c t i n g r e s e a r c h e x p e r i m e n t s and in t h e documenting and p r e s e n t i n g of r e s u l t s . R e s e a r c h e r s a t most i n s t i t u t i o n s are i n v o l v e d in similar a c t i v i t i e s , and t h i s c r e a t e s a d e g r e e of c o m p e t i t i o n and a s t r o n g i m p e t u s t o push forward and g e t e x p e r i m e n t s done and p u b l i s h e d . These a c t i v i t i e s t e n d t o s h o r t e n t h e r e s e a r c h e r ' s h o r i z o n and c a u s e him o r h e r t o c o n c e n t r a t e on t h e problems immediately a t hand. T h i s l e a v e s l i t t l e time t o examine long r a n g e goals. When t h e e d i t o r s of t h i s book a s k e d me t o c o n t r i b u t e a c h a p t e r , i t seemed a t f i r s t t h a t p e r h a p s t h e b e s t c o n t r i b u t i o n would b e t o r e p o r t on a s p e c i f i c s t u d y o r summarize s e v e r a l s t u d i e s on workload. However, a f t e r a d d i t i o n a l t h o u g h t i t seemed t h a t what was r e a l l y needed was a c h a p t e r on i m p o r t a n t problems in workload e s t i m a t i o n t h a t have r e c e i v e d l i t t l e o r no previous a t t e n t i o n . Such a c h a p t e r m i g h t b e h e l p f u l t o o t h e r s s e e k i n g t o s h i f t d i r e c t i o n s somewhat. It might a l s o b e h e l p f u l t o r e s e a r c h e r s w i s h i n g t o g e t s t a r t e d in workload r e s e a r c h b u t in a new area. While t h e r e have been numerous l i t e r a t u r e r e v i e w s on workload e s t i m a t i o n o v e r t h e p a s t d e c a d e , t h e r e a p p e a r t o have been no p a p e r s on f u t u r e d i r e c t i o n s t h a t workl o a d r e s e a r c h might t a k e . Thus, as a g r o u p we r e s e a r c h e r s a p p e a r t o have good h i n d s i g h t b u t poor f o r e s i g h t .
I t is a b i t b r a s h f o r any o n e r e s e a r c h e r t o s u g g e s t what f u t u r e d i r e c t i o n s r e s e a r c h s h o u l d t a k e , and anyone who d o e s so r u n s t h e r i s k o f a p p e a r i n g a r r o g a n t in f r o n t of h i s o r h e r p e e r s . A committee of a t least f i v e s h o u l d have w r i t t e n t h i s c h a p t e r , w i t h t h e members c a r e f u l l y s e l e c t e d t o r e p r e s e n t i n d u s t r y , government, and u n i v e r s i t i e s as w e l l as e n g i n e e r i n g and b e h a v i o r a l s c i e n c e , as w e l l as males and f e m a l e s . The f i n d i n g s and recomm e n d a t i o n s might have been more evenhanded and thorough i f done by committee. However, most s u c h c o m m i t t e e s are r e a s o n a b l y d e m o c r a t i c , and I might
316
W.W. Wienville
have been o u t v o t e d i n r e g a r d t o what s h o u l d be i n c l u d e d ! Each i n d i v i d u a l i s s u e is t a k e n up as a s e c t i o n in t h e remainder of t h e c h a p t e r . Each issue s t a n d s a l o n e and does n o t depend on t h e o t h e r s . The o n l y common t h r e a d is a b i a s toward t o p i c s having e v e n t u a l a p p l i c a b i l i t y i n system d e s i g n and e v a l u a t i o n . Workload s h o u l d n o t be t r e a t e d as a pure s c i e n c e . It is an a p p l i e d s c i e n c e o r technology. I n any c a s e , whether t h e r e is argument o r disagreement on t h e importance of t h e i s s u e s pres e n t e d , t h e hope is t h a t t h e i r p r e s e n t a t i o n w i l l provoke t h o u g h t and commentary on new d i r e c t i o n s f o r workload r e s e a r c h . THE IMPORTANCE OF MULTIPLE EXPERIMENTS
I n workload e s t i m a t i o n r e s e a r c h most i n v e s t i g a t o r s perform one experiment a t a time, draw c o n c l u s i o n s , and t h e n r e p o r t t h e r e s u l t s i n t h e form of a p r e s e n t a t i o n o r p u b l i s h e d paper. T h e o r e t i c a l c o n s t r u c t s are developed on t h e b a s i s o f t h e r e s u l t s , and h o p e f u l l y , t h e technology and s c i e n c e of workload e s t i m a t i o n a r e advanced i n c r e m e n t a l l y . U n f o r t u n a t e l y , t h i s more-or-less s t a n d a r d p r o c e s s seems t o be c a u s i n g more problems t h a n i t s o l v e s . Workload r e s e a r c h e r s , 'of which t h e r e a r e perhaps a hundred, t h e m a j o r i t y i n t h e U.S., have been producing competent techn i c a l p a p e r s f o r t h e p a s t decade. While t h e r e is l i t t l e q u e s t i o n t h a t o u r u n d e r s t a n d i n g of workload h a s i n c r e a s e d somewhat, i t h a s n o t i n c r e a s e d i n p r o p o r t i o n t o t h e number of p u b l i c a t i o n s . I n f a c t , t h e r e seems t o be a d e c r e a s i n g rate of r e t u r n on t i m e and e f f o r t expended. The e x p e r i m e n t s seem t o become more s p e c i f i c and more d e t a i l e d , w h i l e t h e r e s u l t s o b t a i n e d a p p e a r less u s a b l e i n a p p l i c a t i o n s . To be s u r e , in a new f i e l d of r e s e a r c h t h e major advances are u s u a l l y made in t h e e a r l y s t a g e s w i t h smaller advances f o l l o w i n g as t h e y e a r s pass. E v e n t u a l l y a l l of t h e i m p o r t a n t ground g e t s plowed and r e s e a r c h e r s move on t o a new f i e l d . Each of us can name landmark workload s t u d i e s (though we may n o t be in t o t a l agreement) done d u r i n g t h e p a s t decade, b u t t h e s e s t u d i e s were ones which d e a l t in g e n e r a l i z a b l e r e s u l t s i n s t e a d of s p e c i f i c s . They were s t u d i e s which supposedly provided a b a s i s f o r f u r t h e r work and r e f i n e m e n t . Now work is going forward on t h e s p e c i f i c s . U n f o r t u n a t e l y , what we're f i n d i n g is t h a t t h e s p e c i f i c s are so v a r i e d and in some c a s e s c o n t r a d i c t o r y t h a t t h e r e a r e few i f any f u r t h e r g e n e r a l c o n c l u s i o n s t o be drawn. Many would a g r e e t h a t workload r e s e a r c h is now mired down i n i t s own d e t a i l s and is advancing more slowly.
Is t h e r e any way o u t of t h i s dilemma? Do we c o n t i n u e going a f t e r d e t a i l s u n t i l a d m i n i s t r a t o r s become t i r e d of a l l o c a t i n g funds f o r r e s e a r c h in t h e a r e a , o r is t h e r e an a l t e r n a t i v e approach t h a t c a n be t a k e n t h a t c a n red i r e c t work and make i t more u s e f u l ? I f such a r e d i r e c t i o n is t o be undert a k e n i t must have t h e backing of t h e m a j o r i t y of r e s e a r c h e r s in t h e w r k l o a d area. Otherwise i t w i l l n o t occur. While t h e r e may be s e v e r a l c a u s e s f o r t h e l a c k of p r o g r e s s in t h e workload area, i t does a p p e a r t h a t t h e r e is a t l e a s t one major c a u s e . T h i s major c a u s e is not " l a c k of a workload d e f i n i t i o n " , i t is not " l a c k of import a n c e " of t h e workload a r e a , i t is not u s u a l l y l a c k of good e x p e r i m e n t a l
Remaining Issues in Mental Workload Estimation
317
method. Obviously t h e s e may have caused some of t h e problems, b u t they a r e n o t t h e major problem. I t a p p e a r s t h a t t h e major c a u s e of t h e l a c k of advancement of knowledge is l a c k of g e n e r a l i t y of r e s u l t s . Experiments have become v e r y s p e c i f i c , and have focused on d e t e r m i n i n g i n t e r a c t i v e e f f e c t s i n d u a l O K t r i p l e t a s k s , and s i m i l a r t y p e s of e x p e r i m e n t a l paradigms. The problem is t h a t of c o u r s e s u b j e c t s o f t e n respond d i f f e r e n t l y t o d i f f e r e n t combinations of t a s k s elements. Of c o u r s e t h i s needs t o be examined c a r e f u l l y , and of c o u r s e t h e r e is l i t t l e o r n o chance of g e n e r a l i z i n g t h e r e s u l t s . But i f t h e r e s u l t s are n o t g e n e r a l i z a b l e , have we r e a l l y added much t o t h e o v e r a l l u n d e r s t a n d i n g of workload? The s t u d y of workload i n d u a l t a s k s is an e n d l e s s endeavor. I f we assume j u s t twenty f i v e p o s s i b l e t a s k s along one dimension and a n o t h e r twenty f i v e a l o n g a second dimension, t h e r e are 625 e x p e r i m e n t s t h a t must be run t o c o v e r a l l t h e combinations. T h i s assumes we're n o t concerned a b o u t i n s t r u c t i o n s , t a s k omphasis, s t i m u l u s and r e s p o n s e m o d a l i t y , and a number of o t h e r f a c t o r s t h a t can e f f e c t t h e outcomes. The problem very q u i c k l y g e t s o u t of hand. Aside from t h e d u a l - t a s k d i m e n s i o n a l i t y problem, t h e r e is a second major problem, t h a t of i n d i v i d u a l d i f f e r e n c e s . Most r e s e a r c h e r s would a g r e e t h a t human o p e r a t o r s v a r y enormously i n t h e way they p e r c e i v e , m e d i a t e , and respond t o t a s k s . T r a i n i n g , e x p e r i e n c e , p e r s o n a l i t y , range of a p t i t u d e s , g e n d e r , e m o t i o n a l s t a t e , a g e , p e r c e p t u a l s t y l e , sensory-motor a b i l i t i e s , s e m a n t i c i n t e r p r e t a t i o n of i n s t r u c t i o n s and d e f i n i t i o n s , and e t h n i c and c u l t u r a l d i f f e r e n c e s are some of t h e f a c t o r s e n t e r i n g i n t o t h i s v a r i a t i o n . I n d i v i d u a l d i f f e r e n c e s have been r e c o g n i z e d e v e r s i n c e r e s e a r c h e r s began using s t a t i s t i c s i n b e h a v i o r a l s t u d i e s . Indeed, one of t h e major r e a s o n s f o r u s i n g good e x p e r i m e n t a l d e s i g n and s t a t i s t i c a l methods is so t h a t c o n c l u s i o n s can be drawn about t h e g e n e r a l ( o r u s e r ) p o p u l a t i o n even though t h e r e a r e i n d i v i d u a l d i f f e r e n c e s w i t h i n t h e sample p o p u l a t i o n . I t seems t h e n t h a t OUK e f f o r t s have gone from t h e g e n e r a l t o t h e s p e c i f i c , and t h a t w h i l e we have l e a r n e d a good d e a l , we a r e n o t much c l o s e r t o any g e n e r a l i t i e s than we were s e v e r a l y e a r s ago. Some might a r g u e t h a t in f a c t t h e r e are no g e n e r a l i t i e s i n workload, and t h a t e v e r y e x p e r i m e n t a l s i t u a t i o n and o p e r a t o r is unique i n r e g a r d t o workload. But i f t h i s is so, we should simply admit t h a t workload e v a l u a t i o n is f u t i l e and move on t o o t h e r problems t h a t are s o l v a b l e . The f a c t is t h a t t h e r e a r e some g e n e r a l i t i e s . A v a r i e t y of workload t e c h n i q u e s have now been demonstrated t o have g l o b a l s e n s i t i v i t y t o o p e r a t o r l o a d i n g . Other t e c h n i q u e s a r e known t o work w e l l i n g e n e r a l c a t e g o r i e s of t a s k s , such a s manual c o n t r o l . ( H a r t , 1975; Reid, S h i n g l e d e c k e r , and Eggemeier, 1981; W i e r w i l l e , C a s a l i , Connor, and Rahimi, 1985). Furthermore, u n d e r s t a n d i n g of what " d r i v e s " v a r i o u s measures is beginning t o emerge. I t c a n be s a f e l y s a i d t h a t p r o g r e s s h a s been made. How were t h e s e g e n e r a l r e s u l t s o b t a i n e d and why has p r o g r e s s appeared t o slow? The answer t o t h i s q u e s t i o n i s p i v o t a l in d e c i d i n g where we go from here.
I t seems t h a t t h e g e n e r a l r e s u l t s have been o b t a i n e d by performing m u l t i p l e e x p e r i m e n t s c o v e r i n g d i f f e r e n t a s p e c t s of human b e h a v i o r in systems. No s i n g l e experiment is s u f f i c i e n t t o draw g e n e r a l c o n c l u s i o n s . M u l t i p l e experiments must be r u n b e f o r e t h e g e n e r a l i t i e s b e g i n t o a p p e a r . As i n d i c a t e d a t t h e beginning of t h i s s e c t i o n , r e s e a r c h e r s tend t o perform one s t u d y a t a t i m e and t h e n r e p o r t i t . And, a s t h e e x p e r i m e n t s have become
318
W.W.WienviNe
more d e t a i l e d , they have a l s o become less t r a n s f e r r a b l e o r g e n e r a l i z a b l e . Perhaps t h e d e t a i l e d r e s u l t s of a g i v e n experiment a r e b e t t e r u n d e r s t o o d , b u t a c t u a l l y l i t t l e o v e r a l l advancement of knowledge i n t h e g e n e r a l a r e a of workload h a s been made. The key, t h e n , seems t o be t h a t s e v e r a l e x p e r i m e n t s must b e performed b e f o r e any g e n e r a l c o n c l u s i o n s can be drawn. And, u n l e s s t h e r e s u l t s are g e n e r a l i z a b l e a c r o s s e x p e r i m e n t s , they a r e n o t v e r y u s e f u l . A s r e s e a r c h e r s we employ proper e x p e r i m e n t a l d e s i g n and s t a t i s t i c a l t e c h n i q u e s so t h a t we can g e n e r a l i z e t h e e f f e c t s of a s p e c i f i c experiment t o p o p u l a t i o n means. However, such p r o c e d u r e s do n o t a l l o w us t o draw i n f e r e n c e s about o t h e r experiments o r o t h e r s i t u a t i o n s . T h e r e f o r e , m u l t i p l e e x p e r i m e n t s must b e conducted. For g e n e r a l i z a b i l i t y a c r o s s s i t u a t i o n s , t h e e x p e r i m e n t s using d i f f e r e n t s i t u a t i o n s must show a c o n s i s t e n t p a t t e r n . I f t h e r e is no s u c h p a t t e r n , t h e n t h e r e s u l t s a r e n o t g e n e r a l i z a b l e and a r e of l i m i t e d v a l u e in terms of workload e v a l u a t i o n . I t i s s u g g e s t e d then t h a t e x p e r i m e n t e r s r u n g r o u p s of e x p e r i m e n t s , w i t h e a c h i n d i v i d u a l experiment d i r e c t e d toward a g i v e n a p p l i c a t i o n area. R e s u l t s s h o u l d then be r e p o r t e d f o r t h e i n d i v i d u a l e x p e r i m e n t s as well a s t h o s e t h a t a p p e a r c o n s i s t e n t a c r o s s experiments. The l a t t e r are by f a r t h e most i m p o r t a n t , f o r they i n d i c a t e t h e d e g r e e of g e n e r a l i t y in t h e r e s u l t s . While planning and c a r r y i n g o u t m u l t i p l e e x p e r i m e n t s is time-consuming and c o s t l y , such a procedure s h o u l d in t h e long run a c t u a l l y produce more r a p i d p r o g r e s s in t h e s t u d y of workload.
THE CONCEPT OF FULL MENTAL LOAD AND ITS IMPLICATIONS FOR SYSTEM DESIGN
When t h e l o a d t h a t t h e human o p e r a t o r of a system e x p e r i e n c e s is p r i m a r i l y c o g n i t i v e , t h e o p e r a t o r ' s r e s p o n s e t o t h a t load can be viewed a s fundamentally d i f f e r e n t from t h a t f o r o t h e r t y p e s of l o a d . The d i f f e r e n c e i s t h a t w i t h c o g n i t i v e l o a d t h e r e may be a r e l a t i v e l y i n e l a s t i c upper limit. Examination seems t o i n d i c a t e t h a t indeed t h e r e is a fundamental d i f f e r e n c e in most t y p e s of c o g n i t i v e load as compared w i t h o t h e r t y p e s of l o a d . The d i f f e r e n c e i s t h a t t h e o p e r a t o r is f u l l y loaded u n t i l t h e problem a t hand is solved o r r e s o l v e d . T h e r e a f t e r , load d r o p s r a p i d l y u n t i l t h e next problem s i t u a t i o n a r i s e s . To be s u r e , as S i e g e 1 and Wolf ( 1 9 6 9 ) have i n d i c a t e d , stress a f f e c t s problem s o l v i n g a b i l i t y . Under mild stress, an o p e r a t o r can r e d u c e t h e time n e c e s s a r y t o s o l v e o r r e s o l v e a problem. But, t h e r e is a l i m i t t o t h i s t i m e r e d u c t i o n , a f t e r which problem s o l v i n g d e g r a d e s r a p i d l y . O p e r a t o r workload in a system c a n be viewed as r e s u l t i n g from demanding t h e I n rough terns a f u l l l o a d is o p e r a t o r t o perform more t h a n a " f u l l load". t h e maximum c o g n i t i v e load t h e o p e r a t o r can h a n d l e w i t h a d e q u a t e t i m e a v a i l a b l e . For example, in n a v i g a t i o n problems, i t t a k e s a c e r t a i n amount of time t o o b t a i n t h e c o r r e c t answer. During t h e t i m e t h a t t h e o p e r a t o r is s o l v i n g t h e problem, he o r s h e is f u l l y occupied. Any a d d i t i o n a l t a s k s o r o t h e r d i s t r a c t i o n s w i l l simply i n c r e a s e t h e amount of time r e q u i r e d t o o b t a i n t h e c o r r e c t answer. I f t h e t i m e a v a i l a b l e t o o b t a i n t h e answer is s h o r t e r t h a n t h e t i m e r e q u i r e d by t h e o p e r a t o r t o o b t a i n t h a t answer, t h e
Remaining Issues in Mental Workload Estimation
319
o p e r a t o r is l i k e l y t o e x p e r i e n c e both stress and h i g h workload ( o r overload). The main p o i n t being made h e r e is t h a t f o r many real t a s k s which i n v o l v e c o g n i t i v e l o a d , t h e r e is a c t u a l l y a dichotomous form of o p e r a t o r p e r f o r mance. E i t h e r t h e o p e r a t o r is f u l l y occupied in a problem s o l v i n g t a s k , o r t h e o p e r a t o r is i d l i n g a t r e l a t i v e l y low l o a d , w a i t i n g f o r t h e n e x t problem t o a p p e a r or o c c u r . Furthermore, i f time a v a i l a b l e is s h o r t e r t h a n time r e q u i r e d , t h e o p e r a t o r e x p e r i e n c e s s t r e s s , o v e r l o a d , and p o t e n t i a l l y high e r r o r r a t e s . U n f o r t u n a t e l y , our p r e s e n t models and c o n s t r u c t s do n o t seem t o f i t t h i s form of c o g n i t i v e load a c c u r a t e l y . P r e s e n t r a t i n g s c a l e s have r a t i n g l e v e l s which t r a n s f o r m o p e r a t o r l o a d i n g i n t o numerical v a l u e s . There are no d e s c r i p t o r s such as f u l l l o a d , o v e r l o a d , and i d l i n g l o a d . Furthermore, w h i l e many i n v e s t i g a t o r s have r e c o g n i z e d t h e importance of time-stress, somehow our c u r r e n t methods of measurement do n o t seem t o c o r r e c t l y assess t h i s a s p e c t . Namely, t h e y do n o t i s o l a t e time-stress and i t s a s s o c i a t i o n w i t h f u l l l o a d . There is a n i n h e r e n t assumption t h a t s h o r t e n i n g t h e time t o perform a t a s k may c a u s e a more-or-less p r o p o r t i o n a l i n c r e a s e in workload. On t h e o t h e r hand, i f t h e concept of f u l l l o a d is a v a l i d one, s h o r t e n i n g t h e time t o perform a t a s k s h o u l d c a u s e an a b r u p t s h i f t from s a t i s f a c t o r y ( o r a c c e p t a b l e ) l o a d t o o v e r l o a d . In o t h e r words, once t h e time a v a i l a b l e t o s o l v e or r e s o l v e a problem is less t h a n t h a t r e q u i r e d , a s h i f t s h o u l d occur from s a t i s f a c t o r y t o u n s a t i s f a c t o r y l o a d . Good o p e r a t o r s do l e a r n t o h a n d l e o v e r l o a d s u s i n g v a r i o u s s t r a t e g i e s . These may i n c l u d e e s t i m a t i n g a n answer o r c o r r e c t r e s p o n s e r a t h e r than a c t u a l l y s o l v i n g f o r one, d e l a y i n g t h e performance of l e s s i m p o r t a n t t a s k s , o r d e l a y i n g performing t h e c o g n i t i v e t a s k a t hand u n t i l t h e r e is s u f f i c i e n t t i m e t o perform i t c o r r e c t l y . However, when any of t h e s e s t r a t e g i e s is in use, t h e o p e r a t o r is e f f e c t i v e l y overcoming system d e f i c i e n c i e s in terms of imposed t a s k demands. I f t h e system r e q u i r e s t h i s type of b e h a v i o r on t h e p a r t of t h e o p e r a t o r , i t is an i n d i c a t i o n t h a t t h e system is not w e l l designed. Another way of l o o k i n g a t t h i s s i t u a t i o n is t h a t a s workload r e s e a r c h e r s we a r e a t t e m p t i n g t o examine t h e wrong problem. I n s t e a d of e v a l u a t i n g workl o a d u s i n g m e t r i c s of v a r i o u s k i n d s , we s h o u l d i n s t e a d f o c u s some of o u r a t t e n t i o n on t h e d e s i g n c h a r a c t e r i s t i c s of systems t h a t do n o t impose more than f u l l cognitive load. H i s t o r i c a l l y t h e workload problem a r o s e because of c o m p l a i n t s and uneasiness about t h e way systems were imposing l o a d s on o p e r a t o r s . Q u e s t i o n s were then asked a b o u t how workload should b e measured, and then r e s e a r c h e r s went o u t and s t a r t e d r e s e a r c h i n g workload e s t i m a t i o n . While on t h e s u r f a c e t h i s may a p p e a r t o be a l o g i c a l p r o g r e s s i o n of e v e n t s , i t may n o t in f a c t b e l o g i c a l . The real c a u s e of t h e problem is i n a d e q u a t e a t t e n t i o n t o o p e r a t o r l o a d i n g in system d e s i g n . Systems were d e s i g n e d , workload problems then a r o s e , and t h e n human f a c t o r s r e s e a r c h e r s were c a l l e d in t o s o l v e t h e problems. I f t h e systems had been p r o p e r l y d e s i g n e d a t t h e o u t s e t , workload-related problems might n o t have a r i s e n . What is being s u g g e s t e d h e r e is t h a t some of t h e p r e s e n t emphasis on workload e v a l u a t i o n should b e s h i f t e d toward d e s i g n p r o c e d u r e s which e n s u r e t h a t workload-related problems do n o t a r i s e . S p e c i f i c a l l y , p r o c e d u r e s need t o b e developed t h a t i n s u r e t h a t o p e r a t o r s and u s e r s are n o t r e q u i r e d t o perform under c o n d i t i o n s c a l l i n g f o r more t h a n f u l l c o g n i t i v e l o a d . I f
320
W.W. Wienville
such d e s i g n procedures can be e v o l v e d , workload c o m p l a i n t s may v e r y w e l l disappear. I t should be p o i n t e d o u t t h a t d e s i g n p r o c e d u r e s f o r c o n t r o l of o p e r a t o r l o a d a r e n o t t h e province of e n g i n e e r s p e r se. They a r e t h e p r o v i n c e of human f a c t o r s s p e c i a l i s t s and b e h a v i o r a l r e s e a r c h e r s . Those r e s e a r c h e r s who a r e c u r r e n t l y i n v o l v e d in development of work1 ad e s t i m a t i o n p r o c e d u r e s a r e t h e most q u a l i f i e d t o conduct r e s e a r c h on d e s i g n p r o c e d u r e s t h a t e l i m i n a t e workload-related d i f f i c u l t i e s a t t h e o u t s e t .
TASK ANALYTIC METHODS AND MOMENTARY WORKLOAD Task a n a l y t i c methods of workload e s t i m a t i o n are t h e primary means of e s t i m a t i n g workload when t h e system is in t h e c o n c e p t u a l s t a g e s . Because no system O K p h y s i c a l s i m u l a t i o n of t h e system e x i s t s , e x p e r i m e n t a l methods of workload e s t i m a t i o n c a n n o t be used. Task a n a l y t i c , p r o c e d u r e s u s u a l l y i n v o l v e t h e o r e t i c a l estimates of human o p e r a t o r r e s o u r c e s r e q u i r e d per u n i t t i m e , compared w i t h estimates of human o p e r a t o r r e s o u r c e s a v a i l a b l e d u r i n g t h e same u n i t of time. When r e q u i r e d r e s o u r c e s approach o r exceed a v a i l a b l e r e s o u r c e s , a workload problem is s a i d t o e x i s t . Examples of t a s k a n a l y t i c methods i n c l u d e WECC (workload e v a l u a t i o n of c o c k p i t c r e w s ) , (Boylan, 1974); HOS (human o p e r a t o r s i m u l a t o r ) , (Lane, Werry, and S t r i e b , 1 9 7 7 ) ; and CAFES (computer-aided f u n c t i o n a l l o c a t i o n e v a l u a t i o n s y s t e m ) , ( P a r k s and S p r i n g e r , 1975). There is l i t t l e doubt t h a t t h e t a s k a n a l y t i c methods a r e v a l u a b l e human e n g i n e e r i n g t o o l s . S i n c e t h e r e a p p e a r t o be no a l t e r n a t i v e d e s i g n procedu r e s , system d e s i g n e r s have l i t t l e c h o i c e b u t t o u s e t h e s e methods in one form o r a n o t h e r . There is e v e r y r e a s o n t o b e l i e v e t h a t t h e t a s k a n a l y t i c methods work r e a s o n a b l y w e l l and t h a t they do indeed a s s e s s and p r e d i c t w i t h some d e g r e e of a c c u r a c y t h e workload a c t u a l l y e x p e r i e n c e d by system o p e r a t o r s , once t h e system h a s been f a b r i c a t e d and t e s t e d . G e n e r a l l y s p e a k i n g , t h o s e d e s i g n e r s who u s e t a s k a n a l y t i c methods in a conc e p t u a l d e s i g n a l s o check t h e i r d e s i g n s in s i m u l a t i o n o r a c t u a l hardware a t a l a t e r t i m e . The u s u a l p r o c e d u r e f o r checking workload i n v o l v e s t h e u s e of r a t i n g s c a l e s and q u e s t i o n n a i r e s . R a t i n g s c a l e s a r e used t o o b t a i n numerical ( o r d i n a l ) r a t i n g s of workload, and q u e s t i o n n a i r e s r e q u e s t i n g comments a r e used t o uncover workload-related d e f i c i e n c i e s O K " h o t s p o t s " and o t h e r problems in t h e operator-machine system. The r a t i n g scales in g e n e r a l are used t o a s s e s s workload f o r a mission o r s c e n a r i o segment, u s u a l l y between two and t e n minutes in l e n g t h . Because of t h e l e n g t h of t h e segment, t h e r a t i n g v a l u e s o b t a i n e d r e p r e s e n t a v e r a g e workload o v e r t h a t segment. I f t h e r e a r e peaks and v a l l e y s , t h a t is, moment t o moment f l u c t u a t i o n s in l o a d , t h e o p e r a t o r is q u i t e l i k e l y t o r a t e on t h e b a s i s of a v e r a g e l o a d . Recent work by S t r a v e l a n d , Hart, and Yeh (1985) t e n d s t o confirm t h i s , even when i n t r a - t a s k d i f f i c u l t y i s varied. To a s s e s s momentary f l u c t u a t i o n in workload, d e s i g n e r s have r e l i e d on t h e a t t e n d a n t q u e s t i o n n a i r e i n f o r m a t i o n . H o t s p o t s and similar problems r e p r e s e n t a way of uncovering momentary o v e r l o a d s on t h e o p e r a t o r . While s u c h q u e s t i o n n a i r e I n f o r m a t i o n is Very h e l p f u l , p a r t i c u l a r l y in t h e a b s e n s e of o t h e r moment-to-moment i n f o r m a t i o n , i t c e r t a i n l y r e p r e s e n t s a round-about
321
Remaining Issues in Mental Workload Estimation procedure f o r examining momentary workload. I n t h e r e c e n t p a s t a number of r e s e a r c h e r s have a l l u d e d t h e problem of momentary workload. Geer (1976) f o r example h a s s t a t e d : The p o i n t of t h e a n a l y s i s is t o d i s c o v e r s i g n i f i c a n t workload c o n d i t i o n s i n c l u d i n g peaks, n o t t o mask them out. The a n a l y s t is a l s o c a u t i o n e d n o t t o a v e r a g e workload o v e r t h e t i m e i n c r e m e n t s being c o n s i d e r e d . A workload e s t i m a t e of 100% and an e s t i m a t e of 50% f o r two s e q u e n t i a l t a s k s o c c u r r i n g w i t h i n a g i v e n time increment must be c o n s i d e r e d an o v e r a l l e s t i m a t e of 100% ( n o t 75%).
Wierwille (1981) provided a framework f o r momentary workload in which c e r t a i n a n a l y t i c a l t e c h n i q u e s could be c a l l e d upon t o a i d i n p r o v i d i n g s t a b i l i t y of t h e momentary ( o r s h o r t - t i m e ) workload estimates. S u b s e q u e n t l y , A n t i n and Wierwille (1984) r e p o r t e d very encouraging r e s u l t s in e s t i m a t i n g momentary workload f o r i n t e r v a l s a s s h o r t as f i v e seconds in l e n g t h . Thus, momentary workload e s t i m a t i o n a p p e a r s f e a s i b l e , and i n a l l l i k e l i h o o d is a b e t t e r i n d i c a t o r of t a s k d i f f i c u l t y than is a v e r a g e workload. I t seems n o t t o have been r e c o g n i z e d t h a t t h e r e is a c o n n e c t i o n t a s k a n a l y t i c methods used f o r p r e l i m i n a r y d e s i g n and momentary e s t i m a t i o n methods f o r e x p e r i m e n t a l d e t e r m i n a t i o n of workload. c o n n e c t i o n is t h a t momentary workload t e c h n i q u e s c a n be used t o v e r i f y t h e estimates o b t a i n e d from t h e t a s k a n a l y t i c methods.
between t h e workload The check and
The need f o r momentary workload t e c h n i q u e s a p p e a r s s e l f - e v i d e n t . Short p e r i o d s of o p e r a t o r o v e r l o a d can l e a d t o u n u s u a l o p e r a t o r b e h a v i o r s such as postponement o r n e g l e c t of t a s k e l e m e n t s , a c c e p t a n c e of less p r e c i s i o n in performing t a s k s , o r f a i l u r e t o perform t a s k s c o m p l e t e l y . I n f a c t , i t is p o s s i b l e t o argue c o n v i n c i n g l y t h a t momentary workload assessment is more i m p o r t a n t t h a n a v e r a g e workload assessment. For example, supposedly, o p e r a t o r e r r o r s i n c r e a s e d u r i n g p e r i o d s of o v e r l o a d . I f a v e r a g e workload is moderate, b u t c o n t a i n s peaks of o v e r l o a d , a d e s i g n e r may n o t be a b l e t o d e t e r m i n e t h e c a u s e of e r r o r s and may a c t u a l l y o v e r l o o k t h e o v e r l o a d cond i t i o n i f o n l y a v e r a g e measures of workload a r e t a k e n . T h i s is t h e p o i n t t h a t Geer (1976) made. Thus, w h i l e momentary workload s t a n d s on i t s own a s a needed, h i g h l y u s e f u l e v a l u a t i o n t e c h n i q u e , i t s p o t e n t i a l c o n n e c t i o n w i t h t a s k - a n a l y t i c methods makes i t even more v a l u a b l e . It is proposed t h e n t h a t t a s k a n a l y t i c methods and momentary workload methods be s t u d i e d s i m u l t a n e o u s l y i n a n e f f o r t t o develop workload e s t i m a t i o n schemata i n v o l v i n g b o t h p r e l i m i n a r y d e s i g n ( v i a some form of t a s k a n a l y t i c p r o c e d u r e ) and subsequent e x p e r i m e n t a l v e r i f i c a t i o n ( v i a some form of momentary workload e s t i m a t i o n p r o c e d u r e ) . I f developed p r o p e r l y , such a combination would e l i m i n a t e t h e need f o r r e l y i n g on q u e s t i o n n a i r e d a t a t o uncover system workload h o t s p o t s . I t would a l s o b r i n g workload e s t i m a t i o n i n t o t h e a r e n a where most a p p l i c a t i o n s are o c c u r r i n g . Unless workload t e c h n i q u e s e v e n t u a l l y f i n d t h e i r way i n t o a p p l i c a t i o n s , they w i l l be looked upon as i n t e l l e c t u a l e x e r c i s e s performed by academics and o t h e r t h e o r e t i c i a n s . By merging t h e t a s k a n a l y t i c and momentary e s t i m a t i o n p r o c e d u r e s , a powerful workload
322
W.W . Wienville
e s t i m a t i o n technology should emerge.
WORKLOAD ESTIMATION BASED ON NORMAL, OPERATING RECORDS Many t e c h n i c a l p e r s o n n e l are i n v o l v e d i n test and e v a l u a t i o n p r o c e d u r e s f o r new systems. Each of t h e m i l i t a r y s e r v i c e s h a s e x t e n s i v e f a c i l i t i e s f o r performing t e s t and e v a l u a t i o n , and o t h e r government and i n d u s t r i a l concerns a l s o have such f a c i l i t i e s . P e r s o n n e l engaged in t e s t and e v a l u a t i o n are o f t e n faced w i t h problems of e s t i m a t i n g mental workload. New systems tend t o be complex, they u s u a l l y i n v o l v e computer i n t e r f a c e s and s i m i l a r d a t a i n p u t / o u t p u t d e v i c e s , and t h e systems themselves have been procured through a p r o c e s s of c o m p e t i t i v e bidding. E v a l u a t i o n of workload t h u s f a l l s on p e r s o n n e l who must p r o t e c t t h e concerns of t h e i r employing a g e n c i e s and t h e l i v e s and p r o p e r t y of people i n v o l v e d in u s i n g t h e systems o r o t h e r w i s e a f f e c t e d by them. While no f o r m a l i n t e r v i e w p r o c e s s h a s been performed, a comment o f t e n h e a r d from t h e s e p e r s o n n e l is t h a t they need workload e s t i m a t i o n t e c h n i q u e s t h a t a r e embedded in t h e system. S p e c i f i c a l l y , they would l i k e t o be a b l e t o Such r e c o r d s i n c l u d e e v a l u a t e workload based on " o p e r a t i n g r e c o r d s . " r e c o r d i n g s of v e r b a l communications, o p e r a t o r i n p u t s t o t h e system, system r e s p o n s e s back t o t h e o p e r a t o r , and system computer d a t a c a r r i e d on d a t a b u s s e s . Fundamentally, t h e d a t a a v a i l a b l e are d a t a t h a t workload r e s e a r c h e r s would c a l l "performance" o r "primary t a s k " d a t a . However, b e c a u s e most new systems a r e computer c o n t r o l l e d , t h e s e performance d a t a a r e now more abundant. S p e c i f i c a l l y , t h e d a t a b u s s o r b u s s e s can be tapped and " l i s t e n e d to" by p e r i p h e r a l p r o c e s s i n g equipment. For example, in a h i g h performance a i r c r a f t , t h e a i r c r a f t system bus might be made a v a i l a b l e t o a p o r t a b l e p r o c e s s o r t h a t p l u g s i n t o t h e buss, and m o n i t o r s and a n a l y z e s d a t a w i t h t h e o b j e c t i v e of e v a l u a t i n g workload.
A s i s w e l l known, e s t i m a t i o n of workload based on performance h a s l i m i t a t i o n s . Task performance d o e s n o t u s u a l l y b e g i n t o degrade u n t i l o p e r a t o r workload is very high. T h i s o c c u r s because of t h e o p e r a t o r ' s a d a p t a t i o n t o i n c r e a s e d l o a d and t h e m u s t e r i n g of a d d i t i o n a l r e s o u r c e s ( e f f o r t ) . Also, workload metrics based on performance must be i n d i v i d u a l l y t a i l o r e d t o t h e s p e c i f i c system. The metrics used f o r a high performance a i r c r a f t would have t o be d i f f e r e n t from t h o s e used f o r a n u c l e a r power p l a n t . However, t h e s e l i m i t a t i o n s may a p p e a r l a r g e r t h a n they r e a l l y a r e when a t t e m p t i n g t o measure workload from o p e r a t i o n r e c o r d s . Assume f o r t h e moment t h a t i t is t r u e t h a t system performance does n o t By measuring system b e g i n t o degrade u n t i l o p e r a t o r workload i s high. performance, t h e n , i t s h o u l d a t l e a s t be p o s s i b l e t o dichotomize workload a s very h i g h o r less than v e r y high. Beyond t h i s , a d d i t i o n a l i n f o r m a t i o n c a n be o b t a i n e d . When t h e system o p e r a t o r m u s t e r s a d d i t i o n a l e f f o r t t o meet t h e i n c r e a s e d demands of t h e system, t h a t o p e r a t o r changes s t r a t e g y o r o t h e r w i s e a l t e r s t h e way in which t h e t a s k is performed. Such a change is l i k e l y t o r e s u l t i n some o b s e r v a b l e change in t h e o p e r a t o r ' s o u t p u t . Our own r e s e a r c h on primary t a s k workload measures h a s shown t h a t o p e r a t o r o u t p u t v a r i a b l e s l i k e l y t o be s e n s i t i v e t o t h i s s h i f t in s t r a t e g y i n c l u d e a number of c o n t r o l movements i n manual t a s k s and r e s p o n s e t i m e i n p e r c e p t u a l and c o g n i t i v e t a s k s . More s p e c i f i c a l l y , in manual t a s k s , a n o p e r a t o r must a p p l y more i n p u t s and c o r r e c t i o n s t o h o l d system o u t p u t e r r o r l e v e l w i t h i n specified l i m i t s . I n cognitive tasks, cognitive processing load manifests
Remaining Issues in Mental Workload Estimation
323
i t s e l f as i n c r e a s e d t i m e t o r e s o l v e and respond. Thus, by examining opera t o r r e s p o n s e s as w e l l a s system r e s p o n s e s , i t may be p o s s i b l e t o r e s o l v e workload i n t o f o u r o r f i v e d i f f e r e n t o r d i n a l c a t e g o r i e s . T h i s level of r e s o l u t i o n would be a g r e a t h e l p t o t e s t and e v a l u a t i o n p e r s o n n e l .
A number of y e a r s ago a r a t h e r s t r a i g h t f o r w a r d s t u d y of a i r t r a f f i c cont r o l l e r workload concluded t h a t t o t a l r a d i o t r a n s m i s s i o n time combined w i t h t r a f f i c count was a good i n d i c a t o r of c o n t r o l l e r workload (Melton, 1 9 7 9 ) . T h i s measure could b e computed from a v a i l a b l e r e c o r d i n g s , and i t was n o t i n t r u s i v e . A s s u c h , t h e measure r e p r e s e n t e d a workload e s t i m a t i o n techn i q u e based on normal o p e r a t i n g r e c o r d s , c a u s i n g no i n t e r f e r e n c e in t h e o p e r a t o r performance of t h e t a s k . And, i t provided a u s e f u l estimate of workload. Today, most new human o p e r a t o r i n t e r f a c e s are computerized i n one way o r a n o t h e r . N e v e r t h e l e s s , t h e problem of workload e s t i m a t i o n remains. Comp l e x i t y in e s t i m a t i n g workload is c r e a t e d a s a r e u s l t of m u l t i p l e modes of o p e r a t i o n . Many modern systems can be o p e r a t e d in a v a r i e t y of modes rangi n g from f u l l y a u t o m a t i c t o f u l l y manual. Each mode h a s a l e v e l of workl o a d a s s o c i a t e d w i t h i t , and t h e o p e r a t o r ' s t a s k complement can s h i f t from t h a t of s u p e r v i s o r o r monitor t o a c t i v e c o n t r o l l e r . Furthermore, a t a g i v e n l e v e l of automation t h e system may perform a v a r i e t y of t a s k s a s s o c i i a t e d w i t h v a r i o u s subsystems. Thus, workload measurement becomes depende n t on t h e mode of o p e r a t i o n and t h e subsystems being updated o r used. I n c i d e n t workload may change r a p i d l y w i t h demands on t h e system, l e v e l of a u t o m a t i o n , and subsystems in u s e . A s a r e s u l t , no s i n g l e , s i m p l e measure of performance may a c c u r a t e l y r e f l e c t workload l e v e l . Much more complex measures may have t o be d e r i v e d , and they may have t o change form w i t h system mode. I n s p i t e of t h e s e o b s t a c l e s , t h e r e remains a d e f i n i t e , s t r o n g need t o o b t a i n workload e s t i m a t e s from normal o p e r a t i n g r e c o r d s . These measures must be moderately a c c u r a t e , they must n o t i n t r u d e , and t h e y must t a k e i n t o a c c o u n t t h e f a c t t h a t t h e o p e r a t o r ' s r o l e changes in modern systems. Any r e s e a r c h r e s u l t s in t h i s a r e a a r e l i k e l y t o r e c e i v e a warm r e c e p t i o n from t e s t and e v a l u a t i o n p e r s o n n e l . And, i t s h o u l d be remembered t h a t t h e systems being t e s t e d and e v a l u a t e d a r e t h e ones t h a t c o n t r o l o u r d e f e n s e s y s t e m s , o u r commercial a i r t r a f f i c , our n u c l e a r p l a n t s , o u r power systems, and our communication networks. Perhaps t h e e f f o r t is worth t h e t r o u b l e .
EFFECTS OF LEARNING AND PROFICIENCY ON WORKLOAD The human a b i l i t y t o l e a r n and become p r o f i c i e n t a t a g i v e n t a s k r e p r e s e n t s a l a r g e l y u n i n v e s t i g a t e d a r e a of workload r e s e a r c h . T h i s a b i l i t y a p p e a r s t o be h i g h l y e l a s t i c i n t h a t p r o f i c i e n c y l e v e l c a n c o n t i n u e t o rise w i t h time. I n f a c t , l e a r n i n g c u r v e s can b e viewed as p l o t s of p r o f i c i e n c y v e r s u s t i m e f o r s i m p l e t a s k s . Recent work h a s shown t h a t f o r many s i m p l e t a s k s i n c r e m e n t a l l e a r n i n g c o n t i n u e s w i t h d i s t r i b u t e d p r a c t i c e over as much as t h i r t y days (Kennedy, B i t t n e r , C a r t e r , Krause, Harbeson, McCafferty, Pepper, and Wiker, 1981). More complex t a s k s probably i n d u c e i n c r e m e n t a l l e a r n i n g over even l a r g e r p e r i o d s of t i m e , because humans d i s c o v e r s u b t l e t i e s which a l l o w them t o c o n t i n u e t o improve.
We have a l l been i n v o l v e d in a c t i v i t i e s which a t f i r s t we found e x t r e m e l y
W . W . Wienville
324
d i f f i c u l t o r i m p o s s i b l e . These a c t i v i t i e s may have overwhelmed us when t h e y were f i r s t i n t r o d u c e d . Examples would i n c l u d e t y p i n g , performing long d i v i s i o n , d r i v i n g a manual t r a n s m i s s i o n a u t o m o b i l e , s o l o f l y i n g , p l a y i n g a m u s i c a l i n s t r u m e n t , programming a microcomputer, and u s i n g a s c r i p t e d i t o r on a t e r m i n a l . I f asked a b o u t workload l e v e l s h o r t l y a f t e r being i n t r o duced t o one of t h e s e a c t i v i t i e s , we would probably i n d i c a t e t h a t t h e l e v e l was very high. However, a f t e r having performed t h e t a s k s r o u t i n e l y e v e r y day f o r s e v e r a l months, we would no doubt i n d i c a t e t h a t workload l e v e l had d e c r e a s e d , i n some p e r h a p s , even t o t h e p o i n t of low workload. I t seems i m p e r a t i v e t h a t workload r e s e a r c h e r s must come t o g r i p s w i t h t h e problem of l e a r n i n g and p r o f i c i e n c y . Any c o n s t r u c t (workload i n t h i s c a s e ) t h a t is so e x t r e m e l y s e n s i t i v e t o a n i n t e r v e n i n g v a r i a b l e ( l e a r n i n g ) c l e a r l y must account f o r t h a t v a r i a b l e . To d a t e , competent m e n t a l workload r e s e a r c h h a s simply c o n t r o l l e d f o r l e a r n i n g i n e x p e r i m e n t s by s e v e r a l means. These i n c l u d e r i g i d l y e n f o r c i n g e q u a l p r a c t i c e p e r i o d s , o r p r a c t i c e t o c r i t e r i o n , c o u n t e r b a l a n c i n g of c o n d i t i o n s , c o n t r o l of e x p e r i e n c e l e v e l (which i s assumed t o a f f e c t p r o f i c i e n c y ) , and a l s o c o n t r o l of a g e or o t h e r s p e c i f i c background v a r i a b l e s t h a t may a f f e c t l e a r n i n g of s p e c i f i c t y p e s of s k i l l s . Without t h e s e c o n t r o l s , t h e r e s u l t s of t h e workload experiment would be confounded w i t h l e a r n i n g and p r o f i c i e n c y . With t h e s e c o n t r o l s , t h e workload a s p e c t under s t u d y c a n be p r o p e r l y examined; however, l e a r n i n g and p r o f i c i e n c y may s t i l l i n c r e a s e s u b j e c t v a r i a n c e . In o t h e r words, t h e experiment is p r o p e r l y d e s i g n e d t o examine t h e independent v a r i a b l e o r v a r i a b l e s of i n t e r e s t w i t h l e a r n i n g c o n t r o l l e d . The experiment d o e s n o t a l l o w t h e d a t a t o be examined in r e g a r d t o l e a r n i n g , e x c e p t perhaps p o s t hoc
.
Another way o f v i e w i n g l e a r n i n g and p r o f i c i e n c y is as a time-varying e f f e c t . F i g u r e 1 shows a h y p o t h e t i c a l p l o t of p r a c t i c e and a s p e c i f i c i n d e p e n d e n t v a r i a b l e on a workload m e t r i c . On day z e r o , s a y w i t h minimal p r a c t i c e , l e v e l A i n d u c e s t h e h i g h e s t workload and D i n d u c e s t h e lowest. However, a f t e r n i n e p r a c t i c e s e s s i o n s (one on each day) t h e f o u r l e v e l s of t h e independent v a r i a b l e induce a p p r o x i m a t e l y t h e same r e l a t i v e l y low workload
WORKLOAD MEASURE
D
.
INDEP VARIABLE LEVEL
0
3
6
9
DAY
Figure 1 H y p o t h e t i c a l e f f e c t of p r a c t i c e on a workload metric f o r a s p e c i f i c independent v a r i a b l e .
Remaining Issues in Mental Workload Estimation
325
l e v e l . I f t h e experiment were t e r m i n a t e d a t t h e end of t h e f i r s t day, t h e c o n c l u s i o n would be t h a t A (and p o s s i b l y B and C) i n d u c e a h i g h e r l o a d i n g I n f a c t , what h a s been measured is i n i t i a l workload l e v e l , t h a n does D. n o t s u s t a i n e d o r p r a c t i c e d workload l e v e l . I t could be argued t h a t l e v e l A r e q u i r e d t h e l o n g e s t l e a r n i n g time. But, in terms of s u s t a i n e d workload, i t c a n n o t be argued A produces h i g h e r l o a d i n g e f f e c t s . T h i s h y p o t h e t i c a l example shows t h e profound e f f e c t t h a t l e a r n i n g and p r o f i c i e n c y may have on workload e s t i m a t i o n .
I t would seem then t h a t e x p e r i m e n t e r s s h o u l d go beyond c o n t r o l l i n g f o r l e a r n i n g e f f e c t s in workload. I n s t e a d , amount of p r a c t i c e should be t r e a t e d as a n independent v a r i a b l e along w i t h t h e u s u a l independent v a r i a b l e o r v a r i a b l e s in workload r e s e a r c h , namely l o a d i n g . By i n t r o d u c i n g t h i s added dimension, a much c l e a r e r p i c t u r e of i n c i d e n t workload might emerge. In many s i t u a t i o n s i t might be found, f o r example, t h a t what was thought t o b e high workload was in f a c t a r e s u l t of i n i t i a l l e a r n i n g , and t h a t a f t e r s u f f i c i e n t p r a c t i c e workload s e t t l e d a t an a c c e p t a b l e , moderate level. The c o n c e p t of p r o f i c i e n c y and p r a c t i c e in r e g a r d t o workload can b e s t r e t c h e d a b i t f u r t h e r t o show how long term e f f e c t s of p r a c t i c e and t i m e on t a s k can b e examined. Suppose t h a t t h e h y p o t h e t i c a l experiment shown in F i g u r e 1 is c o n t i n u e d o v e r a much l o n g e r p e r i o d , s a y f o r 30 days. Under such c o n d i t i o n s t h e r e s u l t s might b e g i n t o show t h e e f f e c t s of e v e r y day boredom and l a c k of a t t e n t i o n . I f t h e workload m e t r i c is a r a t i n g scale, t h e v a l u e s o b t a i n e d might c o n t i n u e t o d e c r e a s e t o a r e l a t i v e l y low v a l u e , s u g g e s t i n g t h a t t h e o p e r a t o r is e x p e r i e n c i n g lower workload. I f t h e metric is performance r e l a t e d , on t h e o t h e r hand, t h e metric might b e g i n t o show a n i n c r e a s e r e s u l t i n g from i n a t t e n t i o n (Wiener, Curry, and F a u s t i n a , 1984). The main p o i n t i n a l l of t h i s is t h a t workload m e t r i c s probably would e x h i b i t s u b s t a n t i a l s h i f t s i n l e v e l s o v e r t i m e , s t a r t i n g from i n i t i a l pract i c e , through moderate p r o f i c i e n c y , and t h e n on i n t o t h e boredom a s s o c i a t e d w i t h e v e r y day performance. While t h e i n t r o d u c t i o n of l o n g i t u d i n a l (timeon-task) v a r i a b l e s i n t o workload r e s e a r c h may be e x p e n s i v e and t i m e consuming, t h e r e is l i t t l e q u e s t i o n t h a t a good d e a l of i n f o r m a t i o n can b e g a i n e d by doing S O . Other than t h e "endurance" s t u d i e s performed t o examine t h e e f f e c t s of aircrew f l i g h t s c h e d u l e s on performance, and p h y s i o l o g i c a l v a r i a b l e s (e.g. Hale, McNee, E l l i s , B o l l i n g e r , and Hartman, 1974; Kennedy, e t a l . 1981) t h e r e are no known s t u d i e s of p r a c t i c e and d u r a t i o n on workl o a d metrics. A few well-done s t u d i e s in t h i s area might g r e a t l y improve o u r u n d e r s t a n d i n g of what workload r e a l l y is and t h e d e g r e e t o which i t is dependent on p r a c t i c e and time-on-task e f f e c t s .
REFERENCES A n t i n , J. F. and Wierwille, W. W. (1984) I n s t a n t a n e o u s measures of m e n t a l workload: a n i n i t i a l i n v e s t i g a t i o n . S a n t a Monica, CA: Human F a c t o r s S o c i e t y . Proceedings of t h e Human F a c t o r s S o c i e t y 2 8 t h A n n u a l Meeting ( O c t o b e r ) , 6-10. Boylan, R. J. (1974) I n t r o d u c t i o n t o Boeing o p e r a t o r workload and workspace e v a l u a t i o n models. S e a t t l e , WA: Boeing Aerospace Co., Report No. D180-17526-1 ( J a n u a r y ) .
326
W .W . Wienville
Geer, C. W . (1976) A n a l y s t ' s g u i d e f o r t h e a n a l y s i s s e c t i o n s of MIL-H-46855, S e a t t l e , WA: Boeing Aerospace Co. Report D180-19476-1, p. 75, ( J u n e ) . Lane, N. E . , Werry, R. J . , and S t r i e b , M. (1977) The human o p e r a t o r s i m u l a t o r : e s t i m a t i o n of workload r e s e r v e u s i n g a s i m u l a t e d secondary t a s k . N u i l l y s u r Seine, France: AGARD. P r o c e e d i n g s of t h e AGARD Conference on Methods t o Assess Workload, Report No. AGARD-CPP-216 (April). Hale, H. B . , McNee, R. C . , E l l i s , J. P., B o l l i n g e r , R. R., and Hartman, B. 0. (1974) Endocrin-metabolic i n d i c e s of a i r c r e w workoad: a n a n a l y s i s a c r o s s s t u d i e s . N u i l l y s u r S e i n e , France: AGARD. Proceedings of t h e AGARD Conference on S i m u l a t i o n and Study of High Workload O p e r a t i o n s , Report No. AGARD-CP-146 ( A p r i l ) AIO-1-ALO-6. H a r t , S. G. (1975) Time e s t i m a t i o n as a secondary t a s k t o measure w o r k l o a d - a t t e n t i o n s h a r i n g e f f e c t on o p e r a t o r performance. NASA Ames Research C e n t e r : Proceedings of t h e E l e v e n t h Conference on Manual C o n t r o l , Report No. NASA-TMS-62-464. Kennedy, R. S . , B i t t n e r , A. C . , J r . , Carter, R. C . , Krause, M., Harbeson, M. M . , McCafferty, D. B., Pepper, R. L., and Wiker, S. F. (1981). Performance e v a l u a t i o n tests f o r environmental r e s e a r c h ( P e t e r ) : C o l l e c t e d p a p e r s . New O r l e a n s , La., Naval Biodynamics L a b o r a t o r y , Research Report No. NBDL-80R008, ( J u l y ) . Melton, C. E. (1979). Workload and stress in a i r t r a f f i c c o n t r o l l e r s . I n B. 0. Hartman and R. E. McKenzie (Eds.) Survey of Methods t o Assess Workload. N u i l l y s u r S e i n e , France: AGARD Report N o . 246 (August) 137-144. P a r k s , D. L. and S p r i n g e r , W. E. (1975) Human f a c t o r s e n g i n e e r i n g a n a l y t i c p r o c e s s d e f i n i t i o n and c r i t e r i o n development f o r CAFES. S e a t t l e , WA: Boeing Aerospace Co., Report No. D180-18750-1 (June). R e i d , G. B . , S h i n g l e d e c k e r , C. A., and Eggemeier, F. T. (1981) A p p l i c a t i o n of c o n j o i n t measurement t o workload scale development. S a n t a Monica, CA: Proceedings Human F a c t o r s S o c i e t y Twenty F i f t h Annual Meeting, 522-526. S i e g e l , A. I. and Wolf, J . J. (1969) Man-machine P s y c h o s o c i a l and performance i n t e r a c t i o n . Sons.
s i m u l a t i o n models: New York, John Wiley and
S t r a v e l a n d , L., H a r t , S . G . , Yeh, Y. Y. (1985) Memory and s u b j e c t i v e workload assessment. M o f f e t t F i e l d , CA: NASA Ames Research C e n t e r . Proceedings of t h e 2 1 g A n n u a l Conference on Manual C o n t r o l . (In press.) Wiener, E. L., C u r r y , R. E., and F a u s t i n a , M. L. (1984). V i g i l a n c e and t a s k l o a d : I n s e a r c h of t h e i n v e r t e d U. Human F a c t o r s 2 ( 2 ) , 2 15-2 22.
Remaining Issues in Mental Workload Estimation
37-1
Wierwille, W. W. (1981) I n s t a n t a n e o u s mental workload: concept and p o t e n t i a l methods f o r measurement. New York, NY: I E E E . Proceedings of t h e I n t e r n a t i o n a l Conference on C y b e r n e t i c s and S o c i e t y ( O c t o b e r ) , 604-608. W i e r w i l l e , W. W . , C a s a l i , J . G . , Connor, S. A . , and Rahimi, M. (1985) E v a l u a t i o n of t h e s e n s i t i v i t y and i n t r u s i o n of mental workload e s t i m a t i o n t e c h n i q u e s . In W. B. Rouse (Ed.) Advances in Man-Machine Systems, Volume 11, Greenwich, Conn.: JAI Press. ( I n p r e s s . )
HUMAN MENTAL WORKMAD P.A. Hancock and N. Meshkati (Editors) 0 Elsevier Science Publishers B.V.(Nonh-Holland), 1988
3 29
A BIBLIOGRAPHIC LISTING OF MENTAL WORKLOAD RESEARCH
P.A. Hancock, T. Mihaly, M. Rahimi and N. Meshkati
Department of Safety Science and Human Factors Department Institute of Safety and Systems Management University of Southern California
Los Angeles, CA 90089
1. INTRODUCTION This chapter presents a bibliographic listing of papers concerned with the investigation of human mental workload. The purpose of this listing is to provide individuals with a reference resource of the more recent research contributions. The primary orientation of the work concerns the listing of research articles that have appeared in the last decade. Although we have drawn from numerous sources, the responsibility for the present selection is ours. We have restricted citations largely to those which focus directly on the workload issue, rather than those studies which have used mental workload assessment as an adjunct to investigation of an allied area. In the sections which follow, a brief precis is given of the contents of the bibliography, the major sources of mental workload literature, and publication growth. We are aware that any trends noted depend directly on the initial process of selection. However, it is our belief that the current listing provides a representative sample of work accomplished.
2. RESOURCES USED TO CREATE T H E BIBLIOGRAPHIC LISTING In creating this index we relied heavily on three sources. Initially, we extracted a number of citations from the report by Wierwille and Williges (1980), which provides a detailed breakdown of the mental workload area into several classifications. T h e work of Clement (1978) and his colleagues was the source for many of the citations concerning primary task performance, with supplementary information provided by the report of Jex and Clement (1977). For the majority of the remaining references we collected citations from a compilation of mental workload literature produced by Sandra Hart of NASA Ames Research Center. In addition to these selections, we have included all contributions to the edited volume by Moray (1979). as well as the references from the other chapters of the present text.
P.A. Hancock et al.
330
3. DESCRIPTION O F PUBLICATION SOURCES Two tabulations were performed to describe the reference listing. The first tabulation indicates the number of papers per publication type. Each citation was classified according to its type, i.e.. journal article. chapter in book, government report, dissertation, etc. Table 1 presents the results ofthis tabulation.
Table 1. Different Publication Types Represented in the Listing PUBLICATION TYPE
NUMBER OF CITATIONS
Journal Article Paper in Proceedings U.S. Government Report Chapter in Book Report in Industry Bulletin Dissertation
2 I4 199 66 58 16 3 1
Table 2. Major Sources of Mental Workload Literature SOURCE Human Factors Ergonomics Aviation, Space and Environmental Medicine IEEE Systems. Man, and Cybernetics Journal of Experimental Psychology Acta Psychologica
NUMBER OF CITATIONS 45
43 31 13
6 5 I
This analysis of the references by publication type revealed that practitioners have disseminated their research findings, for the most part, in the form of journal articles (214) and papers in proceedings (199). The three main sources of proceedings papers were: Proceedines of the Human Factors Society Annual Meeting (67), the Proceedings of the Conference on Manual Control (38), and the Proceedings of the Svmposium on Aviation Psychology (32). As shown in Table I , chapters on mental workload accounted for over 50 citations, with a slighter larger percentage of citations published in the form of government reports. Of the 66 reports included in the index, most summarized research projects supported by NASA or the Air Force. T o reveal which current journals have published the majority of mental workload literature, a separate frequency count was done. Table 2 presents the leading six sources of publications. In the present listing, Human Factors published the greatest number of articles relating to workload, closely followed by Uonornics. As Table 2 indicates, substantially fewer articles were published by main stream psychological journals.
A Bibliographic Listing of MWL Research
33 1
4. PUBLICATION GROWTH
In terms of the volume of mental workload-related papers published, the data show that the number of publications per three-year period since 196 I has increased dramatically. The number of publications has grown fromjust a few papers during the 1960s to over a hundred papers in recent years. The vertical bars in Figure 1, which reach above 100 for periods 1979-1981, 1982- 1984, and 1985- 1987, clearly demonstrate the steady interest among practitioners in this area of research.
140 120
too 80 60 40
20 0
Three- year Interval Figure I . Number of mental workload papers published per three-year interval. Period from 1961-1987.
5. CONTENT AREA DECOMPOSITION OF BIBLIOGRAPHIC LISTING The papers found in this reference resource deal with topics ranging from human error, behavioral responses to mental workload, and individual differences to system design, quantification of mental workload, and air traffic control. The number of papers related
P.A. Hancock et al.
332
to the specific content areas listed in Table 3 were tallied and the results are presented in Figure 2. I f a paper discussed more than one content area, then it was counted in each
of the categories; for example, several reports addressed both physiological and subjective measures of mental workload. Of the over 500 papers included in this bibliographic listing, three topic areas were most frequently addressed: mental workload assessment techniques ( 120). physiological measures of mental workload (93), and performance (84). Subjective measures of mental workload (72). aviation-related research (65),dual taskhime-sharing (57), industrial and system applications (52), and mental workload modelling and simulation (42) were also well-represented in the literature. Various performance parameters were examined in the literature, including controlling, monitoring, attentional capacity, time estimation, and memory.
140 120 100
80 60 40
20 0 A B C
D E F G H
I J
K L Y N O P O R S T U V W X
Content Area Figure 2. Number of publications per content area. Individual content areas designated in Table 3.
6. SUMMARY
I t is not possible to provide a comprehensive listing of an evolving area such as mental workload, and there are probably many English and foreign language listings that have eluded our survey. While all mistakes in citation should be construed as our responsibility, it is our hope that a number of workers will find this a helpful resource in their own research endeavors. We encourage experts in other human factors-related fields to create similar listings for their specific areas of research. We also welcome readers to suggest improvements, corrections, and additions to this listing for consideration in any future revision.
333
A Bibliographic Listing of M W L Research
Table 3. Number of Publications per Content Area NUMBER OF PUBLICATIONS
CONTENT AREA A B
C D E
F 0
J
H I J K L M N 0 P
Q
R S T U V W
X
Mental workload (MWL) assessment techniques Physiological MWL measures Performance parameters Subjective mental workload measures Aviation: cockpit design, flight deck operations, etc. Time-sharing, dual task situations Industrial and system applications Mental workload modelling and simulation Controlling and monitoring System design and performance Attention, resource allocation Individual differences MWL literature reviews Adaptation to MWL, behavioral response Workload scaling, developing MWL indices Mental fatigue Communications Training Air traffic control Human error Psychological stress Time estimation Memory Perception
120 93 84 12 65 51 52 42 35 33 23 20 19 18 16 13 11 10
10 8 8 8 6 6
ACKNOWLEDGMENTS
In the creation of the present chapter, we are pleased to acknowledge the role of the following paper: Kulkarni, J., and Karwowski, W. (1986). Research Guide of Application of Fuzzy Set Theory to Human Factors. In: Apdications of Fuzzy Set Theorv in Human Factors, W. Karwowski and A. Mital, (Eds.). Amsterdam: Elsevier Science Publishers. (pp. 395-400). This paper served as a stimulus and framework with respect to o u r bibliographic listing. T h e authors wish to thank Nancy Knabe, Carolyn Bjerke, Cuong Chu, and George Rodenburg for their valuable assistance in gathering references and preparing the graphs and tables. Production of this work was supported in part by Grant NCC 2-379 from NASA Ames Research Center to the first author; Sandra Hart and Michael Vidulich were the technical monitors for the grant.
This Page Intentionally Left Blank
A Bibliographic Listing of M W L Research
335
REFERENCE LISTING
Aasman, J . , Mulder, G.. & Mulder, L. (1987). Operator effort and the measurement of heart rate variability. Human Factors, 29, 161-170. Aasman, J , Wijeri, A , , Mulder, G . , & Mulder, L. (1987). Measuring mental fatigue in normal daily working routines. In: P.A. Hancock and N. Meshkati, (Eds.). Human mental workload. .4insterdam: North-Holland. Acton. W. H., Crabtree, M., Sirnons, J. C., Corner, F. E., & Eckel. J . S. (1983). Quantification o t crew workload imposed by communications-related tasks in commercial transport aircraft. Proceedinm of the Human Factors Society, 27, 239-243. Alkov. R. A , , Borowsky, M. S., & Gaynor, J . A. (1983). Pilot error as a symptom of inadequate stress roping. Proceedinas of the SvmDosium on Aviation Psychology, 2, 401-406. Allen R. W., Stein, A. C., & Jex, H. R. (1981). Detecting human operator impairment with a psychomotor task. Proceedincs of the Annual Conference on Manual Control, l7, 6 1 1-626. Antin. J. F.. & Wierwille, W. W. (1984). Instantaneous measures of mental workload: An initial investigation. Proceedings - ot' the Human Factors Society, 28, 6- 10. Arbak, C. J , , Shew, R. L., & Simons, J. C. (1984). The use of reflective SWAT for workload assessment. Proceedings of the Human Factors Society, 28, 959-962. Aretr, A. J. (1983). A comparison of manual and vocal response modes for the control of aircraft systems. Proceedings of the Human Factors Society, 27, 97-101. Armstrong, G. C. ( 1985). Computer-aided analysis of in-flight physiological measurement. Behavior Research Methods. Instruments and Comouters, l7, 183- 185. Bainbridge, L. (1974). Problems in the assessment of mental load. Travail Hurnain, 37, 279-302. Bainbridge, L. ( 197s). Forgotten alternatives in skill and work-load. Ereonomics, 2_L, 169- 185. Barnes, R. M. (1978). Physiological arousal and workload during autoland procedures. Proceedings - ot Aerospace Medical Association Annual Meeting, 3. 37-38.
336
P.A. Hancock et al.
Baron, S.. & Levison. W. H . (1975). An optimal control methodology for analyzing the effects of display parameters of performance and workload in manual flight control. IEEE Transactions on Systems. Man and Cybernetics. 7, 457-472. Bateman. R. P., Acton, W. H., & Crabtree. M. S. (1984). Workload and performance: Orthogonal measures. Proceedinm of the Human Factors SocieJ, 28, 678-554. Battiste, V . , & Hart, S. G. (1985). Predicted versus experienced workload and performance on a supervisory control task. Proceedings of the Symposium on Aviation P s v c h o l o ~3, , 255-262. Beatty, J. ( 1979). Pupillometric methods of workload evaluation: Present status and future possibilities. In: R. Auffret (Ed.), Survey of methods to assess workload. (AGARD Proceedings 246) London: Harford House, (pp. 103-1 10). Berg, S. L., & Sheridan, T. B. (1985). Effect of Time Span and Task Load on Pilot Mental Workload. (Final Report for NASA Grant NAG 2-227) Washington, D.C.: National Aeronautics and Space Administration. Berg, S. L., & Sheridan, T. B. (1985). Measuring workload differences between shortterm memory and long-term memory scenarios in a simulated flight environment. Proceedings of the Annual Conference on Manual Control, 20, 397-416. Berg, S. L., & Sheridan, T. S . (1985). T h e impact of physical and mental tasks on pilot mental workload. Proceedings of the Annual Conference on Manual Control, 21. Berkhout, J . Sources of Inter-individual Differences in the Perceived Difficulty of F- I8 Pilot Subtasks Broken Down According - to Mechanisms of Learning- and Execution (Final Report for for NASA Contract NO0 123-C 159) Vermillion, SD: University of South Dakota. Biferno, M . A. (1985). Mental Workload Measurement: Event-Related Potentials and Ratinas of Workload and Fatigue. (NASA CR- 177354) Washington, D.C.: National Aeronautics and Space Administration. Bird, K. L. (1981). Subjective rating scales as a workload assessment techniques. Proceedings of the Annual Conference on Manual Control, 11. Bisseret. A . (1971). Analysis of mental processes involved in air traffic control. Ergonomics, l4, 565-570. Bitternam. M. E.. & Soloway. E. (1946). The relation between frequency of blinking and effort expended in mental work. Journal of Experimental_Psycholoey, S , 134- 136.
A Bibliographic Listing of MWL Research
337
Bloem. K . A . , & Damos, D. L. (1985). Individual differences in secondary task performance and subjective estimation of workload. Psychological Reports, 5 6 , 3 I 1-322. Blyx, A. S . , Stromme, S. B.. & Ursin, H . (1974). Additional heart rate as an indicator ot' psychological activation. Aerospace Medicine, 45, 12 19- 1222. Borg, G. (1978). Subjective aspects of physical and mental load. 2 15-220.
Ergonomics,
?A,
Bortolussi, M., Hart, S. G., & Shively, R. J . (1987). Measuring moment-to-moment pilot workload using synchronous presentations of secondary tasks in a motion-base simulator. Proceedings of the Symposium on Aviation Psychology, 3, 651-657. Bortolussi, M. R., Kantowitz. B. H., & Hart, S. G. (1986). Measuring pilot workload in a motion base trainer. Applied Ergonomics, l7,278-283. Boy, G. A,, & Tessier, C. (1982). Message: An expert system for aircraft crew workload assessment. Proceedings of the Symposium on Aviation Psycholoa, 2, 207-222. Boyce, P.R. (1974). Sinus arrhythmia as a measure of mental load. Ergonomics, l 7 , 177-183. Boyd, S. P. (1983). Assessing the validity of SWAT as a workload measurement instrument. Proceedings of the Human Factors Society, 27, 124-128. Bradshaw, J. L. (1968). Load and pupillary change in continuous processing tasks, British lournal of Psvchology, 59, 265-27 1. Braune, R., & Wickens. C. D. (1983). Individual differences and age-related changes in the time-sharing ability of aviators. Proceedings of the Human Factors Society, 27, 117-120. Braune, R. J., & Wickens, C. D. (1983). The functional age profile: An objective decision criterion for the assessment of pilot performance capacities and capabilities. Proceedinvs of the Svmposium on Aviation Psychology, 2, 437-444. Braune R., & Wickens, C. D. (1985). Time-sharing revised: Test of a componential model for the assessment of individual differences. Proceedings of the Symposium on Aviation Psychology, 3, 27 1-278. Brenner, M., Branscomb. H . H., & Schwaru. G. E. (1979). Psychological stress evaluator: Two tests of a vocal measure. Psvchophysiolow, l6,35 1-357.
338
P.A. Hancock et al.
Brenner, M.. Shipp, T., Doherty, E. T., & Morrissey, P. (1985). Voice measures of psychological stress: Laboratory and field data. In: I . R. Titze, and R. C. Scherer (Eds.), Vocal fold physioloay. biomechanics, acoustics, and phonatory control, (pp. 239-248). Denver, CO: The Denver Center for the Performing Arts. Brichin, M.. & Hampejsova, 0. (1970). Result of two kinds of mental load measurements. Ceskoslovenska Psycholoeie, 14,19-3 I . (In Czechslovakian). Broadbent. D. E. (1982). Task combination and selective intake of information. Acts Psvcholoaica, 3,253-290. Broadbent, D. E.. Cooper, P. F., Fitzgerald, P. & Parkes, K. R. (1982). T h e cognitive failures questionnaire ( C F Q and its correlates. British lournal of Clinical Psvchology, 21, 1-16. Brown, E. L., Stone, C . , & Pearce. W. E. (1975). Improving cockpits through flight crew - of the Advanced Aircrew Display Svmuosium, 2. workload measurement. Proceedings Brown, I. D. (1972). Dual task methodology of assessing work-load. Ereonomics, 21, 221-224. Brown, 1. D. (1962). Measuring the spare mental capacity of car drivers by a subsidiary auditory task. Ergonomics, 5 , 247-250. Brown, I. D. (1965). A comparison of two subsidiary tasks used to measure fatigue in car drivers. Ergonomics, 8. 467-473. - of Burke, M. (1980). Workload reduction: Control theoretic approaches. Proceedings the Meeting- of the Aviation, SDace and Environmental Medicine Society.
Burke, M. W., Gilson, R. D., & Jagacinski, R. J. (1980). Multi-modal processing for visual workload relief. Erzonomics, 23, 96 1-975. Burton, R. R., Storm, W. F., Johnson, L. W., & Leverett, S. D. (1977). Stress responses of pilots flying high-performance aircraft during aerial combat maneuvers. Aviation, SDace and Environmental Medicine, 48,30 1-307. Butterbaugh, L., & Warner, D. (1981). Pilot Workload. Problems. Wright-Patterson Air Force Base, Ohio, AFWAL.
A Survey of Operational
Caplan, R. D., & Jones, K. W. (1975). Effects of work-load, role ambiguity and type of personality on anxiety. depression and heart rate. Journal of AoDlied Psvchology, 60, 7 13-7 19.
A Bibliographic Listing of M W L Research
339
Casali, J . G . , 8c Wierwille, W. W. (1982). A sensitivityhtrusion comparison of mental workload estimation techniques using a flight task emphasizing perceptual piloting activity. Proceedings of the IEEE International Conference on Cybernetics and Society, 598-602. Casali, J . C . , 8c Wierwille, W. W. (1983). Communications-imposed pilot workload: A comparison of sixteen estimation techniques. Proceedings of the Symposium o n Aviation Psvchology, 2, 223-234. Casali, J. G . , 8c Wierwille, W. W. (1983). A comparison of rating scale, secondary-task, physiological, and primary task workload estimation techniques in a simulated flight task emphasizing communications load. Human Factors, 25, 623-642. Casali, J. G . , & Wierwille, W. W. (1986). On the measurement of pilot perceptual workload: A comparison of assessment techniques addressing sensitivity and intrusion issues. Ergonomics, 21, 1033-1050. Casper, P. A,, & Kantowitz, B. H. (1987). Estimating the cost of mental loading in a bimodal divided-attention task: Combining reaction time, heart-rate variability, and signal-detection theory. Proceedings of the 1987 Mental State Estimation WorkshoD. Hampton, VA: NASA-Langley Research Center. Casper, P. A,, Shively, R. J., & Hart, S. C.(1986). A microprocessor-based system for selecting workload assessment measures. Proceedings of the I EEE International Conference on Systems, Man and Cybernetics, 1054-1059. Chignell, M., 8c Hancock. P. A. (1985). A knowledge-based adaptive mechanism for task load leveling. Proceedines of the Annual Conference on Manual Control, 21, 9.1-9. I I . Chignell, M.H., & Hancock, P.A. (1986). Horn clause representations in human machine systems with adaptive control. In: W. Karwowski, (Ed.). Trends in EraonomicslHuman Factors 111. Amsterdam: North-Holland, (pp. 249-256). Chignell, M.H.. & Hancock, P.A. (1986). Comparison of mental workload and available capacity in complex person-machine systems. In: W. Karwowski and A. Mital (Eds.). research. Amsterdam: North-Holland, (pp. Fuzzy methods and techniques in ereonomics 271-288). Childress, M. E. (1984). Subjective scales for workload evaluation: Critical aspects and new directions for research. Proceedings of the Annual Conference on Manual Control. 19, 1-2. Childress, M. E., Hart, S. G., & Bortolussi. M. R. (1982). The reliability and validity of
340
P.A. Hancock et al.
flight task workload ratings. Proceedings of the Human Factors Society, S . 319-323. Chiles, W. D., & Alluisi E. A. (1979). On the specification of operator on occupational workload with performance-measurement methods. Human Factors, 21, 5 15-528. Chiles, W. D., Jennings, A. E., & Alluisi, E. A. (1979). Measuring and scaling of workload in complex performance. Aviation. Space and Environmental Medicine, 50, 376-38 I , Chiles, W. D. (1977). Obiective Methods for Developing- Indices of Pilot Workload. (FAA-AM-77- 15) Washington, D. C.: Federal Aviation Administration. Chubb. G. P. ( 1983). Emotive disruptions: Performance implications. the Symposium on Aviation Psychology, 2, 413-420.
Proceedings - of
Clement, W. F. (1977). Annotated Bibliography of Procedures which Assess Primary Task Performance in Some Manner as the Basic Element of a Workload Measurement Procedure. (Technical Report No. I104-2). Mountain View, CA: Systems Technology, Inc. Cohen, A. D. (1982). The Hughes design analysis system and instructor workload in operational trainers. Proceedings of the Human Factors Society, 28, 364-368. Cooper, G. E. ( 1957). Understanding and interpreting pilot opinion. - Review, IS, 47-52. Engineerinv
Aeronautical
Cooper, G. E., & Harper, R. P. (1969). The Use of Pilot Rating in the Evaluation of Aircraft Handling Oualities (NASA TN-D-5153) Washington, D.C.: National Aeronautics and Space Administration. Cooper, G . E., White, M. D., & Lauber. J. K. (1980). (Eds.), Resource Management of the Flight Deck: Proceedings of a NASNIndustry WorkshoD. (NASA CP 2120) Washington, D.C.: National Aeronautics and Space Administration. Conrad, R. (1953). Some effects on performance of changes in perceptual load. Journal of Experimental Psychology, 44,313-322. Corlett, E. N. (1973). Cardiac arrhythmia as a field technique. Some comment on a recent symposium. Ergonomics, l6,3-4. Cote, D. O., Krueger, G. P., & Simmons, R. R. (1983). Helicopter copilot workload during nap-of-the-earth flight. Proceedings of the SymDosium on Aviation Psvcholou, 2, 289-298.
A Bibliographic Listing of M W L Research
34 1
Courtright. J . F., & Kuperrnan, G. (1984). Use of SWAT in USAF System T & E. Proceedings of the Human Factors Society, 28, 700-704. Crabtree, M. S., Bateman, R. P., & Acton, W. H. (1984). Benefits of using objective and subjective workload nieasures. Proceedings of the Human Factors Society, 28, 950-953. Crosby, J . V . , & Parkinson, S. R. (1979). A dual task investigation of pilots’ skill level. Ergonomics, 22, 1301-1313. Curry, R. E. (1979). Mental load in monitoring tasks. I n : N . Moray (Ed.), Mental workload: Its theory and measurement. N e w York: Plenum Press. (pp. 117-124). Curry, R. E. (1985). T h e Introduction of New Cockpit Technology: A Human Factors T M 86659) Washington, D.C.: National Aeronautics and Space Administration.
w .(NASA
Curreri, L. V. (1985). Formula for a better understanding of pilot performance. Proceedings of the SymDosium on Aviation Psychology, 5 . 4 5 1-458. Darnos, D. L. (1978). Residual attention as a predictor of pilot performance. Human Factors, 20, 435-440. Damos, D. ( 1984). Classification schemes For individual differences in multiple-task performance and subjective estimates. Proceedings of the Annual Conference o n Manual Control, 20, 97-104. Damos, D. L. (1984). Examinine the Relation between Subiective Estimates of Workload and Individual Differences in Performance. (NASA CR-234 I ) Washington, D.C.: National Aeronautics and Space Administration. Damos. D. L. (1984). Individual differences in multiple-task performance and subjective estimates of workload. PerceDtual and Motor Skills, 59, 567-580. Damos, D. L. (1985). T h e relation between the type A behavior pattern, pacing, and subjective workload in single and dual-task conditions. Human Factors, 27, 675-68 I . Darnos, D. L. (1987). Individual differences in subjective estimates of workload. In: P.A. Hancock & N . Meshkati (Eds.), Human mental workload . Amsterdam: NorthHolland. Damos, D.. & Bloern. K . A. (1985). Type A behavior pattern, multiple-task performance, and subjective estimation of mental workload. Bulletin of the Psvchonomic Society, 23, 53-56.
342
P.A. Hancock et al.
Damos, D. L., & Wickens, C. D. (1980). The identification and transfer of timesharing skills. &a Psycholoeica, 43, 15-39. Danev, S . . Radneva, R., & Zlatarove, I. (1975). Change in heart rate variability due to iritormational physical and emotional loads in laboratory and tield conditions, M v a Nervosa . Superior, 11, 187- 188. Danev, S . C., & Vartna, G. F. (1970). Information load and time stress: Some psychophysiological consequences. I NO-Nieuws, 25, 389-395 Davis, D. R. (1964). Psychological mechanisms in pilot error. In: A. Cassie, S. D. Fokkema, & J . B. Parry (Eds.), Aviation PsycholoPy: Studies on accident liabili% proficiency criteria. and Dersonnel selection. The Hague: Mouton & Go. (pp. 11-23), Defayolle, M., Dinand, J. P., & Gentil, M. T . (1973). Average evoked potential in relation to attitude, mental load, and intelligence. In: W. T. Singleton, J . G . Cox, & D. Whittield (Eds.), Measurements of Man at Work. London: Taylor and Francis, (pp. 81-91). Derrick, W. L. (1981). The relationship between processing resource and subjective dimensions of operator workload. Proceedings of the Human Factors Society, 25, 53 2 - 5 3 6. Derrick, W. L., & McCloy, T. M. (1982). Tracking bandwidth manipulations and processing resource cost. Proceedings of the Human Factors Society, 26. 26-30. Derrick, W. L., & Wickens, C. D. (1984). A Multide Processing Resource ExDlanation of the Subjective Dimensions of ODerator Workload. (EPL-84-2/ONR-84- I ) UrbanaChampaign: University of Illinois, Engineering Psychology Research Laboratory. Detro, S. D. (1985). Subjective assessment of pilot workload in the advanced fighter cockpit. Proceedings of the Symposium on Aviation Psychology, 2,247-254. Donchin, E. (1984). Th e use of ERP's to monitor non-conscious mentation. Proceedings of the Annual Conference on Manual Control, 20, 1-19. Donchin, E., Hart, S. C., & Hartzell, E. J. (1987). Executive Summary: Workshop on Workload and Training, an Examination of their Interactions. (NASA CR 89459) Washington, D.C.: National Aeronautics and Space Administration. Dully, F. E. (1983). The life style keys to flight deck performance of the naval aviator: Another window (SAE Technical Paper series No. 83 1529). The Second AerosDace Behavioral Engineering Technolocry Conference.
A Bibliographic Listing of MWL Research
343
Eckel, J . S., & Crabtree, M . S. (1983). Analytic and subjective assessments of operator workload imposed by communications tasks in transport aircraft. Proceedings of the SymDosium oiAviation Psychology, 2,237-242. Eggemeier, F 7 (1987) Properties of workload assessment techniques I n PA Hancock & N Mrshkati (Eds ), Human mental workload Amsterdam North-Holland Eggemeier, F. T. (198 I ) . Development of a secondary task workload assessment battery. Proceedings ofhe-IEEE Conference on Systems, Man-and Cybernetics, 4 10-4 14. Eggemeier, F. T. (I98 I ) . Current issues in subjective assessment of workload. 25, 5 13-5 17. Proceedines of Human Factors So-, Eggemeier, F. T.. & Stadler, M . A. (1984). Subjective workload assessment in a spatial memory task. Proceedings of t h e Human Factors Society. 28, 680-684. Eggemeier, F. T. (1984). Workload metrics for system evaluation. Proceedings of NATO D e f e n L Research Group Panel VlII Workshop: Applications of System Ergonomics to Weapon System Ergonomics to Weapon System Development. Shrivenham, England. C.5-C.20. Eggemeier, F. T., Crabtree, M . S . , & LaPointe, P. A. (1983). The effect of delayed report on subjective ratings of mental workload. Proceedines of Human Factors Society, 27, 139- 143. Eggemeier. F. T., Crabtree, M. S.. Zingg. J. J., Reid, G. B., & Shingledecker, C. A. (1982). Subjective workload assessment in a memory update task. Proceedines of Human Factors Society, 26, 643-647. Eggemeier, F. T., Shingledecker. C. A,, & Crabtree, M. S. (1985). Workload measurement in system design and evaluation. Proceedings of the Human Factors 29, 2 15-684.
w,
Eggleston, R. G.. & Kulwicki, P. V. (1984). A technology forecasting and assessment method for evaluating system utility and operator workload. Proceedings - of Human Factors Society, 2 8 , 31-35. Eggleston, R G I & Quinn, T J (1984) A preliminary evaluation of a projective workload assessinent procedure Proceedings ot Human Factors Society. 3,695-699 Elkin, P. A.. Klochkov. A. M . , 8e Zhelenniakov. V . D. (1971). Application of EEG spectral characteristics and derivatives in aviation phvsiology practices. Zhurnal Vvstheo Nervnos Deiatel'ugs2. 21, 560-565 ( I n Russian). ~
344
P.A. Hancock et al.
Ellis, G . A , , & Roscoe, A. H. (1982). The airline pilots view of flight deck workload: a preliminary study using- a questionnaire. . (Technical Memorandum FS (B) 465). London, England: Royal Aircraft Establishment, Controller HMSO London.
Ellis, S. R. (1982). Contingency in visual scanning of cockpit traffic displays. Proceedings of the Human Factors Society, 2s. 1005- 1009. Ellis, S. R . , & Stark, L. (1981). Pilot scanning patterns while viewing cockpit displays of traffic information. Proceedings of the Annual Conference on Manual Control, l7, 517-524. Enstrom, K . D., & Rouse, W. B. (1977). Real-time determination of how a human has allocated his attention between control and monitoring tasks. I EEE Transactions on Systems. Man and Cybernetics, 7, 153- 16 1 . Ephrath. A. R., & Curry, R. E. (1977). Detection by pilots of system failures during instrument landings. I EEE Transactions on Systems, Man and Cybernetics, 84 1-848.
z,
Ettma, J . H. (1969). Blood pressure changes during mental load experiments in man. Psychotherapy and Psychosomatics. If,191- 195. Ettma, J . H., & Zielhuis, R. L. (1971). Physiological parameters of mental load. Ergonomics, l4, 137-144. Fadden, D. M. (1982). Boeing model 767 Flight deck workload assessment methodology. Proceedings of the SAE Guidance and Control System Meeting. Williamsburg, VA. Farber, E., & Gallagher, V. (1972). Attentional demands as a measure of the influence of visibility conditions on driving task difficulty. Hirrhway Research Record, 414, 1-5. Firth, P. A. (1973). Psychological factors influencing the relationship between cardiac arrhythmia and mental loads. Ergonomics, 16,5-16. Fisk, A. D., Derrick, W. D., & Schneider, W. (1983). The assessment of workload: Dual task methodology. Proceedings of the Human Factors Society, 27, 229-233. Fournier, B. A , , & Stager. P. (1976). Concurrent validation of a dual-task selection test. Journal of Applied Psychology!,& 589-595. I, Foushee. H. C . (1984). Dyads and triads at 35,000 feet: Factors affecting group processes and aircrew performance. American Psychologist, 39, 885-893. Foushee. H C. ( 1982) The role of communications, socio-psychological. and personality
A Bibliographic Listing of MWL Research
factors in t h e maintenance of crew coordination. Medicine. 53, 1062- 1066.
345
Aviation, Space and Environmental
Foushee, H. C., 8c Manos. K . L. (1981). Information transfer within the cockpit: Problems in intracockpit communications. In: C. E. Billings & E. S. Cheaney (Eds.). Information Transfer Problems in the Aviation Svstem. (NASA T P 1875) Moffett Field, CA: NASA-Ames Research Center. Frankenhaeuser, M., & Johansson, G. (1976). Task demand as reflected in catecholamine excretion and heart rate. Journal of Human Stress, 2, 15-23. Frolov, N . I. (1976). Evaluation of the working capacity of a pilot during flight duty. Voenno-Meditsinskio Zhurnal, 54-68. (In Russian). Furedy, J . J. (1987). Beyond heart rate in the cardiac psychophysiological assessment of mental effort: The T-wave amplitude component of the electrocardiogram. Human Factors, 29, 183-194. Gabriel, R. F. (1977). Some potential errors in human information processing during approach and landing. In: Kitay (Ed.), Air Line Pilots Association Symposium on Human Factors Emphasizing Human Performance, Workload and Communications. Washington, D.C. Gardner, R. M.. Beltramo J. S.. & Krinsky R. (1975). Pupillary changes during encoding storage and retrievals of information. Perceptual and Motor Skills, 41, 95 1-955. Gartner, W. B., & Murphy, M . R. (1976). Pilot workload and fatigue: A critical survey cf conceDts and assessment techniaues. (NASA T N D-8365)Washington, D.C.: National Aeronautical and Space Administration Ames Research Center. Garvey, W. S., & Taylor F. V. (1959). Interaction among operator variables system dynamics and task-induced stress. Journal of Applied Psychology, 43.79-84. Gaume, J . G., & White, M. D. (1975). Mental Workload Assessment, I. Laboratory investigation of decision-making and short-term memory in a multiple-task situation. (DAC Report MDC 5662101) Long Beach, CA: Douglas Aircraft Company. Gaume, J . G., & White, M. D. (1975). Mental workload assessment, 11. Phvsioloaical measures of mental workload: ReDort of three Dreliminary laboratory tests. (DAC Report MDC J7023/01) Long Beach, CA: Douglas Aircraft Company. Gaume, J . G . . & White, M. D. (1975). Mental Workload Assessment, HI. Laboratory evaluation of onesubjective and two physioloeical measures of mental workload (DAC
346
P.A.Hancock et al.
Report MDC J7023/01) Long Beach, CA: Douglas Aircraft Company Gerathewohl. S. J . (1976). Optimization of crew effectiveness in future cockpit design: Biomedical implications. Aviation, Space and Environmental Medicine, 47,1 182- I 187. Gerathewohl. S. J . (1979). N e w approaches and results in the assessment of pilot and aircrew workload. Proceedings of the XlIIth Conference of the Western European Association for Aviation Psychology. Kollekolle, Denmark, 24-28. Gerathewohl, S. J.. Brown, E. L.. Burke, J . E., Kimball, K. A , , Lowe. W. F., & Stackhouse, S. P. (1978). lnflight measurement of pilot workload: A panel discussion. Aviation, Space and Environmental Medicine. 49, 8 10-822. Gevins, A. S. (1984). Use of neuroelectric measures to assess cognitive workload. Proceedings of the Human Factors Society, 26, 36. Giffin. W. C.. & Rockwell. T. H. (1983). Computer-aided testing of pilot response to critical in-flight events. Proceedings of the SvmDosium o n Aviation Psychology, 2. 331-342. Giffin, W.C.. Rockwell, T. H., & Smith, P. E. (1985). A review of critical in-flight events research methodology. Proceedings of the Symposium on Aviation Psychology, 3, 32 1-328. Gilliland, K.. Shingledecker, C.. Wilson, G., & Peio, K. (1984). Effect of workload on the auditory evoked brainstem response. Proceedings of the Human Factors Society, 28, 37-39. Glass, A. (1966). Comparison of the effect of hard and easy mental arithmetic upon blocking of the occipital alpha rhythm. Quarterly lournal of Experimental Psychology, 18, 142-152. Goerres, H. P. (1977). Subjective stress assessment: A new simple method to determine pilot workload. Aviation, Space and Environmental Medicine, 48, 588-564. Goguen, J. A., Linde, C., & Murphy, M. (1984). Crew communications as a factor in aviation accidents. (NASA CP-234 I ) . Proceedings of the Annual Conference on Manual Control, 20, 217-248. Goldstein, I. L., Dorfman, P. W.. & Price, A. (1978). Speed and load stress as a determinate of performance in a time sharing task. Human Factors, 20, 603-609. Comer, F. E.. & O’Donnell, R. D. (1976). The application of evoked potential data to the
A Bibliographic Listing of MWL Research
evaluation of visual target recognition performance. Medical Association, 184- 185.
341
Proceedings of the Aerospace
Gopher, D. (1981) Performance tradeoffs under time-sharing conditions: The ability of human operators to release resources by lowering their standards of performance. Proceedings of the IEEE International Conference o n Cybernetics and Society. 609-61 3. Gopher, D. (1982) A selective attention test as a predictor of success in flight training. Human Factors, 2_4, 173- 178. Gopher. D.. & North, R. A. (1977). Manipulating the conditions of training in timesharing performance. Human Factors, 19, 583-593. Gopher, D. (1984). Workload Book: Assessment of operator workload in engineering systems. (NASA CR- 166596) Washington, D.C.: National Aeronautics and Space Administration.
Gopher, D. (1984). Measurement of workload: Physics, psychophysics, and metaphysics. Proceedings of the Annual Conference on Manual Control, 20, 55. Gopher, D., & Braune, R. (1984). On the psychophysics of' workload: Why bother with with subjective measures? Human Factors, 26, 5 19-532. Gopher, D., Brickner, M., & Navon, D. (1982). Different difficulty manipulations interact differently with task emphasis: Evidence for multiple resources. Journal of Experimental Psychology: Human Perception and Performance, 8, 146- 157. Gopher. D., Chillag, N., & Arzi, N. (1985). T h e influence of voluntarv effort, context, and anchor task on the subiective estimate of load. (HEIS-85-2). Haifa, Israel: Technion. Gopher, D., Chillag, N., & Arzi, N. (1985). T h e psychophysics of workload: A second look at the relationship between subjective measures and performance. Proceedings of the Human Factors Society, 29, 640-644. Gopher, D., & Donchin, E. (1986). Workload - An examination of the concept. In: K. Boff, L. Kaufman & J. P. Thomas (Eds.), Handbook of perception and human performance, New York: Wiley & Sons. Gopher, D., & Spitz. G. (1982). Attention control in complex information processing &. (HEIS-82-7). Haifa, Israel: Research Center for Work Safety and Human Engineering.
348
P.A. Hancock et al.
Govindaraj, T.. Poturalski, R. J.. Vikmanis, M . M., & Ward, S. L. (1981). A model for human attention allocation strategies in situations with competing criteria. Proceedings of the IEEE International Conference onGbernetics and Society, 475-478. Graeber, R. C., Foushee, H. C . , & Lauber, J. K . (1984). Dimensions of flight crew performance decrements: Methodological implications for field research. In: J . Cullen, J . Siegrist & H . M . Wegmann (Eds.). Breakdown in human adaptation to 'stress'. Towards a multidisciplinarv approach. Volume I . The Hague: Martinus NlJhoff Publishers, 584-605. Graeber, R. C., Foushee, H. C., Gander, P. H., & Noga, G . W. (1985). Circadian rhythmicity and fatigue in flight operations. Journal of Occupational and Environmental Health, 7 , 122-129. Green, R., & Flux, R. (1977). Auditory communication and workload. (AGARDCP-2 16), Cologne, Federal Republic of Germany, 18-22 April.
Gunning, D. (1978). Time estimation as a technique to measure workload. Proceedings of the Human Factors Society, 22, 41-45. Hacker, W., Plath, H. E., Richter, P.. & Zimmer. K. (1978). Internal representation of task structure and mental load of work: Approaches and methods of assessment. Eraonomics, 21, 187-194. Hacker, W. (1974). Determining the psychic workload. Present status and perspectives. Socialistiche Arbeitawissenschaft, l8, 17-28. (In German). Hale, C. R. (1982). Representing human cognition in complex man-machine 334-338. environments. Proceedings of the Human Factors Society,
z,
Hale, H. B., Anderson, C. A,, Williams, E. W., & Tanner, E. (1968). Endocrinemetabolic effects of unusually long or frequent flying missions in C- 130 or C- 135 aircraft. Aerospace Medicine, 39, 561-570. Hale, H. B., Hartman, B. O., Harris, D. A., Williams, E. W., Miranda, R. E., & Hosenfeld, J. M. (1972). Time zone entrainment and flight stressors as interactants. AerosDace Medicine, 43, 1089- 1094. Hamilton, P. (1979). Process entropy and cognitive control: mental load in internalized thought processes. In: N. Moray (Ed.), Mental Workload: Its Theory and Measurement, N e w York: Plenum Press, 289-298. Hancock, P. A. (1986). T h e role of temporal fhctors in workload prediction. Proceedinm -
A Bibliographic Listing of M W L Research
349
of IEEE Internation Conference on Systems. Man and Cybernetics (pp, 1049- 1053). Hancock, P. A. (1986). On the use of time: The irreplaceable resource. In: 0. Brown and H . Hendrick (Eds.), Human Factors in Organizational Design and Management 11. Amsterdam: North-Holland. (pp. 83-89). Hancock, P. A. (1987). Arousal theory, stress, and performance: Problems of' incorporating energetic aspects of behavior into human-machine systems function, In: L. S. Mark, J. S. Warm, & R. L. Huston (Eds.), Ergonomics and Human Factors: Recent Research. Amsterdam: Springer-Verlag. (pp. 173- 179). Hancock, P. A. (1987). The effect of gender and time of day upon the subjective estimate of mental workload during the performance of a simple task. In: P. A. Hancock & N . Meshkati (Eds.), Human Mental Workload. Amsterdam: NorthHolland. Hancock. P. A,, & Chignell, M. H. (1986). Toward a theory of mental workload: Stress and adaptability in human-machine systems. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 378-383. Hancock, P. A,, & Chignell, M . H. (1987). Adaptive control in human-machine systems. . Amsterdam: North Holland. In: P. A. Hancock (Ed.), Hancock. P. A,, & Chignell. M. H. (1986). Input information requirements for an adaptive human-machine system. Proceedings of the Psychology in the DOD SvmDosium, 10. Hancock, P. A,, & Carson, D. M . (in press). T h e time trap: Temporal incongruities under stressful conditions. Aviation, Space and Environmental Medicine. Hancock, P. A., Meshkati, N.. & Robertson, M. M. (1985). Physiological reflections of mental workload. Aviation, SDace and Environmental Medicine, 56, 11 10-1 1 14. Harper, R. P., & Cooper, G . E. (1986). Handling qualities and pilot evaluation. Journal of Guidance. Control and Dynamics, 9, 515-529.
AIAA
Harris, D. A,, Pegram, C. V., & Hartman, B. 0. (1971). Performance and fatigue in experiments double-crew transport missions. AerosDace Medicine, 42,980-986. Harris, R. L., & Mixon. R. W. (1979). Advanced transport operation effects on pilot scan patterns. Proceedings of the Human Factors Society, 23, 347-35 I . Harris, R. L., Tole, J . R.. Ephrath. A . R.. & Stephens, A. T.
(
1982). How a new
350
P.A. Hancock e l a1
instrument affects pilots’ mental workload. 2fj, 1010-1013.
Proceedings of the Human Factors Society,
Hart, S. G. (1975). Time estimation as a secondary task to measure workload. Proceedings of the Annual Conference o n Manual Control, IJ, 64-77. Hart, S. G. (1978). Subjective time estimation as an index of workload. In: D. Kitay - of the Symposium on Man-System Interface: Advances in Workload (Ed.), Proceedinas SAdj. Washington, D.C.: Air Line Pilots Association, I 15-13 1 . Hart, S. G. (1982). Theoretical basis for workload assessment research at NASA-Ames Research Center. Proceedings of the Workshop on FliPht Testing to Identify Pilot Workload and Pilot Dynamics, (AFTEC-TR-82-5), 455-470. Hart, S. G. (1986). T h e relationship between workload and training: An introduction. Proceedings of the Human Factors Society, 30, I 116-1 120. Hart, S. G . (1986). Theory and measurement of human workload. In: J. Zeidner (Ed.), Human Productivity Enhancement: Training- and Human Factors in Systems Design, I , New York: Praeger, 396-456. Hart, S. C. (1987). Measurement of pilot workload. In: A. Roscoe (Ed.), AGARDograph on Pilot Workload Assessment (AGARDograph No. 282). Neuilly stir Seinne, France: AGARD. Hart, S. G., Battiste. V., Chesney, M., Ward, M.. & McElroy. M. (1987). Responses of Type A and Type B individuals performing a supervisory control simulation. In: G. Salvendy (Ed.), Proceedinm of Second International Conference on Human-Computer Interaction. Amsterdam: Elsevier. Hart, S. G . , Battiste, V., & Lester, P. T. (1984). POPCORN: A supervisory control - of the Annual simulation for workload and performance research. Proceedings Conference on Manual Control, 20, 431-454. Hart, S. G., & Bortolussi, M. R. (1984). Pilot errors as a source of workload. Human Factors, 26, 545-556. Hart, S. G . , & Chappell. S. (1983). Pilot communications as a source and indicator of‘ workload. Proceedings of the IEEE International Conference on Systems, M a n - d Cybernetics. Hart, S. G . , & Chappell, S. L. (1983). lntluence of pilot workload and traffii information In pilot’s situation awareness. Proceedings of the Annual conference on Mantial
A Bibliographic Listing of M W L Research
35 I
Control, 18, 522-544 Hart, S. G., & Childress. M. E. (1983). Influence of pilot workload and traffic information on pilot's situation awareness. Proceedings of the Annual Conference on Manual Control, 1_9. 4-26. Hart, S. G.. Childress, M. E., & Bortolussi, M. (I98 I).Defining the subjective experience of workload. m e d i n g s of the Human Factors Society, 25, 527-531. Hart, S. G., Childress, M. E., & Hauser, J. R. (1982). Individual definitions of the term "workload". Proceedings of the Psycholoav _. in the DOD SymDosium. 478-485. Hart, S. G., Hauser, J . R., & Lester, P. T. (1984). Inflight evaluation of four measures of pilot workload. Proceedings of the Human Factors Society, 28, 945-949. Hart, S. G . , & Hauser, J . R. (1987). Inflight application of three pilot workload measurement techniques. Aviation, Space and Environmental Medicine, 58, 402-4 10. Hart, S. G., McPherson, D., & Loomis, L. L. (1978). Time estimation as a secondary task to measure workload: Summary of research. Proceedings of the Annual Conference on Manual Control l4, 693-712.
Hart, S. G . , Sellers, J . J . , & Guthart, G. (1984). T h e impact of response selection and response execution difficulty on the subjective experience of workload. Proceedings - of the Human Factors Society, 28, 732-736. Hart, S. G . , & Sheridan, T. B. (1984). Pilot workload, performance, and aircraft control automation. Proceedings of the AGARD SvmDosium on Human Factors Considerations - Performance Aircraft - Conference Proceedines No. 371 . Neuilly sur Seine, in Hieh France: NATO - Advisory Group for Aerospace Research and Development. Hart, S. G., Shively, R. J., Vidulich, M. A,, & Miller, R. C. (1986). The effects of stimulus modality and task integrality: Predicting dual-task performance and workload from single task levels. Proceedings of the Annual Conference on Manual Control, 21, 5.1-5.18. Hart, S. G . . & Staveland, L. E. (1987). Development of the NASA-Task Load Index (NASA-TLX): Results of empirical and theoretical research. In: P. A. Hancock & N. Meshkati (Eds.),Human Mental Workload. Amsterdam: Elsevier. Hartman, B. 0..Albanese, R. A , , & Humphrees. G . B. (1985). Reliability of military pilots: Problems and prospects. Proceedings of the Symposium on Aviation Psychology, 3, 297-308.
352
P.A. Hancock et al.
Hartman, B., Hughes, H . , Samn, S. , Albanese, R., & Lozano, P. (1983). Cockpit workload is the tip of the iceberg. Proceedings of the Svmposium on Aviation P s v c h o l o ~2, , 109-1 14. Hauser, J. R., Childress, M. E., & Hart, S. G. (1983). Rating, consistency and component salience in subjective workload estimation. Proceedings of the Annual Conference on Manual Control, Is,127-149. Hauser, J . R., & Hart, S. G. (1983). T h e effect of feedback o n subjective and objective measures of workload and performance. Proceedings of the Human Factors Society, 27, 144. Hawkins, H., & Ketchum, R. D. (1980). The Case Aeainst Secondary Task Analyses of Mental Workload. (TR-6) Washington, D.C.: Office of Naval Research. Hay, G. C., House, C. D., & Sulzer. R. L. (1978). Summary of DeDartment of TransportatiodFederaI Aviation Administration 1977/1978 Task Force on Crew Workload ReDort. (FAA-EM-78-15) Washington, D.C.: Federal Aviation Administration. Hay, G. C., Sulzer, R. L., & Gold, F. (1981). Update of the FAA Task Force o n Crew Workload Accident Analysis. (FAA-ASF-8 I - 1) Washington, D.C.: Federal Aviation Administration. Hay, G. C., Sulzer, R. L., & Cox, W. J. (1981). Flight Crewmember Workload Evaluation. (FAA-RD-80- 129) Washington, D.C.: Department of Transportation. Heffley, R. K. (1983). Modelling Pilot Workload for Aircraft Flying Qualities Analysis. (N62269-82-R-07 12) Warminster, PA: Naval Air Development Center.
- of Heffley, R. K. (1983). Pilot workload in the total pilot-vehicle-task system. Proceedings the Human Factors Society, 26, 234-238. Helm, W. R. (1981). Psychometric measures of task difficulty under varying levels of information load. Proceedings of the Human Factors Society, 25, 5 18-52 1. Helm, W. R., Fishburne, R. P. Jr., & Waag, W. L. (1978). Channel capacity and locus of interference under dual task conditions. Perceptual and Motor Skills, 46,659-666. Helmreich, R. L., Foushee, H. C., Bensen, R.. & Russini, W. (1985). Cockpit resource management: Exploring the attitude-performance linkage. Proceedings of the , - S 3, 445-450. Hemingway, J.C. (1984). An Experimental Evaluation of tne Sternbera Task as a
A BibIiographic Listing of MWL Research
353
Workload Metric tor Helicopter Flight Handling Oua-. (FHQ Research). (NASA T M 85884) Washington, D.C.: National Aeronautics and Space Administration. Hess, R. A. (1977) Prediction of pilot opinion ratings using an optimal pilot model, Human Factors, l9,459-475. Hicks, R. E., Miller, G. W . , Gaes, G . , & Bierman. K. (1977). Concurrent processing demands and the experience of time-in-passing. American lournal of Psvchology, 90, 431-446. Hicks, T. G., & Wierwille, W. W. (1979). Comparison of five mental workload assessment procedures in a moving-base driving simulator. Human Factors, 21. 129-143. Higgins, T. H. (1981). A Systems Engineering Evaluation for Piloted Aircraft and Other Man Operated Vehicles and Machines. A Unifying Set of Hypotheses for Dynamic System Test and Evaluation: T h e Rating of System Performance, System Load, and System Work and Their Interrelationships. (FAA-RD-8 1-30) Washington, D.C.: Federal Aviation Administration. Higgins, T., Chignell. M. H., & Hancock, P. A. (1987). Task analysis and workload evaluation in a carrier landing sequence. In: P. A. Hancock & M. H. Chignell, (Eds.), Intelligent interfaces: theory. research and design. Amsterdam: North-Holland. Hill, S. G., Plarnondon, B. D., Wierwille, W. W., Lysaght, R.J.. Dick, A. 0.. & Bittner, A. C. (1987). Analytic techniques for the assessment of operator workload. Proceedines of the Human Factors Society, 3 1 , 368-372. Holland, M. K.,& Tarlow, G. (1972). Blinking and mental load. Psychological Reports,
31, 119-127. Hopkin, V. D. (1979). Mental workload in air traffic control. In: N. Moray (Ed.), Mental workload: Its theory and measurement. New York: Plenum Press, 381-386. Hopkins, H. A. (1973). Establishing priorities during flight deck operation. Proceedings of the Royal Aeronautical Society’s SymDosium on “Flieht Deck Environment and Pilot Workload”. Horst, R. L.. Munson, R. C., & Ruchkin, D. S. (1984). Event-related potential indices of workload in a single-task paradigm. Proceedings of the Human Factors Society, 28, 727-73 1. Howitt. J . S . , Hay, A. E., Shergold, G. R.. & Ferres, H. M. (1978). Workload and Fatigue-in-Flight EEG Changes. Aviation. Space and Environmental Medicine, 43,
354
P.A. Hancock et al.
1197-1202. Howitt, J . S. (1973). The assessment of pilot workload. Proceedings - of Royal Aeronautical Society's Symposium on "Flight Deck Environment and Pilot Workload". Hurst. M . W., lk Rose, R. M. (1978). Objective workload and behavioral response in airport radar control rooms. Ercronomics, 559-565.
z!,
Hyndman, B. W., & Gregory, J . R. (1975). Spectral analysis of sinus arrythrnia during mental loading. Ereonomics, Is,255-270. Imhoff, D. L., & Levine, J . M. (1981). Perceptual-motor and cognitive performance task battery for pilot selection. (AFHRL-TR-80-77) Brooks AFB, T X : Air Force Systems Command. Inbar, C. R., & Eden, C. (1976). Psychological stress evaluators. Correlation with voice tremor. Bioloeical Cvbernetics, 3, 165- 167. Inomata, 0. (1977). An evaluation of heart rate variability in different levels of mental loading. Journal of Human Eraology, 6, 208-210. Ioseliani, K. K. (1987). Information-activation relationships and mental performance of I ) , 17-2 I . (In operators. Kosmicheskaia Biologiva I Aviakosmicheskaya Meditsina, Russian).
a(
Ioseliani, K. K. (1985). Psychic adaptation and work capacity during simulated weightlessness. Kosmicheskava Bioloeiya I Aviakosmicheskaya Meditsina. l9( I), 19-24, (In Russian). Ioseliani, K. K. (1980). Evaluation and prediction of mental performance of neurotically affected flight personnel. Kosmicheskava Biologiva I Aviakosrnicheskava Meditsina, 14( I), 68-72. (In Russian). Ioseliani, K. K. (1975). Study of variation in psychic performance of flying personnel during hypertension to predict their profession. Kosmicheskava Bioloaiva _. I Aviakosmicheskaya Meditsina. 9(6),65-70. (In Russian). Israel, J. B.. Chesney, G . L., Wickens, C. D., & Donchin, E. (1980). P300 and tracking difficulty: Evidence for multiple resources in dual-task performance Psvchophvsiolog-j, 17, 259-273. Israel, J. 6..Wickens, C. D., Chesney, G. L.. & Donchin, E. (1980). The event-related brain potential as an index ot display-monitoring workload. Human Factors, 22, 21 1-224.
A Bibliographic Listing of MWL Research
355
Ivanov-Muromskii, K. A., & Lukianove, 0. N. (1975). Mapping in the state of operational stress. Fizioloaiia Cheloveka, 1.459-568. (In Russian). Jahns, D. W. (1973). Operator workload. What is it and how should it be measured? In: K. D. Cross, & J . H McGrato (Eds.), Crew System Design. Jennings, A. E., & Chiles, W. D. (1977). A n investigation of time-sharing ability as a factor in complex performance. Human Factors, 19, 535- 547. Jensen, R. S., & Chappell, S. (1984). Pilot performance and Workload Assessment: An Analysis of Pilot Errors. (NASA CR-xxxx) Washington, D.C., National Aeronautics and Space Administration. Jensen, R. S., & Marsh, R. W. Simulator Tests of Pilot Performance in Terminal Area Navieation Operations: Effects of Various Airborne System Characteristics. (FAARD-76-99) Washington, D.C.: Federal Aviation Administration. Jex, H. R. (1979). A proposed set of standardized sub-critical tasks for tracking workload calibration. In: N. Moray (Ed.), Mental workload: Its theory and measurement. N e w York: Plenum Press, (pp. 179-188). Jex, H. R. (1967). Two applications of a critical-instability task to secondary workload research. IEEE Transactions of Human Factors in Electronics, 8, 279-282. Jex, H. R., & Clement, W. F. (1979). Defining and measuring perceptual-motor workload in manual control tasks. In: N. Moray (Ed.), Mental workload: Its theory and measurement. New York: Plenum Press, (pp. 125-179). Jex, H. R., & Clement, W. F. (1977). On Defining and measuring DerceDtual-motor (TR-I 104-1) Mountain View, CA: Systems workload in manual control tasks. Technology, Inc. Johanssen, G . (1979). Workload and workload measurement. In: N. Moray (Ed.), Mental workload: Its theory and measurement. New York: Plenum Press, (pp. 3-12). Johnson, J. A. (1985). Fatigue, stress, or the pre-occupied mind has a direct relationship to the human factors’ accidents. Proceedings of the Symposium on Aviation Psychology 2, 625-630. Johnson, W. B., & Rouse, W. B. (1982). Analysis and classification of human errors in troubleshooting live aircraft power plants. I E E E Transactions on Systems, Man and Cybernetics, 12,389-393.
356
P.A. Hancock et al.
Jones, E. C. J r . , & Schuster, D. H. (1970). Design and development of an adaptive auditory and distractive stressor. IEEE Transactions on Man-Machine Systems, I6 1- 163.
u,
Juris, M., & Velden, M. (1977). Physiological Psychology, 5 , 42 1-424.
The pupillary response to mental overload.
Kahneman. D., Beatty, J., & Pollack, I. (1967). Perceptual deficit during a mental task, Science, 157, 218-219. Kahneman, D., Tversky, B.. Shapiro. D., & Crider, A. (1969). Pupillary heart rate and skin resistancy changes during a mental task. Journal of Experimental Psvchology, 79, 164-167. Kahneman, D., & Beatty. J . (1967). Pupillary responses in a pitch discrimination task. Perception & Psvchophvsics, 2, 101-105. Kalsbeek, J. W. H. (1969). Measurement of mental work load and of acceptable load. Possible application in industry. International lournal of Production Research, 2, 33-45. Kalsbeek. J. W. H. (1971). Standards of acceptable load in a task. 64 1-650.
Ergonomics, i4,
Kalsbeek, J. W. H. (1973). Do you believe in sinus arrhythmia? Ergonomics. l6,99-104. Kalsbeek, J. W. H. (1973). Sinus arrhythmia and the dual task method in measuring mental load. In: W. T . Singleton, J. G. Fox, & D. Whitfield,(Eds.), Measurement of Man at Work. London: Taylor and Francis, (pp. 101-113). Kalsbeek, J. W. H. (1964). On the measurement of deterioration in performance caused by distraction stress. Ergonomics, 7 , 187-195. Kalsbeek, J. H. W., & Sykes. R. N. (1967). Objective measurement of mental load. & Psvchologica, 22, 253-26 I . Kantor, J. E. Methodoloev to Assess Psychological Stress and its Impact in the Air Combat Environment. (AFHRL-TR-78-3). Brooks Air Force Base: Human Resources Laboratory Kantowitz, B. H. (1987). Mental workload. In: P. A . Hancock (Ed.). Human factors psychology. Amsterdam: North-Holland. Kantowitz. B. H . (1987). Defining and measuring pilot mental workload. Proceedines of
A Bibliographic Listing of M W L Research
351
the 1987 Mental State Estimation Workshop. Hampton, VA: NASA-Langley Research Center. Kantowitz, 8 . H., & Casper, P. A. (in press). Human workload in aviation. In: E. Wiener, & D. Nagel (Eds.), Human factors in aviation. New York: Academic Press. Kantowitz, B. H., Hart, S. G., & Bortolussi, M . R. (1983). Measuring pilot workload in a moving-base simulator: I. Asynchronous secondary choice-reaction task. Proceedings - of the Human Factors Society, 27, 319-322. Kantowitz, B. H.. Hart, S. G., Bortolussi, M. R., Shively, R. J., & Kantowitz, S. C. (1984). Measuring pilot workload in a moving-base simulator: 11. Building levels of load. Proceedings of the Annual Conference on Manual Control, 20, 359-372. Katz, J. G. (1980). Pilot Workload in the Air Transport Environment: Measurement, Theory, and the Influence of Air Traffic Control. (FTL Report R80-3) Cambridge, MA: Massachusetts Institute of Technology. Katz, J. G., & Simpson, R. (1980). Pilot workload in the air transport environment: Theory, measurement, and the influence of air traffic control. Proceedings of the Annual C o n f e r e n s o n Manual Control, l6,2 13. Kennedy, R. S., Bittner, A., Harbeson, M., & Jones, M. B. (1981). PersDectives in Performance Evaluation Tests for Environmental Research (PETER): Collected Papers. (NBDL-80R004) New Orleans: Naval Biodynamics Laboratory. Kessel, C. J., Brickner, M., Allon, Z., & Seidmann, A. (1983). Digital modeling of pilot workload in high speed high performance aircraft. Proceedings of the SvmDosium on Aviation Psychology, 2, 279-282. Kirkpatrick, M., Malone, T. B., & Andrews, P. J. (1984). Development of an interactive microprocessor based workload evaluation model (SIMWAM). Proceedings of the Human Factors Society, 28, 78-80. Kitay, D. (1978). Executive Sumniarv. Crew Workload in the Air Carrier CockDit. Washington, D.C.: Air Line Pilots Association. Klein, K. E., & Wegmann, H. W. (1980). Significance of circadian rhythms in aerospace operations. (AGARDograph No. 247.) London: Harford House. Klotzbucher, E. & Roloff, D. (1977). The effect of mental work with and without time pressure on selected physiological funtions. Zeitschrift fur dir Gesamte Hygiene und lhre Grenzgebiete, 23, 8-1 I, (In German).
P.A. Hancock el al.
358
Knowles, W. 6. (1963). Operator loading tasks. Human Factors, 155-161.
Kramer, A. F. (in press). Event-related brain potentials as indices of cognitive workload and attentional allocation. Proceedings of the ACARD Symposium: Electrical and Magnetic Properties of the Central Nervous System: Research in Clinical Applications in Aerospace Medicine. Neuilly sur Seinne, FR: ACARD. Kramer, A. F. (in press). Review of physiological methods for measuring pilot workload. Proceedines of the Workshop on the Assessment of Crew Workload Measurement Methods, Techniques. and Procedures: Preliminary Selection of Measures. WrightPatterson Air Force Base. O H . Kramer, A. F., Sirevaag, E. J . , & Braune, R. A. (1987). A psychophysiological assessment of operator workload during simulated flight missions. Human Factors, 29, 145-160. Kramer, A. F., & Strayer, D. L. (1987). A Componential Analysis of Changes in Human Information Processing During the Development of Automaticity. (Progress Report for NAG 2-369). Moffett Field, CA: NASA-Ames Research Center, Aerospace Human Factors Research Division. Kramer, A. F., Wickens, C. D., & Donchin, E. (1985). Processing of stimulus properties: Evidence for dual-task integrality. Journal of Experimental Psychology: Human Perception and Performance, 393-408.
u,
Krivolave, J. (1968). Pulse rate and information load during typing. Activitat Nervosa Superior, 10, 172-176, (In Czechoslovakian). Krol, J. P. (1971). Variation in ATC-work load as a function of variation in cockpit workload. Ergonomics. l4, 585-590. Krzanowski, W.J., & Nicholson, A. N. (1972). Analysis of pilot assessment of workload. Aerospace Medicine, 43,993-997. Kundiev, I. I., Navakatikian, A. O., Tomashevskaia, L. I., Derkach, V. S., & Kovaleva, A. I. (1976). Stressful mental activity and the regulatory state in the cardiovascular system. Fiziologiia Cheloveka, 2, 433-440. (In Russian). Kulkarn, J., & Karwowski, W. (1986). Research guide to applications of fuzzy set theory in human factors. In: W. Kjarwowski and A. Mital (Eds.). ADDlications of fuzzy set theory in human factors. Amsterdam: Elsevier Science Publishers. Kuroda. I., Fujiwara, O., Okamura, N.. & Utsuki, N. (1976). Method for determining pilot stress through analysis of voice communication. Aviation, Space and Environmental
A Bibliographic Listing of MWL Research
359
Medicine, 47, 528-533 Lane, N. E., & Streib, M. I. The human operator simulator: Workload estimation using a simulated secondary task. AGARD A I 1- I - 12. Lauber, J. K. (1984). Resource management in the cockpit. Air Line Pilot, 20-38 Laurell, H., & Lispar. H . 0. (1978). A validation of subsidiary reaction time against detection of roadside obstacles during prolonged driving. Ergonomics, 21, 8 1-88, Leplat, J . (1978). Factors determining work-load. Ergonomics, 21, 143-149. Lester, P. T., & Palmer, E. A. (1983). Reaction time studies of separation violation detection with cockpit traffic displays. Proceedings of the Annual Conference of Manual Control, 19, 133-4 I . Levine, J . M., Ogden, C. D., & Eisner, E. J . Measurement of Workload by Secondary T&. (Final report For NAS2-9637). Washington, D.C.: Advanced Research Resources Organization. Levison, W. H. (1979). A model for mental workload in tasks requiring continuous Its theorv and information processing. In: N . Moray (Ed.), Mental workload: measurement. New York: Plenum Press, 189-2 18. Liu, Y.,& Wickens, C. D. (1987). Mental Workload and Cognitive Task Automation: A n Evaluation of Subjective and Time Estimation Metrics. (ERL-87-2/NASA-87-2) Champaign: University of Illinois, Engineering- Psychology Research Laboratory. Lindholm, E., & Sisson, N . (1985). Physiological assessment of pilot workload in simulated and actual flight environments. Behavior Research Methods. Instruments & Computers, l7, 191-194. Lisper, H.O., Laurell, H., & Stening, G. (1973). Effects of experience of the drive on heart-rate respiration-rate and subsidiary reaction time in a three hour continous driving Is,501-506. task. Ergonomics, Logan, G. D. (1979). On the use of a concurrent memory load to measure attention and automaticity. Journal of Experimental Psvcholoay: Human Performance and Perception. 5, 189-207. Lorens, S. A. Jr., & Darrow, C. W. (1976). Eye movements EEG, GSR, and EKG during mental multiplications. Electroencephalograph and Clinical Neurophysiology, 14, 739-746.
P.A. Hancock et al.
360
Luczak, H . (1971). The use of simulator for testing individual mental working capacity Ergonomics, l4, 65 1-660. Luczak, H., & Laurig, W. (1973). An analysis of heart rate variability. Ereonomics, &, 85-97. Luczak, H., & Rohmert, W. (1976). Adaptation reaction of workers in ergonomic field European lournal of Applied studies of information processing work potentials. Phvsiologv, 35, 33-47, (In German). Lyman, E. G., & Orlady, H. W. (1981). Fatigue and Associated Performance Decrements in Air Transport ODerations. (NAS2-166167). Washington, D.C.: National Aeronautics and Space Administration. Machac, M. (1971). Mental load, fatigue, and recovering. Czechslovakian).
Psvcholoeie,
6, 72-79, (In
Madni, A. M., & Lyman, J. (1983). Model-based estimation and prediction of taskimposed mental workload. Proceedinvs of the I EEE International Conference on Systems, Man and Cybernetics, 314-3 17. Majendie, A. M. A. (1962). Automatic landing: T h e role of the human pilot. Aerospace Engineering, 24-34. Mane, A., & Wickens, C. D. (1986). The effects of task difficulty and workload on training. Proceedines of the Human Factors Society, 1124-1 127.
a.
Martin, J.. Long, J., & Broome. D. (1984). The division of attention between a primary tracking task and secondary tasks of pointing with a stylus or speaking in a simulated ship’s-gunfire-control task. Ereonomics, 22. 397-408. Matthews, M. (1986). The influence of visual workload history on visual performance. 623-632. Human Factors,
a,
McCauley, M. E., Kennedy, R. S., & Bittner, A. C. Development of Performance Evaluation Tests for Environmental Research (PETER): Time Estimation. Perceptual and Motor Skills, 51, 655-665. McCormick, D. (1985). Simulated terrain following flight: Visual and radar correlation requirements. Proceedings of the Svmposium on Aviation Psycholoey, 3, 499-504. McCoy, C. E. (1985). The role of fundamental expression in instrument flight performance. Proceedings of the Symposium on Aviation Psvchology, 2,4 19-426.
A Bibliographic Listing of M W L Research
361
McDonnell, J. D. (1969). An application of measurement methods to improve the quantitative nature of pilot rating scales. IEEE Transactions on Man-Machine Systems, 10, 81-92. McDonnell, J. D. (1968). Pilot Rating Techniques for the Estimation and Evaluation of Handling Qualities. (AFFDL-TR-68-76) Wright-Patterson Air Force Base, O H : Flight Dynamics Laboratory. McRuer, D. T., Clement, W. F., & Allen, R. W. (1981). A theory of human error. Proceedings of the Annual Conference o n Manual Control, 11. McRuer, D. T., & Jex, H. R. (1967). A review of quasi-linear pilot models. Transactions on Human Factors in Electronics, 8, 231-249.
IEEE
Menalson, D., Curry, R. E., Howell, J. D., & Connelly, M. E. (1973). T h e effect of communications and traffic situation displays on pilots awareness of traffic in the terminal - of the Annual Conference on Manual Control, l9, 25-39. area. Proceedings Melton, C. E.. McKenzie, J . M., Kelln, J. R., Hoffman, S. M., & Saldivar, J. T . (1975). Effect of a general aviation trainer on the stress of flight training. Aviation, .%ace and Environmental Medicine, 46, 1-5. Merhav, S. J., & Orna, B. Y. (1976). Control augmentation and work load reduction by kinesthetic information from the manipulator. IEEE Transactions on Systems. Man and Cybernetics, 6, 825-835. Meshkati, M. (1987). Toward development of comprehensive theories of mental workload. In P. A. Hancock and N. Meshkati (Eds.), Human Mental Workload. Amsterdam: North-Holland. Meshkati, N. (1987). Heart rate variability and mental workload assessment. In P. A. Hancock and N. Meshkati (Eds.), Human Mental Workload. Amsterdam: NorthHolland. Meshkati, N., & Driver. M. J. (1984). Individual information processing behavior in perceived job difficulties: A decision style and j o b design approach to coping with human mental workload. In H. W. Hendrick & 0. Brown, Jr. (Eds.), Human Factors in Design - and Management. Amsterdam: North-Holland. Organizational Meshkati, N., Hancock, P. A,. & Robertson, M. M. (1984). The measurement of human mental workload in dynamic organizational systems: An effective guide for j o b design. In: H. W. Hendrick & 0. Brown, Jr. (Eds.), Human Factors in Ormnizational Design and Management. Amsterdam: North-Holland.
P.A. Hancock et al.
362
Meshkati, N . & Loewenthal, A. (1987). An ecletic and critical review of four primary A guide for developing a comprehensive mental workload assessment methods: conceptual model. In: P. A . Hancock & N . Meshkati (Eds.), Human Mental Workload. Amsterdam: North-Holland. Meshkati, N . . & Loewenthal, A. (1987). The effects of individual differences in information processing behavior on experimenting mental workload and perceived task difficulty: An experimental approach. In: P. A . Hancock & N . Meshkati (Eds.), Human Mental Workload. __Amsterdam: North-Holland. Meshkati, N., & Robertson, M. M . (1985). Individual difterences in experiencing mental workload: A guide for cockpit workload evaluators. Proceedines of the Los Angeles Human Factors Society. 8. on Metzler, T. R., & Shingledecker, C. A. (1982). Register of Research on Progress Mental Workload. (AFAMRL-TR-82-42) Dayton, OH: Wright-Patterson Air Force Base, Aerospace Medical Research Laboratory.
Meyer, R.E. (1974). Stress and the air traffic controller. Aeronautique el Spaciale, 13, 97-106.
Revue di Medicine
Micalizzi, J , , & Wickens, C. D. (1980). The Application of Additive Factors Methodology to Workload Assessment in a Dynamic System Monitoring Task. (EPL-80-2/ONR-80-2) Urbana-Champaign: Univerisity of Illinois, Engineering Psychology Research Laboratory. Michon, J . A. (1964). Ervonomics, 461-463.
z,
A note on the measurement of perceptual motor load.
Michon, J. A. (1966). Tapping regularity as a measurement of perceptual motor load Ervonomics, 9, 401-412. Miller, R. C., & Hart, S. G. (1984). Assessing the subjective workload of directional orientation tasks. Proceedings of the Annual Conference on Manual Control, 20. Milord, J. T., & Perry, R. P. (1977). A methodological study of overload. Journal of General Psychology, 97, 131-137. Mital A , , & Ulgen. 0. M . (1982). Mental stress quantification and identification decision 474-478. modeling. Proceedings of Human Factors Society,
s,
Mohler, S. R., & Sulzer, R. (1981). Elements of aircrew workload. Bulletin. Arlington. VA Flight Safety Foundation, Inc.
Human Factors
A Bibliographic Listing of M W L Research
Mohler, S. R. (1979). Mental function in safe pilot performance. Bulletin. Arlington, VA: Flight Safety Foundation, Inc.
363
Human Factors
Moise, S. L., Jr. (1980). Development of Neurophysiological and Behavioral Metrics of Human Performance. (AFAMRL-TR-80-39). Bolling Air Force Base, D.C.: Air Force Office of Scientific Research. Monty, R. A,, & Ruby, W. J . Effects on added workload of compensatory tracking for maximum terrain following. Human Factors, 7, 207-2 14. Moray, N. (1982). Subjective mental workload. Human Factors, 24, 25-40. Moray, N. (1979). Models and measures of mental workload. In: N . Moray (Ed.), Mental workload: Its theory and measurement. N e w York: Plenum Press, (pp. 13-22). Moray, N., Eisen, P., Creco, G . , Krushelnycky, E., Money, L., Muir, B., Noy, I., Shein. F., Turksen, B., & Waldon, L. (1986). Fuzzy and vector measurement of workload. Proceedings of the Human Factors Society, 30, 1121-1 124. Moray, N., Eisen, P. Money, L., & Turksen, I. (1987). Fuzzy analysis of skill and rulebased mental workload. In: P. A. Hancock & N. Meshkati (Eds.), Human mental workload. Amsterdam: North-Holland. Moray, N., Turksen, B., Adie, P., Drascic, D., Eisen, P., Krushelnycky, E., Money, L., Schonert, H. & Thornton, C. (1986). Progress in mental workload measurement. Proceedings - of the Human Factors Society, 30, 1040- 1043. Moray, N., & Waterton, K. (1982). A fuzzy model of rather heavy workload. Proceedings of the Annual Conference on Manual Control, l8, 120-126. Morehead, D. R., & Rouse, W. B. (1983). Computer-aided information seeking: Assessing the value of information. Proceedings of the Human Factors Society, 27, 855-859. Morris, N. M., & Rouse, W. B. (1985) An experimental approach to validating a theory of human error in complex systems. Proceedings of the Human Factors Society, 29. Morris, N . M., Rouse, W. B., & Frey, P. R. (1985). Adaptive Aiding for Symbiotic Human-computerantrol: Conceptual Model and Experimental Approach. (AFAMRLTR-84-072) Dayton, OH: Wright-Patterson Air Force Base, Aerospace Medical Research Laboratory Morris, N. M.. Rouse, W. B., & Ward, S. L. (1985). Experimental evaluation of adaptive
364
P.A. Hancock et al.
task allocation in an aerial environment. Proceedings of the IFACIIFIPIIFORSIIEA Conference on Analysis Design. and Evaluation of Man-Machine Systems. Moss, R. W., & Tofte. P. (1974). C-130 Crew Workload Assessment Program, Volume I: Summary of Results. (AFFDL-TM-74- 185-FGR) Dayton, OH: Wright-Patterson Air Force Base, Flight Dynamics Laboratory. Mulder, G. (1980). The Heart of Mental Effort: Studies in the Psvchophvsiologv of Mental Work. Unpublished Doctoral Dissertation, University of Groningen, The Netherlands. Mulder, G . (1979). Mental load, mental effort, and attention. In: N. Moray (Ed.), Mental workload: Its theory and measurement. New York: Plenum Press, (pp. 299-326). Mulder, G. (1979). Sinusarrythmia and mental workload. In: N . Moray (Ed.), Mental workload: Its theory and measurement. New York: Plenum Press, (pp. 327-344). R. E. H. (1973). Mental load and Mulder, G . & Mulder-Hajonider, van de Meulen, W.’ the measurement of heart rate variability. Ergonomics, l6, 69-83.
Murphy, E. D., & Mitchell, C. M. (1984). Cognitive attributes to guide display design in automated command-and-control systems. Proceedings of the Human Factors Society, 28. 418-422. Murphy, M. R., Randle, R. J., Tanner, T . A., Frankel, R. M., Goguen, J. A., & Linde, C. (1984). The measurement of crew coordination and decision making factors and their relationships to flight task performance. Proceedings of the Annual Conference on Manual Control, 20. Murphy, M. R. ( 1980). Analysis of eighty-four commercial aviation incidents: - of the Implications for a resource management approach to crew training. Proceedings Annual Reliability and Maintainability SvmDosium. Nakamura. M.. Okaue, M., & Hori, H. (1974). The change of heart rate during mental work. Aeromedical Laboratory Reports, l4, 181-190. Tokyo, japan: Japan Air Self Defense Force. (In Japanese). Nataupsky, M . , & Abbott, T , S. (1987). Comparison of workload measures on computergenerated primary tlight displays. Proceedings of the Human Factors Society, 31. Navon, D., 8c Gopher, D. (1979). On the economy of the human-processing system. Psychological Reviey. 84. 2 14-255.
A Bibliographic Listing of MWL Research
365
Navon, D., & Gopher, D. (1980). Task difficulty, resources, and dual task performance. In: R. S. Nickerson (Ed.), Attention and Performance VIII. Hillsdale, NJ: Lawrence Erlbaum Associates. Nicholson, A. N . (1973). Aircrew workload during the approach and landing. Aeronautical lournal, 77, 286-289. Nicholson, A. N., Hill, L. E., Borland, R. G., & Krzanowski, W. J. (1973). Influence ot workload on the neurological state of a pilot during the approach and landing. Aerospace Medicine, 44, 146-152. Nisbett, R. E., & Wilson, T. D. (1977). Telling more than w e can know: Verbal reports on mental processes. Psychological Review, 84, 23 1-259. North, D. M. (1977). YC-14 designed to cut pilot workload. Aviation Week and Space Technology, 18. North, R. A. & Gopher, D. (1976). Measures of attention as predictors of flight performance. Human Factors, l8, 1-14. Notestine, J . C. (1984). Subjective workload assessment and effect of delayed ratings in a probability learning task. Proceedings of the Human Factors Society, 28, 685-689. Noyer, A. (1971). Mental fatigue and palmar skin resistance. 289-298.
Travail Humain, 34,
ODonnell R. D. ( 1979). Contributions of Psvchophvsiological Techniaues to Aircraft Design and other Operational Problems. (ACARDograph 244). London: Harford
House. ODonnell. R. D., & Eggemeier, F. T. (1986). Workload assessment methodology. In: K. R. Boff., L. Kaufman., and J. P. Thomas, (Eds.). Handbook of perception and human performance. New York: Wiley. Ogden, C. D., Levine. J: M.,& Eisner, E. J. (1979). Measurement of workload by secondary tasks. Human Factors, 21, 529-548. Ohara, S. (1970). Change of tracking performance respiration and heart rate during experimentally induced anxiety. Aeromedical Laboratory Reports, 198-205. Tokyo, Japan: Japan Air Self Defense Force. (In Japanese).
u,
Onstott. E. D., & Faulkner, W. H. (1977). Prediction of pilot reserve attention capacity during air-to-air target tracking. Proceedings of the Annual Conference on Manual
366
P.A. Hancock et al.
Control. 13 Onstott, E. D., Warner, J . S., & Hodgkinson, J . (1984). Maximum normalized rate as a flying qualities parameter. Proceedings of the Annual Conference on Manual Control, 20. Opmerr, C. H. J . M.. & Krol, J . P. (1973). Toward an objective assessment of cockpit workload: I-physiological variables during different flight phases. Aerospace Medicine, 44, 527-532. Opmeer, C. H. J M . (1973). Th e information content of successive RR-interval time i n the EGG: Preliminary results using factor analysis and frequency analysis. Ergonomics, 16, 105-112. Orlady. H. W. (1982). Flight crew performance when pilot-flying and pilot-not-flying - of the Human Factors Society, 26. 307-3 I I . duties are exchanged. Proceedings Pardon, N. (1977). Methods of evaluation of mental load. InterDrofessionneIle, 65, 19-38. (In French).
Cahier de Medicien
Parks, D. L. (1977). Current workload methods and emerging challenges. In: N. Moray (Ed.), Mental workload: Its theorv and measurement. New York: Plenum Press, (pp. 387-416). Pattipati, K. R.. Ephrath, A. R., & Kleinman, D. L. (1979). Analvsis of Human Decisionmaking in Multi-task Environments. (TR-79-15) Storrs, CN: University of Connecticut, School of Engineering. Pausder. H. J.. & Gerdes. R. M. (1982). The Effects of Pilot Stress Factors on Handling Quality Assessments During U.S./German Helicopter Agility Flight Tests. (NASA T M 84294) Washington, D.C.: National Aeronautics and Space Administration. Pew, R. W. (1979). Secondary tasks and workload measurement. In: N . Moray (Ed.), Mental workload: Its theorv and measurement. New York: Plenum Press, (pp. 23-28). Phatak, A.V., (1983). Review of model-based methods for pilot performance and workload assessment. (Contract No. NAS2-I 1318). Phatak, A. V. ( 1985). Technical requirements for benchmark simulator-based TERPS evaluation (Contract No. NAS2- I 1973). Moffet Field, CA: NASA-Ames Research Center, Analytical Mechanics Associates, Inc. Phatak, A. V., Mehra, R. K.. & Day, C. N. (1975). Application of system identification to
A Bibliographic Listing of M W L Research
361
modeling the human controller under stress conditions. IEEE Transactions o n Automatic Control. Phatak, A,, Weinert, H.. Segall, I . , & Day, C. (1976). Identification of a modified optimal control model for the human operator. Automatica, 12,3 1-4 I , Pierson, W. R , Mercer, C. R., & Susser, L. L. (1973). The internal environment and flight deck layout. Proceedines of the Royal Aeronautical Society's Symposium on "Flieht Deck Environment and Pilot Workload". Porubcansky, C. A . (1983). Speech technology: Present and future applications in the airborne environment. Proceedings of the Symposium on Aviation Psvcholoay, 2, 85-94. Pope, A. & Bowles, R. L. (1982). A program for assessing pilot mental state in flight simulators. Proceedings of the Al AA Aerospace Sciences Meeting, 8. Poston. A. M. (1978). A Survey of Existing Computer Programs for Aircrew Workload Assessment. (USAHEL T M 13-78). Aberdeen Proving Ground: Human Engineering Laboratory. Pottler, S. S., & Acton, W. H. (1985). Relative contributions of SWAT dimensions to overall subjective workload ratings. Proceedings of the Symposium o n Aviation Psycholoey. 2, 23 1-238. Price, D.L. (1975). The effects of certain gimbal orders on target acquisition and workload Human Factors, , ?l 571-576. Rahimi, M., & Wierwille, W. W. (1982). Evaluation of the sensitivity and intrusion of workload estimation techniques in piloting tasks emphasizing mediational activity. Proceedings of the IEEE International Conference on Cybernetics and Society, 593-597. Rahimi, M., Wierwille. W. W., 8c Casali, J. G. (1984). T h e experimental evaluation of mediational workload measurement. Proceedings of the International Conference on OccuDational Ereonomics. Rashman, S. M. (1972). T h e function of external respiration in mental activity. Fizioloeishnia Zhurnal, l8, 362-366 (In Ukranian). Rasmussen, J. (1979). Reflection on the concept of operator workload. In: N. Moray (Ed.), Mental workload: Its theory and measurement. New York: Plenum Press, (pp. 29-40). Rasmussen. J . (1983). Skills, rules, and knowledge: Signals, signs, and symbols, and
P.A. Hancock et al.
368
other distinctions in human performance models. IEEE Transactions on Systems, Man and Cybernetics,
a.
Rault. A. (1976). Pilot workload analysis In: T.B. Sheridan and G. Johannsen (Eds.), Monitoring behavior and supervison control. New York: Plenum Press, (pp, 139- 153). Rault, A. (1979). Measurement of pilot workload. In: N . Moray (Ed.), Mental workload: ks. theory and measurement. New York: Plenum Press, (pp. 417-422). Rehman, J . T. (I98 I ) . Pilot Performance Measurement: An Annotated Bibliography. (FAA-EM-82- 16) Washington, D.C.: Federal Aviation Administration. Rehinan. J. T.. Stein, E. S., & Rosenberg, B. L. (1983). Subjective pilot workload assessment. Human Factors, 25, 297-307. Reid, G . B. (1985). Current status of the development of the Subjective Workload Assessment Technique. Proceedines of the Human Factors Society, 29, 220-223. Reid, G . B., Eggemeier, F.T., & Nygren. T . E. (1982). An individual differences approach to SWAT scale development. Proceedines of the Human Factors Society, 26, 639-642. Reid, G . B., & Nygren, T.E. (1987). The subjective workload assessment technique: A scaling procedure for measuring mental workload. In: P. A. Hancock & N. Meshkati (Eds.), Human mental workload. Amsterdam: North- Holland. Reid, G . B., Nygren, T. E., & Eggemeier, F. T. (1981). Development of multidimensional subjective measures of workload. Proceedings - of the I= International Conference on Cvbernetics and Society, 403-406. Reid, G. R., Shingledecker, C. A., & Eggemeier, F. T. (1981). Application of conjoint measurement to workload scale development. Proceedings of the Human Factors 25, 522-526.
m,
Renault, B.. Ragot, R.. Lesevre, N.. 8c Remond, A. (1982). Onset and offset of brain events as indices of mental chronometry. Science, 215, 1413-1415. Repperger, D. W., Rogers, D. B., van Patten, R. E., 8c Frazier, J . (1982). A study of task difficulty with a subjective rating scale. Proceedings OF the Workshop on Flight Testing to Identify Pilor Workload and Pilot Dynamics. (AFFTC-TR-82-5) Edwards Air Force Base, CA: Air Force Flight Test Center, 499-515. Riley, D. D.. 8s Breitmaier, W. A. (1985). Cockpit information requirements analysis: A
A Bibliographic Listing of MWL Research
369
mission orientation. Proceedings of the Symposium on Aviation Psvchology, 2, 55-63 Robertson, M. M. (1984). Personality differences as a moderator of mental workload behavior: Mental workload performance and strain reactions as a function of cognitive complexity. Proceedings of the Human Factors Society, 28, 690-694. Robertson, M. M., Hendrick, H . W., & Hancock, P. A . (1984). Individual response to a computer generated mental workload task as a function of cognitive complexity. Proceedings of the International Conference on Occupational Ergonomics, 53 1-535. Robertson, M. M., & Meshkati, N . (1985). Analysis of the effects of two individual differences classification models o n experiencing mental workload and task difficulty. - of the Human Factors Society, 29, 178-181. Proceedings Rohmert, W. (1979). Determination of stress and strain at real work places: Methods and results of field studies with air traffic control officers. In: N. Moray (Ed.), Mental workload: Its theory and measurement. New York: Plenum Press, (pp. 423-443). Rohmert, W. (197 I ) . An international symposium of objective assessment of workload in 14,545-547. air traffic control tasks. Ervonomics. Rohmert, W., Laurig, W., Philipp. U . , & Luczak, H. (1973). Heart rate variability and work-load measurement. Ergonomics, l6, 33-44. Rolfe, J. M., 8c Lindsay, S . J. E. (1973). Flight deck environment and pilot workload: Biological measures of workload. RAF Institute of Aviation Medicine. Rolfe, J.M. (1973). T h e secondary task as a measure of mental load. In: W.T. Singleton J.C. Cox & D. Whitfield (Eds.), Measurement of Man at Work. London: Taylor & Francis, (pp. 135-148). Rolfe, J.M. (197 I). Multiple task performance operator overloads. Psychology, 45,125- 132.
Occupational
Roscoe, A. H. (1978). Stress and workload in pilots. Aviation. Space and Environmental Medicine, 49, 630-636. Roscoe, A. H. (1982). Heart rate as an in-flight measure of pilot workload. Proceedings of the Workshop on Flivht Testing - to Identify Pilot Workload and Pilot Dynamics. (AFFTC-TR-82-5) Edwards Air Force Base, CA: Flight Test Center, 338-349.
Roscoe, A. H., Ellis, G.. Reid, L. D., & Chiles, W. D. (1978). Assessing pilot workload. AGAPE-AC-233, (A-A05-587).
370
P.A. Hancock et al
Roscoe, S. N. (1974). Assessment of pilotage error in airborn area navigation procedures. Human Factors, l6, 223-228. Roscoe, A. H. (1976). Use of pilot heart rate measurement in flight evaluation. Aviation, Space and Environmental Medicine, 47.86-90. Roscoe, A. H.. & Grieve, B. S. (1986). The impact of new technology on pilot workload ratings. Society of Automotive Engineers. Technical Paper. Rosenberg, B., Stein, E. S.. & Rehmann, J . T. (1981). Critical Trackine - Task Workload Rating Study. (FAA-EM-8 I - 13) Washington, D.C.: Federal Aviation Administration. Rotondo, G. (1978). Workload and operational fatigue in helicopter pilots. Space and Environmental Medicine, 49, 430-436.
Aviation,
Rouse, W. 8. (1979). Approaches to mental workload. In: N. Moray (Ed.), Mental workload: Its theory and measurement. New York: Plenum Press, 255. Rouse, W. B. (1983). Elements of human error. Preprints of the NATO Conference on Human Error. Bellagio, Italy. (September 1983). Rouse, W. B. (1985). Optimal allocation of system development resources to reduce and/or tolerate human error. IEEE Transactions on Systems. Man and Cybernetics, 15, 620-630. Rouse, W. B., & Hammer, J. M. (1982). Design of an intelligent computer-aided cockpit. Proceedinns of the IEEE International Conference on Cybernetics and Society, 449-453. Rouse, W. B., & Morris, N . M. (1985). Conceptual design of a human error tolerant interface for complex engineering systems. Proceedings of the IFACIIFIPIIFORSIIEA Conference on Analysis. Desian, and Evaluation of Man-Machine Systems, 2. Rehmann, J . T., Stein, E. S., & Rosenberg, B. L. (1983). Subjective pilot workload assessment. Human Factors, 25,297-307. Sanders, A. F. (1979). Some remarks of mental load. In: N. Moray (Ed.), Mental workload: Its theory and measurement. N e w York: Plenum Press. (pp, 41-77). Sanders, A. F. (1983). Towards a model of stress and human performance. Psychologica, 53, 61-97.
&
Sanders, M. C . , Simmons, R. R., & Hofmann, M . A. (1979). Visual workload bf the copilotlnavigator during terrain flights. Human Facms, 21, 369- 383.
A Bibliographic Listing of MWL Research
37 1
Sanders, M . G., Burden, R. T., Simmons, R. R., Lees, M . A., & Kimball. K . A. (1978). Pilot performanceiworkload during a high hover maneuver with a stability augmentation system. m e d i n g s of the Aerospace Medical Association. Sanders, M. G., Hofmann, M. A., Simmons, R. R., & DeBonis, J. N. (1979). Visual workload of the copilot/navigator during terrain flight. In: R. Auffret (Ed.), Studies on Pilot Workload. (AGARD Conference Proceedings No 2 17) London: Harford House. Savage, R. E., Weirwille, W. W., & Cordes, R. E. (1978). Evaluating t h e sensitivity of various measures of operator workload using random digits as a secondary task. Human Factors, 20, 649-654. Sayers, B. M. (1973). Analysis of heart rate variability. Ergonomics, l6, 17-32. Schiflett. S. G. (1980). Evaluation of a Pilot Workload Assessment Device to Test Alternate DisDlav Formats and Control Handline; Oualities. (SY-33R-80). Patuxent River, MD: Naval Air Test Center. Schiflett, S. G., & Loikith, G. J. (1980). Voice Stress Analvsis as a Measure of Operator Workload. ( T M 79-3 SY). Patuxent River, MD: Naval Air Test Center. Schiflett, S. G. (1983). Theoretical development of an adaptive secondary task to measure pilot workload for flight evaluations. Proceedinas of the Human Factors Society, 27, 602-605. Schlegel. R. E. (1985). Training characteristics of the criterion task set workload assessment battery. Proceedings of the Human Factors Society, 29, 770-773. Schlegel, R. E. (1986). To Determine the Existence of a Subjective Measure of Excessive Mental Workload in a Single Cognitive Task. (Final Report - PO 246442) Witchita, KN: Boeing Military Airplane Company. Schmidt, D. K. (1976). On modeling ATC work load and sector capacity. Journal of Aircraft, 13, 531-537. Schmidt, D. K. (1978). Queuing analysis of the air traffic controller’s workload. Transactions of Systems. Man and Cybernetics, 8, 492- 498.
IEEE
Schouten, J . F., Kalsbeek, J. W. H., & Leopold, F. F. (1962). On the evaluation of perceptual and mental load. Ergonomics, 5, 25 1-260. Schwartz, J . J., & Ekkers, C. L. (1976). Estimation of task loading by observing and regulating complex technical systems. Men Orderneming, 76, 85- 108. (In Dutch).
312
P.A. Hancock et al.
Scucchi, G. D., & Sells, S. B. (1969). Information load and three-man flight crews: An examination of the traditional organization in relation to current and developing airliners. Aerospace Medicine Sebej, F., & Biro, V. (1978). Effect of psychic load on the course of respiration. S U a Psychologica, 20, 67-7 I . Sekiguchi, C., Handa, Y., Gotoh, M., Kurihara, Y.. Nagasawa, Y., & Kuroda, I . (1979). Frequency analysis of heart rate variability under flight conditions. Aviation. Spaceand Environmental Medicine, 50,625-634. Sekiguchi. C., Handa, Y., Gotoh, M., Kurihara, Y., Nagasawa, A,, & Kuronda, I.(1978). Evaluation methods of mental workload under flight conditions-relationship to heart rate variability. Aviation, Space and Environmental Medicine, 9, 920-925. Sem-Jacobsen. C. W. (1976). Monitoring of the heart failure and pilot load/overload by the Vesla Seat Pad. Aviation, Space and Environmental Medicine, 47, 44 1-444. Senders, J . (1979). Axiomatic model of workload. In: N. Moray (Ed.), Mental workload: Its theory and measurement. N e w York: Plenum Press, (pp. 263-267). Senders, J.W. (1964). The human operator as a monitor and controller of multidegree of freedom systems. IEEE Transactions of Human Facton in Electronics, 5 , 2-5. Senders, J . W. (1983). On the nature and source of human error. Proceedings of the Symposium o n Aviation Psvchology, 2,42 1-428. Shepherd, W. T. (1985). Cockpit speech interference considerations. Proceedings of the SvmDosium o n Aviation Psvchology, 2,4 1 14 18. Sheridan, T. B. (1979). Experimentation in Supervisory Control and Flight Management. (Status Report for NSG-2 1180) Cambridge, MA: Massachusetts Institute of Technology. Sheridan, T . B., & Berg, S. (1984). Supervisory workload: Monitoring of overlapping - of the Annual Conference on Manual Control, 0. tasks. Proceedings Sheridan, T. B., & Simpson, R. W. (1979). Toward the Definition and Measurement of the Mental Workload of Transport Pilots. (Final Report DOT-0s-70055) Cambridge: Massachusetts Institute of Technology. Sheridan, T. B., & Stassen, H. (1979). Definitions, models and measures of human workload. In: N. Moray (Ed.), Mental workload: Its theory and measurement. New York: Plenum Press. (pp. 219-234).
A Bibliographic Listing of M W L Research
373
Shingledecker, C. A., Crabtree, M. S.. & Acton, M. S. (1982). Standardized tests for the evaluation and classification of workload metrics. Proceedings of the Human Factors 2f3, 648-65 1 .
w,
Shingledecker, C. A , , Crabtree. M. S., & Eggemeier, F. T. (1985). Methods and systems for measuring human performance capabilities. Proceedings of the Human Factors .S29, 210-214. Shingledecker, C. A., Crabtree, M. S. , & Simons, J. C. (1980). Subsidiary Radio Communications Tasks for Workload Assessment in R&D Simulations: I . Conceptual Development and Task Workload Scaling. (AFAMRL-TR-80- 126) Dayton, OH: WrightPatterson Air Force Base: Systems Research Laboratory, Inc. Shingledecker. C. A,, Crabtree, M. S., Simons. J. C., Courtright, J. F., & O’Donnell, R. D. (1980). Subsidiarv Radio Communications Tasks for Workload Assessment in R&D Simulations: I. Task DeveloDment and Workload Scaling. (AFAMRL-TR-80- 126) Dayton, OH: Wright-Patterson Air Force Base, Aeromedical Research Laboratory. Shingledecker, C.A., & Eggemeier, F.T. (1985). Methods and systems for measuring human performance capabilities. Proceedings of the Human Factors Society, 29, 2 10-214. Shively, R. J . (1986). Application of mental workload methodology to human-computer - of the IEEE International Conference o n Systems. Man and interaction. Proceedings Cybernetics, 907-9 II . Shively, R. J., Battiste, V., Hamerman-Matsumoto, J., Pepitone, D. D.,Bortolussi, M. R., & Hart, S. C. (1987). Inflight evaluation of pilot workload measures for rotorcraft. Proceedings of the Symposium on Aviation Psychology, 4. Silverstein, L. D.. Comer, F. E., Crabtree, M. S.,& Acton. W. H. (1984). A comparison of analytic and subiective techniaues for estimatine communications-related workload during commercial transport flieht operations. (NASA CR- 234 I ) Washington, D.C.: National Aeronautics and Space Administration. Simmons, R. R. (1979). Methodological considerations of visual workloads of helicopter pilots. Human Factors, 2 l , 353-367. Simonov. P. V., Frolov, M. V., & Ivanov, E. A. (1980). Psychophysiological monitoring of operator’s emotional stress in aviation and astronautics. Aviation. Space and Environmental Medicine. Simonov, P. V., & Frolov, M . V. (1977). Analysis ot the human voice as a method of
314
P.A. Hancock et al.
controlling emotional state. Achievement and goals. Aviation. Space and Environmental Medicine, 48, 23-25. Simonov, P. V., Frolov, M. V. & Sviridov, E. P (1975). Characteristics of the electrocadiogram under physical and emotional stress in man. Aviation, Space and Environmental Medicine, 46, 141-143. Simpson, C. A. (1985). Speech variability effects on recognition accuracy associated with concurrent task performance by pilots. Proceedines of the Symposium on Aviation psycho lo^, 2, 87-102. Skipper, J. H., Rieger, C. A., & Wierwille, W. W. (1986). Evaluation of decision-tree rating scales for mental workload estimation. Eraonomics, 29, 585-599. Smit, J., 8c Wewerinke, P.H. (1978). An analysis of helicopter pilot control behavior and workload during instrument flying tasks. (AGARD-CP-255). AGARD Operational Helicopter Aviation Medicine, 30.1-30.1 1. Soede, M. (1979). On mental load and reduced mental capacity: Some considerations concerning laboratory and field investigations. In: N. Moray (Ed.), Mental workload: Its theory and measurement. New York: Plenum Press, (pp. 445-468). Soliday, S.M., & Scholhan, B. (1965). Task loading of pilots in simulated low-altitude high-speed flights. Human Factors, 7,45-53. Soulsby, E. P. (1983). Human operator workload: A survey of concepts and assessment techniaues. Storrs: University of Connecticut. Sperandio, J.C. (197 1). Variations of operator's strategies and regulating effects of l4,57 1-577. workload. Ergonomics, Sperandio, J.C.(1978). T h e regulation of working methods as a function of work-load among air traffic controllers. Ergonomics, 2 l , 195-201. Speyer, J. J. & Fort, A. (1983). Workload assessment for two-man crew certification. Proceedings of the Symposium on Aviation Psychology, 2, 185-200. Spicuzza, R. J., Pinkus, A. R., & ODonnell, R. D. (1974). Development of Performance Assessment Methodology for the Digital Avionics Information System. Dayton, O H : Systems Research Laboratory, Inc. Spyker. D. A,, Stackhouse. S. P., Khalafalla, A . S . , & McLane, R. C. (1971). Develmrnent of Te c h n i w e sf o r Measuring _Pilot Workload. (NASA CR-1888) Washington, D.C.
A Bibliographic Listing of M W L Research
375
National Aeronautics and Space Administration Stackhouse, S. P.. & Petersen, J . R. (1972). Measurine Information Processing Workloads. (2950 1-3004) Minneapolis, M N : Honeywell Systems and Research Division. Stager, P , & Zutelt, K . (1972). Dual-task method in determining load differences. Journal 0 1 Experimental Psychology, 3,113-1 15. Stamford, B.A. (1976). Validity and reliability of subjective ratings of perceived exertion during work. Eraonomics, 19, 53-60. Staveland, L., Hart, S. G . , & Yeh, Y.-Y. (1985). Memory and subjective workload assessment. Proceedings of the Annual Conference on Manual Control, 21. Stein, E. S., 8c Rosenberg, B. L. (1981). The in-flight measurement of pilot workload (a preliminary study). Atlantic City, NJ: Federal Aviation Administration Technical Center. Stephens, A. T., Tole, J . R., Ephrath, A., & Young, L. R. (1980). Pilot eye scanning behavior as an index of mental loading. Proceedines of the Annual Northeast Bioeneineerina Conference, 8. Stern, J. A,, & Skelly, J. J. (1984). The eyeblink and workload considerations. Proceedings of the Human Factors Society. 28, 942-944. Strasser, H. (1979). Measurement of mental workload. In: N. Moray (Ed.), Mental Workload: Its theory and measure-. New York: Plenum Press, (pp. 345-348). Strasser, H. (1977). Psychological measures of workload correlation between physiological parameter and operational performance. (AGARD CP-2 16). pp. A8/ 1A8I7. Tanaka, K., Buharali, A,, & Sheridan, T. B. (1983). Mental workload in supervisory control of automated aircraft. Proceedinzs of the Conference on Manual Control, l9, 40-58. Teiger, C. (1978). Regulation of activity: An analytical tool for studying work-load in perceptual motor tasks. Ergonomics, 21, 203-2 13. Tole, J. R., Ephrath. A., Stephens, A., & Young, L. R. (1980). Workload and pilot eye scanning behavior. Proceedings - of the Conference on Manual Control, Is. Towne, D. M. (1985). Coenitive workload in fault diaenosis. (Report No. ONR-107). 1.0s Angeles: University of Southern California, Behavioral Technology Laboratories
316
P.A. Hancock et al.
(Contract No. NO00 14-80-C-0493). Townsend. J . T. (1987). Toward a Dynamic Mathematical Theory of Mental Workload in POPCORN (Progress Report for NAG 2-307) Moffett Field, CA: NASA- Arnes Research Center, Aerospace Human Factors Research Division. Townsend. J . T., Kadlec, H., & Kantowitz, 6. H. (in press). Popeye: A production rulebased model of' multitask supervisory control (POPCORN). Proceedinas of the 1987 Mental State Estimation WorkshoD. Hampton. VA: NASA-Langley Research Center. Trumbo, D., & Noble, M. (1970). Secondary task effects on serial verbal learning. Journal of ExDerimental Psvchology. 85, 4 18-424. Trumbo. D., Noble, M., & Swink, J . (1967). Secondary tasks interference in the performance of tracking tasks. Journal of Experimental Psvchology, 73, 232-240. Tsang, P. S. ( 1986). Displaykontrol integrality and time-sharing performance. Proceedings of the Human Factors Society, 30, 445-449. Tsang, P. S . , Hart, S . G.. & Vidulich, M. A. (1987). T h e effects of displaykontrol 110, compatibility and integrality on dual-task performance and subjective workload. and Decision Makine in Advanced Airborne Weapon Svstems. Information Manavement (AGARD Proceedings No. 4 14). Loughton, Essex, England: AGARD, 5.1-5.9. Tsang. P. S., &Johnson, W. W. (in press). Automation: Changes in cognitive demands and mental workload. Proceedings of the Symposium on Aviation Psychology, 4. Turksen, I. B., Moray, N., & Fuller, K. (in press). A linguistic rule-based expert system for mental workload. In: H. J . Bullinger & H. J. Warnecke (Eds.), Toward the Factory of the Future. Ursano. R. J. (1980). Stress and adaptation: the interaction of the pilot personality and disease. Aviation. Space and Environmental Medicine, 51( I l), 1245-1249. Ursin, H., & Ursin, R. (1979). Physiological indicators of mental load. In: N. Moray (Ed.), Mental workload: Its theorv and measurement. New York: Plenum Press. (pp. 349-365).
van Gigch, J . P. (1970). A model for measuring the information processing rate and mental load of complete activities. Canadian Operational Research Society journal, 8, 116-128. van Gigch, J . P. (1970). Applications of a model used in calculating the mental load of'
A Bibliographic Listing of MWL Research
workers in industry. Canadian Operational Reserach Society lournal,
311
8,
176- 184.
Response modalities and time-sharing performance, Vidulich, M. A. (1986). Proceedings of the Human Factors Society, 30, 337-34 1. Vidulich, M. A. (1987). The cognitive psychology of subjective mental workload. In: P. A. Hancock & N . Meshkati (Eds.), Human Mental Workload. Amsterdam: Elsevier. Vidulich, M. A., & Pandit, P. (1986). Training and subjective workload in a category search task. Proceedines of the Human Factors Society, 30, 1133-1 136. Vidulich. M. A,, & Pandit, P. (in press). Individual differences and subjective workload assessment. Proceedings of the Symposium on Aviation Psychology, 4. Vidulich, M. A., & Tsang, P. S. (1986). Techniques of subjective workload assessment: A comparison of SWAT and the NASA-Bipolar methods. Ergonomics, 29, 1385-1398. Vidulich, M. A,, & Tsang, P. S. (1985). Assessing subjective workload assessment: A comparison of SWAT and the NASA-Bipolar methods. Proceedings of the Human Factors Society, 2_9. Vidulich, M . A.. & Wickens, C. D. (1982). Th e influence of S-C-R compatibility and resource competition on performance of threat-evaluation and fault diagnosis. Proceedings of the Human Factors Society, 26, 223-226. Vidulich, M. A,, & Wickens, C. D. (1983). Processinp Phenomena and the Dissociation between Subiective and Objective Workload Measures. (EPL-83-2/ONR-83-2). Champaign: University of Illinois, Engineering-Psychology Research Laboratory. Vidulich, M. A., & Wickens, C. D. (1984). Subjective workload assessment and voluntary control of effort in a tracking task. Proceedings of the Conference on Manual Control, 20. Vidulich, M. A., & Wickens, C. D. (1985). Stimulus-Central-Processing-Reponse compatibility guidelines for the optimal use of speech technology. Behavior, Research Methods, Instruments & ComDuters, 17,243-249. Vidulich, M. A.. & Wickens, C. D. (1986). Causes of dissociation between subjective workload measures and performance: Caveats for the use of subjective assessments. 12,291-296. Applied Ergonomics. Spectral analysis of Vincente, K. J., Thornton, D. C., & Moray, N. (1987). sinusarrhythmia: A measure of effort. Human Factors, 29, 171-182.
378
P.A. Hancock et al.
Volle, M . A. (1978). Work fatigue and frequency of critical flicker fusion. Ergonomics, 21, 551-558. ( I n French).
Waller, M. C. (1976). An Investieation of Correlation Between Pilot Scanning Behavior and Workload Using - Stepwise Regression Analysis. (NASA T M X-3344) Washington, D.C.: National Aeronautics and Space Administration. Weiss, S.M., Boggs, G . , Lehto, M., Shodja. S., & Martin, D.J. (1982). Computer system response time and psychophysiological stress I I. Proceedines of Human Factors Society, 26. Welford, A. T. (1978). Mental workload as a function of demand, capacity, strategy and skill. Ereonomics, 21, 15 1-167. Wempe, T . E.. & Baty, D. L. (1968). Human information processing rates during certain multiaxis tracking tasks with a concurrent auditory task. IEEE Transactions on ManMachine Systems, 9, 129-138. Wewerinke, P. H. (1977). Performance and workload analysis of in-flight helicopter tasks. Proceedines - of the Conference on Manual Control, l3. 106- 107. Wewerinke, P. H . (1974). Human operator workload for various control situations. Proceedines of the Conference on Manual Control, l o , 167-192. Whitaker, L. A. (1979). Dual-task interference as a function of cognitive processing load. Acta Psvcholoeica, 43, 7 1-84. White, S. A., McKinnon, D. P., & Lyman, J. (1985). Modified petri net sensitivity to workload manipulations. Proceedines of the Conference on Manual Control, 2 l , 3.1-3.17. Wickens, C. D., (1979). Measures of workload, stress and secondary tasks. In: N. Moray (Ed.), Mental workload: IU theory and measurement. New York: Plenum Press, (pp. 79-99). Wickens, C. D. (in press). Review of performance-based measures of pilot workload. Proceedinccs of the Workshop on the Assessment of Crew Workload Measurement Methods, Techniques, and Procedures: Preliminarv Selection of Measures. Wickens, C. D., & Derrick W. (1981). Workload measurement and multiple resources. IEEE Transactions on Svstems, Man and Cybernetics, 600-603. Wickens, C. D., Hyman, F., Dellinger, J . , Taylor, H., & Meador, M . (1985). 'The
A Bibliographic Listing of MWL Research
Sternberg memory search task as an index of pilot workload. Symposium on Aviation Psychology, 2, 287-294.
379
Proceedings of the
Wickens, C. D., & Kessel, D. (1979). The effects of participatory mode and task workload on the detection of dynamic system failures. IEEE Transactions on Systems, Man and Cybernetic?, 9. Wickens. C., Kramer, A., Vanasse, L., & Donchin, E. (1983). Performance of concurrent tasks: A psychophysiological analysis of the reciprocity of information-processing resources. Science, 22 I , 1080-1082. Wickens, C. D., Sandry, D. L., & Vidulich, M . A. (1983). Compatibility and resource competition between modalities of input, central processing, and output. Human Factors, 25, 227-248. Wickens, C. D., Vidulich, M., & Sandry, D. (1981). Factors influencing the performance advantage of speech technology. Proceedings of the Human Factors Society, 25, 705-709. Wickens. C. D., Vidulich. M. A., & Sandry-Garza, D. (1984). Principles of S-C-R compatibility with spatial and verbal 'tasks: The role of display- control location and voice-interactive display-control interfacing. Human Factors, 26, 533-544. Wickens, C. D., & Yeh, Y.-Y. (1983). T h e dissociation between subjective workload and performance: A multiple resource approach. Proceedings - of the Human Factors 27, 244-247.
w,
Wickens, C. D., & Yeh, Y.-Y. (1986). A multiple resources model of workload prediction and assessment. Proceedings of the IEEE International Conference o n Systems, Man and Cvbernetics, 1044- 1048. Wierwille, W. W. (1979). Physiological measures of aircrew mental workload. Human Factors, 2 1, 575-593. Wierwille, W. W. (1981). Instantaneous mental workload: Concept and potential methods for measurement. Proceedinn of the IEEE International Conference on Cybernetics and Society, 84-88. Wierwille, W. W. (1982). Determination of sensitive measures of pilot workload as a function of the type of piloting task (Report No. AFFTC-TR-82-5). In: M. L. Frazier, & Captain R. B. Crombie (Eds.), Proceedings - of the workshop on flkht testing to identifv pilot worklad and pilot dynamics.
380
P.A. Hancock et al.
Wierwille, W. W. (1983). Comparative Evaluation of Workload Estimation Techniques in Piloting- Tasks. ( N A S A CR- 166496) Washington, D.C.: National Aeronautics and Space Administration. Wierwille, W. W. (1987). Important remaining issues in mental workload estimation. In: P. A. Hancock & N . Meshkati, Human mental workload. Amsterdam: North-Holland. Wierwille, W . W., & Casali, J . G. (1983). A validated rating scale for global mental workload measurement applications. Proceedings of the Human Factors Society. 27, 129-133. Wierwille, W. W., Casali, J. G., Connor. S. A,, & Rahimi, M. (1985). Evaluation of the sensitivity and intrusion of mental workload estimation techniques. Advances in ManMachine Svstems Research, 2, 51-127. Wierwille, W. W., & Connor, S. A. (1983). Evaluation of twenty workload assessment measures using a psychomotor task in a motion-base aircraft simulator. Human Factors, 25, 1-16. Wierwille, W. W., & Connor, S. A. (1983). The sensitivity of twenty measures of pilot mental workload in a simulated ILS task. Proceedings of the Conference on Manual Control. 19, 150-162. Wierwille, W. W., & Gutmann, J. C. (1978). Comparison of primary and secondary task measures as a function of simulated vehicle dynamics and driving conditions. Human Factors, 20, 233-244. Wierwille, W. W., Cutmann, J . C.. Hicks, T. C., & Muto, W. H. (1977). Secondary task measurement of workload as a function of simulated vehicle dynamics and driving conditions. Human Factors, l9, 557-565. Wierwille, W. W.. Rahimi, M.. & Casali. J. G. (1985). Evaluation of 16 measures of mental workload using a simulated flight task emphasizing mediational activity. Human 489-502. Factors,
a,
Wierwille, W.W., Skipper, J. H., & Rieger, C. A. (1984). Decision tree rating scales for workload estimation: Theme and variations. Proceedings of the Conference on Manual Control, 20. Wierwille, W. W., & Williges B. H. (1980). An Annotated Bibliographv _ . . on Operator Mental Workload Assessment. (Technical Report SY-27R-80) Patuxent River, MD: Naval Air Test Center.
A Bibliographic Listing of MWL Research
381
Wierwille, W. W.. & Williges R. (1978). Survey of Operator Workload Assessment Techniques. (Report S-78-101) Blacksburg, VA: Systemetrics. (Final Technical Report Contract no. NO042 1-77-C-0083) Patuxent River, MD: Naval Air Test Center. Wierwille, W. W., Williges. R. C., & Schiflett, S. G. (1979). Aircrew workload assessment techniques. I n : H. 0. Hartman & R. E. McKenzie (Eds.), Survev of methods to assess workload, (pp. 19-53). Wightman, D. C., & Lintern, C. (1984). Part-training strategies in simulated carrier landing final approach training. Proceedinvs of the Human Factors Society, 29. Widervanck, C., Mulder, C . , & Michon, J . A. (1978). driving. Ervonomics, 2 l , 225-229.
Mapping mental load in car
Williges, R. C., & Wierwille, W. W. (1979). Behavioral measures of aircrew mental workload Human Factors, 2 l , 549-574. Wilson, G . F. (1981). Steady state evoked potentials and subject performance in operational environments. Proceedings of the IEEE International Conference on Cybernetics and Society. 407-409. Wilson, G. F. (1985). A neurophysiological test battery for workload assessment. - of the Human Factors Society, 29, 224. Proceedinps Wilson, G. F., & O’Donnell, R. D. (1987). Measurement of operator workload with the neuropsychological workload test battery. In: P. A. Hancock and N. Meshkati (Eds.), Human mental workload. Amsterdam: Horth-Holland. Wilson, G. F., & ODonnell, R. D. (1982). Transient evoked potential and eye movement recordings during simulated emergencies. Proceedinas of the Human Factors Society, 26, 652-653. Wolf, J. D. (1978). Crew workload assessment: Development, of a Measure of Operator Workload. (AFFDL-TR-78- 165) Dayton, OH: Wright-Patterson Air Force Base, Flight Dynamics Laboratory. Yeh, Y.- Y.. & Wickens, C. D. (1984). Why d o performance and subjective workload measures dissociate? Proceedinas of the Human Factors Society, 2s.
Yeh, Y.- Y., & Wickens, C. D. (1985). An Investication of the Dissociation between Subiective Measures_ of‘ Mental Workload and Performance. (EPL-84- l/NASA-84- 1) Urbana-Champaign. IL: Engineering-Psychology Research Laboratory.
382
P.A. Hancock et al.
Yeh. Y.- Y., & Wickens, C. 0 . ( 1985). The Dissociation of Subjective Measures of Mental Workload and Performance. (EPL-84-2/NASA-84-2). Urbana-Champaign, IL: Engineering-Psychology Research Laboratory. Yeh, Y- Y., 8c Wickens. C. D. (1984). T h e Dissociation of Subiective Measures of Mental Workload and Performance. (NASA CR-234 I ) Washington, D.C.: National Aeronautics and Space Administration. Yeh, Y.-Y., Wickens, C. D., & Hart, S. C . (1985). The effect of varying task difficulty on subjective workload. Proceedines of the Human Factors Society, 29. Zeier, H. (1979). Concurrent physiological activity of driver and passenger when driving with and without automatic transmission in heavy city traffic. Ergonomics, 22, 799-8 10. Zeitlin, L. R., & Finkelman, J. M. (1975). Research note: Subsidiary task techniques of' digit generation and digit recall as indirect measures of operator loading. Human Factors, 17, 218-220. Zwaga, H. J. G . (1973). Psychophysiological reactions to mental tasks. Effect of stress. Ereonomics, l6. 61-67.